



# **Speeding-up Simulation-based Fault Injection** for Highly Complex HDL Models



PhD. Student: Ilya Tuzov **Collaborators and Supervisors: Juan-Carlos Ruiz, David de Andrés and Pedro Gil Doctoral Program in Computer Science** ITACA, {tuil, jcruizg, ddandres, pgil}@disca.upv.es

Abstract – Increasing integration scales and clock frequencies also increase the sensitivity of integrated circuits to different kinds of faults. Early design verification in presence of faults and dependability assessment is commonly accomplished by means of Simulation-based fault injection (SBFI) techniques, which can be applied at different levels of HDL description. The closer to implementation models are, the more representative are simulation results. However, injecting faults in highly complex and detailed models is a very resource-intensive process that usually requires prohibitive simulation times. This work proposes an approach to speed up this process, making feasible the dependability assessment of very detailed implementation-level HDL models.



accuracy

Accurate

technology and

timing

increase of

growth

to faults should be estimated to ensure device's safe behavior.

- Faults effects can be representatively ~10 simulated by means of implementation-~1 level HDL models, which accurately reflect functional/timing behavior and the structure of resulting circuit.
- Simulation of implementation-level models is up to 4 orders of magnitude slower than of source RTL (behavioral) models, resulting in prohibitive SBFI experimentation time in practice.

#### **Objectives:**

- Analyze the factors affecting simulation complexity at implementation-level
- Optimize fault simulation procedures with the aim for speed-up to enable sensitivity analysis/dependability assessment for complex HDL designs.

## 3. Proposed optimizations and expected speed-up

1. Mixed-Level HDL assembly - reduces computational complexity of HDL model



Technology-

Simulation time

the

model accuracy along the semi-

accurate

generic libraries

post-

synthesis

Circuit at gate-

level

with

source code

RTL

Behavioral

cycle-accurate

Figure 1.

drastically

custom design flow



### 4. Experimental speed-up versus estimation

**Target model** – LEON3 soft-core processor synthesized for Virtex-6 FPGA. **Workload** – integer matrix multiplication (*MiBench* automotive benchmark) **Faultload** – single transient (bit-flip) and permanent (stuck-at-/0) faults.

#### **Computing platforms:**

a) Cluster 'Rigel' (UPV): 72 nodes Xeon E5-2450, CentOS 6, Sun Grid Engine;

Figure 2. Target unit at implementation level interacts with high-level model of the rest of design

**2.** Checkpoints – save/restore pre-computed simulation state to bypass model initialization and reduce the workload execution in each of N experiments



Figure 3. Checkpoints reduce the simulation time to just fault injection and effects observation



b) Multicore PC: Intel Core i5-4670, CentOS 6.7.

**Simulator:** Mentor Graphics ModelSim 10.4 in both environments.

Table 1. Experimentation time measured with respect to enabled optimizations

|                                                                             |              | 0      | ptimiz | ations    |      | Execution time      |           |                                  |
|-----------------------------------------------------------------------------|--------------|--------|--------|-----------|------|---------------------|-----------|----------------------------------|
|                                                                             |              |        |        |           |      | Single              | Standard  | Whole                            |
| Config                                                                      | . ICP        | MLA    | WCP    | Multicore | Grid | experiment          | deviation | campaign                         |
|                                                                             |              |        |        |           |      | $T_{avg}$ (seconds) | σ         | <i>T<sub>total</sub></i> (hours) |
| C1                                                                          | —            | —      | —      | —         | _    | 7327                | 11.0      | 15896#                           |
| C2                                                                          | +            | —      | —      | -         | -    | 113                 | 14.9      | 246.8*                           |
| C3                                                                          | +            | +      | —      | —         | —    | 40                  | 2.2       | 87.7*                            |
| C4                                                                          | +            | +      | +      | —         | -    | 27                  | 3.8       | 59.5*                            |
| C5                                                                          | +            | +      | +      | +         | —    | 30                  | 5.8       | 22.7                             |
| C6                                                                          | +            |        | +      | +         | _    | 79                  | 15.7      | 59.1                             |
| C7                                                                          | +            | +      | +      | _         | +    | 43                  | 10.9      | 3.0                              |
| C8                                                                          | +            | —      | +      | _         | +    | 114                 | 28.0      | 5.7                              |
| # Estimation based on 10 injection experiments                              |              |        |        |           |      |                     |           |                                  |
| * Estimation based on 500 injection experiments                             |              |        |        |           |      |                     |           |                                  |
| CPI/CPW – initialization / workload checkpoints, MLA – mixed-level assembly |              |        |        |           |      |                     |           |                                  |
| 128                                                                         |              |        | 11     | 1,3       |      | -                   |           |                                  |
| 120                                                                         |              |        |        | 64,9      |      | Estimated           |           |                                  |
| 64                                                                          |              |        |        |           |      | Experime            | ental     | 41,6                             |
| 32                                                                          |              |        |        |           |      | •                   |           | 10.83                            |
| 16                                                                          | 2            | 82 sir | nale   |           |      |                     |           |                                  |
| 8                                                                           | 2.62 on igit |        | ılti   |           |      |                     |           |                                  |
| 0                                                                           | 2.65 arid    |        |        |           |      | 2 91 2 62           |           |                                  |
| 4                                                                           | ـ ک          | .uu yi |        |           |      | ۷,3                 | ,         |                                  |

#### Figure 4. Grid- and multicore-based SBFI flow



- Figure 5. Expected and experimentally obtained speed-up factor (logarithmic scale)
- **Global speed-up factor: 5047 (**S<sub>MLA</sub>×S<sub>ICP</sub>×S<sub>WCP</sub>×S<sub>MP</sub>)
- **Storage space reduction factor: 1387** + tracing speed-up of 1,36.
- No major discrepancies in SBFI results between implementation- and mixedlevel models (less than 2% difference for all fault models and failure modes).

**Conclusions:** Proposed optimizations greatly accelerated the execution of fault simulation experiments and reduced the required storage space without any loss in accuracy of results, supporting the stated hypotheses and expressions for speed-up factors. Latter could be used to estimate the efficiency of each optimization depending on the properties of particular HDL model, workload and computing resources. This enabled extensive SBFI campaigns for complex implementation-level models, required (among other applications) for the ongoing study of design space exploration for HW design.

Acknowledgment: This work has been partially funded by the Ministerio de Economía, Industria y Competitividad de España under grant agreement no TIN2016-81075-R, and the "Programa de Ayudas de Investigación y Desarrollo" (PAID) de la Universitat Politècnica de València.