Safety Mechanisms in ISO 26262
About 2249 wordsAbout 8 min
2026-03-25
What Is a Safety Mechanism?
A safety mechanism is a technical implementation — in hardware, software, or a combination — that detects faults, controls failures, or prevents hazardous events. Safety mechanisms are the concrete means by which ASIL requirements are met.
Fault ───[Safety Mechanism]───► Detection ───► Reaction ───► Safe State
(detect fault before (transition to
safety goal is violated) safe condition)ISO 26262 assigns a Diagnostic Coverage (DC) value to each safety mechanism:
- DC ≥ 99%: High coverage
- 90% ≤ DC < 99%: Medium coverage
- 60% ≤ DC < 90%: Low coverage
- DC < 60%: No coverage
Watchdog Timer
Purpose
The watchdog timer detects SW execution failures: deadlocks, infinite loops, task overruns, and loss of CPU execution.
Simple Watchdog vs. Window Watchdog
| Feature | Simple / Timeout Watchdog | Window Watchdog |
|---|---|---|
| Principle | Triggers reset if NOT triggered within timeout period | Triggers reset if triggered TOO EARLY or TOO LATE |
| Fault detected | SW stall / deadlock | SW stall AND runaway (triggers too fast) |
| ASIL suitability | ASIL B— | ASIL C/D (required for ASIL D in many implementations) |
| Implementation | MCU internal WDT or external WDT IC | External WDT IC (e.g., TI TPS3431) or MCU WWDT peripheral |
Window Watchdog Timing Diagram:
| Forbidden Zone | Open Window |
|________________|_____________|__________________► time
0 T_open T_timeout
If SW triggers WDT before T_open → Reset (SW executed too fast: runaway)
If SW triggers WDT between T_open → OK (correct execution rate)
and T_timeout
If SW triggers WDT after T_timeout → Reset (SW stalled: deadlock or overrun)Complex WDT for ASIL D: Question-Answer Watchdog
A simple trigger (toggling a pin) can be faked by stuck-at-high/low faults on the GPIO output. A question-answer watchdog prevents this:
Q/A Watchdog Protocol:
1. Watchdog IC sends a pseudo-random challenge value to MCU (via SPI)
2. MCU computes the expected response using a known algorithm:
response = f(challenge) = CRC8(challenge ^ SECRET_KEY)
3. MCU writes response to watchdog IC
4. Watchdog IC verifies the response
5. If response is wrong or missing within T_window → Reset
WHAT THIS DETECTS additionally:
- SW stuck in wrong code path (wrong algorithm executed → wrong response)
- Stuck-at fault on SPI bus (constant response value detected)
- CPU bit-flip causing computation errorWatchdog Integration Patterns in AUTOSAR
WdgIf / WdgM (Watchdog Interface / Manager) in AUTOSAR:
┌────────────────────────────────────────────────────────────┐
│ WdgManager (WdgM) │
│ Supervision of: │
│ ├── Alive Supervision: task called within time period │
│ ├── Deadline Supervision: task completes within budget │
│ └── Logical Supervision: program flow check points │
│ │ │
└───────────────────────────┼───────────────────────────────┘
│
WdgIf (abstraction)
│
┌─────────┴─────────┐
Internal WDT External WDT IC
(MCU peripheral) (SBC: TLE9279, SBC FS8x)End-to-End (E2E) Communication Protection
Why Communication Needs Protection
Automotive networks (CAN, Ethernet, LIN, FlexRay) are subject to:
- Electromagnetic interference → bit flips
- Bus arbitration glitches → message loss or repetition
- ECU software errors → wrong data sent on the bus
- Systematic faults in a gateway ECU → wrong routing
The AUTOSAR E2E Library provides a standardized mechanism to detect these faults at the receiving end.
E2E Fault Detection Capability
| Fault Type | E2E Detection Mechanism |
|---|---|
| Bit errors (1–n bits flipped in transit) | CRC covers full data payload |
| Wrong sender (masquerade attack or routing error) | Data ID embedded in CRC calculation |
| Out-of-order messages | Counter (sequence number monotonically incremented) |
| Repeated messages (replay) | Counter mismatch detected |
| Lost messages (dropped) | Counter skips detected |
| Delayed messages (late arrival) | Timeout monitoring in WdgM/E2E status |
| Incorrect message length | CRC calculation over fixed-length payload catches truncation |
AUTOSAR E2E Profiles
| Profile | CRC Type | Counter Width | Data ID | Typical Use |
|---|---|---|---|---|
| P01 | CRC-8 (SAE J1850) | 4-bit | 11-bit | CAN, LIN |
| P02 | CRC-8H2F | 4-bit | 15-bit | CAN, SPI |
| P04 | CRC-32P4 | 8-bit | 32-bit | Ethernet, high-integrity |
| P05 | CRC-8 | 4-bit | 8-bit | Simple CAN FD |
| P06 | CRC-16 | 4-bit | 16-bit | Medium-complexity |
| P22 | CRC-8H2F | 4-bit | 16-bit | Modern CAN FD (latest) |
Diagnostic Coverage of E2E profiles (typical claims in FMEDA)
E2E Profile P02 (CRC-8H2F, 4-bit counter, 15-bit Data ID):
Residual fault probability: < 3.9 × 10⁻³
(1 in 256 corrupt messages undetected by CRC alone)
With counter (sequence number):
Combined residual: < 1/256 × probability of counter miss = very low
Claimed Diagnostic Coverage in FMEDA: up to 99% for transmission errors
(actual DC depends on fault model — see AUTOSAR E2E specification)E2E Implementation Example
/* Sender side (producer ECU) */
void SendTorqueSetpoint(float torque_Nm)
{
TorqueMsg_t msg = {0};
msg.torque_q16 = (int16_t)(torque_Nm * 256.0f); /* Q8.8 fixed-point */
/* AUTOSAR E2E Profile 2 protection */
E2E_P02Protect(&e2e_config_torque, &e2e_state_tx,
(uint8_t *)&msg, sizeof(msg));
Can_Write(CAN_CH_EPS, MSG_ID_TORQUE_SETPOINT, &msg);
}
/* Receiver side (EPS ECU) */
void ReceiveTorqueSetpoint(const uint8_t *raw_data, uint16_t len)
{
TorqueMsg_t msg;
E2E_P02CheckStatusType status;
memcpy(&msg, raw_data, len);
E2E_P02Check(&e2e_config_torque, &e2e_state_rx,
(uint8_t *)&msg, sizeof(msg), &status);
if (status == E2E_P02STATUS_OK || status == E2E_P02STATUS_SYNC) {
torque_setpoint_Nm = (float)msg.torque_q16 / 256.0f;
} else {
/* E2E fault: wrong CRC, counter skip, etc. */
DiagMgr_SetFault(FAULT_E2E_TORQUE_SETPOINT);
SafetyManager_RequestSafeState(SS_02);
}
}Plausibility Monitoring
Plausibility monitoring detects when a sensor or system output is implausible — outside physical, parametric, or cross-channel limits — even when no obvious signal fault is present.
Types of Plausibility Monitors
1. Range Check (Single Signal)
Range check: sensor value must be within [MIN, MAX] physical range
if (torque_Nm < TORQUE_SENSOR_MIN || torque_Nm > TORQUE_SENSOR_MAX) {
/* Sensor is broken (OOV: out of valid range) or shorted */
DiagMgr_SetFault(FAULT_TORQUE_SENSOR_RANGE);
}
Diagnostic coverage claim: ~60%
(detects stuck-high, stuck-low, but not within-range drift)2. Rate-of-Change Check (First Derivative)
Physical systems are constrained by inertia and bandwidth.
Signal derivative exceeding physical limits indicates a sensor fault.
dTorque_dt = (torque_now - torque_prev) / TASK_PERIOD_s;
if (fabsf(dTorque_dt) > TORQUE_RATE_MAX_Nm_s) {
/* Signal jump exceeds physics — likely sensor glitch or fault */
DiagMgr_SetFault(FAULT_TORQUE_SENSOR_RATE);
}
Diagnostic coverage contribution: ~20–30%
(combined with range check: ~80% total)3. Cross-Channel Comparison
Two independent sensors (or channels) measuring the same physical quantity.
If they disagree beyond a threshold → at least one has failed.
error = fabsf(torque_channel_A - torque_channel_B);
if (error > TORQUE_CROSS_CHECK_DELTA_MAX) {
/* Disagreement — either channel A or B is faulty */
DiagMgr_SetFault(FAULT_TORQUE_CROSS_CHECK);
/* With 3 sensors (triplex), voter can identify and isolate faulty channel */
}
Diagnostic coverage claim: up to 99% (cross-check with diverse sensors)4. Model-Based Monitoring (Physical Model)
An internal model of the physical system predicts the expected output
given the commanded input. Deviation from prediction indicates a fault.
expected_torque = vehicle_model_predict(steering_angle, vehicle_speed);
model_error = fabsf(measured_torque - expected_torque);
if (model_error > TORQUE_MODEL_TOLERANCE) {
DiagMgr_SetFault(FAULT_TORQUE_MODEL_MISMATCH);
}
Used for ASIL C/D as an independent plausibility channel.Memory Protection
RAM ECC (Error Correcting Code)
ECC is a hardware mechanism that detects and corrects bit errors in memory caused by:
- Cosmic ray single-event upsets (SEU)
- Alpha particle emission from IC packaging
- Electromagnetic interference inducing bit flips
ECC for SRAM (typical Hamming / SECDED code):
SEC-DED = Single Error Correct, Double Error Detect
Protection mechanism:
Write path: ECC encoder adds check bits to each data word
Read path: ECC decoder checks and corrects single-bit errors
reports but cannot correct double-bit errors
MCU register configuration:
ECC_CTRL_REG = ECC_ENABLE | ECC_ERROR_INJECT_DISABLE;
Error reporting:
Single-bit error → NMI or correctable interrupt → log fault, continue
Double-bit error → Hard fault / NMI → immediate safe state
Startup test:
MCU startup diagonal test: inject known single-bit error,
verify ECC corrects it. Fail to correct → SW fails startup self-test.Flash Memory CRC / Hash Verification
Flash CRC verification at startup:
1. Pre-calculated CRC over all Flash code and calibration regions
(computed and stored at end of Flash image by linker/programmer tool)
2. At ECU startup, SW recalculates CRC over Flash content
3. If recalculated CRC ≠ stored CRC:
→ Flash image corrupted (programming defect, cosmic ray, aging)
→ ECU reports fault, does not proceed to normal operation
CRC algorithm: CRC-32 (IEEE 802.3) or CRC-16-CCITT
Diagnostic coverage: high (~99.9997%) for random bit errors in Flash
Background monitoring (periodic):
Flash CRC also recalculated in background task during runtime
(not just at startup) to detect Flash degradation during vehicle lifeMemory Protection Unit (MPU) for Safety Partitioning
MPU Region Configuration (ARM Cortex-M/R example):
Region 0: Background region — default, no-access (trap all unassigned accesses)
Region 1: Flash code (all tasks): Execute | Read | No-Write (code is read-only)
Region 2: Safety task RAM (ASIL C/D): Read-Write for kernel mode, Read-only for QM mode
Region 3: QM task RAM: Read-Write for QM mode only
Region 4: Shared data (IPC): Read-Write for both partitions (explicitly allowed)
Region 5: Peripheral registers (safety peripherals): Privileged access only
Region 6: Peripheral registers (non-safety): Unprivileged access allowed
MPU context switch:
AUTOSAR OS saves/restores MPU config on each task switch.
OS protection hook called on any MPU violation:
void Os_ProtectionHook(StatusType FatalError) {
DiagMgr_SetFault(FAULT_MPU_VIOLATION);
SafetyManager_RequestSafeState(SS_01);
}Stack Guard Region
Stack overflow protection using MPU guard regions:
Task stack layout (growing downward in memory):
┌──────────────────────────────┐ ← Stack top (initial SP)
│ Task Stack Area │
│ (allocated in linker script) │
│ │
│ .... (stack grows downward) │
│ │
├──────────────────────────────┤ ← Stack guard region start
│ GUARD REGION (32 bytes) │ ← MPU region: NO-ACCESS
├──────────────────────────────┤ ← Stack bottom (maximum depth)
│ Other Task Stack │
If task overflows into guard region:
→ MemFault exception raised immediately
→ ProtectionHook called before stack corruption propagates to next taskCPU Monitoring: Lock-Step and Control Flow Integrity
Lock-Step Processor
Described in detail in asil-03-asil-decomposition.md. In summary:
- Two CPU cores execute the same instruction stream with 1–2 cycle offset
- A comparator checks the output of each instruction
- Any mismatch (stuck-at fault, bit flip in CPU pipeline) → NMI → safe state
- Diagnostic coverage: >99% for permanent faults; >90% for transient faults
Control Flow Monitoring (CFM)
Control flow monitoring detects SW execution that deviates from the intended control flow — e.g., a fault that causes the program counter to jump to an invalid address.
Software CFM using program flow signatures (ASIL C/D):
Concept:
1. At each checkpoint in the code, XOR the current flow signature
with a known constant
2. At the end of a function, verify the running signature matches
the expected end value
3. Mismatch → control flow was deviated (missed a branch, jumped to wrong code)
Example:
void SafetyTask(void) {
volatile uint32_t sig = 0;
sig ^= 0xAAAAAAAA; /* Checkpoint A */
Check_TorqueDeviation();
sig ^= 0x12345678; /* Checkpoint B */
Check_SensorRange();
sig ^= 0xDEADBEEF; /* Checkpoint C */
Check_E2EStatus();
/* Expected end signature: 0xAAAAAAAA ^ 0x12345678 ^ 0xDEADBEEF */
if (sig != EXPECTED_SAFETY_TASK_SIG) {
SafetyManager_RequestSafeState(SS_01);
}
}CAN / LIN Bus Error Detection
CAN frames have built-in error detection mechanisms. Understanding their DC is required for FMEDA analysis:
CAN Built-in Error Detection
| Mechanism | Type | Coverage |
|---|---|---|
| CRC field (15-bit CRC) | Detects random bit errors | Hamming Distance HD=6; undetected: < 5 × 10⁻⁵ per message |
| Bit stuffing (max 5 consecutive same bits) | Detects synchronization errors | — |
| Form check (end-of-frame check) | Detects frame structure errors | — |
| ACK check | Detects bus write fault (no receiver acknowledges) | — |
| Error-active/passive/bus-off state machine | Limits effect of faulty transmitter | — |
CAN alone is NOT sufficient for ASIL C/D: CAN's HD=6 CRC covers only random bit errors, not:
- Data substitution (wrong data from same ECU)
- Delayed messages
- Sequence errors
- Wrong sender (no message authentication in CAN)
This is why AUTOSAR E2E protection is required in addition to CAN's built-in error detection for safety-relevant data.
Redundant Sensor Reading and Voter Logic
For ASIL D systems where sensor failure is a significant contributor to PMHF, redundant sensors with voting logic provide both increased diagnostic coverage and fault tolerance.
Voter Architectures
1+1 (Dual): No voting — disagreement detected, system enters safe state
Channel A ──┐
Compare ──► Error detected → Safe state
Channel B ──┘
Advantage: Simple, low cost
Disadvantage: Cannot identify which channel is wrong
2+1 (Triplex with voter): Voter selects 2-of-3 majority
Channel A ──┐
Channel B ──┼── Voter (select majority) ──► System output
Channel C ──┘
If Channel A = 5 Nm, B = 5 Nm, C = 0 Nm (C failed):
Voter result = 5 Nm (A and B agree)
Voter reports: C is the outlier → isolate C, continue operation
Advantage: Fault tolerant — continues operation after single fault
Used for ASIL D systems where availability is also required (e.g., braking)
2+2 (Quad): Two pairs cross-check each other
Pair 1 (A+A'): internal compare
Pair 2 (B+B'): internal compare
Pair 1 vs Pair 2: cross compare
Used in highest-integrity aerospace; rare in automotiveASIL D Example: Redundant Steering Angle Sensors
Physical configuration:
SAS-A: Steering angle sensor, SPI channel A → Main MCU (ASIL D channel)
SAS-B: Steering angle sensor, SPI channel B → Safety MCU (ASIL D channel)
Both sensors read the same physical angle but via independent signal paths.
Cross-check in Safety MCU:
error = |SAS_A_deg - SAS_B_deg|
if (error > STEERING_ANGLE_COMPARE_DEG) {
DiagMgr_SetFault(FAULT_SAS_CROSS_CHECK);
SafetyManager_RequestSafeState(SS_01);
}
FMEDA DC claim for SAS cross-check: 99%
(Covers stuck-high, stuck-low, drift, open circuit per sensor)Safety Mechanism Coverage Summary
The following table summarizes typical DC claims used in automotive FMEDA:
| Safety Mechanism | Fault Mode Covered | Typical DC |
|---|---|---|
| Watchdog (simple) | SW deadlock / stall | 90% |
| Window watchdog | SW deadlock + runaway | 97–99% |
| Q/A watchdog | SW deadlock + wrong execution path | >99% |
| E2E CRC (Profile P02) | Communication bit error, wrong data | 97–99% |
| E2E counter | Message loss, repetition, sequence | 95–99% |
| Signal range check | Sensor stuck-high, stuck-low | 60–80% |
| Signal rate-of-change | Sensor step fault | 20–30% |
| Cross-channel comparison | Single-channel failure (either channel) | 90–99% |
| ADC self-test | ADC internal error | 90–95% |
| RAM ECC (SECDED) | Single-bit RAM flip | 97–99% |
| Flash CRC startup test | Flash data corruption | >99% |
| MCU lock-step | Permanent CPU faults | >99% |
| MCU lock-step | Transient CPU faults | >90% |
| MPU access violation | SW memory corruption | 90–97% |
| HW overcurrent latch | Motor phase short | 97–99% |
| Stack overflow guard (MPU) | Stack overflow → memory corruption | 95% |
| CPU CFM signatures | Control flow deviation | 90–95% |