AI EvaluationJune 23, 2026·7 min read

HARA Finds the Cliff Edge. It Cannot See the Fog: SOTIF Test Coverage for Machine Learning ADAS

ISO 26262 Part 6 and SOTIF Clause 8 together still leave a measurable coverage gap for ML-based ADAS perception — and UN R155 homologation auditors are now asking for the evidence that closes it.

📥 Featured researchEU AI Act Readiness Index 2026

Get the report →

Key takeaways

ISO 26262 Part 6 software testing was designed for deterministic embedded software; it has no native mechanism to evaluate ML model behaviour under distributional shift or adversarial perception conditions.
ISO 21448 Clause 8 validation measures address known and reasonably foreseeable misuse scenarios but do not mandate adversarial scenario simulation or runtime monitoring of feature distribution.
UN R155 Cyber Security Management System requirements, enforced for all new type approvals in the EU from July 2024, create an indirect audit trail that connects to perception-layer safety evidence.
The coverage gap between the two standards can only be closed by adding a third layer: adversarial scenario simulation over long-tail ODD conditions plus runtime distribution-shift monitoring deployed on the target ECU.
A structured HARA-to-fault-injection workflow — starting with SOTIF hazard decomposition and ending with monitored scenario replay — gives homologation auditors the traceable safety argument they are beginning to require.

The Problem a HARA Cannot Frame

A Hazard Analysis and Risk Assessment identifies what the system does wrong. It asks: given a system-level malfunction, what harm can result, and how severe, controllable, and probable is that harm? For a deterministic embedded function — say, an ABS pressure modulator — that framing is complete. The failure modes are finite, the state space is bounded, and fault-tree analysis can trace every branch. Machine learning perception components in L2+ ADAS do not behave that way. The failure is not a malfunction in the classical sense. The model produces a confident output that is simply wrong for an input the training distribution did not represent well. HARA cannot enumerate a failure mode it has no mechanism to describe. That is the structural problem SOTIF test coverage for machine learning ADAS must answer.

What ISO 26262 Part 6 Actually Covers — and What It Does Not

ISO 26262 Part 6 addresses software-level development processes for safety-related systems. Its testing requirements — unit, integration, back-to-back — are structured around coverage metrics: statement coverage, branch coverage, MC/DC for the highest ASIL levels. These are necessary disciplines for the deterministic software layers that surround an ML model: the preprocessing pipeline, the sensor fusion arbitration logic, the actuation command path. Applied to the ML model itself, they are largely inapplicable. You cannot achieve meaningful MC/DC coverage on a neural network with tens of millions of parameters. The standard acknowledges this implicitly; it was authored before neural-network-based perception became production-relevant. A safety validation engineer using Part 6 alone has a rigorous framework for the wrong artifact.

What ISO 21448 Clause 8 Adds — and Where It Stops

ISO 21448, the SOTIF standard, was written precisely because ISO 26262 could not address performance insufficiency — cases where the system functions as designed but causes harm because of limitations in specification or triggering conditions. Clause 8 lays out validation measures: systematic scenario generation, coverage of the Operational Design Domain, evaluation of known unsafe scenarios and known safe scenarios, with a target of minimising residual unknown unsafe scenarios. This is conceptually the right frame for ML perception. In practice, Clause 8 validation relies heavily on scenario catalogues derived from stakeholder analysis and prior accident data. It does not mandate adversarial perturbation of sensor inputs. It does not require runtime monitoring of whether the feature distribution seen by the deployed model has shifted relative to the distribution it was validated against. Those are gaps, not deficiencies in intent, but gaps in what the current clause text compels a supplier to produce.

Standard-to-Test-Type Mapping: Where Each Clause Lands

The two standards map to different test types at different lifecycle phases. ISO 26262 Part 6 clause 9 unit testing maps to deterministic software modules with structural coverage targets — applicable to the pre- and post-processing code around an ML model, not to the model weights themselves. Part 6 clause 10 integration testing maps to interface behaviour between components — useful for validating that the perception output is correctly consumed by the path planning layer, but not for validating what the perception output says. ISO 21448 Clause 8.3 scenario-based validation maps to ODD coverage testing against a scenario library — the right target artifact, but dependent on the completeness of the scenario library itself. Clause 8.4 residual risk evaluation maps to a probabilistic argument about unknown unsafe scenarios — necessary for the safety case but not equivalent to having found those scenarios through adversarial generation. The combined framework tells you that a scenario library must exist and that residual risk must be argued; it does not tell you how to find the scenarios the library is missing.

ML Model Failure Modes the Combined Framework Misses

📊 Related research

EU AI Act Readiness Index 2026

Most regulated enterprises remain structurally unprepared for EU AI Act obligations despite partial enforcement beginning February 2025, with 78% taking no meaningful compliance steps and 83% lacking even basic AI system inventories—the foundation for all subsequent requirements.

Get the report →

Three failure mode classes consistently fall outside what the ISO 26262 plus ISO 21448 combination compels a supplier to test. The first is adversarial perception: inputs that are within the sensor's nominal operating range but structured — by road conditions, lighting transitions, unusual object geometries, or deliberate markings — to produce high-confidence mis-classification. These are not out-of-ODD events; they occur within defined operating conditions. The second is silent distribution shift: the statistical properties of real-world inputs drift over time — seasonally, geographically, as road infrastructure changes — and model performance degrades without any fault code or diagnostic flag being raised. The system appears healthy by every in-vehicle monitor while its actual perception accuracy has moved materially below the validated baseline. The third is compounding rarity: individual triggering conditions that are each within the scenario library may combine in ways that produce unsafe behaviour at a rate the library's independence assumptions would not predict. Simulation at scale is the only practical way to exercise this class systematically.

A Worked Scenario: HARA Through Fault Injection for a Highway Lane-Change Feature

Consider a Tier-1 supplier homologating an L2+ highway lane-change assist feature. The HARA identifies an unintended lateral movement hazard, assigns ASIL C, and derives a safety goal: the system shall not command a lane change when the target lane is occupied. ISO 26262 Part 6 testing covers the decision logic and the actuation path with full MC/DC. ISO 21448 Clause 8 validation produces a scenario library covering adjacent-vehicle cut-ins, occluded lane markings, rain and low-sun glare. The safety case argues residual risk is acceptable. What the library does not cover: a vehicle in the target lane partially occluded by spray from a heavy goods vehicle, where the visual texture of the spray causes the object detection head to suppress the bounding box at a confidence threshold just below the filter cutoff. This is not a sensor fault. It is not an out-of-ODD condition. It is a distributional edge case that adversarial scenario simulation — systematically varying spray density, vehicle colour contrast, and relative velocity in a physically accurate simulation environment — would surface within a tractable test campaign. Adding a runtime monitor that flags when the entropy of the perception output distribution exceeds a calibrated threshold provides the in-service detection layer. Together, they constitute the safety argument an R155 auditor can audit: here is how we searched for the gap, here is what we found, here is how the deployed system detects when it is operating outside its validated envelope.

What UN R155 Homologation Auditors Are Beginning to Require

UN Regulation 155, with mandatory EU enforcement for all new type approvals from July 2024, requires OEMs and their supply chains to operate a Cyber Security Management System. Within that CSMS, threat analysis and risk assessment must be maintained through the vehicle lifecycle. The regulation's language around software update management and post-market monitoring creates an audit expectation that is functionally adjacent to what a runtime distribution-shift monitor provides for perception safety: evidence that the manufacturer knows when the operational environment has moved outside the validated envelope and has a response process. Homologation auditors are increasingly treating perception-layer safety evidence — adversarial test logs, scenario coverage arguments, runtime monitoring architectures — as part of the CSMS evidence package, not as a separate concern. Suppliers who arrive at type approval with only a Clause 8 scenario library and a Part 6 test report are finding those questions harder to answer.

Closing the Gap Before the Audit

The path forward is not to replace ISO 26262 Part 6 or ISO 21448 Clause 8 — both remain necessary foundations. The path is to add a structured third layer: adversarial scenario generation that systematically targets the long-tail ODD conditions a manual scenario library will not reach, combined with runtime distribution monitoring deployed on the production ECU with a defined escalation path. The safety argument connecting HARA through to that runtime monitor — tracing from the ASIL-C safety goal through the validation evidence to the in-service detection capability — is the artifact that makes a SOTIF test coverage position for machine learning ADAS defensible at homologation. Building that argument after the audit question arrives is significantly harder than building it into the validation programme from the start. That is the discipline that separates a safety case from a safety document.

“HARA tells you where the system can fail. It cannot tell you what the camera never recognised in the first place.”

Go deeper — gated research

EU AI Act Readiness Index 2026

Get the report →Talk to our team →

By Qapitol· AI assurance & governance

HARA Finds the Cliff Edge. It Cannot See the Fog: SOTIF Test Coverage for Machine Learning ADAS

The Problem a HARA Cannot Frame

What ISO 26262 Part 6 Actually Covers — and What It Does Not

What ISO 21448 Clause 8 Adds — and Where It Stops

Standard-to-Test-Type Mapping: Where Each Clause Lands

ML Model Failure Modes the Combined Framework Misses

A Worked Scenario: HARA Through Fault Injection for a Highway Lane-Change Feature

What UN R155 Homologation Auditors Are Beginning to Require

Closing the Gap Before the Audit

EU AI Act Readiness Index 2026

Related insights

Your RLHF Model Passed Staging. The Reward Signal Is Already Decaying.

Your Fraud Scoring SaaS Cleared QA. It Has Never Been Tested for Distributional Drift.

Your AI Outperforms Humans on Accuracy. SR 11-7 Still Won't Let You Deploy It.

Enjoyed this? There’s more every two weeks.