whitepaper

EU AI Act: A Technical Compliance Guide

Decompose Articles 9–15 into engineering obligations your teams can actually test and verify.

11 min read·Free with email

What you’ll take away

Map each EU AI Act Article 9–15 obligation to a concrete engineering control, test type, or governance artifact
Apply a five-domain compliance framework covering risk management, data governance, documentation, human oversight, and robustness
Use the obligation-mapping worksheet to assign ownership and evidence requirements across your AI system inventory
Build a phased compliance roadmap that sequences quick wins, systemic changes, and continuous assurance activities
Understand where ISO/IEC 42001, NIST AI RMF, and DPDP intersect with Act obligations to avoid duplicating effort

Why Articles 9–15 Are the Compliance Core

The EU AI Act creates a tiered regulatory structure, but for engineering and QA teams the weight falls squarely on Articles 9 through 15. These articles govern what high-risk AI systems must actually do — not in abstract policy terms, but in operational, testable, and documentable ways. Providers placing high-risk AI systems on the EU market, and deployers who exercise meaningful control over those systems, both carry obligations under this section.

The challenge most enterprises face is not awareness. Leadership knows the Act exists. The challenge is translation: turning statutory language into engineering tasks, test plans, and governance artifacts that can be owned, executed, and evidenced. This guide performs that translation.

Each of the five obligation domains below maps directly to one or more articles, identifies the engineering control or test type the obligation implies, and specifies what evidence a conformity assessment or regulator would expect to see.

---

Domain 1 — Risk Management System (Article 9)

Article 9 requires a documented, iterative risk management system that runs across the full AI system lifecycle. It is not a one-time assessment; it is a continuous process.

Engineering obligations

Establish a risk identification procedure that covers both intended use and reasonably foreseeable misuse. This must be documented, versioned, and reviewed at defined intervals or on material change.
Define risk evaluation criteria before the system is deployed. Criteria should address the severity and reversibility of harm, the breadth of population affected, and the degree of human oversight available at point of use.
Implement and test residual risk controls. Where technical controls cannot reduce risk to an acceptable level, the remaining residual risk must be explicitly acknowledged and, where relevant, surfaced to deployers.

What to test and verify

Structured red-teaming exercises that probe the system against foreseeable misuse scenarios documented in the risk register
Boundary condition testing: inputs the system was not designed for but is likely to encounter
Regression testing after any model update to confirm that previously identified risks have not re-emerged

Evidence artifacts

A living risk register with dated entries, risk owner assignments, and linked mitigations
Test reports demonstrating coverage of identified risk scenarios
A change control log showing that risk review was triggered by material updates

Article 9 aligns closely with the risk treatment processes in ISO/IEC 42001 Clause 6.1 and with the NIST AI RMF's MAP and MEASURE functions. Organizations that have already implemented either standard can extend rather than rebuild.

---

Domain 2 — Training, Validation, and Test Data Governance (Article 10)

Article 10 imposes explicit data governance obligations on high-risk AI systems. The article specifies that training, validation, and test datasets must be subject to data governance and management practices, must be relevant, representative, and free from errors to the extent possible, and must take into account the characteristics of the geographic, contextual, or functional setting in which the system will be used.

Engineering obligations

Document the provenance of every dataset used in training, validation, and testing: origin, collection method, any preprocessing transformations, and any known limitations.
Conduct and document a bias and representativeness assessment before training commences and before any significant retraining. The assessment should be referenced to the intended deployment context, not to the dataset in isolation.
Where personal data is used, align data governance practices with applicable requirements under the GDPR and, for systems with Indian data subjects, the Digital Personal Data Protection Act (DPDP). Article 10 does not replace data protection law; it adds an AI-specific governance layer on top of it.
Implement synthetic data generation as a risk control where representative real-world data cannot be obtained or where using real data creates unacceptable privacy exposure. Synthetic data must itself be validated for fidelity and bias before use.

What to test and verify

Automated data quality checks: schema validation, null rate thresholds, distributional drift detection against a baseline
Stratified performance evaluation across demographic and contextual subgroups relevant to the deployment setting
Hold-out test set governance: confirm the test set was not contaminated by training data and is representative of the target distribution

Evidence artifacts

A dataset card or data sheet for each dataset, covering provenance, limitations, and bias assessment outcomes
Subgroup performance reports with documented acceptance thresholds
A data lineage diagram covering the full pipeline from source to model input

---

Domain 3 — Technical Documentation (Article 11 and Annex IV)

Article 11, read with Annex IV, specifies what technical documentation must exist before a high-risk AI system is placed on the market or put into service. Annex IV enumerates nine categories of required content, ranging from the general description of the system to post-market monitoring plans.

Engineering obligations

Maintain a Technical File that is updated on every material change. The file is not a static deliverable; it is a living configuration of artifacts that collectively demonstrate conformity.
Document the system's intended purpose with precision: the specific task performed, the categories of natural persons it operates on or makes decisions about, the input data types, and the outputs produced. Vague intended-purpose statements are a common finding in conformity reviews.
Include architecture diagrams, model cards for any foundation model components used, hyperparameter configurations, and training infrastructure descriptions.
Document the metrics used to evaluate performance, the thresholds applied, and the reasoning behind threshold choices.

Practical structure for the Technical File

General system description and intended purpose
Development methodology and data governance summary (linking to Domain 2 artifacts)
Accuracy, robustness, and cybersecurity specifications (linking to Domain 5)
Risk management documentation (linking to Domain 1)
Changes made to the system over its lifecycle and the conformity review triggered by each change
Post-market monitoring plan and feedback loop description
Declaration of conformity reference

Evidence artifacts

A version-controlled Technical File with a clear change history
Model cards for all model components, including third-party or open-source models integrated into the system
A traceability matrix linking Annex IV requirements to specific file sections and artifact owners

ISO/IEC 42001 Annex A controls A.6 and A.7 address AI system documentation and impact assessment. Organizations pursuing ISO/IEC 42001 certification can structure their technical documentation to satisfy both frameworks simultaneously with minimal duplication.

---

Domain 4 — Human Oversight (Article 14)

Article 14 requires that high-risk AI systems be designed and developed in a way that allows effective human oversight during the period of use. The article is explicit that oversight must be possible, not merely theoretical.

Engineering obligations

Design override mechanisms: every high-risk system must have a technically implemented means for a natural person to intervene, pause, or override the system's output or decision. The mechanism must be accessible to deployers and, where relevant, to end users.
Implement output flagging: the system must be capable of identifying and flagging outputs that fall outside expected confidence ranges or that correspond to high-stakes edge cases, directing those outputs for human review rather than automated action.
Ensure interpretability at the point of oversight: the person exercising oversight must be able to understand the system's output at the level of detail needed to make a meaningful intervention. This is an interface and documentation requirement, not just a model-level requirement.
Document oversight roles and responsibilities in the system's operational documentation. The deployer must understand what oversight actions are available and when they should be triggered.

What to test and verify

Functional testing of all override and pause mechanisms under representative operational conditions, including high-load scenarios
Testing that flagging logic correctly identifies out-of-distribution inputs and high-uncertainty outputs
Usability assessment of oversight interfaces: can a person with the defined role actually exercise oversight effectively within the time constraints of the operational context?

Evidence artifacts

Oversight mechanism test reports with pass/fail criteria and results
Interface design documentation and, where relevant, usability test records
Operational guidance for deployers describing oversight responsibilities and escalation procedures

---

Domain 5 — Accuracy, Robustness, and Cybersecurity (Article 15)

Article 15 requires that high-risk AI systems achieve an appropriate level of accuracy, be resilient against errors and inconsistencies, and be protected against adversarial manipulation that could alter their behavior.

Engineering obligations

Define accuracy metrics appropriate to the task and deployment context before training. Post-hoc metric selection is not compliant with the spirit of Article 15 and is indefensible in a conformity review.
Conduct robustness testing across the full range of inputs the system is expected to encounter, including inputs that are degraded, noisy, incomplete, or structurally unusual.
For systems that process inputs in real time or interact with external data sources, implement and test protection against adversarial input attacks: prompt injection for LLM-based systems, data poisoning detection for continuously trained systems, and model extraction defenses where the system constitutes a proprietary asset.
Establish fallback behavior: when the system encounters inputs that exceed its operational design domain, it should fail in a defined and safe way rather than producing a high-confidence incorrect output.

What to test and verify

Accuracy benchmarking on held-out test sets, with results stratified by subgroup and input type
Adversarial robustness testing: at minimum, perturbation-based testing for perception and classification systems; prompt injection and jailbreak testing for generative AI components
Failsafe behavior testing: confirm that out-of-domain inputs trigger the defined fallback rather than silent degradation
Regression testing cadence tied to the deployment monitoring plan

Evidence artifacts

Accuracy and robustness test reports with documented thresholds and outcomes
Adversarial testing methodology description and results
Incident and anomaly log from post-market monitoring, demonstrating active tracking of real-world performance

---

Obligation-Mapping Worksheet

The following worksheet structure assigns ownership and evidence requirements across your AI system inventory. Replicate one row per system per obligation.

Column 1 — Article and obligation (e.g., Art. 9 / Risk evaluation criteria)
Column 2 — Engineering control or test type (e.g., Structured red-teaming)
Column 3 — Evidence artifact required (e.g., Red-team report with scenario coverage matrix)
Column 4 — Responsible team (e.g., AI Safety / QE)
Column 5 — Current status (Not started / In progress / Evidenced / Reviewed)
Column 6 — Target completion date and reviewer sign-off

For each high-risk AI system in scope, complete the worksheet before beginning implementation work. The worksheet becomes the source of record for gap identification and serves as a pre-audit checklist.

---

Compliance Roadmap: Three Phases

Phase 1 — Inventory and Gap Assessment (Weeks 1–6)

Complete an inventory of all AI systems potentially in scope as high-risk under Annex III
For each in-scope system, complete the obligation-mapping worksheet to identify gaps
Assign executive ownership of compliance and establish a cross-functional working group covering legal, engineering, QA, and data governance
Prioritize gaps by exposure: systems already deployed to EU users are higher urgency than those in development

Phase 2 — Systemic Implementation (Months 2–6)

Establish or extend the risk management system to cover all in-scope systems, including the tooling, cadence, and ownership model
Build or retrofit technical documentation structures; adopt a Technical File template aligned to Annex IV
Implement data governance controls including dataset cards, bias assessment procedures, and data lineage tooling
Engineer and test human oversight mechanisms for any system lacking them
Integrate adversarial testing and robustness benchmarking into existing QA pipelines

Phase 3 — Continuous Assurance (Ongoing)

Implement post-market monitoring with defined trigger thresholds for re-evaluation
Establish a change management process that automatically triggers compliance review for material model updates
Conduct periodic internal audits using the obligation-mapping worksheet as the audit instrument
Where applicable, align the continuous assurance cadence with ISO/IEC 42001 internal audit cycles to reduce organizational overhead

---

A Note on Why Assurance Cannot Be a One-Time Exercise

The EU AI Act's requirement for lifecycle risk management reflects something the engineering community has long understood about complex systems: properties like accuracy, fairness, and safety are not inherent to a model at a point in time. They are emergent properties of a model, its data environment, its deployment context, and its operational use — all of which change. A system that was compliant at deployment can drift out of compliance as data distributions shift, as use patterns evolve, or as adversarial techniques advance.

Technical compliance for high-risk AI is therefore not a project with an end date. It is a continuous assurance capability: the organizational ability to detect when something has changed, assess whether that change affects conformity, and respond before harm occurs. Building that capability — the processes, the tooling, the team competency, and the governance structure — is the real work that Articles 9 through 15 are asking enterprises to do.

Free · read in full with your details

Read “EU AI Act: A Technical Compliance Guide”

Enter your details to unlock the full resource.