AI Assurance Glossary

The terms behind controlling enterprise AI.

Plain definitions of the words we use across this site — written so a QA lead, a CISO, and a board member can read the same sentence and agree on what it means. 22 terms, one shared vocabulary.

Run your AI Exposure Snapshot Talk to Qapitol

Filter the glossary

22 terms, A to Z.

Agentic Workflow: An AI system that does more than answer — it plans, calls tools, and takes multi-step actions to reach a goal, often with little human input between steps. Each tool call and decision is a point where the system can go wrong, which is why agentic workflows need assurance at the step level, not just the output.
AI Assurance: The practice of making an AI system trustworthy enough to run in production: evaluating how it behaves, controlling how it can fail, and producing the evidence that it was checked. Assurance is continuous, because the system and its inputs keep changing.
AI Control Layer: The set of mechanisms that sit across an AI system to make it visible, measurable, and accountable — discovery, evaluation, monitoring, controls, and evidence. It is the difference between hoping a model behaves and being able to show that it does.
AI Exposure: The AI systems and workflows running across an enterprise that are not fully understood, owned, or controlled — including the ones no one registered. Exposure is what you cannot yet sign off, and it is usually larger than people expect.
AI Sign-Off: A defensible decision that a specific AI system is approved to run — backed by evaluation results, controls, and a record an auditor or board would accept. Sign-off is the goal; "the model seems accurate" is not sign-off.
Atomic Compliance Unit: The smallest piece of evidence that proves one specific control held for one specific AI behaviour — for example, a single logged decision with its inputs, its evaluation result, and the policy it satisfied. Assembled together, atomic units become an audit-ready record instead of a narrative.
Audit Trail: A durable, time-ordered record of what an AI system did and why: the inputs, the decision, the evaluation, and any human action. Without it, you cannot reconstruct a past decision, and you cannot defend one.
Continuous Monitoring: Watching an AI system in production over time rather than testing it once before launch. Because models drift and inputs change, monitoring is how assurance stays true after the system is live.
Drift: The gradual change in an AI system’s behaviour or accuracy as the real world, the data, or the model itself shifts away from the conditions it was validated under. Drift is why a system that was signed off last quarter may not be defensible this quarter.
Evaluation (evals): Structured tests that measure how an AI system actually behaves — accuracy, safety, consistency, and failure modes — against criteria you define. Because AI does not give one fixed answer, evals replace the pass/fail test script with graded, repeatable measurement.
Guardrail: A control placed around an AI system to keep its inputs or outputs inside allowed boundaries — blocking unsafe requests, filtering disallowed content, or enforcing a policy. Guardrails constrain behaviour at runtime; evals measure whether they hold.
Hallucination: When a model produces output that is fluent and confident but false or unsupported. Hallucination is dangerous precisely because it looks correct, which is why it has to be caught by evaluation rather than by reading the answer.
Human Override: A designed point where a person can review, approve, or stop an AI system’s action before or after it takes effect. Several regimes expect a meaningful human in the loop for high-stakes decisions — override is how that expectation becomes real rather than nominal.
Model Risk: The risk that an AI or statistical model produces wrong, biased, or unexplainable outputs that lead to bad decisions or regulatory exposure. In regulated sectors, managing model risk — and evidencing that you do — is a supervisory expectation, not an optional discipline.
Non-Determinism: The property of most AI systems that the same input can produce different outputs. Non-determinism is why traditional QA — which assumes one correct answer — cannot assure AI, and why evaluation has to measure behaviour across many runs.
Observability: The ability to see inside a running AI system — its inputs, prompts, tool calls, retrievals, and outputs — well enough to explain any single result. Observability is the raw material; the audit trail is what you keep from it.
Prompt Injection: An attack where adversarial text — in a user message, a document, or a retrieved page — manipulates a model into ignoring its instructions or leaking data. It is a primary reason agentic and RAG systems need red-teaming before they are trusted with real actions.
RAG (Retrieval-Augmented Generation): A pattern where a model answers using documents retrieved at query time rather than only what it learned in training. RAG improves grounding, but it also moves the risk into the retrieval and data layers — what was retrieved becomes part of what you have to assure.
Red-Teaming: Deliberately attacking an AI system to find how it fails — jailbreaks, prompt injection, unsafe outputs, and edge cases — before an adversary or a customer does. Red-teaming turns "we think it’s safe" into a documented list of what breaks it and how.
Shadow AI: AI systems or tools adopted across an enterprise without governance, registration, or oversight. Shadow AI is a major source of exposure: you cannot control, monitor, or sign off on a system you do not know is running.
Traceability: The ability to follow an AI-assisted outcome back through every step that produced it — the data, the prompt, the model, the controls, and any human decision. Traceability is what lets you answer "why did the system do this?" after the fact.
Vaporware (assurance sense): A capability or compliance claim asserted without evidence to back it — a control that exists in a slide but not in the system. In assurance, vaporware is the failure mode you are guarding against: every external claim should trace to something you can actually show.

Definitions are the easy part. Evidence is the work.

Run a Snapshot to see which of these terms describe risks already live in your stack.

Run your AI Exposure Snapshot Talk to Qapitol