Designing for the AI Exception

Most teams build for the 99%. The workflow works, the model returns something plausible, and the output sails through unchecked. Until it doesn’t. Until it hits a clause you’ve never seen, a redline that slips through, or a regulator-ready document with a fabricated citation. That 1% matters more than the rest combined.

You can’t afford to treat it like an edge case, you have to build for it.

I'm going to cover a pragmatic approach to exception handling in AI workflows. It’s designed for legal, but works anywhere output quality really matters. We’re talking about context-aware routing, deliberate downgrade to deterministic checks, smart second-model verification, and human sign-off with a real SLA. Add a small incident playbook and you’ve got something teams can actually run with.

The lane, and why it matters

An exception lane is a separate path that stops risky or uncertain outputs before they reach clients or decision-makers. Once something enters it, we stop treating it like just another AI response. No creativity or clever completions, instead we have evidence, rules, and human eyes.

The system should push items into the lane when:

Model confidence is low or citation coverage is weak
Hard facts fail validation (totals don’t reconcile, dates conflict, references are missing)
The task is high-stakes (payments, redlines, public filings)
The input looks off (new clause types, scans, the wrong language, layout issues)

That’s the mechanics and the real impact depends on who the client is and what the matter involves. You need context.

Context first, always

You can’t assess risk if you don’t know who the work is for. Any serious exception handling starts with context.

Pull it in early:

Client tier, public profile, policy flags, zero-retention rules, vendor restrictions
Matter value, deadlines, governing law, regulators, privilege, number of jurisdictions
Input quality signals: clause types, language detection, scan artefacts
Coverage signals: schema validation, citation rate, previous exception history

Bundle this into an impact score that tells the system how strict to be.

A simple severity rubric:

S3 (Critical)
Regulator-facing, client-deliverable, or tied to authority or payment.
Strict schema, recomputed facts, full verification, and human sign-off within 2 hours.

S2 (Important)
Feeds client work or internal decisions that matter.
Schema plus recompute, with verification on high-risk fields. Sign-off within 1 day.

S1 (Low risk)
Drafts or exploratory work.
Schema only. If validators fail, it gets held. Review is optional.

Downgrade, deliberately

Once something is flagged, your job isn’t to polish it. Your job is to prove it holds up.

Basics that should already be in place:

Enforce a proper schema. If key fields are missing or broken, block it.
Recalculate everything you can: totals, dates, timelines, internal references.
Where policy exists, code it. Don’t allow ambiguity where a rule is clear.
Tie every claim back to its source.

If the model helped, good however from here on, it earns nothing on trust.

How would this work?

Say the model flags a clause as triggering a change of control risk. That’s plausible, but this is a £20m divestment matter, the client has zero-retention rules, and there’s no actual trigger language in the clause.

Here’s how the system handles it:

Validator re-parses the clause and finds no termination right
Second model checks the classification and flags it as unsupported
Reviewer checks against the deal policy, confirms, and overrides the flag

It’s signed off in under 90 minutes with a full audit trail. If that had gone out as-is, someone on the client deal team would have caught it and then next time, they’d trust the process a little less.

Use your second model properly

Another model only helps if it’s doing something different. It’s not there to nod along and be another Yes Man.

Give it a role:

Model A proposes, Model B checks each field.
Use a different model family or a lean verifier tuned for edge cases.
Keep the prompt sharp, don't just ask for a vibe check, ask if the risk classification is correct, based on this clause and policy. Yes or no. Cite the span.
Scale the coverage with impact. Spot checks are fine for S1. S3 should get full field-by-field confirmation.

If the two models disagree on a critical point, the item stays in the lane.

Human sign-off with a clock attached

If you're going to say someone will check it, make sure someone actually does. Build in the SLA.

Set expectations clearly:

S3: Eyes on in 2 hours. Matter lead or delegate signs off.
S2: Within one working day. Workstream owner.
S1: During the sprint. Analyst or QA, or it can be skipped.

Give them a clean view:

Client and matter summary
Input and proposed output
Validation results
Verifier outcomes with spans
Blocked controls or flagged issues

Let them approve, edit, or reject in one click. Capture a short note if they make changes.

Track three things:

Time to first view
Time to decision
Rework rate

Keep those numbers visible. People pay attention when the clock is ticking.

Codify the rules. Don’t rely on instinct.

Routing logic should live in config or code not in people’s heads.

Useful rules to start with:

Strategic clients over £5m → S3
FCA or PRA matter → full verifier, 4-hour incident alert window
Zero-retention → no logging, UK-only model
Privileged → no retrieval, human must review

Don’t overthink it. Start with a config file and build a service if the surface area grows.

What a full run looks like

Document is uploaded
System resolves client and matter context, scores impact
Rule triggers the lane, blocks auto-release
Validators run: schema, recompute, policy enforcement
Verifier runs: claim-by-claim against source and policy
Human reviewer gets compact view and makes a decision
Outcome is logged. If rejected, the system routes it to manual
Pattern is recorded for future rule updates

No mess. No debate. No gaps.

When something still slips

Things will go wrong. Be ready.

Trigger:
A broken output hits production or breaches SLA.

Contain:

Pause similar items
Add a visible notice for users

Triage:

Assign someone
Pull inputs, prompt, citations, validator/verifier logs
Confirm all policies were followed

Diagnose:

Reproduce the issue
Identify the weak point (routing, validation, verification, review)

Fix:

Patch the logic
Correct affected outputs
Run a check to confirm

Communicate:

Share what broke, what changed, and who was affected
Use a plain, honest template for clients if needed

Learn:

Add a test
Refine the routing or prompt
Record the pattern for next time

You want S3 incidents closed in a day. Then check again 48 hours later to make sure nothing slipped through.

Obvious traps to avoid

Assuming confidence scores are meaningful. They aren’t.
Mixing propose and verify into one prompt. Split the jobs.
Promising human review without scheduling it.
Treating every matter the same. Context shifts everything.
Ignoring the offline path. Your validators should still run when the model doesn’t.

If you're rolling this out internally

Frame this as basic operational hygiene, not some dramatic AI safety project.

Start small:

Choose one workflow that’s already reviewed manually
Add two context fields: client and matter
Wire a rule that bumps high-value or high-risk matters into S3
Add three validators and a simple verifier
Define your S3 and S2 SLAs. Start measuring eyes-on time

Run it for a month, see what it catches and then share the patterns. The value speaks for itself.

You’ll know it’s working when the first real exception flows through the system, gets flagged, reviewed, and resolved before anyone even thinks to escalate.

That’s how you build trust, it's how you protect the business and how you handle the 1% that really matters.