The Last Mile Liability Problem in Legal AI
You generate a summary. You read it, it looks right. Nothing jumps out, nothing feels off, it does the job. You copy a paragraph into a draft, adjust a sentence or two, move on.
A few minutes later, part of that same text is sitting in an email to a client.
Nothing about the text changed, but the risk did.
That change is so normal it barely registers. It happens dozens of times a day across most teams, and it happens without any real support from the systems people are using. The output moves from something informal to something relied upon, and the only thing governing that transition is judgement under time pressure.
When use changes the risk
The same piece of text can sit comfortably in one context and become problematic in another. Read privately, it’s harmless or used to structure thinking, it’s useful, then dropped into a draft, it starts to take shape... sent externally though, it becomes something someone else relies on.
Each step feels incremental, almost invisible, but the consequence increases with every move. What makes this difficult is that nothing in the text itself signals the change. The words don’t suddenly become wrong. They simply start to carry weight.
That weight is where liability sits.
Where most systems are blind
Most firms are still orienting themselves around the model. Is it accurate, is it reliable, can we trust it, how do we test it. All reasonable questions, but they focus on the point where the text is created, not where it starts to matter.
Evaluation tends to stop at correctness. Did it extract the right clauses, did it summarise accurately, did it miss anything obvious. What it rarely considers is whether that same output is safe once it leaves the system and enters real work.
A summary can be broadly correct and still be unsafe in a client context. It may lack caveats, gloss over edge cases, or reflect assumptions that were never made explicit. None of that necessarily shows up in a prompt test. It only becomes visible when someone tries to use the output for something it was never designed to support.
This isn’t a model problem.
You can improve prompts, refine instructions, run more test cases, even move to stronger models. None of that changes the moment where someone decides the output is good enough to act on. That decision sits outside the model, and in most systems, outside the system entirely.
Risk increases as outputs move closer to action
The useful way to think about this is not internal versus client, but proximity to action.
Reading an output introduces context but carries little consequence. Drafting introduces persistence, something starts to take form. Sending creates reliance, someone else now depends on what was written. Decision-making creates exposure, where the output shapes a real outcome.
Most legal AI systems flatten these stages into one. Generate text and leave the rest to the user. In practice, they are very different moments, each with a different risk profile, but the system treats them as interchangeable.
That gap is where problems emerge.
What needs to change
If the risk sits at the point of use, then control needs to sit there too. That does not mean introducing heavy process or slowing people down unnecessarily. It means recognising that outputs are not neutral and should not be treated as such.
A useful starting point is to treat outputs as internal by default. Something to work from rather than something to rely on. The moment someone wants to move beyond that, the system should recognise the shift and respond accordingly.
That response does not need to be complex, but it does need to be deliberate. At the point where an output is about to leave the system or support a decision, a different standard should apply. That might involve a simple confirmation, a prompt to review key assumptions, or a requirement for a second pair of eyes. The detail matters less than the presence of that moment.
Friction, in this context, is not something to eliminate entirely. It needs to be placed with intent. Internal use should remain fast and fluid. External use should feel slightly different, not obstructive, but considered.
There are a few practical signals that you’re getting this right:
- outputs are clearly positioned as draft or internal until explicitly moved beyond that
- moving to client-facing use requires a conscious step, not just a copy and paste
- the system captures where outputs are actually used, not just how they were generated
None of these are heavy changes. They are small shifts in how the system behaves at the point where it matters.
The predictable objections
At this point, I'll talk about some pushback I've heard about this...
There is a concern that this introduces friction, but that friction already exists in practice. Advice is reviewed, drafts are checked, decisions are scrutinised. The difference is whether those controls sit in someone’s head or are reflected in the system itself.
There is also the view that better models will solve this. Model quality matters, but it does not remove the gap between generation and use. Outputs remain context sensitive, and that sensitivity is exactly where risk emerges.
And then there is always the question of whether people will simply work around it. They will if controls are constant, intrusive, or disconnected from how work actually happens, but they tend not to when controls appear at natural decision points and reflect existing professional expectations.
The risk isn’t when something looks wrong. It’s when something looks fine and slips through into real use without anyone stopping to question it.
That gap, between generation and action, is where the real risk sits.
Firms that focus only on improving outputs will continue to miss it. Firms that recognise and design for that transition will end up with something far more defensible, not because the model is perfect, but because the system understands where it actually matters.