Hallucinations Aren’t a Bug.

OpenAI’s latest paper spells out what many of us working in this space already knew: hallucinations are not an accident. They are a natural outcome of how language models are trained and how they are scored.
Now that point is simple but uncomfortable. Models learn to guess, as they get rewarded for producing confident answers and penalised when they hold back. So they bluff.
That habit is so deep in the pipeline that even when you improve retrieval, tweak prompts, or fine-tune on specialist data, you are still working on a system that has been shaped to prefer fluency over honesty.
Why this matters in law
In legal contexts the risk is massive. A model that fabricates a case citation or invents a clause is not just inaccurate, it is misleading in a way that looks convincing. Plausible falsehoods are far more dangerous than clear mistakes.
The problem is not just whether a model will be wrong. It is that it will be wrong in a way that hides the error. A made-up judgment with the right tone and formatting can sail through a first check. A fabricated contractual obligation looks perfectly at home in a summary. That is where the damage happens.
Why the findings matter
The authors make two points that matter for anyone building legal AI.
First, hallucinations are inevitable at the pretraining stage. Even with perfect data, the maths shows that gaps and singletons like facts that appear only once, create fertile ground for errors. In law, that maps directly to obscure case law (or maybe not obscure - just not well represented like Scottish Law), edge conditions in tax or just the kind of contractual clauses you only see once in a hundred deals. If the model has no pattern to learn, it guesses.
Second, post-training makes things worse. Benchmarks treat every answer as right or wrong. There is no credit for deferring, no space for saying “not sure.” Under those rules, bluffing is the optimal strategy. So models optimise for that. The outcome is a system that is rewarded for doing the one thing we least want in law: sounding confident when it is not.
What this means for legal tech builders
This is not a call for throwing everything away, it's a call for different design choices.
1. Change what counts as success.
If your evaluation still marks “I don’t know” as a fail, you are training the wrong behaviour. Build metrics that treat honest uncertainty as a positive outcome. In law, that is often the safest response.
2. Stop benchmarking against exams.
Leaderboards like MMLU or GPQA reward fluency under test conditions. Legal AI is not an exam. A model that performs well on multiple-choice trivia but hallucinates a clause under pressure is worse than useless. Build domain-specific tests that measure behaviour, not just answers.
3. Treat retrieval as scaffolding, not salvation.
RAG helps anchor answers but does not change the incentive structure. When retrieval fails, the model still guesses as without behavioural constraints, it will happily invent a judgment to fill the gap.
4. Build escalation into workflows.
A model that flags uncertainty and routes the task to a human is not weak. It is aligned with legal practice. The system should reward deferral, not punish it.
5. Measure risk exposure directly.
Track how often your model gives a wrong answer with high confidence. Track how often it admits uncertainty when it should because those numbers tell you whether the system is safe to trust in a client-facing workflow. Accuracy alone isn't really the point.
Where governance comes in
The governance frameworks lining up around AI give legal teams a blueprint for dealing with hallucination.
The EU AI Act is clear: legal use cases sit firmly in the high-risk category. That means transparency, auditability, and the ability to explain when and why a system produced an answer. A model that bluffs when uncertain runs straight into those obligations. You cannot claim your tool is transparent if its most common failure mode is confident fabrication.
NIST’s AI Risk Management Framework points in the same direction. Its trustworthiness principles call for validity, reliability, accountability, and transparency. A system that knows when to defer ticks those boxes. A system that hallucinates does not.
This is where legal tech needs to stop looking at hallucination as a technical quirk and start treating it as a governance issue. If your platform cannot demonstrate how it limits, tracks, and handles fabricated output, you are not only risking bad client work, you are risking regulatory breach.
Hallucinations are not going away. They are a structural feature of how language models are trained and judged. The OpenAI paper just formalises what many already suspected.
For legal AI, the only real question is whether you treat hallucination as a bug to patch, or as a form of debt to manage. Ignore it and it builds up quietly, until it appears in the middle of a client deliverable. Manage it, and you can design tools that are cautious, traceable, and usable in practice.
Governance frameworks are already spelling out the expectations. The models will not change on their own. It is the way we build around them that will decide whether legal AI is a liability or an asset.