Agentic Legal AI Has A Proportionality Problem

Agentic Legal AI Has A Proportionality Problem

For the past couple of years, most legal AI systems have operated in a fairly constrained way. A lawyer uploads documents, a prompt runs, a summary appears, perhaps clauses are extracted and perhaps a draft response gets generated.

Even when firms described these as "AI workflows", the orchestration was still fundamentally human. The lawyer decided what to analyse, what order to do things in, when enough work had been done, whether additional effort was commercially justified and whether the final output was proportionate to the task itself.

The AI performed tasks while the human coordinated work and now all that is beginning to change.

The industry is now moving toward agentic systems capable of:

  • selecting tools
  • invoking workflows autonomously
  • dynamically retrieving context
  • validating outputs against policy
  • escalating reasoning to larger models
  • coordinating multi-step tasks with increasingly limited human orchestration

Most discussion around this still focuses on hallucinations and prompt quality. I think the more important issue is scale, not in the traditional infrastructure sense, but operational scale. Once systems begin coordinating work autonomously, small inefficiencies stop being localised and start becoming systemic.


One slightly over-cautious lawyer is manageable, one slightly inefficient process is manageable and one associate spending too long perfecting a piece of work is manageable. However, autonomous systems change the scale characteristics of those behaviours completely.

A model can retrieve hundreds of documents, recursively expand context windows, generate multiple alternative drafts, rerun validation loops, invoke stronger models and repeatedly escalate uncertainty without ever naturally recognising that the workflow itself is becoming commercially disproportionate.

That matters because legal culture already rewards:

  • defensibility
  • caution
  • escalation
  • additional review
  • qualification
  • exhaustive analysis

Without explicit controls, agentic systems naturally drift toward additional retrieval, additional validation and additional reasoning depth, particularly inside legal environments where caution and defensibility are already culturally reinforced.

I have already seen versions of this behaviour while experimenting with coding agents and orchestration chains. Once models become responsible for coordinating work, they can become trapped in reasoning loops they effectively have no chance of escaping. They continue reformulating plans, retrying failed approaches, retrieving additional context and exhausting tokens long after a human engineer would have stopped and reassessed the situation.

Nothing is technically broken when this happens. The system is simply following the incentives it was given: continue reasoning, continue validating and continue attempting completion.

The problem is that legal workflows already contain large amounts of ambiguity, uncertainty and defensive process design. Agency amplifies all of it.


This may become one of the defining operational challenges of legal AI over the next few years.

Human lawyers instinctively apply proportionality.

Nobody sensible spends six hours manually reviewing a £40 NDA because experienced lawyers understand commercial reality, client expectations, matter value, urgency and when "good enough" is actually correct operationally.

Current agentic systems generally aren't as sensible.

Left unconstrained, they optimise toward:

  • completeness
  • confidence
  • additional reasoning
  • additional evidence gathering
  • minimising theoretical risk

That sounds responsible on paper, but operationally it creates a very different problem. Workflows become bloated, escalation becomes excessive, retrieval expands unnecessarily and infrastructure cost just increases in the background.

In some ways the behaviour resembles an enthusiastic junior lawyer spending three days trying to produce the perfect answer to something that realistically required an hour of commercially sensible judgement. Not because they are careless, but because they are trying extremely hard to produce excellent work product. Traditional legal culture often rewards that behaviour, whereas an autonomous system can apply it continuously and at scale.

This is where token consumption becomes particularly interesting. In many agentic systems, rising token usage is not purely a technical characteristic and often becomes a proxy for organisational indecision.

Every additional reasoning loop frequently reflects something unresolved elsewhere in the organisation, whether that is unclear policy, inconsistent risk appetite, lack of trust in outputs, poor workflow design or simply an inability to define what an acceptable threshold actually looks like in practice.

The system keeps spending because the organisation itself has not fully decided what "sufficient" means, which makes this far less of a pure model problem and much more of a governance problem.


Governance is shifting upward into the orchestration layer

Most firms still govern AI as though the model itself is the primary risk surface. The conversations are usually centred around questions such as:

  • Is Claude approved?
  • Is OpenAI approved?
  • Can this data leave the jurisdiction?
  • Is the provider secure?
  • Is this prompt compliant?

Those questions still matter, but agentic systems fundamentally change where the real operational risk sits.

Once systems begin coordinating work autonomously, the more important questions become:

  • Why did the workflow take this path?
  • Why did it escalate?
  • Why did it continue spending?
  • Why was additional retrieval triggered?
  • Why was this model selected?
  • What evidence was relied upon?
  • Why was the output released?
  • Why did the stop condition fail?

The orchestration layer increasingly becomes the real control surface because it determines how reasoning is routed, when escalation occurs, what evidence is retrieved, what permissions apply and when autonomous workflows are allowed to stop or release outputs.

Two firms using the exact same frontier model may behave completely differently because their orchestration reflects different institutional judgement, operational priorities and risk appetite. Over time, that may become a more important differentiator than the underlying model itself.


I suspect this becomes inevitable as agentic systems mature.

Legal AI systems cannot simply optimise for maximum reasoning depth. They need to optimise for proportionate reasoning by understanding the value of the matter, the cost of additional reasoning, the consequence of failure and when further escalation is no longer commercially justified.

A practical proportionality framework probably includes several things.

Economic boundaries

Every workflow should have:

  • expected cost ranges
  • escalation thresholds
  • hard stop limits
  • cost-to-matter proportionality rules

A £50 fixed-fee task should not trigger £14 worth of autonomous reasoning chains because a workflow became trapped in recursive validation loops.

Matter-sensitive reasoning depth

Not every workflow deserves the same level of reasoning intensity. Internal administrative triage, low-risk commercial review, regulatory investigations and privileged litigation analysis should all operate under very different reasoning, retrieval and escalation policies.

Many current systems effectively apply the same cognitive posture everywhere, which becomes operationally dangerous once workflows begin scaling autonomously.

Retrieval proportionality

One of the largest hidden cost multipliers in agentic systems is uncontrolled retrieval. Agents often behave as though more context automatically improves outcomes, when in practice more context frequently introduces more noise, more contradictions, larger validation burdens and significantly higher token consumption.

Escalation discipline

Agents should justify escalation economically and operationally, not simply because confidence dropped marginally between reasoning passes. The more important question is whether additional reasoning materially changes the likely business outcome.

That is much closer to how experienced lawyers actually operate.

Stop conditions

This may become one of the most important controls in agentic legal systems. Agents need retry limits, reasoning depth limits, timeout conditions, budget ceilings, diminishing-return detection and clear human escalation triggers because models do not naturally recognise when a task is no longer commercially worth pursuing.

Humans do, well at least most of the time.


I still think the industry is underestimating how significant this transition is.

The move from "AI assistant" to "AI coordinator" changes the problem space entirely. The difficult questions are no longer just prompt engineering, hallucinations, summarisation quality or extraction accuracy. They become questions of economic governance, workflow control, proportionality, escalation logic, permissions, observability, auditability and operational accountability.

The firms that succeed with agentic legal AI probably will not simply be the firms with the best models. They will be the firms that understand how to contain autonomous operational scale before the economics and governance burden of autonomous legal workflows begin scaling faster than the organisation itself can realistically supervise.