Human in the Loop Is No Longer Enough for Legal AI

Human in the Loop Is No Longer Enough for Legal AI

For years, “human in the loop” has been one of the most reassuring phrases in AI governance. It sounds sensible, controlled and reassuring, particularly in legal work where most people are rightly uncomfortable with the idea of AI systems acting without human judgement somewhere in the process.

In legal AI, it has become almost a default answer to any concern about risk. Can the system make mistakes? Yes, but there’s a human in the loop. Could the output be wrong? Yes, but there’s a human in the loop. What about accountability, privilege, regulatory duties, professional judgement, client reliance, hallucinations, bias or misuse? Again, human in the loop.

The phrase does a lot of work, probably too much, because it now carries almost every concern people have about AI without forcing us to be precise about what the human is actually doing.

The issue is not that humans shouldn’t be involved. Of course they should. The issue is that the phrase has become so broad that it no longer tells us whether the human is reviewing an output, monitoring a system, approving an action, owning the process, or simply sitting somewhere nearby in a governance diagram.

That matters because legal AI is moving from isolated drafting tools into more agentic systems. Systems that retrieve information, make choices, trigger workflows, draft documents, escalate issues, update records, route tasks and interact with other systems. In that world, saying “there is a human in the loop” is not enough.

We need better language.


The comfort of the loop

The appeal of human in the loop is easy to understand. Most legal teams are not looking to remove human judgement from legal work. They are looking to use AI in a way that is safe, useful and proportionate, while still preserving the parts of legal work that genuinely require expertise, judgement, context and responsibility.

So the loop becomes a kind of governance shortcut. It suggests that AI is not acting alone, that a person remains involved, and that the machine assists while the human decides. That is true in some cases, particularly where a lawyer reviews a generated clause before it goes to a client, a knowledge lawyer checks an AI-produced summary before it enters a precedent bank, or a risk team approves an AI-generated regulatory response before submission.

The problem starts when the same phrase is used for much weaker forms of involvement. A person receiving a dashboard once a week is not in the same position as someone approving each output. A lawyer who can technically override a system, but does not have time to review the underlying reasoning, is not in the same position as someone making the decision themselves. A senior person who is “accountable” for a system they do not operate, monitor or understand is not meaningfully in the loop in any practical sense.

Those are different governance patterns. They carry different risks, require different controls and create different expectations for the people involved. Calling all of them human in the loop hides the thing we most need to understand.


Why this matters more with agents

The weakness in the phrase becomes more obvious as AI systems become more agentic.

A traditional AI drafting tool has a relatively simple interaction pattern. The user asks for something, the model produces something, the user reviews it, then the user decides what to do next. That is not risk-free, but the shape of control is at least visible.

Agentic systems are different because they can break a task into steps, select tools, call APIs, search repositories, update records, draft messages, generate documents, recommend actions and pass work between systems. The human may not see every step. They may only see the end result. In some cases, they may only see exceptions.

That changes the governance question. It is no longer enough to ask whether a human is involved somewhere in the process. We need to ask where the human sits, what they can see, what they can stop, what they are expected to check, and what happens if they miss something.

A legal AI agent that prepares a first draft of an email is one thing. An agent that interprets a client instruction, identifies the relevant legal process, pulls matter data, decides which jurisdictional workflow applies, drafts documents and proposes filings is something else entirely. Both might be described as having a human in the loop, but that phrase tells us very little about the actual risk position.


The Loop framework

The Loop framework, that I produced, is a way of being more precise about human involvement in AI systems. It separates the broad idea of “human in the loop” into distinct governance patterns, so we can describe whether the human is reviewing work, monitoring a system, or carrying responsibility for the overall process.

The three patterns are:

  1. Human-in-the-Loop
  2. Human-on-the-Loop
  3. Human-accountable-for-the-Loop

Those distinctions may sound subtle, but they are important. Each role can be valid, and in many legal AI systems more than one will be needed, but they should not be treated as interchangeable.


Human-in-the-Loop

Human-in-the-Loop is the most direct form of involvement. The AI produces something, but a human reviews and approves the output before it has effect, meaning the human is part of the active workflow rather than just supervising it from a distance.

In legal work, this is often the most familiar model. A lawyer reviews a draft, a partner approves advice, a compliance officer signs off an escalation, or a legal operations specialist checks a workflow recommendation before it is implemented. This model makes sense where the output has legal, commercial or regulatory significance and where the human can realistically assess the work.

That last point matters more than we often admit. Human-in-the-Loop only works where the human has enough context, capability and time to review properly. If the system produces twenty outputs and the reviewer can only skim them, the governance model may exist on paper but not in practice.

A rubber stamp is not a loop.

Used properly, Human-in-the-Loop remains highly valuable and will be essential for many uses of legal AI, particularly where advice, negotiation strategy, client communication or legal judgement is involved. It is not always the right model though. It can be slow, expensive and unnecessary for lower-risk tasks, and it can create a false sense of safety if the human review is too shallow to catch meaningful errors.


Human-on-the-Loop

Human-on-the-Loop is different. Here, the system can operate without approval at every step, but humans monitor its behaviour by looking for patterns, exceptions, drift, failures and signs that the system is operating outside expected boundaries.

This is closer to supervision than approval.

In a legal AI context, Human-on-the-Loop might apply to systems that classify documents, route tasks, monitor matter progress, flag risks, extract data or generate internal summaries. A person may not check every individual output before it is used, but they monitor samples, alerts, audit logs and performance metrics to make sure the system remains within its intended operating range.

That can be entirely appropriate. Not every AI output needs a lawyer to approve it. A system that tags incoming documents by type may not require individual legal review for each classification. A system that routes low-risk internal queries may not need a senior lawyer to approve every routing decision. A tool that summarises internal knowledge documents may be better governed through sampling, feedback and escalation than through pre-release sign-off on every output.

But Human-on-the-Loop only works if the monitoring is real. There need to be clear thresholds, audit trails, escalation routes, sampling methods and ownership. Someone must know what they are looking for, someone must have authority to intervene, and someone must notice when performance changes.

Without that, Human-on-the-Loop becomes a passive dashboard that nobody acts on.


Human-accountable-for-the-Loop

The third category is the one legal AI needs to talk about much more seriously.

Human-accountable-for-the-Loop (HAL) is not about reviewing each output or monitoring each process step. It is about named accountability for the system as a whole. A single person owns the design, deployment, controls, limits, escalation points and governance of the AI-enabled process, and while they may not operate the system day to day or review every output, they are accountable for whether the loop itself is properly designed.

That distinction matters. A committee can review, a governance board can challenge, a risk function can set standards and the product team can build and maintain the system, but accountability cannot sit with “the AI working group”, “the platform” or “the business”. For an agentic workflow, someone has to be answerable by name, and that person needs the authority to stop the system if it moves outside its permitted boundaries.

This matters because many AI risks do not arise from a single bad output. They arise from poor system design. The wrong task gets automated, the wrong data is connected, the wrong people are given access, the model is used outside its intended context, the escalation threshold is too high, the workflow hides uncertainty, the audit trail is incomplete, or users trust the system more than they should because the interface makes the answer feel more authoritative than it is.

Those are loop design failures.

HAL is the practical expression of this third pattern. It asks whether the workflow has a named owner, explicit delegated authority, hard limits, clean escalation routes, reconstructable evidence, live monitoring, scheduled review and a clear understanding of liability. Those domains matter because once AI systems start taking action at scale, accountability cannot be left as a vague governance principle. It has to be designed into the workflow.

For higher-risk legal AI systems, the accountable owner should be able to answer fairly basic questions:

  • Who, by name, is accountable if this system causes harm?
  • Does that person have the authority to pause or revoke the system today?
  • What has the system explicitly been permitted to do?
  • What must the system never do?
  • When does uncertainty route to a human?
  • Can each decision or action be reconstructed later?
  • How is the system monitored in production?
  • When is the system formally reviewed?
  • Where does legal and financial liability sit if something goes wrong?

Those questions are not theoretical. They are the difference between a controlled legal AI system and a tool that simply feels controlled because a human appears somewhere near it.


Why precision matters

The LOOP framework is not meant to create more terminology for its own sake. It is meant to stop organisations using one comforting phrase to describe very different realities.

A system where a lawyer approves every output is not the same as a system where a lawyer reviews a monthly report. A system that escalates uncertain cases is not the same as a system that lets users decide whether something “looks right”. A system with a named accountable owner is not the same as a system where everyone assumes someone else is responsible.

These differences matter for procurement, risk assessment, client assurance, regulatory engagement, professional duties and internal adoption. They also matter commercially, because if every legal AI use case is forced into Human-in-the-Loop review then many tools will become too slow or too expensive to be useful, while if every system is treated as safe because there is some vague human oversight, organisations will take risks they do not properly understand.

The answer is not maximum friction. It is proportional governance.

Low-risk tasks may need monitoring rather than approval. Higher-risk tasks may need direct review. System-level decisions need accountable ownership. Some tasks should not be automated at all. That is a more useful conversation than simply asking whether there is a human in the loop.


For legal teams adopting AI, the practical point is simple: stop asking only whether a human is involved and ask what kind of involvement is required.

For each AI-enabled process, legal teams should be able to describe the loop clearly. Is the human approving the output before use? Are they monitoring the system after the fact? Are they accountable for the design of the process? Are all three present? Are none of them really present?

This can be built into AI intake, risk assessments, matter workflows, procurement reviews and client-facing explanations. A contract review assistant might use Human-in-the-Loop for suggested amendments, Human-on-the-Loop for monitoring accuracy across document types, and Human-accountable-for-the-Loop for overall process ownership by the relevant legal team and technology owner.

A legal triage model might not need a lawyer to approve every low-risk routing decision, but it may need monitoring for misclassification, bias, unexplained drift and escalation failures. It also needs someone accountable for deciding which matters can be routed automatically and which must be reviewed.

A client-facing legal AI tool may need stricter controls again, particularly where users might mistake generated content for legal advice. The framework does not answer every governance question, but it gives teams a clearer starting point and a more honest way to discuss the controls they actually have.


The risk of accountability theatre

There is a particular danger in legal AI that we should be honest about: organisations may use human involvement as a form of accountability theatre.

A policy says a lawyer remains responsible. A workflow includes an approval button. A dashboard exists. A governance committee meets quarterly. On paper, all very nice, the system is controlled.

In reality, the lawyer may not understand how the AI reached its answer, the approval may be too late in the process to matter, the dashboard may show metrics nobody has defined, and the committee may have no real visibility into operational use.

That is not good governance, it's just a pile of liability wrapped in process language.

The Loop framework is a way of challenging that because it forces the next question: what is the human actually doing? Reviewing is not the same as supervising, supervising is not the same as owning, owning is not the same as understanding, and understanding is not the same as being able to intervene.

Legal AI governance needs that level of honesty.


I do not think “human in the loop” will disappear. It is too widely used, and at a high level it still captures something important. Humans should remain central to legal AI systems, but the phrase needs to become the beginning of the conversation rather than the end of it.

The next stage of legal AI adoption will not be won by firms and legal teams that simply say they have humans involved. Everyone will say that. The more interesting question is whether they can explain how human judgement, monitoring and accountability actually work inside the system.

The Loop framework is my attempt to give that design conversation better language, because when AI starts acting across legal workflows, the real question is not whether a human is somewhere in the loop.

It is whether the loop is worth trusting.