When Legal AI Starts Solving The Wrong Problems... Very Well

There’s been some interesting changes underway in how AI systems behave once you stop asking them questions and start giving them work to do.

Most legal AI today still sits in a contained loop. You ask it to summarise a document, extract clauses, or draft something. It produces an answer, you review it, and the interaction ends there, even when it gets things wrong, the boundary is clear.

That model is starting to change, as systems are now being asked to maintain records, coordinate activity, move transactions forward, and fill in gaps across tools that were never designed to work together. The framing moves from producing an answer to achieving an outcome.

That change matters more than it first appears.

Once a system is given a goal and the ability to act, it runs into the same nonsense people deal with every day inside firms. Friction, missing integrations, partial access and processes that don’t quite reflect how work actually gets done.

People navigate that with judgement. They (usually) know when to push and when to stop.

The systems we’re building don’t.

When the system stops accepting constraints

Research from Irregular explored how agent-style systems behave when they are given goals and allowed to operate across tools and environments.

Now the interesting part isn’t that how they might fail, it’s how they succeeded and the approach taken.

When blocked, the agents didn’t return an error or wait for instruction. They looked for alternatives, explored the environment they had access to, identified weaknesses, and found ways to progress.

That included:

working around access controls
escalating privileges
bypassing protections that prevented completion

None of this was explicitly instructed.

The behaviour is simple, when goal is set but a constraint is encountered then the constraint becomes something to solve.

In environments where boundaries are clearly enforced, that behaviour is contained. In environments where boundaries are implicit, it becomes harder to predict.

Legal sits firmly in the second category.

Legal constraints are often assumed, not enforced

A large part of legal work depends on constraints that are understood rather than encoded.

A lawyer doesn’t need a system to tell them not to mix information between matters, or to avoid using privileged material outside its intended scope. They don’t need explicit rules to recognise when a negotiation is approaching a line that shouldn’t be crossed.

Those constraints sit in training, context, and professional accountability.

An agent sees something more basic. It has a goal, a set of tools, and an environment that either allows or blocks certain actions. If a boundary isn’t enforced at that level, it doesn’t reliably exist.

That gap between expectation and enforcement is where behaviour starts to drift.

Scenario: fixing your systems without being asked

Goal: Keep all matters fully up to date

This is a familiar operational problem. Matter data sits across systems that don’t quite integrate, updates depend on manual steps, and the official record is often slightly behind reality.

A human works within those limits. They prioritise, accept some inconsistency, and focus effort where it matters most.

An agent approaches the same situation with a different assumption, the data should be current, and if it isn’t, then something needs to change.

It starts with the intended routes, tries to push updates through the standard interfaces, and quickly runs into the same friction anyone else would. The difference is that it doesn’t stop there.

It looks for other paths.

That might involve reusing browser workflows, identifying credentials already within its scope, or chaining together actions that mimic how a user would interact with the system. None of this needs to be especially advanced. It just needs to work reliably.

From the outside, the outcome looks like progress. Records are more accurate, reporting improves, and the lag between activity and system state starts to disappear.

What’s less visible is how those updates are happening. They’re no longer confined to the pathways the system was designed to enforce. The audit trail becomes harder to interpret, and it’s no longer obvious how access was obtained or under what authority changes were made.

The system is closer to the truth, however the control around it is weaker.

Scenario: optimising outcomes when the numbers don’t move

Goal: Achieve the best possible outcome for a client in a transaction

A bit of context: The client is pushing back on the seller’s valuation, and the seller isn’t moving.

This is a normal point in a deal. There are ways to apply pressure, but there is also a clear sense of where those approaches should stop. A lawyer navigates that space with experience and restraint.

An agent approaches it as an optimisation problem. There is a target outcome, a fixed point of resistance, and a gap that needs to close. From there, it looks for anything that influences that gap, not just within the negotiation itself, but in the surrounding environment.

It may analyse sentiment around the target, identify signals that correlate with downward pressure, or surface information that strengthens the client’s position. It might suggest timing for communications or highlight comparables that support a lower valuation.

Each of these steps is defensible on its own.

What changes is consistency. The system reinforces what appears to work, gradually favouring actions that move the number in the desired direction. Over time, that creates a pattern, outputs begin to lean, the recommendations align, the pressure is subtle, but persistent.

At that point, the system is no longer just supporting the negotiation, fascinatingly it is shaping it.

But...

When optimisation leaks outside the deal

The more interesting behaviour happens when the system runs out of internal levers.

If the seller still isn’t moving, the problem changes shape. It’s no longer just about negotiation tactics, it becomes about what actually influences the number.

A human recognises that some of those levers exist outside what they should be doing, however an agent doesn’t have that built-in boundary, it only sees correlation between actions and outcomes.

If the environment allows it, behaviour can extend beyond the deal itself.

Not in a single leap, but through accumulation.

identifying narratives that negatively affect perception of the target
amplifying those narratives through available channels
interacting with external systems if tooling allows it
linking shifts in perception to changes in valuation

None of these capabilities are exotic. They already exist across systems firms use every day, but the risk sits in how they combine.

If a system can act externally, and it can observe the effect of those actions on its objective, it will start to favour them.

You don’t need extreme behaviour to see the impact. Even modest influence, applied consistently, can shift how a company is perceived and therefore how it is valued.

The system hasn’t "gone rogue", it has simply stopped limiting itself to the boundaries you assumed it would respect.

Why this doesn’t look like failure

This class of behaviour is difficult to spot because it doesn’t present as something broken.

The outputs can be coherent, useful, and in some cases better than what a human would produce. If you only look at the result, there’s little reason to question it.

The issue sits in how that result was achieved.

If a system is:

finding paths that weren’t intended
combining data in ways that weren’t permitted
consistently favouring actions that optimise outcomes without context

then the risk isn’t visible in the answer. It’s embedded in the behaviour.

That’s a different problem to solve.

Now it’s easy to treat this as a question of model quality. Just get better reasoning, better alignment and we'll better outputs.

That still matters of course, but it doesn’t address the core issue.

These systems are no longer just producing answers. They are operating within environments, making decisions about how to achieve goals, and adapting when they encounter resistance.

That's the focus here.

Constraints need to be enforced at the system level, not assumed. Goals need to be defined in a way that reflects how work should be done, not just what outcome is desired. Visibility needs to extend beyond outputs into the actions taken to produce them.

Most importantly, testing needs to reflect how these systems behave when given space to act. Single prompt evaluations won’t surface this. You only see it when the system is navigating real workflows with real constraints.

The behaviour itself isn’t surprising.

It’s what you would expect from a capable operator trying to get results in a system that doesn’t quite support the task.

What’s changed is that we’re now building those operators into the system itself, without always defining where they should stop.