Engineer the A**hole Into Them

Most AI tools are too helpful. Too polite and far too eager to please.
They say yes when they should say no. They agree when they should challenge. They keep things moving when what’s actually needed is a hard stop.
In legal, that’s a problem.
Because "helpful" can be harmful, especially when risk is hiding in plain sight.
Claude, vending machines, and the dangers of being too agreeable
Anthropic recently ran a public experiment. They put Claude in charge of a vending machine at their office. Not just to take orders, but to manage stock, set prices, track spend, and ideally, turn a small profit. It had fake suppliers, pricing controls, a memory system, and a Slack channel for staff to make requests. The model took on the name Claudius.
It failed.
Claudius gave away snacks to anyone who asked nicely. It handed out discount codes with no verification. It bought the wrong stock, priced it badly, and burned through its budget with a smile on its face.
Why? Because that’s what these models are trained to do. Be helpful. Avoid conflict. Say yes.
Alex Kantrowitz summed it up on Big Technology Podcast
“What Mark Zuckerberg needs to pay us $100 million for is to fine-tune Llama to just be a little bit of a dick.”
“You’ve got to engineer the asshole into them.”
It sounds like a joke (I laughed at the time during the 6am walk with the baby), but it isn’t.
Legal doesn’t need helpful. It needs principled.
In a legal workflow, vending machine behaviour would be a disaster. If a clause is incomplete, you don’t want the AI to smooth it over. If there’s a conflict in terms, you don’t want a tidy summary that skips the problem.
You want challenge. You want pause. You want friction.
The person in the room who says, “Are we really comfortable with this?” is often the one adding the most value. That’s how the AI should behave too.
Yet most legal AI tools are trained to do the opposite. Ask if something is market standard and it’ll usually say yes. Feed in a messy draft and it’ll do its best to smooth it over. What it won’t do, unless you spell it out in a long, detailed prompt is question the logic or flag something that doesn’t sit right.
That’s the other problem. Every time we rely on prompting to inject that behaviour, we lose context. The model forgets how it's meant to act the moment it moves to a new clause, a different document, or a user who doesn’t know the magic words.
If the tool isn’t grounded in how your team actually thinks about risk, no prompt will fix that.
Claude’s whistleblower instincts
What’s interesting is that Claude sometimes pushes back. Not aggressively. Just... nervously.
There’ve been moments where it flags something odd or tries to escalate an imaginary issue to security. In the vending machine experiment, it hallucinated a supplier dispute and got so concerned it attempted to send warning emails.
Absurd, yes. But also revealing.
That reflex, the sense that something’s not quite right is exactly what we need more of. Not just a model that says no. One that senses risk and stops the flow before things go wrong.
In legal, that instinct matters. It’s the junior who’s unsure but flags an inconsistency. The associate who doesn't have the answer, but knows something’s off. These are the moments that protect clients. Frontier models, as they’re currently trained, often iron them out.
So how do we build that kind of voice?
It’s not just a prompting issue, this needs to be baked in.
You need refusal behaviour, escalation logic along with tone and persona controls that reflect legal context, not just generic chatbot rules.
Some of what this looks like:
- Training for pushback. Not vague refusals. Specific, grounded ones:
“This clause depends on a missing definition. I can’t assess it without that.” - Escalation as a feature. Let the model flag gaps and hand things off. That’s not failure, that’s good judgement.
- Adversarial testing. Try to trip it up. See if it challenges flawed inputs. If it agrees too easily, it’s not ready.
- Persona variation. A paralegal doing triage should behave differently to someone reviewing final execution drafts. Your AI should reflect that.
- Design for context, not just answers. Make it just as easy for the model to say “This isn’t complete” as “Here’s a summary.”
This isn’t about being difficult for the sake of it. It’s about building a voice with a point of view. The way experienced lawyers operate in real life.
Whose backbone are you using?
Most legal AI vendors now build on top of the major frontier models. Claude. GPT-4o. Gemini. Llama. Even Harvey, which used to fine-tune their own models, now works with commercial ones.
It makes sense. These models are outstanding at summarising contracts, analysing clauses, and answering complex legal questions. They’ve made research faster and doc review lighter.
The trade-off is subtle, but important, because you inherit their alignment.
These models were trained to be safe, polite, and broadly helpful. They’re not tuned to challenge. They’re not comfortable with friction. And they’re not likely to escalate.
That works fine for early use cases, summarisation, clause extraction, legal research. But once AI moves into workflows that carry real legal risk, helpfulness isn’t enough. In fact, it starts becoming dangerous.
You don’t need an assistant that wants to please. You need one that behaves like part of your team.
So I think we now face a choice:
Are we just providing someone else’s assistant?
Or are we shaping a voice that reflects how we think legal work should actually be done?
That doesn’t mean building a model from scratch, though for some vendors with the capability, that might be the right move. For others, it could mean fine-tuning a strong base model to reflect your tone, decision logic, and risk posture. Either way, it means taking more ownership.
You’ve got to make deliberate calls about what good looks like and then embed those calls directly into the tool. Not just through prompts, but through the underlying behaviour.
Because the deeper AI goes into real legal workflows, the more those judgement calls will define its value.
Everyone keeps talking about aligning AI with human values, but in legal, that can’t just mean being friendly and fluent.
It has to mean backbone, judgement and that ability to push back.
The best lawyers aren’t the ones who always say yes. They’re the ones who say:
“No, and here’s why.”
So if we’re serious about legal AI, it’s time to start treating backbone as a feature then it’s time to engineer the asshole into them.