When AI Starts Hunting Bugs in the Law

Google’s new AI-powered bug hunter just autonomously found and reproduced 20 security vulnerabilities in widely used open-source tools. No prompts, no checklists, no human steering. Just an unsupervised AI digging through code and surfacing real exploits.

Humans stepped in after the fact, validating the bugs before disclosure, but the hard part of discovery and proof of exploitability was done without them.

If you work in legal tech, this isn’t just interesting. It’s a glimpse of what’s coming for us.

Legal as an attack surface

Laws and contracts are systems. Complex, yes, but still structured and rule-based. They can be gamed, and the fact that they’re written in natural language makes them harder for traditional software to interpret, but easier for modern AI to manipulate.

So what happens when models like Big Sleep get turned loose not on code, but on legislative frameworks or contract libraries?

Find clauses that silently create obligations without alerting the counterparty
Spot contracts that allow one-sided termination under obscure conditions
Extract provisions where enforcement depends on undefined or circular references
Simulate malicious compliance across multiple jurisdictional contexts

This sort of analysis is already happening, just not yet at scale. Whether you're mapping clause libraries, running due diligence checks, or reverse-engineering a policy stack, you're edging closer to exploit discovery. AI will simply accelerate the process, and it will do it across hundreds or thousands of documents without pause.

AI as the legal red team

Right now, most legal AI tools are reactive. They help you review, summarise, and extract information. A proactive model, one that acts like an adversary and probes for weaknesses, plays a very different role.

It doesn't just support legal work. It tests it.

Of course, this isn’t a new idea. Lawyers have always tried to find leverage. A senior partner who flags the indemnity carve-out, a tax adviser who spots a loophole in entity timing or a litigator who knows how to exploit procedural gaps. That mindset already exists, however what changes here is the scale and repeatability.

AI can now do this across hundreds of contracts, entire libraries of statute, or interconnected regulatory documents, all at once and most importantly, it doesn't forget, hesitate, or miss things because it’s on its fourth hour of reading.

Take a basic auto-renewal clause:

"This agreement shall renew for successive 12-month periods unless either party provides notice of termination at least 60 days before the end of the then-current term."

Now drop that into a dataset of 300 supplier agreements. A red-teaming model could:

Flag contracts where the clause is buried in an annex or schedule
Spot cases where local law overrides auto-renewal unless there's active consent
Identify clauses that trigger a cost increase on renewal
Detect asymmetry in notice periods between parties

Not revolutionary in isolation, but in bulk, across a portfolio, this sort of analysis becomes invaluable. It turns contract review into systemic risk analysis.

From impossible to operational

Worth noting here that not long ago, many in the security world claimed this kind of thing couldn’t be done. Models were too vague, too error-prone. They hallucinated. They couldn’t reason across multiple steps. They weren’t reliable enough to handle real-world code.

That was two years ago.

Now we have a deployed system that finds and reproduces bugs on its own that Google stands behind. Quietly, effectively, and at scale. If you still believe legal language is too nuanced or too context dependent for AI to reason over, you may already be behind the curve.

The upside and the race

Used properly, this kind of tooling strengthens legal work. It allows teams to spot problems earlier, standardise where needed, and challenge risk before a regulator or counterparty does.

Now it’s not only a tool for compliance, it’s just as available to those looking to manipulate, delay, or exploit. Whether it’s gaming jurisdictional gaps or surfacing favourable readings, these models don’t have a moral compass, they just go where the prompt tells them.

I saw an RFP from a nation state exploring how to codify its laws and run models over them to detect and correct loopholes before they’re tested in court. The intent was good, but the same capability could be flipped the other way. Once these models can model the law, they can be used to break it too.

The only way this remains a net positive is if the good actors outpace the bad, that means investment, a lot of it. Not just in AI tools that summarise and extract, but in tools that challenge, simulate, and probe. Google had DeepMind and Project Zero. Do we have anything close in legal?

Because the tooling is coming either way. The only real question is who puts it to work first and why.