Security Theatre Vs Real Security In Legal Tech

Most legal tech platforms now lead with the same promise. Bank‑grade security (or military if they're feeling cool), ISO badges along with long lists of controls that look reassuring and feel familiar. On paper, everything appears solid and defensible.

In practice, many of these systems move sensitive client data through a web of third‑party services that few buyers ever map properly. That gap between perception and operation is where real risk tends to sit.

This isn’t about encryption at rest. Everyone does that now (apart from maybe some of the vibe coded apps...) and they should. The more interesting questions are less visible. Where prompts actually go, who can see logs, how long embeddings persist, and what really happens when a regulator asks you to evidence an audit trail.

At its core, this is about the difference between what law firms think they’re buying and what they’re actually running day to day.

The comfort of surface‑level security

Security theatre works because it speaks a language everyone recognises. AES‑256. SOC 2. Pen tests. Vendor questionnaires completed with confident ticks. None of this is meaningless, but none of it tells the whole story either.

In AI‑enabled legal systems, the biggest risks rarely sit in a single database. They emerge from movement. Prompts leaving the platform, context assembled dynamically, documents chunked and embedded, logs written for debugging, metrics exported for monitoring. Each step is defensible in isolation. Taken together, they form a system most firms never fully examine.

What’s sold as a secure product is often a distributed architecture made up of multiple services, stitched together with reasonable assumptions and limited visibility across the whole.

Where prompts really go

Ask a buyer where prompts are processed and you’ll usually hear a reference to the model provider. That answer is almost always incomplete.

Before inference, prompts may pass through orchestration layers, preprocessing pipelines, safety filters, logging services, or evaluation tooling. After inference, outputs may be stored, scored, replayed, or sampled for quality checks. Each hand‑off introduces another trust boundary, with its own access rules, retention policies, and operational risks.

The critical question isn’t whether a model provider trains on your data. It’s who else can access prompts and outputs along the way, and under what circumstances.

Many firms discover they can’t answer that with any real confidence.

Logs aren’t harmless exhaust

Logs are often treated as operational by‑products. In reality, they’re some of the most sensitive artefacts in an AI system.

They frequently contain client identifiers, deal structures, draft advice, internal reasoning, and edge‑case behaviour. They’re also commonly accessible to broader engineering or support teams because they’re needed to keep systems running.

When a regulator or client asks for an audit trail, logs stop being technical plumbing and start being evidence. If those logs are fragmented across vendors, inconsistently retained, or impossible to correlate, the issue isn’t just compliance. It’s whether the firm can credibly explain its own systems.

A platform you can’t account for under scrutiny isn’t secure, regardless of how strong its perimeter controls appear.

Embeddings are often described as anonymous vectors, which sounds comforting and technically neat.

In practice, embeddings persist. They’re stored, reused, sometimes shared across features. They often outlive the documents they were created from, and deleting a source file doesn’t necessarily remove its semantic footprint.

A simple test exposes the gap here. How long do embeddings live, where are they stored, and how are they destroyed? If the answers are vague or inconsistent, that’s not a theoretical concern. It’s an operational one.

What audit day actually looks like

Security assurances tend to feel robust until audit day arrives.

Regulators don’t ask whether controls exist in principle. They ask you to demonstrate them:

Who accessed which data, when, and for what purpose.
How a particular output was produced.
Which sources were used.
Whether safeguards were enforced consistently or bypassed under pressure.

This is where many legal AI systems struggle, not because they were built irresponsibly, but because they were never designed for replay, traceability, and forensic inspection from the outset because that's not what has been a good selling point to clients.

Legal work demands the ability to reconstruct decisions after the fact. Systems optimised first for speed and usability, with governance added later, often fall short when that requirement becomes non‑negotiable.

What firms think they’re buying

Most firms believe they’re purchasing a product with clear boundaries. Data goes in, answers come out, and security sits neatly around the edge.

What they’re often operating instead is a distributed system spanning multiple vendors, overlapping responsibilities, and blurred lines of accountability. When something fails, no single party has a complete view of what happened.

That isn’t solely a vendor problem. It’s also a buying and design problem.

It’s worth pausing on a few assumptions that often surface at this point.

Some vendors will say they already do all of this. In practice, a small number do parts of it well, fewer do it consistently end to end, and very few make the full picture visible to buyers without being pushed. Good intent, certifications, and partial controls don’t automatically add up to an architecture that can be explained under pressure.

Others might argue that this level of scrutiny slows delivery. For me, experience tends to show the opposite. Unclear data flows, ad‑hoc logging, and blurred ownership are what create delays when questions arise. Systems designed with traceability in mind usually move faster because far less needs to be reconstructed later.

There’s also a tendency to lean heavily on contractual assurances. Contracts matter, but they don’t replace operational clarity. When something goes wrong, the firm still has to explain what happened inside its own environment, regardless of what a vendor has committed to on paper.

None of this demands perfection. It demands an honest view of how systems actually behave.

What real security actually looks like

Real security in legal tech is rarely visible and almost never easy to sell.

It shows up in unglamorous places: data-flow diagrams that reflect how information actually moves rather than how the architecture deck suggests it should.

Prompt-handling rules that extend beyond the model provider and into orchestration, logging, and evaluation layers. Logs treated as sensitive operational records, not engineering exhaust, with explicit access controls and retention decisions that can be defended. Embedding lifecycle rules that acknowledge persistence and deletion as active responsibilities, not assumptions. Permission models that reflect how legal work is supervised in reality, not how it’s convenient to implement technically.

Most of all, it shows up in systems that can answer hard questions calmly, without scrambling, caveats, or retrospective reconstruction.

What firms should be asking vendors

If this all feels abstract, it shouldn’t. Firms can turn it into very practical conversations, and just as importantly see how to assess the quality of the answers they receive.

Ask vendors to walk you through the full data journey for a single prompt, from user input to final output.

Good answer: a concrete walkthrough with named components, clear stages, specific storage points, and an explanation of which steps are optional or configurable.

Poor answer: a high‑level diagram that never quite maps to reality, or a deflection back to certifications rather than describing what actually happens.

Ask where data is transformed, stored, logged, or exported, and which parties have access at each step.

Good answer: an explicit description of trust boundaries, with practical detail on how access is restricted and audited.

Poor answer: collapsing everything into "the platform", or vague assurances that internal access is rare without explaining how that’s enforced.

Ask what appears in logs by default, who can view them, and how long they’re retained.

Good answer: logs treated as sensitive data, with clear retention periods, role‑based access, and an understanding of what content is captured.

Poor answer: logs described as an engineering concern, indefinite retention because it’s convenient, or uncertainty about what’s actually recorded.

Ask whether logs can be correlated to reconstruct a specific output months later.

Good answer: a clear explanation of replay or traceability, showing how inputs, context, and outputs can be linked deterministically.

Poor answer: promises of cooperation during audits without any concrete mechanism for reconstruction.

Ask how embeddings are created, where they live, whether they’re shared across features, and what happens when underlying documents are deleted or a matter is closed.

Good answer: embeddings recognised as persistent artefacts, with explicit lifecycle management and deletion processes.

Poor answer: reliance on claims of anonymity, or an assumption that deleting the source document implicitly removes everything derived from it.

Ask how permissions are enforced in agentic or tool‑calling workflows, and how those permissions map to real legal roles and supervision models.

Good answer: permissioning that mirrors legal reality, approval chains, escalation paths, and separation of duties.

Poor answer: technically accurate descriptions of roles and scopes that don’t align to how legal work is actually supervised.

Finally, ask what happens on audit day. Not in theory, but in practice.

Good answer: a calm description of what evidence can be produced quickly, what takes longer, and where the limits genuinely are.

Poor answer: generic statements that audits are supported, without clarity on what that support looks like under time pressure.

Vendors who consistently give good answers here are worth paying attention to. Those who don’t are still early in their maturity, regardless of how polished the interface looks.

Law firms don’t need to become security specialists. They do need to stop mistaking reassurance for understanding.

The most important questions are straightforward:

Where does the data go.
Who can see it.
How long does it live.
Can we explain a specific output months later if we’re asked to.

Security theatre keeps everyone comfortable in the short term. Real security creates some friction early on, and avoids far worse conversations later. That distinction is only going to matter more as clients, regulators, and courts start looking past shiny badges and into how these systems actually behave.