AI's Carbon Footprint: We Are Still Chasing The Wrong Metric

In April 2025, I wrote an article arguing that much of the discussion around AI's environmental impact was focused on the wrong thing. Looking back a year later, I still believe that argument was correct. If anything, the rise of reasoning models and agentic AI has made the problem even more apparent.

When AI first entered the mainstream, environmental discussions naturally gravitated towards the most visible unit of measurement available: the prompt.

Articles attempted to calculate the carbon footprint of a ChatGPT interaction. Researchers estimated water consumption per query. Comparisons were made between AI requests and traditional web searches. These discussions helped bring attention to an important issue. AI systems consume resources, require substantial infrastructure and have environmental consequences that deserve scrutiny.

The problem was never the measurements themselves.

The problem was the assumption that a prompt represented a meaningful unit of work.

That assumption becomes increasingly difficult to defend in 2026.

The AI systems being deployed today bear little resemblance to the simple question-and-answer interfaces that dominated public discussion only a year ago. A single user request may now trigger retrieval systems, web searches, tool calls, reasoning loops, validation checks, external system interactions and multiple model invocations before a response is returned. What appears to be one interaction from the user's perspective may represent a substantial amount of computational activity occurring behind the scenes.

As the gap between user interaction and computational effort grows, metrics built around counting prompts become progressively less useful.

Why The Original Metrics Made Sense

It is easy to criticise earlier sustainability discussions with the benefit of hindsight, but the focus on prompts was understandable.

At the time, prompts were one of the few observable units available. Most users interacted directly with a model, received a response and moved on. Measuring environmental impact at the interaction level provided a practical starting point for understanding a rapidly emerging technology.

The industry needed simple metrics.

The challenge is that simple metrics often survive long after the systems they were designed to measure have evolved.

Imagine trying to understand the environmental impact of a logistics company by measuring the fuel consumption of a single van journey. The figure itself may be accurate, but it tells us little about route optimisation, warehouse operations, fleet management or overall delivery efficiency. As systems become more sophisticated, meaningful measurement often requires moving further up the chain.

AI appears to be reaching that point.

A Prompt Is No Longer A Unit Of Work

A modern AI workflow increasingly resembles a process rather than a transaction.

A lawyer asking a question about a transaction may trigger document retrieval from multiple repositories, extraction of relevant clauses, cross-referencing against prior matters, iterative reasoning steps, validation checks and structured report generation. Another request may involve a lightweight extraction model identifying dates or parties from a contract before returning a simple answer.

Both interactions may appear identical from the user's perspective.

Both may involve a single prompt.

The underlying resource consumption could differ dramatically.

This distinction becomes even more important as organisations deploy agentic systems. Agents can decide how much work to perform, when to invoke additional tools, whether to revisit earlier conclusions and how extensively to reason about a problem. Two systems attempting to solve the same task may consume vastly different amounts of compute depending on how they have been designed and governed.

The environmental footprint therefore becomes increasingly influenced by workflow architecture rather than simply model selection.

The Numbers Still Matter

None of this means environmental measurements should be ignored.

In fact, one of the most positive developments over the past year has been the emergence of more credible environmental disclosures from AI providers.

For example, Mistral's lifecycle assessment estimated approximately 1.14 grams of CO₂ equivalent for a 400-token response within its measured scenario. We are also beginning to see broader disclosures covering water consumption, training infrastructure, operational emissions and hardware utilisation. These efforts represent an important step towards greater transparency and accountability.

Without measurement there can be no meaningful governance.

Carbon emissions remain important. Water consumption remains important. Electricity demand remains important. Even the environmental impact associated with manufacturing and replacing increasingly specialised hardware deserves consideration.

The problem is not that these measurements are wrong.

The problem is that they increasingly describe individual components of a system rather than the system itself.

Knowing the environmental impact of a model invocation helps organisations compare technologies and understand trade-offs. It does not necessarily tell us whether an overall workflow has been designed efficiently, whether expensive reasoning was genuinely necessary or whether a simpler approach could have achieved the same outcome.

As AI systems become more autonomous, measuring isolated interactions begins to resemble measuring individual components of a supply chain while ignoring the performance of the overall operation.

The More Meaningful Question Is Outcome Efficiency

One of the reasons prompt-based measurements have always felt unsatisfactory is that they focus on activity rather than outcomes.

Consider two legal workflows. In the first, a lightweight AI system extracts dates, parties and governing law clauses from thousands of agreements, involving a large number of model calls but relatively little reasoning. In the second, an agentic due diligence workflow retrieves documents, performs iterative analysis, validates findings, generates risk assessments and produces structured recommendations for legal review.

Both workflows may begin with a single user request and appear as a single interaction within a user interface, yet the underlying computational effort could differ by orders of magnitude.

This is why the more meaningful question is not how much carbon was produced by a prompt, but how much carbon was required to complete a piece of work.

Organisations do not invest in AI to generate prompts. They invest in AI to reduce effort, improve quality, increase consistency and accelerate outcomes, and sustainability discussions should increasingly reflect those objectives.

Ultimately, the unit of measurement that matters is not the interaction itself but the completed outcome.

Reasoning Changes The Economics

Another important shift over the past year has been the growing importance of test-time compute.

Historically, environmental discussions focused heavily on training. Large training runs required significant infrastructure and naturally attracted attention. While training remains important, reasoning models have shifted part of the discussion towards inference.

Modern systems often spend substantially more computational effort producing better answers. Rather than responding immediately, they may explore alternatives, challenge assumptions, validate intermediate conclusions and revisit earlier reasoning.

From a quality perspective this is often desirable.

Most organisations would expect a complex regulatory analysis to require more effort than extracting a contract date.

From a sustainability perspective, however, it means the environmental footprint of a workflow is increasingly influenced by decisions made during execution rather than simply the model being used.

The same model can produce radically different resource consumption profiles depending on how much reasoning is performed and how many supporting systems are invoked.

Sustainability Is Becoming A Governance Challenge

This may be the most significant change since I wrote the original article.

Many organisations now have governance frameworks covering privacy, security, model risk, explainability and human oversight. Sustainability is often discussed separately, frequently as an infrastructure concern rather than an operational one.

I suspect that distinction will become harder to maintain.

As AI systems become capable of determining how much computational effort to expend, sustainability increasingly becomes a governance issue.

Organisations will need to decide when advanced reasoning is justified, when lightweight models are sufficient and how computational resources should be allocated across different categories of work.

This is where concepts such as proportionality and economic routing become increasingly relevant.

A contract classification task should not necessarily consume the same computational resources as a complex cross-border restructuring analysis. Just as professional services firms allocate work according to complexity, risk and expertise, mature AI estates will increasingly allocate computational effort according to the demands of the task being performed.

The goal is not simply to reduce cost.

The goal is to ensure that resources are being used proportionately to the value being delivered.

In that sense, sustainability becomes closely linked to broader questions of governance, accountability and operational design.

What Should Organisations Measure Instead?

If prompts are becoming a less useful metric, what should replace them?

The answer is unlikely to be a single number. As AI systems become more complex, organisations will need a broader set of measurements that capture activity at different layers of the technology stack.

Carbon and energy consumption remain valuable at the model level because they support procurement decisions, infrastructure planning and comparisons between different technologies. However, these metrics only tell part of the story.

At the workflow level, organisations should increasingly consider measures such as resource consumption per completed process, reasoning escalation rates, model routing decisions, workflow completion costs and the business outcomes ultimately delivered. These indicators provide a clearer view of how computational resources are being used to create value.

The objective is not to eliminate environmental measurement but to apply it at the level where meaningful decisions are actually made. A sustainability programme that understands how resources are consumed across an entire workflow will almost always be more useful than one focused exclusively on individual prompts, because it reflects the reality of how modern AI systems operate.

The AI Sustainability Framework

Much of the discussion around AI sustainability still focuses on individual interactions. While metrics such as carbon emissions per prompt or water consumption per model invocation remain useful, they only provide visibility into a small part of the overall system.

As AI workflows become more agentic, organisations need a broader framework that measures environmental impact across multiple layers.

Layer	Key Question	Example Measures
Infrastructure	What resources does the underlying technology consume?	gCO₂e per request, water consumption, electricity usage, hardware utilisation
Model	How efficient is the model itself?	Tokens processed, inference cost, reasoning depth, response efficiency
Workflow	What actually happened to complete the task?	Model calls, retrieval operations, tool usage, validation loops, workflow cost
Governance	Was the appropriate level of compute used?	Escalation rates, model routing decisions, lightweight vs advanced model usage
Outcome	What value was delivered?	Carbon per document reviewed, carbon per contract analysed, carbon per completed matter
Strategic Impact	What was the net organisational effect?	Productivity gains, avoided effort, overall sustainability improvement

The further up the framework an organisation measures, the closer it gets to understanding the true environmental impact of AI.

A carbon figure attached to a single prompt may be technically accurate, but it provides limited insight into whether a workflow was designed efficiently. Equally, measuring only business outcomes may obscure opportunities to improve the underlying infrastructure.

Effective governance requires visibility across all six layers.

As AI systems become capable of determining how much work to perform, sustainability becomes less about measuring isolated interactions and more about understanding how computational resources are consumed in pursuit of business outcomes.

Looking back at the argument I made in just over a year back, I still believe the central point was correct.

The industry focused on a metric that was easy to communicate rather than one that was genuinely useful.

What has changed is the scale of the problem.

Reasoning models, agentic systems and increasingly autonomous workflows have widened the gap between user interactions and the computational effort required to satisfy them. A prompt no longer represents a meaningful unit of work in many modern AI systems.

That does not make sustainability less important. It makes sustainability harder to understand and therefore more important to govern.

For decades organisations have governed people, budgets, infrastructure and business processes. Increasingly they will also need to govern how AI systems consume computational resources on their behalf. The firms that navigate this transition most successfully are unlikely to be those counting prompts or comparing isolated interactions. They will be the organisations measuring outcomes, applying proportionality, routing work intelligently and treating computational effort as a governed organisational resource.

The environmental debate around AI is maturing, and so the questions we ask, and the metrics we choose to govern, need to mature with it.

Note: I estimated the re-working and image generation for this article at roughly 0.25 to 0.4 kg CO₂e, using published AI lifecycle estimates as a rough benchmark. I have rounded that up and offset 1 kg CO₂e to account for uncertainty.