microsoft.com web signal

Microsoft Measures AI Query Energy Use, Projects 8-20x Savings

TL;DR

  • A peer-reviewed Joule study finds typical LLM queries consume 0.16 to 0.60 watt-hours, with water use under one drop per query.
  • Combining optimized models, smarter serving techniques, and custom hardware achieves 8 to 20x energy reduction per query.
  • One billion daily AI queries require roughly 0.7 GWh at baseline, dropping to approximately 0.3 GWh with efficiency improvements applied.

The argument that AI's energy footprint is unmanageably large just got its most rigorous public challenge. According to a peer-reviewed study published in the journal *Joule*, Microsoft reports that a typical query to some of the largest and most capable LLMs consumes between 0.16 and 0.60 watt-hours of electricity, roughly the equivalent of running a PC for 15 to 60 seconds or a microwave for under two seconds. Water consumption per query, the company says, ranges from zero to 0.067 milliliters, less than a single drop.

The efficiency case builds from three compounding layers. Optimized models, specifically smaller task-appropriate ones like Phi and Fara-7B, can reduce energy use 5 to 10 times compared to always routing queries to the largest available model; Microsoft's Model Router automates that routing decision. Smarter serving techniques including disaggregated and adaptive serving add up to another 5x improvement for long-query workloads. Hardware advances, including the company's Maia 200 custom inference chip, contribute a further 1.5 to 2.5x. Stack all three layers and the blog post's headline claim, 8 to 20x energy reduction per query, comes into view.

At billion-query scale, the numbers become concrete. Microsoft calculates that one billion conversational queries per day requires approximately 0.7 gigawatt-hours at baseline, dropping to roughly 0.3 GWh when the efficiency improvements are applied.

The honest caveat is that the full efficiency range requires the entire stack: custom silicon, Microsoft's routing layer, and the right model mix. Organizations running on different infrastructure will likely see narrower gains. The study also covers inference only, not training, which carries its own separate energy burden. And per-query efficiency improvements do not automatically translate to lower aggregate demand if total query volumes scale quickly enough.

What the post does provide, backed by a peer-reviewed paper, is a concrete baseline: 0.16 to 0.60 Wh per query. For sustainability and procurement teams modeling AI deployments, that number is now the most grounded figure available to work from.

Shared on Bluesky by 2 AI experts