reddit.com via Reddit

web_search Eats 50% of Agent Spend, Not the LLM

agents ai infrastructure agents cost-optimization observability

Key insights

  • web_search alone accounts for roughly 50% of total agent spend, outpacing LLM inference costs in production deployments.
  • p95 latency in agentic systems is dominated by tool round-trips, not model generation or context window size.
  • Per-session cost waste concentrates in a small number of expensive tool calls, not distributed evenly across operations.

Why this matters

Most agent cost models are built around LLM pricing tiers, so teams are optimizing inference while their external tool budgets scale unchecked. Production agents calling web search, code execution, or external APIs at volume face a cost structure that looks nothing like the benchmarks used during development or prototyping. Any team planning to productionize agents at scale needs per-call instrumentation before they can make defensible architectural, vendor, or caching decisions.

Summary

Production instrumentation of agentic systems is exposing a cost assumption most teams have wrong: the LLM isn't what's draining the budget. A developer who added per-call telemetry to their production agents found web_search alone consumed roughly 50% of total agent spend. LLM inference, widely assumed to be the dominant cost driver, wasn't the binding constraint at all. Essentially: developers building agentic stacks are optimizing the wrong layer. - p95 latency is driven by tool round-trips, not model generation time - Context window size is rarely the actual bottleneck in production workloads - Per-session cost waste concentrates in a small number of expensive tool calls As agentic architectures grow more tool-heavy, the real cost surface is shifting to external API calls and integrations that most teams aren't monitoring at the per-call level.

Potential risks and opportunities

Risks

  • Teams that have pre-committed to LLM-optimized reserved capacity (Azure OpenAI, AWS Bedrock) may face budget overruns as web search and tool costs scale faster than forecast
  • Agents deployed without per-call instrumentation will continue burning budget on tool calls invisibly, with no cost signal until invoices arrive, creating runway risk for early-stage AI startups running production agents
  • Web search API providers (Serper, SerpAPI, Tavily) face pricing pressure or rapid churn as developers now aware of the 50% overhead aggressively shop for caching solutions or cheaper alternatives

Opportunities

  • Observability vendors (Langfuse, Helicone, Arize AI) can position tool-level cost tracing as a first-class feature to capture developer teams now motivated to instrument their stacks
  • Web search providers offering semantic deduplication or tiered caching (Tavily, Exa) gain competitive advantage as cost-conscious teams shop for cheaper tool infrastructure to reclaim that 50%
  • Agent framework maintainers (LangChain, LlamaIndex, CrewAI) can ship native per-tool cost instrumentation dashboards to capture mindshare among production-focused developers before third-party observability tools lock in the category

What we don't know yet

  • Which web search providers were in use (Bing API, Serper, SerpAPI, Tavily) and whether switching providers meaningfully changes the 50% cost share
  • Whether semantic caching of repeated web_search queries was tested as a mitigation and what hit rates are realistic in the developer's specific production workload
  • No data on how the spend distribution shifts as agent complexity grows -- whether tool-cost dominance holds for multi-tool agents that also invoke code execution or browser automation