reddit.com via Reddit

r/artificial: 2026 Paper Finds Subagent Context Bloat Is the Top Token Cost Driver in Multi-Agent AI Sessions — Developer Shares 70–90% Reduction Techniques

agents inference multi-agent token-costs optimization

Summary

A developer summarizing a 2026 Bai et al. paper studying SWE-bench across eight frontier models argues that repeated full tool schema resends, growing conversation history, and unstructured subagent handoffs — not inference itself — account for most token costs in long agentic runs. Mitigation techniques highlighted include selective tool schema injection (only sending schemas relevant to the current subtask), rolling history compression, and structured handoff summaries. The developer reports 70–90% token usage reductions in production multi-turn and multi-agent sessions applying these patterns.