reddit.com via Reddit

Claude Code Skills Waste 23K Tokens Per Session

anthropic coding tools claude-code context-optimization coding-tools

Key insights

  • Roughly half of installed Claude Code skills showed zero auto-activations in one developer's session log audit.
  • All installed skills load full instruction text into context at session start, regardless of whether they trigger.
  • The developer estimated 23,000 tokens wasted per session from inactive skills, with compounding cost implications for teams.

Why this matters

Context window efficiency directly affects both model performance and API cost, so a structural design where unused skills unconditionally consume thousands of tokens per session is a scalability problem that grows with team size and skill library depth. Enterprises evaluating Claude Code for standardized developer tooling need to account for this overhead when projecting costs and assessing whether available context is sufficient for complex, long-horizon tasks. The publicly shared audit method also surfaces a reproducible diagnostic that could pressure Anthropic to ship lazy-loading or conditional skill inclusion before enterprise adoption accelerates.

Summary

A developer auditing Claude Code session logs found that roughly half of their installed skills never auto-activated across recorded sessions, yet every skill loads its full instruction text into context at session start regardless of whether it fires. The audit method, shared publicly on r/ClaudeAI, involved parsing ~/.claude/projects/*.jsonl log files to count per-skill activation events. The developer estimated the dead-weight context load at approximately 23,000 tokens per session, a figure that compounds quickly for teams running multiple long sessions daily or maintaining large skill libraries. Essentially: (Anthropic, Claude Code users) have a structural mismatch between how skills are loaded and how they are actually used. - Every installed skill contributes to context overhead unconditionally, with no lazy-loading or activation-gated inclusion mechanism. - At 23,000 wasted tokens per session, teams on high-volume deployments face both cost drag and a meaningfully reduced effective context window for actual work. - The audit method is reproducible by any Claude Code user with access to their local project log files. For enterprises standardizing on Claude Code across engineering teams, unaudited skill libraries represent a quiet but compounding tax on both API spend and model performance.

Potential risks and opportunities

Risks

  • Enterprise teams that standardized Claude Code skill libraries without auditing activation rates may find effective context windows materially smaller than provisioned, causing silent degradation in long-session task quality.
  • If Anthropic does not address the unconditional loading behavior, third-party skill marketplaces could proliferate bloated skills that further erode context budgets for unsuspecting users.
  • Developers optimizing around the 23,000-token overhead by aggressively pruning skill libraries could inadvertently remove skills that activate rarely but are high-value when triggered, creating a difficult tradeoff without better tooling.

Opportunities

  • Anthropic could ship a skill activation dashboard or built-in audit command directly in Claude Code CLI, turning this community-discovered pain point into a retention and trust feature before competitors respond.
  • Third-party Claude Code tooling developers (e.g., those building around the .jsonl log format) have an opening to productize the audit method as a standalone context-efficiency analyzer.
  • Teams that proactively audit and trim their skill libraries now gain a competitive advantage in context efficiency, particularly for AI coding workflows where long-context tasks like large refactors or multi-file reviews are the norm.

What we don't know yet

  • Whether Anthropic has an internal roadmap for conditional or lazy skill loading, and if so, any committed timeline as of May 2026.
  • How activation rates vary across different skill categories (e.g., scheduling vs. code-review skills) and whether certain skill types are structurally unlikely to auto-activate.
  • What the token overhead looks like at scale for teams with 20+ skills installed, and whether the 23,000-token estimate holds or compounds non-linearly.