'Caveman' plugin trims Claude and Codex output to cut AI bills
TL;DR
- A plugin called caveman, written by Julius Brussee in early April, strips verbose model output and cut tokens by roughly 65 to 75 percent in his tests.
- Shayne Sweeney, OpenAI's director of engineering, contributed code to caveman to support Codex, and developers at Nvidia and GitHub are reportedly using it.
- GitHub shifted to per-token billing in April, Uber blew through its entire AI budget in four months, and Legrand's internal memo points staff at caveman.
The most interesting cost-control move in coding tools right now is not a new model or a new contract, it is a small open-source plugin that makes Claude Code and Codex talk like, well, cavemen. 404 Media reports that developers at companies including Nvidia and GitHub are using a tool called caveman, written by Julius Brussee in early April, to strip the polite chatbot prose out of model output and keep only the parts that matter.
Brussee's claim is that caveman cuts output tokens by 'roughly 65–75 percent versus default verbose output,' and 404 Media's own test with Claude Code logged a 5,800-token, 65 percent saving on one session. The plugin preserves code, commands, URLs and numbers while compressing the surrounding language, so the model 'speaks less like a polite chatbot and more like a terse tool,' as Brussee put it. Shayne Sweeney, OpenAI's director of engineering, has reportedly contributed code to the project to support Codex.
The reason this kind of micro-optimization is suddenly newsworthy is the bill. GitHub moved to per-token billing in April, Uber blew through its entire AI budget in four months, and Walmart and Uber have started capping usage. An internal Legrand memo cited in the piece tells staff that 'since the billing system changed and the new quotas were implemented, we all need to be mindful of our usage,' and points them at the caveman skill.
The honest caveat is that the reporting does not give you a representative accuracy benchmark, just one favourable token count, and it does not say what Anthropic, OpenAI or Google think of users systematically suppressing the verbosity their products produce by default. If terseness becomes standard, vendors may respond by changing pricing or shipping a 'terse mode' of their own.
For small teams and individual developers paying out of pocket for Claude Code, Codex or Gemini, this is the cheap intervention worth trying first before downgrading to a smaller model. Prompt engineering is now partly a budget tool, not just a quality tool.
Shared on Bluesky by 3 AI experts
-
NEW: Companies are deliberately making their AI tools speak like cavemen in an attempt to stop burning through AI tokens and curb their massive expenditure on AI, 404 Media has found. “Caveman save you token, save you …
View on Bluesky →
Originally reported by 404media.co
Read the original article →Original headline: Companies Are Making Claude and Codex Talk Like Cavemen to Stop AI’s Soaring Costs