github.com via Hacker News

pxpipe renders Claude context to PNGs to cut bills 59-70%

TL;DR

  • pxpipe is a local proxy that renders bulky Claude Code context (tool docs, older history, large tool results) as PNGs for the model to read.
  • The project reports 59 to 70% lower end-to-end bills and one demo session going from $42.21 in plain text to $6.06 with pxpipe.
  • It is explicitly lossy: exact 12-character hex strings came back 0/15 on Opus and 13/15 on Fable 5, with silent confabulation as the failure mode.

pxpipe is a local proxy that intercepts Claude Code requests, takes the bulky text context (system prompt, tool docs, older history, big tool_result blobs) and renders it as PNG images before sending it upstream. The model then reads the pixels back with its vision head. According to the project README on GitHub, a 1928×1928 image costs about 4,761 vision tokens but holds up to ≈92,000 characters, so dense content packs roughly 3.1 characters per image-token versus about 1 per text-token.

The claimed savings are large. The README reports a 59 to 70 percent lower end-to-end bill on production workloads, 72 to 74 percent on the compressed portion, and a session demo showing $42.21 plain versus $6.06 with pxpipe on identical tasks. On SWE-bench Lite the pilot resolved 10 of 10 tasks on both arms with a 65 percent request-size reduction. On SWE-bench Pro it resolved 14 of 19 with pxpipe on versus 15 of 19 without.

The mechanism it exploits is a pricing shape rather than a modelling one. The README notes that an image's token cost is fixed by pixel dimensions, not by how much text is inside it, so packing an image with tightly reflowed code is arbitrage against the pricing schedule. The default target is Claude Fable 5, where the author claims full reading accuracy on dense pages; Opus 4.8 is opt-in and degrades noticeably.

The honest caveat is right in the README and deserves to be the headline: this is lossy. Exact recall on 12-character hex strings came back 0 out of 15 on Opus and 13 out of 15 on Fable 5, and the failure mode is described as silent confabulation, a plausible wrong value, not an error. Anything you need back byte-exact (IDs, hashes, secrets, exact numbers) has to stay text. What the reporting does not give you is the retry rate once an agent acts on a confabulated file path or commit hash, which is where the real bill lives for tool-heavy agents.

The direction is the part worth watching. If a solo project can plausibly halve the bill for coding agents by leaning on how vision tokens are billed, both Anthropic and the agent-framework vendors will notice, and this specific arbitrage window is unlikely to stay open indefinitely.