Workweave Router Targets 40-70% AI Cost Cuts With Endpoint Swap
TL;DR
- Workweave Router routes each AI request to the cheapest capable model in under 50ms, requiring only an endpoint change from existing tools like Claude Code or Codex CLI.
- The project claims 60 to 70 percent of production Claude Code requests are short completions open-source models handle at parity, at roughly one-fortieth the cost of frontier models.
- The router is licensed under Elastic License v2, meaning it is source-available but restricts commercial redistribution and managed-service use.
Routing every AI request to the cheapest model that can actually handle it has become one of the more practical cost-reduction levers in production AI systems. The catch has always been the routing call itself: if classifying a request requires another expensive LLM call, you can end up spending more, not less. Workweave Router, a proxy released under the Elastic License v2, sidesteps that by running a small ONNX model locally to embed each incoming request and score it against a frozen set of intent clusters. The project describes the approach as "a tiny on-box embedder, not a vibes-based prompt." Notably, the scorer does not simply pick the highest-scoring model -- according to the project's introductory blog post, it picks the candidate that wins on a cost, latency, and verbosity composite.
The cost-reduction claims are significant but worth reading carefully. The blog post reports 80 to 85 percent cost reductions with no observable quality regression on production Claude Code traffic, attributing the savings to the observation that 60 to 70 percent of requests in that workload are short, structurally simple completions that an open-source model handles at parity, at roughly one-fortieth of the cost. The GitHub repository's headline figure is more conservative: "Cut costs 40-70% with just an endpoint change." The routing path itself adds low single-digit milliseconds of overhead. Both ranges are self-reported, not independently validated, so treat them as directional rather than guaranteed.
One design decision worth understanding is session pinning. The router keeps model decisions sticky per session rather than re-evaluating each turn. The rationale is prompt cache efficiency: as the blog explains, routing turn one to provider A and turn two to provider B means paying full cost to re-prime an identical context on the second provider, eroding the savings. That is a sensible trade-off for interactive coding sessions but may not suit workloads where complexity escalates sharply mid-session.
The ELv2 licensing is a meaningful caveat for teams evaluating this as open source. Elastic License v2 is source-available and allows self-hosting, but restricts commercial redistribution and offering the software as a managed service. For teams with strict open-source policies or plans to build a product on top of the router, that distinction matters.
For developers already using Claude Code, OpenAI Codex CLI, opencode, or Cursor, the described integration path requires only an endpoint change, with provider API keys staying local and encrypted at rest. If the cost-reduction claims hold on your particular workload mix, that is a low-friction test to run.
Originally reported by github.com
Read the original article →Original headline: Show HN: Workweave Router — Open-Source Model Routing Proxy Integrates With Claude Code, Codex CLI, and Cursor, Claims 40–70% Cost Reduction