Semble cuts AI agent token use 98% with fast code search
Key insights
- Semble achieves NDCG@10 of 0.854, matching transformer-based code search while consuming 98% fewer tokens than grep workflows.
- Queries resolve in approximately 1.5ms on CPU with no GPU, API keys, or external service dependencies required.
- The library ships with MCP server and AGENTS.md support, enabling drop-in integration with Claude Code, Cursor, and Codex.
Why this matters
Token consumption is the practical rate-limiter for long-horizon agentic coding sessions, and Semble directly attacks that bottleneck by replacing verbose grep-and-read patterns with sub-2ms indexed retrieval. For founders and teams building coding agents or internal developer tools, a 98% reduction in retrieval tokens translates directly into more reasoning steps per dollar and longer effective context before hitting limits. The MCP-compatible packaging lowers the adoption barrier enough that this could become a default component in agentic coding stacks the same way vector databases became default in RAG pipelines.
Summary
MinishLab released Semble, an open-source code search library purpose-built for AI coding agents that retrieves results in roughly 1.5ms on CPU with no GPU, API keys, or external services needed.
The core value proposition is token efficiency: conventional grep-and-read workflows force agents to consume large chunks of raw file content, which burns context and hits rate limits fast during long agentic sessions. Semble indexes a typical repository in about 250ms and achieves an NDCG@10 score of 0.854, matching code-specialized transformer models while using 98% fewer tokens per query.
Essentially: (MinishLab, Semble) are targeting the token budget as the binding constraint on agentic coding, not raw model capability.
- Ships as an MCP server or AGENTS.md shell command, compatible with Claude Code, Cursor, Codex, and OpenCode out of the box.
- Retrieval quality is on par with heavier transformer-based code search without any network round-trips or inference costs.
- The Show HN post hit 137 points within hours, with discussion validating token efficiency as the central bottleneck for multi-step coding agents.
As context windows fill faster in agentic loops than in single-turn chat, lightweight local retrieval tools like Semble are increasingly the practical ceiling on how far an agent can reason over a large codebase in one session.
Potential risks and opportunities
Risks
- Agents relying on Semble's retrieval quality could silently miss relevant code if the embedding model underperforms on domain-specific languages or proprietary frameworks not represented in its training data.
- If Semble becomes a default dependency in widely deployed agentic coding tools, a supply-chain compromise of the MinishLab PyPI package could affect a large number of developer environments simultaneously.
- Competing MCP-native retrieval tools from better-resourced vendors (e.g., Sourcegraph, GitHub via Copilot Extensions) could absorb the same niche with tighter IDE integration, leaving Semble dependent on community maintenance momentum.
Opportunities
- AI coding agent platforms (Cursor, Replit, Codeium) could integrate Semble or a similar local retrieval layer to extend effective session length without increasing API spend, a direct cost and retention lever.
- Enterprise developer-tooling vendors (JetBrains, Atlassian) gain a tested open-source reference design for token-efficient code search they can adapt into proprietary offerings targeting on-premise deployments.
- MinishLab is positioned to commercialize around the MCP server packaging, offering hosted index management or enterprise support for teams that want Semble's retrieval quality without self-hosting the indexing pipeline.
What we don't know yet
- Whether Semble's NDCG@10 benchmark holds on polyglot or monorepo-scale codebases beyond the evaluation set used in the Show HN post.
- How index freshness is handled in active development environments where files change continuously during an agent session.
- Whether MinishLab intends to maintain the MCP server interface as the MCP specification evolves, or if that compatibility will lag behind protocol updates.
Originally reported by github.com
Read the original article →Original headline: Show HN: Semble — Open-Source Code Search for AI Agents Using 98% Fewer Tokens Than grep, Sub-2ms Queries on CPU