github.com web signal

Semble cuts AI agent token use 98% with fast code search

agents coding tools coding-tools ai-agents inference open-source

Key insights

  • Semble achieves NDCG@10 of 0.854, matching transformer-based code search while consuming 98% fewer tokens than grep workflows.
  • Queries resolve in approximately 1.5ms on CPU with no GPU, API keys, or external service dependencies required.
  • The library ships with MCP server and AGENTS.md support, enabling drop-in integration with Claude Code, Cursor, and Codex.

Why this matters

Token consumption is the practical rate-limiter for long-horizon agentic coding sessions, and Semble directly attacks that bottleneck by replacing verbose grep-and-read patterns with sub-2ms indexed retrieval. For founders and teams building coding agents or internal developer tools, a 98% reduction in retrieval tokens translates directly into more reasoning steps per dollar and longer effective context before hitting limits. The MCP-compatible packaging lowers the adoption barrier enough that this could become a default component in agentic coding stacks the same way vector databases became default in RAG pipelines.

Summary

MinishLab released Semble, an open-source code search library purpose-built for AI coding agents that retrieves results in roughly 1.5ms on CPU with no GPU, API keys, or external services needed. The core value proposition is token efficiency: conventional grep-and-read workflows force agents to consume large chunks of raw file content, which burns context and hits rate limits fast during long agentic sessions. Semble indexes a typical repository in about 250ms and achieves an NDCG@10 score of 0.854, matching code-specialized transformer models while using 98% fewer tokens per query. Essentially: (MinishLab, Semble) are targeting the token budget as the binding constraint on agentic coding, not raw model capability. - Ships as an MCP server or AGENTS.md shell command, compatible with Claude Code, Cursor, Codex, and OpenCode out of the box. - Retrieval quality is on par with heavier transformer-based code search without any network round-trips or inference costs. - The Show HN post hit 137 points within hours, with discussion validating token efficiency as the central bottleneck for multi-step coding agents. As context windows fill faster in agentic loops than in single-turn chat, lightweight local retrieval tools like Semble are increasingly the practical ceiling on how far an agent can reason over a large codebase in one session.

Potential risks and opportunities

Risks

  • Agents relying on Semble's retrieval quality could silently miss relevant code if the embedding model underperforms on domain-specific languages or proprietary frameworks not represented in its training data.
  • If Semble becomes a default dependency in widely deployed agentic coding tools, a supply-chain compromise of the MinishLab PyPI package could affect a large number of developer environments simultaneously.
  • Competing MCP-native retrieval tools from better-resourced vendors (e.g., Sourcegraph, GitHub via Copilot Extensions) could absorb the same niche with tighter IDE integration, leaving Semble dependent on community maintenance momentum.

Opportunities

  • AI coding agent platforms (Cursor, Replit, Codeium) could integrate Semble or a similar local retrieval layer to extend effective session length without increasing API spend, a direct cost and retention lever.
  • Enterprise developer-tooling vendors (JetBrains, Atlassian) gain a tested open-source reference design for token-efficient code search they can adapt into proprietary offerings targeting on-premise deployments.
  • MinishLab is positioned to commercialize around the MCP server packaging, offering hosted index management or enterprise support for teams that want Semble's retrieval quality without self-hosting the indexing pipeline.

What we don't know yet

  • Whether Semble's NDCG@10 benchmark holds on polyglot or monorepo-scale codebases beyond the evaluation set used in the Show HN post.
  • How index freshness is handled in active development environments where files change continuously during an agent session.
  • Whether MinishLab intends to maintain the MCP server interface as the MCP specification evolves, or if that compatibility will lag behind protocol updates.