Dropbox open-sources Rust semantic search in SQLite
Key insights
- Witchcraft achieves 21ms p95 latency on NFCorpus, beating Stanford's original XTR-Warp on equivalent hardware.
- The entire retrieval stack fits in a single SQLite file with no external API, vector DB, or embedding server dependency.
- Companion app Pickbrain indexes Claude Code and Codex session transcripts, targeting AI coding agent workflows directly.
Why this matters
Local-first retrieval has been blocked by infrastructure complexity for most developers, and Witchcraft collapses that stack to a single SQLite file, making semantic search a zero-dependency library rather than a service. For AI agent builders, this removes the vector database as a deployment requirement, which directly lowers latency, cost, and attack surface for on-device or air-gapped agent memory. The Pickbrain integration with Claude Code and Codex transcripts signals that retrieval over agent session history is becoming a first-class product pattern, not just a research problem.
Summary
Dropbox has released Witchcraft, a Rust-based semantic search engine that runs entirely within a single SQLite file, requiring no API keys, embedding servers, or vector databases.
Built by a Dropbox engineer as a ground-up reimplementation of Stanford's XTR-Warp ColBERT late-interaction retrieval system, Witchcraft hits 21ms p95 end-to-end latency on the NFCorpus benchmark, outperforming the original XTR-Warp on server-class hardware. The library ships with a companion tool called Pickbrain that indexes Claude Code and Codex session transcripts for local search.
Essentially: (Dropbox, Stanford) the dependency stack for local semantic search just collapsed to a single file.
- ColBERT late-interaction retrieval is typically expensive infrastructure; Witchcraft packages it in SQLite with no external process required.
- Pickbrain targets AI coding agent workflows directly, indexing session transcripts from Claude Code and Codex out of the box.
- The project is gaining traction simultaneously on r/MachineLearning and Hacker News as teams evaluate it for local-first RAG pipelines.
The release puts production-grade semantic retrieval within reach of any developer who can ship a SQLite file, which meaningfully lowers the floor for local AI agent memory and context systems.
Potential risks and opportunities
Risks
- Teams that build production RAG pipelines on Witchcraft before it stabilizes could face breaking API changes, as the library is a solo-engineer release with no stated SLA or versioning commitment.
- SQLite's write-concurrency limits could become a bottleneck for multi-agent deployments where several coding agents attempt simultaneous index writes, potentially blocking adoption in team-scale use cases.
- If Dropbox reassigns the engineer or shifts internal priorities, the project could stall without a maintained release, leaving dependent pipelines on an unmaintained retrieval core.
Opportunities
- Local AI agent framework maintainers (LangChain, LlamaIndex, CrewAI) could integrate Witchcraft as a zero-dependency retrieval backend within the next 30 to 60 days, reducing their infrastructure requirements for on-device deployments.
- Developer tool companies building on top of Claude Code or Codex (Cursor, Replit, Sourcegraph) could adopt Pickbrain-style session indexing to offer persistent agent memory without cloud retrieval costs.
- Embedded and edge AI vendors targeting air-gapped or regulated environments gain a credible semantic search option for compliance-sensitive deployments where external vector DB calls are prohibited.
What we don't know yet
- Whether Witchcraft's NFCorpus benchmark performance holds on longer context corpora typical of real AI coding agent sessions, which can run to hundreds of thousands of tokens.
- Licensing terms for commercial use of Witchcraft are not yet widely discussed in the thread, and Dropbox's open-source licensing history includes restrictions that have surprised downstream users.
- Whether Pickbrain supports session transcript formats beyond Claude Code and Codex, and when or if indexing for other major agents (Cursor, Gemini CLI) is planned.
Originally reported by reddit.com
Read the original article →Original headline: r/MachineLearning: Dropbox Open-Sources Witchcraft — Rust Semantic Search Engine in Single SQLite File, 21ms p95, Built for AI Coding Agents Without API Keys or Vector DB