r/MachineLearning: Developer Abandons Semantic Embeddings for Tool Selection in 140-Tool MCP Agent, Switches to BM25 — Finds Cosine Similarity Actively Harmful on Schema Names
Summary
A practitioner routing ~140 MCP-exposed tools in a production agent switched from cosine-similarity over embedding vectors to BM25 keyword retrieval after finding that semantic search consistently mis-routed queries onto plausible-sounding but wrong tools — a failure mode that worsened as the tool count grew. The root cause identified: embedding models trained on natural language prose don't generalize to the naming conventions used in tool schemas, whereas BM25's exact-term matching handles schema-style identifiers reliably. The post includes concrete failure examples from a live deployment and is drawing broad validation from the ML practitioner community.
Originally reported by reddit.com
Read the original article →Original headline: r/MachineLearning: Developer Abandons Semantic Embeddings for Tool Selection in 140-Tool MCP Agent, Switches to BM25 — Finds Cosine Similarity Actively Harmful on Schema Names