reddit.com via Reddit June 8th 2026

r/MachineLearning: Developer Abandons Semantic Embeddings for Tool Selection in 140-Tool MCP Agent, Switches to BM25 — Finds Cosine Similarity Actively Harmful on Schema Names

agents rag prompt engineering agents tool-selection retrieval

Summary

A practitioner routing ~140 MCP-exposed tools in a production agent switched from cosine-similarity over embedding vectors to BM25 keyword retrieval after finding that semantic search consistently mis-routed queries onto plausible-sounding but wrong tools — a failure mode that worsened as the tool count grew. The root cause identified: embedding models trained on natural language prose don't generalize to the naming conventions used in tool schemas, whereas BM25's exact-term matching handles schema-style identifiers reliably. The post includes concrete failure examples from a live deployment and is drawing broad validation from the ML practitioner community.

Originally reported by reddit.com

Read the original article →

Original headline: r/MachineLearning: Developer Abandons Semantic Embeddings for Tool Selection in 140-Tool MCP Agent, Switches to BM25 — Finds Cosine Similarity Actively Harmful on Schema Names