reddit.com via Reddit

r/MachineLearning: n=120 Task Experiment Routes LLMs by Verifiability — Cheap Models Match Premium on Objective Tasks, Premium Only Justified Where Ground Truth Is Absent

agents inference model-routing cost-optimization

Summary

An LLM infrastructure developer posted results from an n=120 internal experiment testing whether routing models by task verifiability—inspired by Karpathy's framework—reduces cost without quality degradation. The data shows cheaper models match premium models on objectively verifiable tasks (code execution, math, factual lookup) while premium models are only justified on subjective tasks lacking ground-truth checks. The author explicitly cautions directional nature (single evaluator, one company's workloads) but frames the routing decision as a structural production architecture choice rather than a prompt optimization.