🧵There is a fundamental issue with reference-based LLM-judges. People implicitly assume a reference-based judge behaves like: score=f(candidate,reference)score=f(candidate,reference) However, the actual behavior is closer to: score=f(candidate,reference,parametric knowledge,pr…
Who's Who of AI
Leo Boytsov
Machine learning scientist and engineer speaking πtorch & C++ (ph-D CMU) working on (un)natural language processing, speaking πtorch & C++. Opinions sampled from MY OWN 100T param LM.
What they're sharing
Judging Against the Reference: Uncovering Knowledge-Driven Failures in LLM-Judges on QA Evaluation arxiv.org
Articles & links
Their network
In Leo Boytsov's orbit
Center = Leo Boytsov. Left = members they follow (green edges). Right = members who follow them (blue edges). Top = mutual follows (orange edges, slightly larger). Drag any node to reposition; click to open that profile.