huggingface.co web signal June 30th 2026

Yale + Google 'RLMF' beats standard RL by 63% on calibration

google safety hallucinations ai-business

TL;DR

RLMF, a joint Yale and Google Research method, surpasses standard RL by up to 63% on 'faithful calibration' across 10 evaluation datasets.
Small 8B open models tuned with RLMF are reported to beat GPT-5 by 37% and Gemini-3.1-Pro by 17% on the same calibration metric.
A two-stage pipeline first calibrates numerical confidence, then rewrites answers into natural linguistic hedging, trained on 2,000 PopQA samples.

A small joint paper from Yale University and Google Research posted this week is worth flagging not for the leaderboard number, but for what it targets. The claim is that a training tweak called reinforcement learning with metacognitive feedback (RLMF) can make an 8B open model hedge more honestly than GPT-5 on a family of question-answering benchmarks.

The setup, described in the paper on Hugging Face, starts from a real weakness in current LLMs: they "hallucinate with high confidence" and misrepresent how sure they actually are. RLMF scales the standard RL preference signal by how well a model can judge its own performance, so completions that come with accurate self-assessment get more weight in preference optimization. A two-stage pipeline first calibrates numerical confidence scores, then rewrites the answer into natural linguistic hedging.

Across 10 evaluation datasets, the authors report that RLMF "surpasses standard RL by up to 63%" on their faithful calibration score while task accuracy holds. Small 8B models tuned with RLMF are reported to beat GPT-5 by 37% and Gemini-3.1-Pro by 17% on that same metric. Training used only 2,000 metacognitively selected samples from a single QA dataset, PopQA, so the generalization to the other nine benchmarks is what actually earns the paper attention.

The honest caveat is that faithful calibration is scored using cMFG*, the authors' own refined metric, and the training corpus is short-form QA. What the paper does not give you is how the pipeline behaves in multi-turn or agentic settings where there is no gold answer to score self-judgment against, or whether the linguistic rewriting stage can drift away from the numerical confidence it was seeded with. Take the 37%-over-GPT-5 headline as reported, not as a general claim about all uncertainty behavior.

If the result holds up outside QA, it points at a cheaper path to more trustworthy hedging than simply scaling parameters. That matters most for teams building on small open models like Llama3.1 and Qwen3, and for domains where a confidently wrong answer is worse than an honestly partial one.

Originally reported by huggingface.co

Read the original article →

Original headline: Yale + Google Research Paper 'RLMF' — Reinforcement Learning With Metacognitive Feedback Improves Faithful Calibration by 63%, Beats GPT-5 by 37% on 10 Datasets While Preserving Accuracy