Gemma-3-27B Alone Shows Causal Latent Planning in New Study
TL;DR
- Across Qwen3, Gemma-3, and Llama-3 at more than ten scales, all families encode future rhyme info at line boundaries.
- Only Gemma-3-27B causally relies on that encoding; all other tested models show near-zero causal effect despite strong probe signals.
- Path patching localized Gemma-3-27B's planning handoff to five attention heads recovering roughly 90% of rhyme-routing capacity.
A finding that a model encodes information and a finding that it *uses* that information are not the same thing. A paper by Nicole Ma and Nick Rui on arxiv, submitted in May 2026, makes that gap concrete by asking whether language models plan ahead when completing rhyming couplets, and whether any internal planning actually drives what they generate.
The experimental setup uses two lightweight tools: linear probing and activation patching, applied to Qwen3, Gemma-3, and Llama-3 at more than ten scales each. The probing result is consistent across all three families: future rhyme information is linearly decodable at the line boundary, and the signal strengthens with scale. Every model family studied encodes something about what the rhyme word will be before reaching it.
The activation patching result is where the families diverge sharply. Only Gemma-3-27B causally relies on that encoding. Every other model tested conditions on the rhyme word throughout generation, with near-zero causal effect at the line boundary despite the strong probe signal. For Gemma-3-27B, Ma and Rui document a "handoff" in which the causal driver migrates from the rhyme word to the line boundary around layer 30. Two-stage path patching then localized that handoff to five attention heads that together recover approximately 90% of the rhyme-routing capacity at the newline.
The honest caveat is that rhyming couplet completion is a deliberately narrow test case, and the paper does not address whether planning-site formation generalizes to less structured tasks. It also leaves open why other model families encode the future rhyme information at all when it has near-zero causal effect, and whether Gemma's handoff mechanism appears at other scales or only at 27B.
For mechanistic interpretability work, the practical value is the method itself: the combination of probing and lightweight path patching is tractable across many scales, and the rhyme task provides a clean, verifiable signal for forward planning. If the same approach extends to harder constrained tasks like code generation or structured reasoning, it offers a replicable way to audit not just whether a model represents future states, but whether those representations actually shape what the model outputs.
Shared on Bluesky by 2 AI experts
-
Findings show some language models, like Gemma-3-27B, exhibit 'latent planning' by forming representations that influence outputs. Detected via activation patching, this reveals model behavior complexity and enhances und…
View on Bluesky →
Originally reported by arxiv.org
Read the original article →Original headline: Where's the Plan? Locating Latent Planning in Language Models with Lightweight Mechanistic Interventions