Frontier Research: circuits got cheap as consciousness got loud


Interpretability moved this week from per-edge ablation to a single regression sweep, the same week DeepMind argued the machine-consciousness question may never resolve. A new method recovers LLM circuits with sparse regression over sparse-autoencoder features instead of thousands of activation patches, while DeepMind reframes consciousness from a test to be passed into a disagreement to be governed. Underneath both, the efficiency frontier moved hard: sparse attention and KV-cache compression posted real, reproducible numbers.


Key Takeaways

  • Circuit discovery just got an order of magnitude cheaper. CircuitLasso recovers circuits at the structural accuracy of intervention methods using SAE features and sparse linear regression — no per-edge ablation.
  • The consciousness debate is now a deliberation problem, not a detection problem. DeepMind argues there may be no test that settles it, which pushes it toward politics — and politics doesn't converge.
  • Long-context attention is no longer quadratic in practice. MiniMax's MSA cuts per-token attention compute 28.4× at 1M tokens while matching dense quality — out of a Chinese lab, not a US one.
  • The KV cache, not the weights, is now the binding memory constraint. Three labs drove it toward 2 bits in a single week, from three independent directions, and the methods compose.
  • Attention maps lie about multimodal models. A new study shows VLM accuracy tracks per-layer representation quality, not attention allocation.

The Big Story

Scalable Circuit Learning for Interpreting Large Language Models — recovering circuits via sparse regression over SAE features at "a fraction of the computational cost" · arXiv · June 15, 2026
Naiyu Yin, Dennis Wei, Tian Gao and colleagues reframe circuit discovery as sparse linear regression over sparse-autoencoder latents, presenting a method they call CircuitLasso whose structural accuracy "matches that of state-of-the-art intervention-based methods" like activation patching "at a fraction of the computational cost." The bet is that you don't have to causally ablate every component to find the subgraph driving a behavior — a learned sparse map over SAE features recovers the same structure and shows how interpretable features propagate layer to layer. If it holds at scale, mechanistic interpretability stops being a patch-by-patch craft and becomes a regression sweep you can run across many behaviors and checkpoints — the precondition for it ever keeping pace with model releases instead of trailing them by a year.


Also This Week

DeepMind's "Artificial Minds, Human Disagreement": AI-consciousness disagreement may be deep and durable — design for "overlapping consensus," not a verdict · DeepMind · June 15, 2026
Adam Bales and Iason Gabriel argue the field should stop chasing a yes/no consciousness test and instead pursue "overlapping consensus, where people agree on certain policies for AI systems, even though they continue to disagree about more fundamental questions" — reframing consciousness from a measurement target into a governance object.

The Hidden Evolution of Disguised Visual Context inside the VLM — accuracy tracks representation quality, not attention allocation · arXiv · June 18, 2026
Wish Suharitdamrong, Tony Alex, Muhammad Awais and Sara Atito find that visual tokens "enter the LLM as disguised visual context, raw representations lacking linguistic structure," reshaped layer by layer depending on the integration paradigm — and that "attention allocation alone is insufficient" to explain VLM behavior. A direct caution against reading multimodal models off their attention maps.


From the Lab

MiniMax Sparse Attention (MSA): a two-branch block-sparse design on a 109B-parameter MoE trained for 3T tokens · MarkTechPost · June 17, 2026
MSA splits attention into an Index Branch that picks which key-value blocks each query reads (block size 128, capped at 16 blocks — a fixed 2,048-token budget per query) and a Main Branch running exact softmax only on those blocks. The headline: a 28.4× drop in per-token attention compute at 1M context, plus 14.2× prefill and 7.6× decode speedups on H800, while matching dense attention on downstream benchmarks — learned top-k selection that makes million-token attention behave sub-quadratically in production, not just in an ablation table. That it ships inside MiniMax-M3 from a Chinese lab is the quiet geopolitics under the math.

The KV-cache compression race: TurboQuant's ≈2.7× bound, OSCAR's 7.83× throughput, EpiCache's 40% accuracy lift · MarkTechPost · June 18, 2026
Three labs hit the same long-context memory wall in one week. TurboQuant (Google & NYU) is data-oblivious — random rotation plus optimal scalar quantization, provably within ≈2.7× of the distortion lower bound and quality-neutral at 3.5 bits with no calibration. OSCAR (Together AI) calibrates an attention-aware rotation offline and pages mixed precision to an effective 2.28 bits, reporting up to 7.83× throughput and roughly 8× cache-memory reduction at 100K context. EpiCache (Apple) tackles multi-turn with episodic clustering and layer-wise budgets — up to 40% higher accuracy than eviction baselines and up to 3.5× lower peak memory, and orthogonal to the quantizers, so the three compose. The KV cache, not the weights, is now the binding memory constraint — driven toward 2 bits from three directions at once.


Worth Reading


The frontier this week wasn't a bigger model — it was a cheaper way to look inside one, arriving just as the field admitted it can't yet define what it's looking for.

— Alexis