OpenAI's autonomous proof, Cursor's post-training moat, and Google's Flash-first keynote rewrote this week.
This week ML went two ways at once: research that wasn't supposed to happen yet, and shipping that wasn't supposed to happen this fast. A general-purpose reasoning model from OpenAI autonomously disproved Paul Erdős's 1946 unit distance conjecture with no math-specific scaffolding, while Google compressed a year of frontier launches into 22 days. Underneath the headlines, Cursor and Together AI made the quieter case that post-training and inference compression — not parameter counts — are where the next twelve months get decided.
Watch & Listen First
- MLST: Michael I. Jordan on why he never followed the field into "AGI" (May 21) — the most influential living computer scientist explains why ML's roots are in statistics and operations research, not AI, and why data markets are Stackelberg games, not optimization problems.
- No Priors Ep. 163 with Elad Gil and Sarah Guo (May 21) — 38-minute conversation on inference economics, post-training moats, and where this cycle's defensibility actually lives.
Key Takeaways
- Frontier reasoning crossed a research threshold. General-purpose models can now produce publishable proofs of open math problems without domain-specific scaffolding.
- The moat moved into post-training. Cursor spent 85% of Composer 2.5's compute budget on RL and continued pretraining over an open checkpoint and matched Opus 4.7 at ~10x lower cost.
- 2-bit KV caches are production-ready. Together AI's OSCAR drops cache memory ~8x and lifts throughput up to 7.83x with no client changes.
- Flash leads the keynote now. Google opening I/O with 3.5 Flash (not Pro) signals that latency and unit economics dominate the agent-era roadmap.
- Open source isn't slowing. Hugging Face's spring report shows Chinese labs at ~41% of downloads and robotics datasets up 23x year-on-year — supply, not demand, is shifting.
The Big Story
An OpenAI model autonomously disproved Erdős's unit distance conjecture · May 20, 2026 · OpenAI
→ The model produced an infinite family of point configurations that beats the long-assumed square-grid bound by an explicit polynomial factor (refined to n^1.014 by Will Sawin at Princeton), built on Golod–Shafarevich theory and infinite class field towers — none of which was prompted. The load-bearing claim for ML practitioners isn't "AI did math"; it's that long-horizon proof search emerged from a general-purpose post-training stack, not a math-specialized scaffold like AlphaProof. External mathematicians verified the proof and wrote a companion paper explaining the construction.
Also This Week
Cursor's Composer 2.5 matches Opus 4.7 on SWE-Bench Multilingual at one-tenth the cost · May 18 · Cursor
→ Built on Moonshot's open Kimi K2.5 checkpoint with 85% of total compute spent on Cursor's RL and continued-pretraining pipeline — the strongest production signal yet that coding agents are won in post-training, not pretraining.
Google ships Gemini 3.5 Flash, Gemini Omni, and Antigravity 2.0 in one I/O · May 19 · Google
→ 3.5 Flash at $1.50/$9.00 per million tokens beats Gemini 3.1 Pro on Terminal-Bench 2.1 (76.2% vs 70.3%) and runs ~4x faster — the price/latency frontier is now the keynote story, not raw capability.
Together AI open-sources OSCAR, a 2-bit attention-aware KV cache · May 25 · MarkTechPost
→ Rotating activations with attention-aware covariance matrices (not generic Hadamard transforms) hits 3x decode speedup at 100K context and 7.83x throughput at large batches — drop-in for SGLang with full paged-cache compatibility.
Gemini Omni Flash brings native multi-input video generation to consumer surfaces · May 19 · Google Blog
→ Text + image + audio + video → high-resolution video output with SynthID watermarking, shipping inside YouTube Shorts the same week — the multimodal stack stopped being an API demo this quarter.
From the Lab
OScaR: The Occam's Razor for Extreme KV Cache Quantization in LLMs and Beyond · arXiv 2605.19660
→ The academic cousin to Together AI's OSCAR — different team, same problem space. Uses omni-scaled canalized rotation to reach near-lossless INT2 quantization across X-LLMs. If you're serving long-context models on constrained VRAM, this is the cleanest write-up of the new accuracy-efficiency Pareto front. Code at ZunhaiSu/OScaR-KV-Quant.
Rethinking LLM Ensembling from the Perspective of Mixture Models · arXiv 2605.00419
→ Reinterprets ensembles as mixture models, allowing stochastic per-token selection of a single component — 1.78–2.68x faster than conventional token-vote ensembles with comparable quality. Practical for cheap "router over open checkpoints" inference stacks.
Worth Reading
- State of Open Source on Hugging Face: Spring 2026 — China at ~41% of downloads, independent developers up from 17% to 39%, robotics datasets up 23x; the supply side of open ML is no longer Western-led.
- Gil Kalai: "Amazing — Erdős' Unit Distance Problem was Disproved! It was achieved by AI!" — a working combinatorialist's reaction to the OpenAI result; the most grounded take you'll find on what the proof actually does and doesn't say about ML.
- Cursor's Composer 2.5 hits third on Artificial Analysis's Coding Agent Index — independent benchmark commentary on the 10–60x cost gap vs. higher-effort Opus 4.7 and GPT-5.5 variants.
The week the moat moved twice: away from frontier scale, and toward proofs no one expected an LLM to write yet.