Machine Learning News: An OpenAI model autonomously disproved Erdős's unit distance conjectur — May 26, 2026

OpenAI's autonomous proof, Cursor's post-training moat, and Google's Flash-first keynote rewrote this week.


This week ML went two ways at once: research that wasn't supposed to happen yet, and shipping that wasn't supposed to happen this fast. A general-purpose reasoning model from OpenAI autonomously disproved Paul Erdős's 1946 unit distance conjecture with no math-specific scaffolding, while Google compressed a year of frontier launches into 22 days. Underneath the headlines, Cursor and Together AI made the quieter case that post-training and inference compression — not parameter counts — are where the next twelve months get decided.



Watch & Listen First


Key Takeaways

  • Frontier reasoning crossed a research threshold. General-purpose models can now produce publishable proofs of open math problems without domain-specific scaffolding.
  • The moat moved into post-training. Cursor spent 85% of Composer 2.5's compute budget on RL and continued pretraining over an open checkpoint and matched Opus 4.7 at ~10x lower cost.
  • 2-bit KV caches are production-ready. Together AI's OSCAR drops cache memory ~8x and lifts throughput up to 7.83x with no client changes.
  • Flash leads the keynote now. Google opening I/O with 3.5 Flash (not Pro) signals that latency and unit economics dominate the agent-era roadmap.
  • Open source isn't slowing. Hugging Face's spring report shows Chinese labs at ~41% of downloads and robotics datasets up 23x year-on-year — supply, not demand, is shifting.

The Big Story

An OpenAI model autonomously disproved Erdős's unit distance conjecture · May 20, 2026 · OpenAI
The model produced an infinite family of point configurations that beats the long-assumed square-grid bound by an explicit polynomial factor (refined to n^1.014 by Will Sawin at Princeton), built on Golod–Shafarevich theory and infinite class field towers — none of which was prompted. The load-bearing claim for ML practitioners isn't "AI did math"; it's that long-horizon proof search emerged from a general-purpose post-training stack, not a math-specialized scaffold like AlphaProof. External mathematicians verified the proof and wrote a companion paper explaining the construction.


Also This Week

Cursor's Composer 2.5 matches Opus 4.7 on SWE-Bench Multilingual at one-tenth the cost · May 18 · Cursor
Built on Moonshot's open Kimi K2.5 checkpoint with 85% of total compute spent on Cursor's RL and continued-pretraining pipeline — the strongest production signal yet that coding agents are won in post-training, not pretraining.

Google ships Gemini 3.5 Flash, Gemini Omni, and Antigravity 2.0 in one I/O · May 19 · Google
3.5 Flash at $1.50/$9.00 per million tokens beats Gemini 3.1 Pro on Terminal-Bench 2.1 (76.2% vs 70.3%) and runs ~4x faster — the price/latency frontier is now the keynote story, not raw capability.

Together AI open-sources OSCAR, a 2-bit attention-aware KV cache · May 25 · MarkTechPost
Rotating activations with attention-aware covariance matrices (not generic Hadamard transforms) hits 3x decode speedup at 100K context and 7.83x throughput at large batches — drop-in for SGLang with full paged-cache compatibility.

Gemini Omni Flash brings native multi-input video generation to consumer surfaces · May 19 · Google Blog
Text + image + audio + video → high-resolution video output with SynthID watermarking, shipping inside YouTube Shorts the same week — the multimodal stack stopped being an API demo this quarter.


From the Lab

OScaR: The Occam's Razor for Extreme KV Cache Quantization in LLMs and Beyond · arXiv 2605.19660
The academic cousin to Together AI's OSCAR — different team, same problem space. Uses omni-scaled canalized rotation to reach near-lossless INT2 quantization across X-LLMs. If you're serving long-context models on constrained VRAM, this is the cleanest write-up of the new accuracy-efficiency Pareto front. Code at ZunhaiSu/OScaR-KV-Quant.

Rethinking LLM Ensembling from the Perspective of Mixture Models · arXiv 2605.00419
Reinterprets ensembles as mixture models, allowing stochastic per-token selection of a single component — 1.78–2.68x faster than conventional token-vote ensembles with comparable quality. Practical for cheap "router over open checkpoints" inference stacks.


Worth Reading


The week the moat moved twice: away from frontier scale, and toward proofs no one expected an LLM to write yet.