The model-release calendar went quiet, but the distribution layer moved more than April did. OpenAI flipped ChatGPT's default to a model that hallucinates roughly half as often on prompts most likely to hurt users. A Miami startup with no peer-reviewed paper claimed a 12M-token context window and got $29M the same day. Apple confirmed iOS 27 will plug Claude or Gemini into Siri system-wide. If you build on the foundation layer, the surface underneath you just shifted.

Get more from AI Weekly

More signal, less noise — pick your channels.

You're reading the weekly brief. Below are the other ways to follow the story — every channel free, easy to leave.

  • → Explore 16 deep dives
    Weekly topic-specific newsletters: Generative AI, Machine Learning, AI in Business, Robotics, Frontier Research, Geopolitics, Healthcare, and more.
    Browse all 16 deep dives →
  • → Breaking AI alerts
    When something major breaks (a $60B acquisition, a regulator's emergency meeting, a frontier model leak), alert subscribers know within hours. Typically 0-2 emails per day.
    Get breaking alerts →
  • → AI News Today (live)
    Live dashboard updated as the scanner finds news: scored stories from the last 48 hours, weekly entity movers, and quarterly trend lines across 113 AI companies, people, and topics.
    Open AI News Today →

Watch & Listen First

State of AI in 2026: LLMs, Coding, Agents — Lex Fridman #490 (YouTube)
Lambert and Raschka, four hours on where reasoning models stand and whether open-weight labs structurally close on the frontier.

AI + a16z: MCP Co-Creator on the Next Wave of LLM Innovation (Spotify)
Anthropic's David Soria Parra on MCP's origin and the next integrations — the protocol layer underneath half the agentic tooling shipping now.

Latent Space: The AI Engineer Podcast (Spotify)
swyx and Alessio on coding agents and how production engineers actually wire frontier models together.


Key Takeaways

  • Defaults are the new benchmark. OpenAI cut hallucinations 52.5% on high-stakes topics by changing what ships at chatgpt.com, not by training a smarter Pro tier.
  • Sub-quadratic attention is a product, not a paper. SubQ's 12M-token window is unverified, but it's the first commercial deployment of post-transformer attention research that's been circling arXiv for two years.
  • Apple is unbundling Apple Intelligence. iOS 27 Extensions lets third-party models drive Siri, Writing Tools, and Image Playground system-wide — plug-and-play on 1B+ devices.
  • RL for reasoning is being reframed, not scaled. Two papers argue RL doesn't teach new capability — it selects sparse policies the base model already contains, wiping out most of the claimed compute moat.
  • Agentic Android ships in months. Gemini Intelligence builds a shopping cart from a grocery list screenshot with one confirm — a full quarter before Apple's equivalent.

The Big Story

OpenAI Flips ChatGPT's Default to GPT-5.5 Instant, Cuts Hallucinations 52.5% · May 5, 2026 · OpenAI

The model isn't the headline — the rollout is. 5.5 Instant produced 52.5% fewer hallucinated claims than 5.3 on high-stakes prompts (medicine, law, finance), with 37.3% fewer on user-flagged factual-error conversations. It also uses ~30% fewer words per answer, so token-cost per useful reply drops without any pricing change. It matters because it's the default: every free chatgpt.com user gets it, and developers hitting chat-latest inherit it on next deploy. RAG guardrails built against 5.3's failure modes are now over-engineered; teams pinned to 5.3 need to retest evals before users notice answers feel different.


Also This Week

Subquadratic Launches With $29M, Claims 12M Context at 1,000x Lower Compute · May 5, 2026 · Subquadratic
→ SSA learning which token pairs actually matter is real research, but the company shipped no peer-reviewed paper, claims 92.1% needle-in-a-haystack at 12M, and earlier attempts (Mamba, RWKV) hybrided when they couldn't match dense quality; benchmark before you migrate.

Apple to Let iOS 27 Users Swap in Gemini or Claude for Siri · May 5, 2026 · Bloomberg
→ Extensions wires installed AI apps directly into Siri, Writing Tools, and Image Playground with distinct voices per provider — iOS 27 becomes a model-routing OS, and OpenAI's exclusive Apple slot dies the day Anthropic and Google ship.

Google Ships Agentic Gemini Intelligence Across Android Ahead of I/O · May 12, 2026 · Google Blog
→ Long-press the power button over a grocery list, ask for a cart in your shopping app, confirm checkout — the first agentic phone behaviour shipping to actual Samsung and Pixel install base.

Cursor Composer 2 Drops to $0.50/$2.50 per Million Tokens · May 2026 · Cursor
→ Frontier-tier coding priced 5x below GPT-5.4 with sub-30-second turns and 8 parallel agents in isolated worktrees — cost-per-finished-task is falling faster than headline token price.


From the Lab

Rethinking RL for LLM Reasoning: It's Sparse Policy Selection, Not Capability Learning · May 7, 2026 · arXiv 2605.06241
→ Akgül et al. show RL post-training doesn't teach reasoning — it redistributes probability onto solutions the base model already contained. ReasonMaxxer matches full RL with tens of problems and minutes of single-GPU training, wiping out three orders of magnitude of the compute moat big RL pipelines claimed.

Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key · May 7, 2026 · arXiv 2605.06638
→ RL training compute follows a power law in reasoning depth with the exponent climbing as logical expressiveness rises — deeper reasoning gets exponentially more expensive to train, which is why today's reasoning models hit a wall around 50–60 steps.


Worth Reading


When the model layer pauses, distribution and architecture move.