The cutting edge, and the cutting edge already on the clock. On the frontier this week: OpenAI shipped its strongest model to ~20 vetted partners, DeepSeek open-sourced the tricks that make models fast, and a new paper squeezed more reasoning out of far less memory. The edge also left the screen: humanoid robots got their first real safety stack, and China gave AI agents ID cards. And it's already at work: GPT-5 Pro cracked a three-year immunology mystery, six of the world's top-10 banks bet $200M on AI fraud detection, and coding agents landed on every phone. The lag between a lab result and a deployed system used to be years. This week it was days.

Sponsor

Quick Hits

The Cutting Edge

  • OpenAI's GPT-5.6 is its strongest model yet — and almost no one can use it — The new lineup is Sol (the flagship and, OpenAI says, its strongest model to date, with an "Ultra Subagent Mode" that deploys sub-agents on complex tasks and a "Max Reasoning" effort setting), Terra (matches GPT-5.5 at half the price), and Luna (the lowest-cost option). But under the June 2 executive order, initial access is restricted to roughly 20 vetted API and Codex partners — broader ChatGPT, Codex and API availability is only "coming soon." The best model OpenAI has built launched to almost no one. [MacRumors]
  • DeepSeek open-sourced the training stack behind fast inference — DeepSpec is a full-stack, MIT-licensed codebase for training and evaluating the speculative-decoding "draft models" — DSpark, DFlash and Eagle3 — that make large models generate faster, with data-prep, training and eval scripts that work across target architectures including Gemma and Qwen. Speculative decoding is the main lever labs pull to cut latency; the whole recipe to build and benchmark it is now public, not a proprietary edge. [GitHub]
  • A new paper throws away 87% of an LLM's memory and gets better answers — InfoKV adds two signals to KV-cache compression — predictive entropy and layer-wise representation change — to keep the tokens attention-only methods discard. On a long-context benchmark it kept just 12.5–25% of the cache and beat the full-cache baseline, with the gap widening as context grew to 64k tokens. The binding constraint on long-context reasoning isn't the weights; it's the cache, and this is a cheaper way to manage it. [Hugging Face]

The Edge Leaves the Screen

  • Humanoid robots just got their first full-stack safety system — NVIDIA unveiled Halos for Robotics, what it calls the industry's first full-stack safety system for physical AI: industrial-grade safety compute (IGX Thor), a Holoscan sensor bridge, a Halos OS safety layer, and a dedicated AI Systems Inspection Lab for certification. First partner Agility is building it into Digit, the humanoid already working in Amazon's warehouses. Embodied AI's bottleneck is shifting from "can it move" to "can it move safely next to people." [NVIDIA]
  • China just gave every AI agent an ID card — China's market regulator (SAMR) issued the country's first national standard for AI-agent interconnection: a seven-part "Artificial Intelligence Agent Interconnection" framework that, among other things, defines how agents get unified identifiers across domains for what state media calls "secure cross-domain interaction." As autonomous agents start talking to each other, which agent is which stops being a detail and becomes infrastructure. [SCMP]

The Cutting Edge, Applied

  • GPT-5 Pro cracked a three-year immunology mystery at The Jackson Laboratory — Since 2022, immunologist Derya Unutmaz had flow-cytometry data he couldn't explain: blocking glucose metabolism in human T cells, then priming them, pushed them toward an inflammatory state. GPT-5 Pro proposed the mechanism — disrupted N-linked glycosylation — and, as a check, correctly predicted the outcome of a held-out lymphoma experiment he'd already run. Unutmaz called it "a remarkable insight." Not a benchmark score — a working lab's open question, closed. [OpenAI]
  • Six of the top-10 banks just bet $200M that AI catches the fraud they miss — Quantifind raised $200M led by Summit Partners, with Citi Ventures and S&P Global in the round, to run governed AI agents against financial-crime alerts; it already serves six of the world's ten largest banks. A Celent analysis cited in the raise estimates a Tier-1 bank could cut alert-processing costs by up to $177.9M a year — the number that turns a pilot into a line item. [PR Newswire]
  • OpenAI Codex Remote is now on every ChatGPT plan — and runs from your phone — Codex's autonomous coding agent reached general availability across all subscription tiers, with iOS/Android apps that pair to a Mac or Windows host via QR code and a DigitalOcean plugin that auto-provisions a cloud workspace. The coding agent left the IDE: you can now kick off, monitor and approve a build from a train platform. [OpenAI]

The Distance From Lab to Field Just Collapsed

For most of the AI era there was a comfortable lag between a research result and a system you could actually use. A clever decoding trick lived in a paper for a year before it shipped. A model that could reason about cell biology was a benchmark score, not a lab partner. That lag is the thing that broke this week.

The two halves of this issue happened at once. DeepSeek didn't publish a paper about faster inference — it open-sourced the training stack, the same week OpenAI put the agent that speed enables on every phone. A frontier model wasn't demoed on a held-out test set — it resolved a three-year question inside a working immunology lab and predicted an experiment's result before anyone showed it the answer. The robots learning to work next to people got a safety certification path the same week the agents in our software got identity cards. Capability and the plumbing to deploy it are arriving together now, not years apart.

The state of the art used to be something you read about and waited for. This week it's a lab partner, a fraud analyst and a coworker — already on the clock.

Key Takeaways

  • The cutting edge moved on every front at once. In a single week: a stronger frontier model (GPT-5.6), open-sourced inference speed (DeepSpec), a memory trick that beats full-context (InfoKV), the first humanoid-robot safety stack (NVIDIA Halos), and China's first national standard for AI agents. Not one breakthrough — a whole field moving.
  • The expensive part is going open. DeepSpec and InfoKV are free to download. The capabilities labs treat as moats — inference speed and long-context memory — are increasingly public recipes anyone can build on.
  • The edge is leaving the screen. Physical AI got a safety-certification path and autonomous agents got identity infrastructure in the same week. The frontier is no longer just a model you call — it's robots on a warehouse floor and agents that have to prove who they are.
  • The lab-to-field gap collapsed. A three-year immunology mystery solved with a verifiable prediction; $200M and six of the top-10 banks behind AI fraud detection; autonomous coding agents on every plan. Applied AI stopped trailing the research by years.

Worth Reading

  • Claude is now a member of your Slack — not a chat window — Claude Tag lets teams tag @Claude in a channel; it builds context from the channel's history and acts with whatever tools, data and codebases it's granted. Anthropic says its internal version already writes 65% of its product team's code. The agent is moving from a tab you open to a colleague you @-mention. Shared this week by 5 of the AI experts we track. [Anthropic]
  • One poisoned config file in a repo can drain your AWS keys through Amazon Q — CVE-2026-12957 (CVSS 8.5), found by Wiz Research: a malicious .amazonq/mcp.json in a cloned repository auto-launches an MCP server that inherits the developer's live AWS credentials, API tokens and SSH keys — no extra click required. Amazon patched it (Language Servers 1.65.0+), but it's a clean look at how the agent-config layer became the soft target. [The Hacker News]
  • Nature: a model's bias isn't designed in — it's baked into the training data — Chinese-language documents matching state-coordinated media appear in a typical training set at roughly 41× the rate of Chinese Wikipedia. Pretraining on just 6,400 state-scripted documents made an open-weight model produce pro-government answers nearly 80% of the time, and annotators rated its Chinese-language replies as more regime-favorable in 75.3% of comparisons. The supply chain you can't audit is the corpus. [Nature]
  • AI hiring tools don't just discriminate — they reject you everywhere at once — Stanford HAI studied 4 million applications across 1,700 postings from 150 employers and found 10% of applicants who applied to four jobs were rejected from all of them — a "systemic rejection" pattern that doesn't appear without algorithmic screening, on top of measurable racial disparities masked by pooled audits. [Stanford HAI]
  • The AI-powered World Cup runs on thousands of human data workers — The torrent of real-time match data behind the 2026 World Cup is produced by annotators in Brazil, the Philippines, India, Egypt and Eastern Europe who hand-tag up to 3,000 actions per match — passes, shots, tackles — for about $70 a game, feeding betting platforms, team analytics and broadcasters. Behind every "automated" stat is a person watching the tape. Shared this week by 5 of the AI experts we track. [Rest of World]

Wait, What?

  • An AI designed a burger that beats the Big Mac — and the planet wins too — In a peer-reviewed npj Science of Food paper, Stanford researchers built "BurgerAI" on 2,216 Food.com recipes using the same diffusion math behind image generators. In a blinded taste test with 101 people, its burgers matched or beat the Big Mac on liking, flavor and texture; its mushroom version scored an order of magnitude lower on environmental impact, and its bean version nearly doubled the nutrition. The authors' framing is the real headline: this moves generative AI "from prediction to design." [npj Science of Food]

Worth Watching

The videos AI practitioners are passing around right now — curated on AI TV.

AI is Losing and the Left is Winning, with Brennan Lee Mulligan and Ed Zitron
Adam Conover
CRASH IMMINENT: Ed Zitron Says AI Valuations Are Complete FRAUDS
Breaking Points
Ed Zitron explains OpenAI’s leaked financials
The Tech Report

This week's poll

We split the week into the raw cutting edge, the edge leaving the screen, and the cutting edge already at work. Which front matters most to you right now?

Last week, 228 of you voted:

Anthropic says Alibaba industrialized the theft of Claude and took it to Washington. Whose problem is this, really?

  • It's theft — labs need legal and technical walls around their models now42%
  • It's inevitable — distillation is how the frontier diffuses, and that's fine23%
  • It's a distraction — the real risk is the talent walking out the door17%
  • It's Washington's call — this is an export-control fight, not a corporate one18%

See full results →

— Alexis