Generative AI News: DeepSeek makes 75% V4-Pro price cut permanent, escalating the inferenc — May 26, 2026

Frontier-class output for under a dollar per million while Google's I/O ships video, agents, and a Flash that actually competes.


A week of pricing shocks and shipped product. DeepSeek made its 75% V4-Pro discount permanent, Google packed I/O with Gemini 3.5 Flash, Omni video and Spark agents, Cursor unveiled Composer 2.5, and Alibaba threw Qwen3.7-Max into the long-horizon agent race. If you're a builder, the cost-quality curve just bent again — and the routing logic that worked last week is already stale.


Watch & Listen First

Daytona: AI Sandboxes for Agents — Latent Space (May 21) — Ivan Burazin on running stateful sandboxes on bare metal as the agent-compute market finally productizes.

Railway, agent-native infrastructure — Latent Space (May 20) — Jake Cooper on what hosting agents on bare-metal data centers actually looks like in 2026.


Key Takeaways

  • The pricing floor collapsed. DeepSeek V4-Pro is now $0.435 / $0.87 per M — roughly 11× cheaper than Claude Opus 4.7 at comparable coding scores.
  • Gemini 3.5 Flash is the new default value pick. $1.50 / $9 per M, 1M context, 76.2% Terminal-Bench 2.1, outperforming Gemini 3.1 Pro on agentic eval.
  • Agent runtimes are now a product, not glue code. Google's Managed Agents API and Cursor's Composer 2.5 both ship hosted Linux environments for long-horizon work.
  • Open weights are scoring at the frontier. Kimi K2.6 sits #4 on Artificial Analysis Intelligence Index; Mistral Medium 3.5 hits 77.6% on SWE-Bench Verified.
  • Video generation got cheaper and more multimodal. Gemini Omni Flash takes image, audio, video and text as input; Midjourney V1 launched at roughly 25× cheaper per second than rivals.

The Big Story

DeepSeek makes 75% V4-Pro price cut permanent, escalating the inference price war · May 25 · Reuters / Yahoo Finance
V4-Pro now lists from $0.003625 (cached) to $0.87 per M output, vs Opus 4.7 at $5 / $25 — the first time a frontier-class 1.6T MoE with 1M context is permanently priced below the Chinese domestic floor. The strategic tell: DeepSeek is pre-committing to a cost structure that assumes Huawei's Ascend 950 supernodes arrive in volume H2, betting hardware sovereignty closes the gap. For builders, the routing calculus now favors DeepSeek for any agentic or batch workload not bound by data-residency rules — and it puts real pressure on Anthropic and OpenAI to either cut Sonnet/GPT-5.5 Mini pricing or argue capability harder.


Also This Week

Google I/O 2026: Gemini 3.5 Flash, Omni video model, Spark agent and a Managed Agents API · May 19 · Google Blog
Flash 3.5 ($1.50 / $9, 1M context, 76% Terminal-Bench) becomes the obvious cost-sensitive default, and Managed Agents hand you a remote Linux box per API call — agent infra is now table stakes from the big three.

Qwen3.7-Max ships with 35-hour autonomy and 1M context, but no open weights · May 20 · VentureBeat
Alibaba is API-only at $2.50 / $7.50 per M and supports Claude Code as a harness — even Chinese labs increasingly gate their best agentic models behind closed APIs.

Cursor Composer 2.5 beats GPT-5.5 on SWE-Bench Multilingual at $0.07 per task · May 18 · Cursor Blog
Built on a Kimi K2.5 base with 85% of compute spent on Cursor's own RL post-training — proof that post-training open-weight foundations is now a real frontier path, not a cheap imitation.

MiniMax M2.7 and Qwen3 Coder Next ship in the same 48 hours · May 22 · LLM-Stats
The release cadence from Chinese labs has compressed from monthly to weekly — Western teams now have to design routing logic that can hot-swap models without re-evals every Friday.


From the Lab

Equilibrium Reasoners: Learning Attractors Enables Scalable Reasoning · arXiv 2605.21488 (May 20)
Reframes iterative latent-state reasoning as learning task-conditioned attractors with stable fixed points. In plain English: it gives a theoretical handle on why some test-time compute approaches plateau while others keep scaling — your inference loop is effectively gradient descent toward an attractor basin. Practical hint: architectures that explicitly target attractor depth may scale reasoning without exploding token budgets, the bottleneck behind today's $/answer wall on parallel-sampling approaches like OpenDeepThink (also dropped this week).


Worth Reading


The frontier is no longer one number. It's a Pareto curve — and this week DeepSeek dragged the cost axis hard while Google quietly redrew the capability axis. Pick your axis.