World Models: the post-LLM intelligence race May 5th 2026

Curated by Alexis

The capital has decided: LLMs are not the endgame. This week, the world models field is digesting a wave of research and investment that collectively argues — in code, cash, and benchmarks — that learning to predict the world beats learning to predict the next token. With AMI Labs flush with its $1.03B seed, Waymo training self-driving cars on scenarios that physically cannot exist yet, and Physical Intelligence demonstrating robots that generalize beyond their training data, the gap between the two paradigms is no longer theoretical. It's shipping.

Watch & Listen First

Yann LeCun | Self-Supervised Learning, JEPA, World Models, and the Future of AI
LeCun's clearest technical walkthrough of the JEPA thesis — why predictive latent-space learning beats token generation for building grounded intelligence. Required viewing to understand what AMI Labs is actually building.

What Is Yann LeCun Cooking? JEPA Explained Simply
A tightly produced explainer breaking down I-JEPA, V-JEPA, and VL-JEPA — ideal for forwarding to colleagues who keep asking "but what does AMI actually ship?"

[VL-JEPA] Yann LeCun's World Models Go 'In-The-Wild'
Paper walkthrough of VL-JEPA's continuous embedding prediction approach, with benchmarks showing 50% parameter reduction versus comparable vision-language models.

Key Takeaways

AMI Labs is the world model funding anchor. LeCun's venture closed $1.03B at a $3.5B valuation — Europe's largest seed round ever — giving the JEPA paradigm institutional backing comparable to frontier LLM labs.
Efficiency is becoming the world models argument. LeWorldModel runs end-to-end JEPA from raw pixels with 15M parameters on a single GPU — 48x faster planning than foundation-model-scale world models. Size isn't the point; structure is.
Autonomous driving is the proving ground. Waymo's collaboration with DeepMind's Genie 3 to simulate impossible scenarios shows world models earning their keep in production-adjacent systems before robotics does.
Physical Intelligence's π0.7 is the first credible step toward compositional generalization in robotics — robots recombining skills across domains without task-specific training, which is precisely what LLM-based planners still can't reliably do.
The "large world model" framing is winning investment. World Labs ($1B), AMI Labs ($1.03B), ShengShu Technology (~$293M) — capital is voting for spatial, physical, and predictive architectures over text scaling.

The Big Picture

AMI Labs' $1.03B JEPA Bet Is Now the World Models Funding Moment · March 2026 · TechCrunch

AMI Labs — LeCun's post-Meta venture built around the Joint Embedding Predictive Architecture — closed Europe's largest-ever seed round, and the team composition clarifies what "world model" actually means in practice. Chief Science Officer Saining Xie and VP of World Models Michael Rabbat give AMI a research-grade core; COO Laurent Solly, ex-Meta VP for Europe, signals that this is simultaneously a technical and geopolitical project — a European alternative to American LLM hegemony built on a fundamentally different epistemic bet. JEPA learns by predicting abstract representations in latent space rather than reconstructing raw pixels or tokens, which LeCun argues produces systems that understand physical causality rather than approximating it through correlation. AMI's first deployment partner is clinical AI developer Nabla — a deliberate beachhead in a domain where hallucination and brittleness carry real costs, and where grounded world models have a structural advantage over stochastic token predictors.

Also This Week

Physical Intelligence's π0.7 Shows Robots Solving Tasks They Were Never Trained On · April 16, 2026 · TechCrunch

The first credible demonstration of compositional generalization in a deployed robot policy — π0.7 recombines skills across training domains to fold laundry on robots with zero laundry-folding data, which is the robotic equivalent of zero-shot transfer that LLM planners have been claiming for years.

Waymo World Model Trains on Scenarios That Don't Physically Exist · February 2026 · Waymo Blog

Built on DeepMind's Genie 3 and rendering interactive environments at 24fps, the Waymo World Model generates edge cases no real-world fleet could collect fast enough — tornadoes, aircraft on freeways — making it the clearest industrial argument yet for generative world models over pure data scaling.

World Labs Closes $1B to Build Spatial Intelligence at Scale · February 2026 · The AI Insider

Fei-Fei Li's company, backed by NVIDIA, AMD, and Autodesk, frames "large world models" as the successor category to LLMs, with Marble — its 3D generative world model — already handling spatial reconstruction from images for architects and roboticists.

NVIDIA Ships Cosmos 2.5 and GR00T N1.7 Into Early Commercial Access · 2026 · NVIDIA Newsroom

Cosmos Transfer 2.5, Predict 2.5, and Reason 2 advance NVIDIA's physical AI stack while GR00T N1.7 moves into commercial licensing — marking the shift from research demos to production-ready world foundation models for humanoid robotics deployments.

From the Lab

LeWorldModel: Stable End-to-End JEPA from Pixels · arXiv 2603.19312

The first JEPA to train stably end-to-end from raw pixels without auxiliary losses or heuristic tricks — 15M parameters, single GPU, converging in hours. LeWM's SIGReg regularizer enforces an isotropic Gaussian latent space to prevent representation collapse, cutting tunable loss hyperparameters from six to one. The 48x planning speedup over foundation-model-scale world models is the strongest efficiency argument yet for JEPA over diffusion-based world simulators, and it came from Mila, NYU, and Samsung SAIL — not a hyperscaler.

VL-JEPA: Vision-Language World Modeling via Continuous Embedding Prediction · arXiv 2512.10942

VL-JEPA replaces discrete token generation with continuous latent embedding prediction across vision and language, hitting comparable benchmark performance with 50% fewer trainable parameters — evidence that the JEPA predictive objective generalizes across modalities and isn't a vision-only niche.

The Debate

The LLMs-vs-world-models debate sharpened when MIT Technology Review listed world models among the ten things that matter most in AI right now, a mainstream imprimatur that would have seemed premature 18 months ago. LeCun's position — that systems predicting text tokens cannot build physical causality, only approximate its surface texture — gained empirical weight via π0.7's compositional generalization results and LeWorldModel's efficiency numbers. But critics, including Adam Holter's widely circulated rebuttal, argue that LeCun conflates the architecture with the training signal, and that multimodal LLMs trained on video and sensor data are converging on the same capabilities from the other direction. The honest read this week: both camps are building world models — they just disagree on whether language is a scaffold or a ceiling.

Worth Reading

Waymo World Model: A New Frontier for Autonomous Driving Simulation — The technical post is more precise than press coverage; the Genie 3 integration details reward close reading.
LeWorldModel: How LeCun Solved the Hardest Problem in World Models on a Single GPU — Best lay explanation of SIGReg and why representation collapse has been the field's unsolved core problem.
Beyond LLMs: Yann LeCun and the Architecture of Post-Scaling AI — Situates AMI Labs in the broader post-scaling discourse; useful context for briefing non-technical stakeholders on why this isn't just another AI startup.

The race to model the physical world is no longer a philosophical argument — it's a funding category.

Get more from AI Weekly

More signal, less noise — pick your channels.