nvidianews.nvidia.com via Reddit

NVIDIA Cosmos 3 open-sources full physical AI stack

6 sources tracking this story
nvidia robotics open source physical-ai open-source robotics

Key insights

  • The mixture-of-transformers architecture separates a reasoning tower from a generation tower, removing the need to orchestrate multiple specialized models.
  • Two model sizes (Nano 8B and Super 32B) ship with NVFP4 quantization delivering 2x inference speedup on Blackwell hardware.
  • NVIDIA released six open synthetic datasets covering robotics, physics, autonomous driving, and warehouse operations, reducing data collection barriers for physical AI teams.

Why this matters

NVIDIA's Cosmos 3 launch at GTC Taipei delivers the first open physical AI foundation model that unifies perception, world simulation, and robot action generation in a single architecture, removing the need for separate systems that previously required months of integration work. The simultaneous release of weights (Nano 8B and Super 32B), six synthetic training datasets, and full post-training code on Hugging Face and GitHub signals a deliberate open-source platform play, with the Cosmos Coalition of six founding AI labs extending NVIDIA's ecosystem leverage into the world-model layer. Jensen Huang framed the announcement as the precursor to 'the big bang of physical AI,' and the accompanying HUE benchmark framework addresses a known credibility gap in physical AI reasoning evaluation. Together, the technical architecture, licensing strategy, and coalition structure position NVIDIA's Cosmos stack as the default training and evaluation infrastructure for robotics and autonomous vehicle teams.

Summary

NVIDIA launched Cosmos 3 at GTC Taipei, an open foundation model unifying vision reasoning, world generation, and action prediction in one architecture. Cosmos 3 Nano and Cosmos 3 Super are live on HuggingFace today with training scripts and datasets on GitHub. The mixture-of-transformers design processes text, images, video, ambient sound, and robot actions natively. NVIDIA claims this cuts physical AI training from months to days. Essentially: (NVIDIA, Agile Robots, Black Forest Labs, Runway, Skild AI) are building shared open infrastructure for physical AI via the Cosmos Coalition. - Both model sizes immediately available with full tooling open-sourced on GitHub. - Architecture spans five modalities natively: text, image, video, sound, robot actions. The open release bets on ecosystem gravity over proprietary control.

Potential risks and opportunities

Risks

  • Robotics startups building core pipelines on Cosmos 3 face future vendor lock-in if NVIDIA changes licensing terms, a pattern precedented by CUDA and early PyTorch ecosystem dependencies
  • Open-sourcing a world generation model on HuggingFace creates risk of misuse for synthetic training data at scale, potentially exposing NVIDIA and Cosmos Coalition partners to EU AI Act scrutiny during the 2026 compliance review cycle
  • Google DeepMind, Boston Dynamics, and other closed physical AI developers face accelerated pressure to open-source comparable stacks within 6 to 12 months or cede the developer ecosystem to NVIDIA

Opportunities

  • Hardware robotics vendors (Unitree, Agility Robotics, Figure AI) can reduce internal AI R&D spend immediately by fine-tuning Cosmos 3 rather than training proprietary perception and planning models from scratch
  • Cloud providers with NVIDIA GPU partnerships (AWS, Google Cloud, Azure) have a natural upsell surface offering managed Cosmos 3 fine-tuning as a service for enterprise robotics and AV customers
  • Autonomous vehicle startups outside NVIDIA's current AV partnerships have a roughly 90-day decision window to either adopt Cosmos 3 and gain training efficiency or commit resources to building a competing open alternative before ecosystem lock-in solidifies

What we don't know yet

  • Benchmark comparisons against closed physical AI systems like Google DeepMind's RT-X and Boston Dynamics' internal research stack are absent from the launch materials
  • Whether the months-to-days training claim holds for custom robotics hardware outside NVIDIA's Jetson and Hopper ecosystem remains untested publicly
  • Commercial licensing terms for Cosmos Coalition partners, specifically whether Runway and Black Forest Labs can monetize fine-tuned derivatives, were not disclosed at launch

What others are reporting

Coverage cluster as of 2h after publish

  1. NVIDIA Blog Read →

    First-party narrative covering the dual-tower architecture, VANTAGE-Bench and TAR challenge benchmark leadership, native robot control output generation, and OpenMDW 1.1 licensing.

    Physical AI systems need to understand not just what they see and what caused that to happen, but what's likely to happen next.
  2. NVIDIA Technical Blog Read →

    Developer-facing deep dive: dual-tower autoregressive and diffusion architecture, NVFP4 2x speedup, vLLM integration, six open synthetic datasets, and HUE benchmark for leaderboard-resistant evaluation.

    NVIDIA Cosmos 3 is a frontier foundation model for physical AI that combines physical reasoning, world generation, and action generation within a single open model.
  3. Hugging Face Blog Read →

    First-party model availability post: Nano (8B) and Super (32B) weights on Hugging Face, Diffusers Cosmos3OmniPipeline integration with code examples, and six open synthetic datasets for post-training.

    Cosmos 3 represents a major leap forward in world foundation models for physical AI: a unified omni-model combining generation, reasoning, and action in one model.
  4. GamesBeat Read →

    Industry outlet covering Cosmos Coalition formation with named partners (Agile Robots, Runway, Skild AI) and Jensen Huang's 'big bang of physical AI' framing, plus developer access paths on Hugging Face and build.nvidia.com.

    The big bang of physical AI is just around the corner thanks to breakthroughs in multimodal reasoning language, vision and world models.
  5. The Elec Read →

    Korean tech publication providing regional Computex coverage, noting the training-cycle reduction from months to days and the coalition structure for a Korea-adjacent hardware and robotics audience.