nvidianews.nvidia.com web signal June 1st 2026

NVIDIA Nemotron Ultra open-weights 500B at 30% lower cost

nvidia open source inference open-weight-models agentic-ai inference

Key insights

Nemotron Ultra runs 30% cheaper than leading alternatives at 300+ tokens per second across a 500-550B parameter model.
The model scores 48 on the U.S. Intelligence Index, directly outperforming Google's Gemma 4 31B in published benchmarks.
NVIDIA confirmed Nemotron 4 is in active development, indicating a sustained open-weight model roadmap beyond this release.

Why this matters

A hardware company controlling both chips and model weights can optimize the full inference stack in ways that pure-software labs cannot, meaning NVIDIA's 30% cost claim is structurally harder for competitors to replicate. The fully open release puts direct pressure on inference hosting providers like Together AI, Fireworks, and Groq, whose margin models depend on being the cheapest path to frontier open weights. Nemotron 4 already confirmed in development signals a sustained competitive program, compressing the response timeline for Meta, Mistral, and Google on the open-weight frontier.

Summary

NVIDIA's Jensen Huang unveiled Nemotron 3 Ultra at Computex on June 1, a 500-550B open-weight model built for agentic reasoning workloads and claiming a 30% inference cost advantage over leading alternatives. The Ultra is the flagship of a three-tier family alongside Nano and Super, and ships with fully open weights. At 300+ output tokens per second, it scores 48 on the U.S. Intelligence Index, outperforming Google's Gemma 4 31B. NVIDIA confirmed Nemotron 4 is already in active development. Essentially: (NVIDIA, Google) open-weight agentic reasoning now has a 500B contender from the company that makes the chips. - 30% cheaper inference than named alternatives, benchmarked at 300+ tokens per second throughput. - U.S. Intelligence Index score of 48 beats Gemma 4 31B in direct comparison. - Fully open weights mean enterprise teams can self-host without licensing constraints. With Nemotron 4 already signaled, NVIDIA is treating open-weight models as a roadmap product, not a one-time research release.

Potential risks and opportunities

Risks

If the 30% cost advantage only materializes on NVIDIA's own H100/H200 hardware, cloud-neutral inference providers (Together AI, Fireworks, Groq) face customer defection to NVIDIA-hosted endpoints within the next 60-90 days.
Google's Gemma team was named publicly as a performance comparison target in a Computex keynote, creating reputational pressure that may force an accelerated Gemma release before internal timelines support it.
Fully open 500B weights with no licensing guardrails lower the barrier for adversarial agentic deployments, giving well-resourced threat actors access to frontier reasoning capability without usage monitoring.

Opportunities

Inference optimization platforms (vLLM, Modal, Anyscale) can build Nemotron Ultra-specific deployment tiers and capture early enterprise traffic before larger cloud providers productize the model.
Enterprises currently on closed-model API contracts with OpenAI or Anthropic now have a credible cost benchmark to justify internal self-hosted migration proposals, opening consulting and integration work for AI deployment firms.
Hyperscalers (AWS, Azure, GCP) can offer Nemotron Ultra as a managed model endpoint to capture inference revenue from customers who want open-weight flexibility without managing their own H100 clusters.

What we don't know yet

Which specific models constitute the 'leading alternatives' in NVIDIA's 30% inference cost comparison, and on what hardware configuration the benchmark was run.
Whether the U.S. Intelligence Index score of 48 was measured on NVIDIA's own H100/H200 infrastructure or a neutral third-party benchmark environment.
Parameter scale and target release window for Nemotron 4, given NVIDIA's June 1 confirmation that it is already in active development.

Originally reported by nvidianews.nvidia.com

Read the original article →

Original headline: NVIDIA Launches Nemotron 3 Ultra at Computex — 500B Open-Weight Agentic AI Model, 30% Cheaper to Run Than Alternatives