nvidianews.nvidia.com web signal

NVIDIA Nemotron 3 Launches 500B-Param Open Ultra Model

By Alexis Dufresne Published June 1, 2026 at 09:36 UTC Updated June 1, 2026 at 13:55 UTC

nvidia open source inference open-weight-models agentic-ai inference

Key insights

NVIDIA Nemotron 3 Ultra packs ~500B total parameters but activates only 50B per token through a mixture-of-experts architecture.
Nano is available immediately on Hugging Face and AWS Bedrock; Super and Ultra models are scheduled for the first half of 2026.
Nemotron 3 Nano achieves 4x higher throughput than its Nemotron 2 predecessor while reducing reasoning-token generation by up to 60%.

Why this matters

The Nemotron 3 family's mixture-of-experts design makes 500B-parameter agents economically viable by capping active compute at 50B per token, with Artificial Analysis rating the family the most efficient among same-size open models. Nano's immediate open availability on Hugging Face alongside AWS Bedrock and Google Cloud distribution gives enterprises a path to frontier-scale agentic reasoning without closed-API lock-in, compressing the moat closed-model providers have built on proprietary access. With thirteen major enterprises including Palantir, CrowdStrike, and Perplexity already integrating the models, NVIDIA is establishing an open-model standard for agentic workloads before its most capable tiers even ship.

Summary

NVIDIA's Nemotron 3 family launches in three open-weight tiers for agentic AI: Nano (30B params), Super (~100B), and Ultra (~500B). All three use a hybrid latent mixture-of-experts design that keeps active parameters well below total count. Ultra activates 50B of 500B parameters per token; Nano runs on just 3B active. Nano ships now on Hugging Face, AWS Bedrock, and Google Cloud; Super and Ultra arrive in the first half of 2026. Essentially: (NVIDIA, AWS, Google Cloud) are building the open agentic model stack together. - Nano delivers 4x the throughput of Nemotron 2 Nano and cuts reasoning-token use by up to 60%. - All tiers carry a 1-million-token context window. - Artificial Analysis ranks Nemotron 3 most efficient among same-size open models. Thirteen early adopters including Accenture, CrowdStrike, Palantir, and Perplexity are already integrating the family.

Potential risks and opportunities

Risks

Super and Ultra's first-half 2026 delay gives closed-model providers additional runway to lock agentic enterprise contracts before NVIDIA's frontier open tiers are available.
Enterprises adopting Nano now face potential migration friction if Super and Ultra use different tool-calling or context-window formats than the 30B tier.
Artificial Analysis efficiency rankings serve as NVIDIA's primary proof of competitiveness, creating reputational risk if rival open-weight releases publish contradicting benchmarks before Super and Ultra ship.

Opportunities

Listed early adopters Perplexity and Cursor gain a head start building Nemotron-native agentic products before Super and Ultra open to the broader developer ecosystem.
Enterprise software vendors ServiceNow, Oracle, and Zoom integrating Nemotron 3 can embed agentic reasoning without paying per-token closed-API fees, reducing vendor concentration risk.
NeMo Gym and NeMo RL open-source libraries released alongside the model family create an opening for AI tooling companies to build Nemotron-specific fine-tuning and evaluation infrastructure.

What we don't know yet

Benchmark comparisons against competing open-weight MoE families were not published in the announcement, leaving Artificial Analysis as the sole efficiency reference.
Pricing for enterprise access through AWS Bedrock and Google Cloud tiers is undisclosed in the release.
Whether the first-half 2026 availability timeline for Super and Ultra will hold given demand on NVIDIA Blackwell infrastructure is unaddressed.

Originally reported by nvidianews.nvidia.com

Read the original article →

Original headline: NVIDIA Launches Nemotron 3 Ultra at Computex — 500B Open-Weight Agentic AI Model, 30% Cheaper to Run Than Alternatives