nvidia.com web signal

NVIDIA GB300 Tops All Seven MLPerf 6.0 Benchmarks

By Alexis Dufresne Published June 17, 2026 at 03:51 UTC Updated June 18, 2026 at 05:17 UTC

7 sources tracking this story

nvidia chips ai infrastructure ai-benchmarks gpu-performance ai-infrastructure

Key insights

MLCommons counted 95 unique systems across 13 hardware accelerators from 24 organizations — four of them first-time submitters — making v6.0 the most diverse Training round on record.
Cloud system submissions more than doubled versus v5.1 six months earlier, a trend MLCommons frames as a structural shift rather than a cyclical uptick.
CoreWeave's DeepSeek-V3 training scaled near-linearly: 5.54 min at 2,048 GPUs, 3.09 min at 4,096, 2.02 min at 8,192 — all on the same production hardware customers use today.

Why this matters

NVIDIA's clean sweep of all seven MLPerf Training 6.0 benchmarks carries weight because it was audited by MLCommons across 95 unique systems from 24 organizations — making it an independently validated result, not a self-reported claim. The round's two new MoE benchmarks (DeepSeek-V3 671B and GPT-OSS-20B) formally establish sparse computation as the paradigm against which training hardware will be measured going forward. Cloud system submissions more than doubled versus v5.1 six months earlier, reflecting a structural shift in where frontier training workloads run. CoreWeave's 2.02-minute DeepSeek-V3 result and Azure's 7.07-minute Llama 3.1 405B result both ran on production customer infrastructure, confirming that the benchmark numbers translate directly to deployable capacity.

Summary

NVIDIA's GB300 NVL72 posted the fastest results across all seven MLPerf Training 6.0 benchmarks at 8,192-GPU scale, beating its predecessor GB200 NVL72 by up to 1.6x. The performance gains come from higher compute density via NVFP4, expanded memory capacity, and higher power capabilities. CoreWeave completed DeepSeek-V3 671B training in 2.02 minutes at 8,192 GPUs using the GB300 NVL72; Microsoft Azure hit 7.07 minutes for Llama 3.1 405B at the same scale. Essentially: NVIDIA and nineteen partner organizations including Google Cloud, Microsoft Azure, CoreWeave, Hewlett Packard Enterprise, and Dell Technologies swept every MLPerf Training category. - MLPerf 6.0 added two new mixture-of-experts benchmarks: DeepSeek-V3 671B and GPT-OSS-20B. - NVIDIA was the only platform to submit results across all seven benchmark categories. - The GB300 NVL72 uses fifth-generation NVLink Switches connecting all 72 GPUs, backed by NVRx fault detection and checkpoint-based recovery. Frontier AI training is consolidating on Blackwell-class hardware at a scale most cloud providers cannot replicate.

Potential risks and opportunities

Risks

Competing hardware vendors (AMD, Google) could challenge NVFP4 as a non-comparable precision format, narrowing the perceived performance gap if benchmarks are re-run at matched numerical precision.
Enterprises and cloud customers who purchased GB200 NVL72 systems face near-term obsolescence pressure now that GB300 NVL72 results have been publicly verified with a 1.6x delta.
The concentration of nineteen partner submissions across Google Cloud, Microsoft Azure, CoreWeave, HPE, and Dell creates a single-vendor hardware dependency that raises supply-chain risk for buyers building multi-year training infrastructure plans.

Opportunities

CoreWeave and Microsoft Azure, who posted the headline DeepSeek-V3 671B and Llama 3.1 405B numbers respectively, can use these MLPerf-verified results as direct sales collateral in competitive GPU cloud procurement.
HPE, Dell Technologies, and other OEM partners submitting results gain certification-adjacent positioning for enterprise buyers evaluating on-premises GB300 NVL72 deployments.
NVIDIA's NVIDIA Resiliency Extension (NVRx) -- covering fault detection and checkpoint-based recovery validated across 30-plus manufacturing tests -- creates an upsell path for managed resiliency software on top of GB300 NVL72 hardware in long-running frontier training jobs.

What we don't know yet

No non-NVIDIA platform submissions were reported for the two new mixture-of-experts benchmarks (DeepSeek-V3 671B and GPT-OSS-20B) -- whether AMD or Google submitted competing results is unaddressed.
Whether the up-to-1.6x speedup over GB200 NVL72 is consistent across all seven benchmarks or reflects best-case gains on NVFP4-optimized workloads is not broken out per benchmark.
Cost per training run at 8,192-GPU scale is entirely absent -- no pricing context is given for CoreWeave's 2.02-minute DeepSeek-V3 result or Microsoft Azure's 7.07-minute Llama result.

What others are reporting

Coverage cluster as of 8h after publish

MLCommons Read →

The authoritative consortium source: 95 systems, 13 accelerators, 24 orgs, 4 first-timers, and the official framing that sparse computation is now the dominant AI training architecture.

Sparse computation is a dominant trend in AI right now. Over the past two years, all of the major new generative AI models have utilized a sparse computation architecture.
CoreWeave Read →

First-party confirmation that the 2.02-minute DeepSeek-V3 result ran on production customer infrastructure, with documented near-linear scaling across three cluster sizes (2,048 / 4,096 / 8,192 GPUs).

Training DeepSeek-V3 in two minutes on the largest GB300 cluster reflects years of metal-to-model engineering investment.
Microsoft Azure Read →

Azure's first-party account of assembling the largest GB200 NVL72 cluster in MLPerf Training history — 2,048 tray nodes across 128 racks — to claim the fastest Llama 3.1 405B result at 7.07 minutes.
NVIDIA Developer Blog Read →

NVIDIA's technical companion to its marketing blog: quantified kernel-level gains (MXFP8 attention adds 8% end-to-end on DeepSeek-V3), CuTe DSL kernel fusions, and a throughput chart across six NeMo container releases from Nov 2025 to June 2026.

DeepSeek-V3 training throughput increased by 1.3x in just three months without hardware changes.
Lambda Read →

A participant cloud provider's view: Lambda beat NVIDIA Theia, HPE, and GigaComputing on Llama 3.1 8B convergence speed, and credits an 18.7% round-over-round gain entirely to software tuning on unchanged hardware.

This represents an 18.7% improvement in training speed attributed purely to software improvements over the last round.
Nebius Read →

A participant cloud provider quantifying the HGX B300 vs GB300 NVL72 trade-off directly: GB300 NVL72 is 11-18% faster than HGX B300 at equivalent GPU counts, and Nebius's GB300 results landed within 0.8-3.1% of the fastest submissions at 72-GPU scale.

Independently verified results mean more than any claim we could make ourselves. That's why we submit to MLPerf every round.

Originally reported by nvidia.com

Read the original article →

Original headline: NVIDIA Blackwell Sweeps All Seven MLPerf Training 6.0 Benchmarks at 8,192-GPU Scale — 1.6× Faster Than GB200