nvidia.com web signal

NVIDIA GB300 Tops All Seven MLPerf 6.0 Benchmarks

5 sources tracking this story
nvidia chips ai infrastructure ai-benchmarks gpu-performance ai-infrastructure

Key insights

  • MLCommons counted 95 unique systems from 24 organizations and 13 hardware accelerators in Training 6.0, yet NVIDIA claimed all seven benchmark wins.
  • GB300 NVL72 ran up to 1.6x faster than GB200 NVL72 at identical GPU scale, with gains from higher power headroom and NVFP4 compute density.
  • NVIDIA's DeepSeek-V3 throughput improved 1.3x in three months through software alone, via CUDA graphs, MXFP8 attention, and near-100% all-to-all overlap.

Why this matters

NVIDIA's clean sweep of all seven MLPerf Training 6.0 benchmarks is validated by MLCommons, the independent consortium that tracked 95 unique systems across 24 organizations and 13 hardware accelerators, giving the result standing as an audited industry standard rather than a self-reported claim. The round's addition of DeepSeek-V3 671B and GPT-OSS-20B formalizes sparse MoE computation as the frontier training paradigm that hardware will be measured against going forward. Software optimizations, including CUDA graphs, MXFP8 attention, and near-100% all-to-all communication overlap, pushed DeepSeek-V3 throughput up 1.3x in three months without hardware changes, meaning the performance gap between generations is only partly a procurement question. CoreWeave's confirmation that its record 2.02-minute DeepSeek-V3 result ran on production customer infrastructure closes the gap between benchmark marketing and what operators can actually deploy.

Summary

NVIDIA's GB300 NVL72 posted the fastest results across all seven MLPerf Training 6.0 benchmarks at 8,192-GPU scale, beating its predecessor GB200 NVL72 by up to 1.6x. The performance gains come from higher compute density via NVFP4, expanded memory capacity, and higher power capabilities. CoreWeave completed DeepSeek-V3 671B training in 2.02 minutes at 8,192 GPUs using the GB300 NVL72; Microsoft Azure hit 7.07 minutes for Llama 3.1 405B at the same scale. Essentially: NVIDIA and nineteen partner organizations including Google Cloud, Microsoft Azure, CoreWeave, Hewlett Packard Enterprise, and Dell Technologies swept every MLPerf Training category. - MLPerf 6.0 added two new mixture-of-experts benchmarks: DeepSeek-V3 671B and GPT-OSS-20B. - NVIDIA was the only platform to submit results across all seven benchmark categories. - The GB300 NVL72 uses fifth-generation NVLink Switches connecting all 72 GPUs, backed by NVRx fault detection and checkpoint-based recovery. Frontier AI training is consolidating on Blackwell-class hardware at a scale most cloud providers cannot replicate.

Potential risks and opportunities

Risks

  • Competing hardware vendors (AMD, Google) could challenge NVFP4 as a non-comparable precision format, narrowing the perceived performance gap if benchmarks are re-run at matched numerical precision.
  • Enterprises and cloud customers who purchased GB200 NVL72 systems face near-term obsolescence pressure now that GB300 NVL72 results have been publicly verified with a 1.6x delta.
  • The concentration of nineteen partner submissions across Google Cloud, Microsoft Azure, CoreWeave, HPE, and Dell creates a single-vendor hardware dependency that raises supply-chain risk for buyers building multi-year training infrastructure plans.

Opportunities

  • CoreWeave and Microsoft Azure, who posted the headline DeepSeek-V3 671B and Llama 3.1 405B numbers respectively, can use these MLPerf-verified results as direct sales collateral in competitive GPU cloud procurement.
  • HPE, Dell Technologies, and other OEM partners submitting results gain certification-adjacent positioning for enterprise buyers evaluating on-premises GB300 NVL72 deployments.
  • NVIDIA's NVIDIA Resiliency Extension (NVRx) -- covering fault detection and checkpoint-based recovery validated across 30-plus manufacturing tests -- creates an upsell path for managed resiliency software on top of GB300 NVL72 hardware in long-running frontier training jobs.

What we don't know yet

  • No non-NVIDIA platform submissions were reported for the two new mixture-of-experts benchmarks (DeepSeek-V3 671B and GPT-OSS-20B) -- whether AMD or Google submitted competing results is unaddressed.
  • Whether the up-to-1.6x speedup over GB200 NVL72 is consistent across all seven benchmarks or reflects best-case gains on NVFP4-optimized workloads is not broken out per benchmark.
  • Cost per training run at 8,192-GPU scale is entirely absent -- no pricing context is given for CoreWeave's 2.02-minute DeepSeek-V3 result or Microsoft Azure's 7.07-minute Llama result.

What others are reporting

Coverage cluster as of 2h after publish

  1. MLCommons via GlobeNewsWire Read →

    The benchmark organizer's neutral release counts 24 submitters and 13 accelerator types, framing NVIDIA's sweep as a market structure finding rather than a vendor claim.

    Sparse computation is a dominant trend in AI right now. All of the major new generative AI models have utilized a sparse computation architecture.
  2. NVIDIA Developer Blog Read →

    Technical blog details six software optimizations driving the gains and documents DeepSeek-V3 throughput rising 1.3x in three months without any hardware changes.

    achieved the fastest time to train at scale, and also delivered the highest performance when normalized on a per-accelerator basis on every benchmark.
  3. CoreWeave Read →

    Confirms its 2.02-minute DeepSeek-V3 result on 8,192 GB300 GPUs ran on production customer infrastructure, not a benchmark-only configuration, closing the deployment credibility gap.

    The gap between benchmark performance and production reality remains one of the most persistent challenges in AI infrastructure.
  4. Lambda Read →

    Names specific per-system training times across GB300 NVL72 competitors and quantifies an 18.7% software-only speed gain over the previous round on the same hardware class.

    This represents an 18.7% improvement in training speed attributed purely to software improvements over the last round.