tomshardware.com web signal

NVIDIA Vera Rubin VR200 Slashes Inference Cost by 10x

nvidia chips ai infrastructure ai-infrastructure chips

Key insights

  • The VR200 NVL72 rack delivers 50 PFLOPS FP4 inference, a 3.3x throughput gain over the Blackwell B300 system.
  • Each Rubin GPU includes 288 GB of HBM4 memory at 22 TB/s bandwidth, a substantial jump from Blackwell's HBM3e.
  • Jensen Huang confirmed full production ramp in H2 2026, with engineering samples already shipping to select customers.

Why this matters

A 10x inference cost reduction per token directly compresses margins for cloud providers currently charging premium rates on Blackwell capacity, forcing a repricing of AI inference contracts before the hardware is even in production. The 50 PFLOPS FP4 figure means a single VR200 rack can theoretically serve workloads currently requiring multiple B200 NVL72 deployments, reshaping how hyperscalers model their 2027 capacity builds. NVIDIA announcing engineering samples in mid-2026 with production in H2 2026 gives it a window to lock in purchasing commitments from Microsoft, Google, and Amazon before AMD MI400 reaches customer qualification.

Summary

NVIDIA has shipped the first Vera Rubin VR200 engineering samples to select customers, the first physical hardware from its post-Blackwell data center generation. Each VR200 NVL72 rack pairs 72 Rubin GPUs (each with 288 GB of HBM4 memory and 22 TB/s bandwidth) with 36 Vera CPUs carrying 88 custom Armv9.2 cores. The rack delivers 50 PFLOPS of FP4 inference, a 3.3x throughput gain over the B300 and a 10x reduction in inference cost per token versus Grace-Blackwell NVL72. Essentially: NVIDIA is resetting what affordable inference looks like before hyperscalers finish deploying Blackwell. - 50 PFLOPS FP4 per rack, versus roughly 15 PFLOPS for the B300 equivalent - HBM4 at 22 TB/s per GPU, up from HBM3e in Blackwell systems - Full production ramp targeted for H2 2026, confirmed by Jensen Huang at GTC Taipei This timeline keeps NVIDIA ahead of AMD MI400 and competing accelerators before customers can qualify alternatives at scale.

Potential risks and opportunities

Risks

  • Hyperscalers that pre-committed large Blackwell (B200/B300) orders may face stranded capacity or renegotiation pressure as Vera Rubin's 10x cost advantage becomes a baseline customer expectation within 12 to 18 months.
  • If HBM4 supply from SK Hynix and Samsung cannot scale to meet VR200 NVL72 demand, NVIDIA's H2 2026 production ramp risks delays that would compress its competitive window against AMD MI400 and Google TPU v6.
  • AMD, Intel, and custom silicon teams at Google and Amazon face accelerated customer lock-in as NVIDIA secures purchasing commitments during the engineering sample phase, well before competing platforms reach qualification.

Opportunities

  • HBM4 suppliers SK Hynix and Samsung gain significant pricing leverage as NVIDIA's production ramp creates concentrated demand before competing accelerator platforms qualify HBM4 at scale.
  • Cloud inference providers such as Together AI, Fireworks AI, and Groq that access early VR200 capacity in H2 2026 have a window to undercut hyperscaler inference pricing before competitors qualify the hardware.
  • Data center infrastructure vendors (Vertiv, Schneider Electric) face an accelerated rack replacement cycle as enterprise customers move from Grace-Blackwell to Vera Rubin, creating near-term thermal and power delivery upgrade demand.

What we don't know yet

  • Which customers received engineering samples is undisclosed; whether hyperscalers like Microsoft, Google, or Amazon are among them has not been confirmed by NVIDIA.
  • Power consumption per VR200 NVL72 rack has not been reported, a critical figure for data center operators planning 2027 infrastructure buildouts.
  • Whether the 10x inference cost reduction holds at FP8 or BF16 precision rather than only FP4 has not been addressed in NVIDIA's public disclosures.