Machine Learning News: Gemma 4 Open Source, MLPerf v6.0, PyTorch 2.7 — April 7, 2026

April 7th 2026

Open models got scary good, inference hardware hit a new arms race, and the ecosystem quietly doubled.

This was the week open-source ML stopped being "good enough" and started being genuinely frontier-competitive. Google dropped Gemma 4 under Apache 2.0, MLCommons published the most ambitious MLPerf Inference benchmark round ever, and Hugging Face's spring census confirmed what practitioners already felt: the open ecosystem has doubled in a year, and the center of gravity is shifting east. If you deploy models for a living, every story below affects your stack.

Watch & Listen First

The TWIML AI Podcast -- Sebastian Raschka on What Matters in LLMs in 2026 (Spotify)

Machine Learning Street Talk -- Jeremy Howard on AI-Assisted Coding and Fine-Tuning (Spotify)

Practical AI -- AI at the Edge in 2026 (Changelog)

Key Takeaways

Gemma 4 makes Apache 2.0 frontier-grade. Google's 31B dense model ranks #3 on Arena AI, with AIME 2026 math jumping from 20.8% to 89.2% over Gemma 3 -- and you can ship it commercially without license headaches.

MLPerf v6.0 is the biggest benchmark overhaul yet. Five new datacenter tests including text-to-video generation, DeepSeek-R1 reasoning with speculative decoding, and the first sequential recommendation benchmark (DLRMv3).

AMD is closing the gap. MI355X matched or exceeded NVIDIA B300 in interactive inference mode (104%) and crossed 1M tokens/sec at cluster scale for the first time.

Open-source doubled on Hugging Face. 13M users, 2M+ public models, 500K+ datasets. China now surpasses the US in monthly downloads (~41%), driven by Qwen's 113K derivative models.

PyTorch/XLA 2.7 bridges JAX. The experimental JAX Bridge lets you call JAX ops inside PyTorch graphs, and a new ragged paged attention kernel delivers 5x speedups on variable-length LLM serving.

The Big Story

Google Releases Gemma 4: Four Open Models Under Apache 2.0 · April 2, 2026 · Google Blog

Google DeepMind shipped four variants -- E2B (2.3B effective), E4B (4.5B), 26B MoE (4B active), and 31B dense -- all under a fully permissive Apache 2.0 license, a first for the Gemma family. The 31B model supports 256K context, native vision and audio, fluency in 140+ languages, and scores 85.7% on GPQA Diamond and 80.0% on LiveCodeBench v6. Architecture-wise, the dense model keeps Gemma 3's hybrid sliding-window + GQA attention with added QK/V normalization and softcapping, while the MoE variants use separate expert blocks alongside standard MLP layers rather than the DeepSeek-style replacement pattern.

→ The Apache 2.0 pivot is the real story. Gemma 3's custom license was a friction point for startups building products on top of it. Combined with immediate support across vLLM, Ollama, and llama.cpp, Google is clearly betting that permissive licensing plus competitive benchmarks will win the derivative-model war -- where Alibaba's Qwen currently dominates with 113K+ forks on Hugging Face.

Also This Week

MLPerf Inference v6.0: NVIDIA Hits 2.49M Tokens/Sec on DeepSeek-R1 · April 1 · MLCommons The largest system ever submitted -- 288 NVIDIA GPUs across 72 nodes -- achieved 2.49M tokens/sec on DeepSeek-R1 offline. Software optimizations alone delivered 2.7x throughput gains on the same hardware vs. six months ago, cutting per-token cost by 60%. Twenty-four organizations submitted; multi-node systems jumped 30%.

AMD MI355X Crosses 1M Tokens/Sec · April 1 · AMD Blog In single-node head-to-heads (8 GPUs), MI355X hit 92-119% of NVIDIA B300 performance depending on model and scenario, with FP4 quantization driving a 4.4x offline improvement on Llama 2 70B over prior rounds. The interactive mode result -- 104% of B300 -- signals real competition for latency-sensitive workloads.

Hugging Face: State of Open Source, Spring 2026 · April 4 · Hugging Face Blog The platform hit 13M users and 2M+ models. Independent developers now account for 39% of downloads (up from 17%), while industry share fell from 70% to 37%. Robotics datasets exploded from 1,145 to 26,991 in one year, jumping from rank #44 to #1 dataset category.

PyTorch/XLA 2.7 Ships JAX Bridge and Ragged Paged Attention · April 2026 · PyTorch Blog The experimental JAX Bridge lets you call `jax.experimental.shard_alike` and other JAX functions directly inside PyTorch/XLA graphs. The new Pallas-based ragged paged attention kernel delivers up to 5x speedup over padded attention for variable-length sequences on Llama 3 8B, plus GPU CI is back with CUDA 12.6 support.

State of MLOps Newsletter Highlights Kubernetes GPU Scheduling · April 6 · Substack This week's roundup flagged KAI Scheduler (open-source Kubernetes GPU scheduling), Google's five strategies for efficient LLM inference, and a comparative benchmark of 10 embedding models for RAG from Zilliz.

From the Lab

"Smoothing the Landscape: Causal Structure Learning via Diffusion Denoising Objectives" · CLeaR 2026 · arXiv cs.LG Accepted to the 5th Conference on Causal Learning and Reasoning. Uses diffusion model denoising objectives to smooth the combinatorial landscape of causal graph search, making structure learning more tractable on high-dimensional observational data.

"Deconfounding Scores and Representation Learning for Causal Effect Estimation with Weak Overlap" · AISTATS 2026 · arXiv stat.ML Proposes learned deconfounding scores that maintain valid causal inference even when treatment and control groups have poor covariate overlap -- a persistent headache in observational studies. Accepted at AISTATS 2026.

Worth Reading

A Visual Guide to Gemma 4 -- Maarten Grootendorst's architectural breakdown with diagrams

Gemma 4 and What Makes an Open Model Succeed -- Nathan Lambert on licensing, ecosystem dynamics, and why derivatives matter more than benchmarks

NVIDIA Sets New MLPerf Records While AMD and Intel Focus on Different Battles -- The Decoder's analysis of why direct chip comparisons are harder than ever

The 2026 open-model landscape has a new shape: permissive licenses, multimodal by default, and a derivative ecosystem that matters more than any single benchmark score. If your inference stack hasn't changed in six months, it's already behind.

Stay ahead in AI

Join 44,000+ professionals getting the AI briefing that matters. 3x/week, free, no spam.