reddit.com via Reddit

r/LocalLLaMA: Google Gemma Team Confirms QAT Variants of Gemma 4 Releasing Soon — Community Urged to Wait Before Quantizing

google open source inference open-source-ai quantization

Summary

A Google Gemma team member (Omar) confirmed in a Reddit comment that Quantization-Aware Training variants of Gemma 4 models are releasing soon, prompting the r/LocalLLaMA community to pause current post-training quantization experiments. QAT models are trained with quantization embedded in the training run itself, typically preserving accuracy significantly better than post-training quantization at equivalent bit depths — making the official QAT release the recommended baseline for local inference.