reddit.com via Reddit June 8th 2026

r/LocalLLaMA: Google's Official Gemma 4 QAT Q4_0 Quantizations Show Measurably Higher Precision Than Unsloth Q4_K_XL Across Multiple Models

google open source inference local-llm quantization gemma

Summary

A LocalLLaMA community member comparing Google's official Gemma 4 QAT Q4_0 checkpoints against Unsloth's Q4_K_XL quantizations found significantly higher numerical precision in the QAT variants for the E4B model and others, validating the Gemma team's earlier recommendation to wait for official QAT releases before quantizing. The finding confirms that quantization-aware training produces meaningfully better weight quality than standard post-training quantization at the same bit-width, making Google's QAT Q4_0 the community reference standard for efficiency-quality tradeoffs in Gemma 4 deployment. Community discussion notes this is particularly significant as QAT Q4_0 already cuts E2B from 9.6GB to 3.2GB — precision gains come at no additional storage cost versus alternative quantization approaches.

Originally reported by reddit.com

Read the original article →

Original headline: r/LocalLLaMA: Google's Official Gemma 4 QAT Q4_0 Quantizations Show Measurably Higher Precision Than Unsloth Q4_K_XL Across Multiple Models