reddit.com via Reddit June 8th 2026

r/LocalLLaMA: Google's Gemma 4 QAT Quantization Contains Token Embedding Bug — '--pure' Flag Omission Reverses Previous Community Assessment, Unsloth UD Q4_K_XL Now Recommended

google open source inference local-llm

Summary

A r/LocalLLaMA developer post identifies a reproducible bug in Google's official Gemma 4 QAT quantizations: llama-quantize incorrectly quantizes the token embedding layer to q6k because Google omitted the '--pure' flag during the quantization process, among other identified issues. The finding reverses previous community assessments that favored Google's official Q4_0 quants over Unsloth variants; the post now recommends Unsloth's UD Q4_K_XL as the best current option for running Gemma 4 locally. Anyone who downloaded the official Google QAT release should verify their quantization, as the embedding-layer bug may explain performance inconsistencies reported since release.

Originally reported by reddit.com

Read the original article →

Original headline: r/LocalLLaMA: Google's Gemma 4 QAT Quantization Contains Token Embedding Bug — '--pure' Flag Omission Reverses Previous Community Assessment, Unsloth UD Q4_K_XL Now Recommended