reddit.com via Reddit

r/LocalLLaMA: Google's Gemma 4 QAT Quantization Contains Token Embedding Bug — '--pure' Flag Omission Reverses Previous Community Assessment, Unsloth UD Q4_K_XL Now Recommended

Summary

A r/LocalLLaMA developer post identifies a reproducible bug in Google's official Gemma 4 QAT quantizations: llama-quantize incorrectly quantizes the token embedding layer to q6k because Google omitted the '--pure' flag during the quantization process, among other identified issues. The finding reverses previous community assessments that favored Google's official Q4_0 quants over Unsloth variants; the post now recommends Unsloth's UD Q4_K_XL as the best current option for running Gemma 4 locally. Anyone who downloaded the official Google QAT release should verify their quantization, as the embedding-layer bug may explain performance inconsistencies reported since release.