reddit.com via Reddit June 8th 2026

r/LocalLLaMA: Developer Abandons Gemma 4 12B QAT After Days of Testing — Tool Calling Too Unreliable for Production Despite Strong Benchmarks

google open source inference open-source inference

Summary

A developer on r/LocalLLaMA documents spending several days trying to get consistent tool calling from the Gemma 4 12B QAT model before abandoning it, noting the model works well in general but tool call reliability is too low for their specific agentic workflows despite praise for the QAT release's general quality. The post is a useful counterpoint to prevailing community enthusiasm for Gemma 4 QAT quantizations and prompts discussion of whether QAT introduces instability in structured output modes even when perplexity and general benchmark scores improve.

Originally reported by reddit.com

Read the original article →

Original headline: r/LocalLLaMA: Developer Abandons Gemma 4 12B QAT After Days of Testing — Tool Calling Too Unreliable for Production Despite Strong Benchmarks