Gemma 4 Uncensored Fine-Tune Posts Just 12/100 Refusals
Key insights
- KLD of 0.0152 indicates the fine-tune preserves nearly all base capability while removing Gemma 4's safety refusals.
- The Heretic uncensoring methodology, originally targeting Llama, now demonstrably applies to Google's Gemma 4 MoE architecture.
- Only 12 of 100 red-team probes triggered refusals, versus near-100/100 for the standard instruction-tuned Gemma 4 release.
Why this matters
Safety tuning applied at fine-tune time can be stripped by third parties with minimal measurable capability loss, which undermines the core assumption behind post-training alignment for open-weight models. The Heretic methodology has now crossed vendor lines from Meta's Llama to Google's Gemma 4, demonstrating that no instruction-tuned open-weight model release is immune to this class of intervention. For practitioners and founders building on open-weight models, any safety guarantees derived from instruction tuning should be treated as advisory rather than enforceable once weights are publicly distributed.
Summary
Community developer llmfan46 has shipped G4-MeroMero-26B-A4B on HuggingFace, an uncensored GGUF fine-tune of Google's Gemma 4 26B using the Heretic methodology, a workflow that recently drew a legal notice from Meta over its Llama applications.
The model reports a KLD of 0.0152 from the base and only 12 of 100 red-team probes triggered a refusal, packaged for immediate use on consumer hardware via llama.cpp.
Essentially: (llmfan46, Heretic workflow) have extended an uncensoring pipeline that Meta tried to suppress legally to Google's Gemma 4 MoE architecture.
- KLD 0.0152 signals minimal capability drift with safety refusals stripped.
- 12/100 refusal rate versus near-100/100 for the unmodified instruction-tuned release.
- Heretic now generalizes beyond Llama to Gemma 4's distinct MoE design.
Google's safety tuning faces the same third-party bypass dynamic Meta tried and failed to stop.
Potential risks and opportunities
Risks
- Google's enterprise API trust agreements rest partly on safety tuning assurances; public availability of a near-fully uncensored Gemma 4 variant could trigger customer audits or contract renegotiations within 60 days
- HuggingFace faces renewed regulatory pressure in the EU and UK to implement upload-time screening for models with documented safety removal, potentially affecting all open GGUF hosting
- Heretic methodology authors face escalating legal exposure now that it demonstrably applies to a second major model family, expanding the potential plaintiff pool beyond Meta
Opportunities
- Red-teaming and model evaluation firms (Haize Labs, Scale AI's safety team) gain a documented, publicly reproducible attack with published KLD metrics to benchmark their refusal-testing pipelines against
- On-device inference vendors and privacy-focused AI startups can use the GGUF availability as a concrete argument for local deployment over API dependency for sensitive enterprise use cases
- Safety researchers developing KLD-resistant alignment techniques now have a real-world, low-cost attack with measurable parameters to test countermeasures against before the next major open-weight release cycle
What we don't know yet
- Whether Google has a technical response to the Heretic methodology beyond the legal approach Meta already attempted and could not enforce
- Benchmark scores (MMLU, HumanEval) for the fine-tune versus base Gemma 4 are unreported, leaving the capability preservation claim based solely on KLD
- Composition of the 100-probe red-team set is undisclosed, making the 12/100 refusal figure difficult to compare against standardized safety evaluations like HarmBench
Originally reported by huggingface.co
Read the original article →Original headline: r/LocalLLaMA: G4-MeroMero-26B-A4B Uncensored Gemma 4 Fine-Tune Ships With Heretic Methodology — KLD 0.0152, 12/100 Refusals