huggingface.co via Reddit May 23rd 2026

Gemma 4 Uncensored Fine-Tune Posts Just 12/100 Refusals

google open source open-source local-llm

Key insights

KLD of 0.0152 indicates the fine-tune preserves nearly all base capability while removing Gemma 4's safety refusals.
The Heretic uncensoring methodology, originally targeting Llama, now demonstrably applies to Google's Gemma 4 MoE architecture.
Only 12 of 100 red-team probes triggered refusals, versus near-100/100 for the standard instruction-tuned Gemma 4 release.

Why this matters

Safety tuning applied at fine-tune time can be stripped by third parties with minimal measurable capability loss, which undermines the core assumption behind post-training alignment for open-weight models. The Heretic methodology has now crossed vendor lines from Meta's Llama to Google's Gemma 4, demonstrating that no instruction-tuned open-weight model release is immune to this class of intervention. For practitioners and founders building on open-weight models, any safety guarantees derived from instruction tuning should be treated as advisory rather than enforceable once weights are publicly distributed.

Summary

Community developer llmfan46 has shipped G4-MeroMero-26B-A4B on HuggingFace, an uncensored GGUF fine-tune of Google's Gemma 4 26B using the Heretic methodology, a workflow that recently drew a legal notice from Meta over its Llama applications. The model reports a KLD of 0.0152 from the base and only 12 of 100 red-team probes triggered a refusal, packaged for immediate use on consumer hardware via llama.cpp. Essentially: (llmfan46, Heretic workflow) have extended an uncensoring pipeline that Meta tried to suppress legally to Google's Gemma 4 MoE architecture. - KLD 0.0152 signals minimal capability drift with safety refusals stripped. - 12/100 refusal rate versus near-100/100 for the unmodified instruction-tuned release. - Heretic now generalizes beyond Llama to Gemma 4's distinct MoE design. Google's safety tuning faces the same third-party bypass dynamic Meta tried and failed to stop.

Potential risks and opportunities

Risks

Google's enterprise API trust agreements rest partly on safety tuning assurances; public availability of a near-fully uncensored Gemma 4 variant could trigger customer audits or contract renegotiations within 60 days
HuggingFace faces renewed regulatory pressure in the EU and UK to implement upload-time screening for models with documented safety removal, potentially affecting all open GGUF hosting
Heretic methodology authors face escalating legal exposure now that it demonstrably applies to a second major model family, expanding the potential plaintiff pool beyond Meta

Opportunities

Red-teaming and model evaluation firms (Haize Labs, Scale AI's safety team) gain a documented, publicly reproducible attack with published KLD metrics to benchmark their refusal-testing pipelines against
On-device inference vendors and privacy-focused AI startups can use the GGUF availability as a concrete argument for local deployment over API dependency for sensitive enterprise use cases
Safety researchers developing KLD-resistant alignment techniques now have a real-world, low-cost attack with measurable parameters to test countermeasures against before the next major open-weight release cycle

What we don't know yet

Whether Google has a technical response to the Heretic methodology beyond the legal approach Meta already attempted and could not enforce
Benchmark scores (MMLU, HumanEval) for the fine-tune versus base Gemma 4 are unreported, leaving the capability preservation claim based solely on KLD
Composition of the 100-probe red-team set is undisclosed, making the 12/100 refusal figure difficult to compare against standardized safety evaluations like HarmBench

Originally reported by huggingface.co

Read the original article →

Original headline: r/LocalLLaMA: G4-MeroMero-26B-A4B Uncensored Gemma 4 Fine-Tune Ships With Heretic Methodology — KLD 0.0152, 12/100 Refusals