huggingface.co via Reddit

Gemma 4 Heretic merge cuts safety refusals to 12/100

open source generative ai open-source model-release

Key insights

  • Heretic deep neural consolidation achieved KLD 0.0152, indicating minimal distributional drift from the base Gemma-4-31B model.
  • The 12/100 refusal rate matches the established Heretic benchmark from earlier series releases, confirming reproducible results.
  • This release continues a community pattern of systematically applying Heretic merging to strip safety filters from Google's Gemma 4 family.

Why this matters

The Heretic methodology has now produced consistent, benchmark-verified results across multiple Gemma 4 merges, meaning uncensoring is becoming a repeatable pipeline rather than a one-off effort. For AI practitioners and founders evaluating open-weight models, this signals that any organization releasing model weights should treat safety filtering as permanently porous once weights are public. Technical leaders watching AI governance need to note that a quantitative KLD metric gives the community a measurable handle on merge quality, which accelerates iteration and lowers the barrier to future uncensored releases across other model families.

Summary

The Gemma-4-Harmonia-31B-Uncensored-Heretic model landed on Hugging Face this week, built by community developer llmfan46 using the Heretic deep neural consolidation methodology to merge multiple Gemma-4-31B instruction-tuned fine-tunes into a single uncensored release. The Heretic merge process is specifically designed to minimize distributional drift while stripping safety filters. This release posts a KLD of 0.0152 and a 12/100 refusal rate, consistent with earlier Heretic series benchmarks, suggesting the methodology produces repeatable results across different source fine-tunes. Essentially: (llmfan46, Heretic community) are systematically applying a standardized merge recipe to Google's Gemma 4 family at scale. - KLD 0.0152 signals the merged model stays extremely close to the base Gemma-4-31B distribution despite safety filter removal. - The 12/100 refusal benchmark is now a defined community standard, not a one-off outcome. Open-weight models with documented, reproducible uncensoring pipelines represent a structural challenge for any company trying to enforce safety properties on released weights.

Potential risks and opportunities

Risks

  • Google's enterprise Gemma 4 sales face reputational drag as uncensored Heretic derivatives undermine its safety posture with compliance-sensitive customers evaluating open-weight deployments
  • Hugging Face could face regulatory pressure under the EU AI Act if hosting a documented, growing series of uncensored large-model releases is classified as high-risk distribution
  • Organizations self-hosting Heretic-merged models in production expose themselves to liability if outputs cause downstream harm with no vendor safety layer and no audit trail

Opportunities

  • Post-deployment guardrail vendors (Guardrails AI, LlamaGuard-based systems, Rebuff) gain a clear sales motion targeting teams running uncensored open-weight models without built-in safety layers
  • Cloud providers offering managed Gemma 4 deployments can differentiate on audited safety compliance that self-hosted Heretic merges structurally cannot provide to enterprise buyers
  • Researchers studying model merging can use the Heretic KLD benchmark series as a reproducible baseline for evaluating merge quality across other open-weight model families beyond Gemma

What we don't know yet

  • Which specific Gemma-4-31B fine-tunes were merged, and what their individual refusal rates were before consolidation
  • Whether Google has issued any response or updated its Gemma 4 usage policy in light of the growing Heretic merge series
  • How the 12/100 refusal benchmark is operationally defined and what prompt categories account for the remaining 12 refusals