Ideogram 4: 9.3B Open-Weight DiT Tops Design Arena
Key insights
- Ideogram 4 is a 9.3B-parameter single-stream DiT trained from scratch using Qwen3-VL-8B-Instruct as its text encoder, not CLIP or T5.
- The model tops open-weight rankings on Design Arena and LMArena, trailing only proprietary GPT and Gemini models globally.
- Weights ship in nf4 and fp8 quantized variants on Hugging Face under a non-commercial license, supporting resolutions up to 2048px.
Why this matters
Ideogram 4 beating FLUX.2 [dev] at 32B and HunyuanImage 3.0 at 80B MoE on text rendering resets expectations for what a sub-10B image model can deliver to the open-source community. The architectural choice to use Qwen3-VL-8B-Instruct as the text encoder, extracting hidden states from 13 intermediate layers instead of relying on CLIP or T5, gives competing labs a concrete alternative design pattern to evaluate for their own releases. The non-commercial license means Ideogram AI retains leverage over any production use of these weights, making the release a research-community land-grab that preserves the company's commercial position.
Summary
Ideogram AI released Ideogram 4 on June 3 as an open-weight text-to-image model: 9.3B parameters, single-stream DiT, trained from scratch with Qwen3-VL-8B-Instruct as text encoder.
It tops open-weight labs on Design Arena and LMArena, behind only GPT and Gemini. A ContraLabs review by 10 professional designers gave it 47.9% first-place wins versus Gemini 3.1 Flash (30.0%) and FLUX.2 [max] (15.5%).
Essentially: a 9.3B model outperforming open-weight rivals including FLUX.2 [dev] (32B) and HunyuanImage 3.0 (80B MoE) on text rendering.
- JSON prompting, bounding-box layout, and hex color conditioning
- 256-2048px, aspect ratios up to 6:1
- nf4 and fp8 on Hugging Face, non-commercial license
The swap from CLIP or T5 to a VLM encoder is the architectural bet practitioners will probe first.
Potential risks and opportunities
Risks
- Developers and startups building pipelines on Ideogram 4 weights face legal and business risk if Ideogram AI later tightens or adds commercial licensing terms
- State-of-the-art in-image text rendering at open-weight accessibility lowers the barrier for text-embedded disinformation, creating regulatory and reputational exposure for Ideogram AI and Hugging Face
- Competing labs releasing open-weight models under permissive commercial licenses in the near term would erode Ideogram 4 adoption among production teams blocked by the non-commercial restriction
Opportunities
- Diffusers-compatible toolchain developers can integrate Ideogram 4 immediately via the nf4 variant, which carries full Diffusers support, expanding the open-source creative image generation ecosystem
- The fp8 variant's broad hardware support beyond CUDA opens deployment paths for AMD and other non-NVIDIA accelerators where the larger FLUX.2 [dev] and HunyuanImage 3.0 models have less traction
- Ideogram AI can convert open-weight research adoption into revenue by offering a paid commercial tier to the organizations currently prototyping on the non-commercial weights
What we don't know yet
- Commercial licensing terms or pricing for enterprise or production use of Ideogram 4 weights are not disclosed in the model card
- Whether the non-commercial license explicitly permits academic fine-tuning for published research is not addressed
- Minimum VRAM requirements for running the fp8 variant at 2048px are not specified in the model card
Originally reported by huggingface.co
Read the original article →Original headline: Ideogram 4 Releases as Open-Weights Text-to-Image Model — 9.3B-Parameter DiT, #1 on Design Arena, Non-Commercial License