github.com via Reddit

LG EXAONE 4.5 33B VLM Added to llama.cpp Mainline

By Alexis Dufresne Published June 1, 2026 at 11:06 UTC Updated June 1, 2026 at 13:55 UTC

open source edge ai local-llm open-source vision-language llama-cpp

Key insights

PR #21733, authored by nuxlear from LGAI-EXAONE and merged by ngxson, added EXAONE 4.5 to llama.cpp mainline on June 1, 2026.
EXAONE 4.5's vision encoder adapts Qwen2-VL's architecture, replacing M-RoPE with GQA and 2D RoPE and using different image boundary tokens.
GGUF quantizations for EXAONE 4.5 33B are already published on Hugging Face under LGAI-EXAONE, enabling immediate local inference.

Why this matters

Mainline llama.cpp integration means EXAONE 4.5 33B is now accessible through the standard toolchain, removing the friction of maintaining or building from a separate repository for local vision-language inference. The vision encoder design, adapted from Qwen2-VL with targeted GQA and 2D RoPE substitutions, provides AI practitioners a direct architectural reference point between two prominent open multimodal implementations. The PR's explicit disclosure that modeling code was written with AI assistance marks a notable data point in how open-source model integrations are now being authored and reviewed by the llama.cpp maintainer community.

Summary

LG AI Research's EXAONE 4.5, a 33B open-weight vision-language model, landed in llama.cpp mainline on June 1, 2026, via PR #21733. Contributor nuxlear from LGAI-EXAONE authored the implementation, reviewed by CISC and merged by ngxson. EXAONE 4.5 builds on the same LLM architecture as EXAONE 4, extending it with multimodal vision support through an encoder adapted from Qwen2-VL that substitutes GQA and 2D RoPE for M-RoPE, and uses distinct image boundary tokens rather than Qwen's format. Essentially: (LG AI Research, llama.cpp community) a 33B multimodal model now runs natively in mainline with no separate build stack required. - n_kv_heads was added to the CLIP model to make the ViT compatible with the GQA structure - GGUF quantizations are already available at LGAI-EXAONE/EXAONE-4.5-33B-GGUF on Hugging Face - Authors disclosed the modeling code was implemented with help from an AI assistant With mainline support, EXAONE 4.5 joins the growing set of locally runnable vision-language models available directly through the standard llama.cpp toolchain.

Potential risks and opportunities

Risks

AI-assisted modeling code reviewed by CISC and ngxson may contain subtle bugs in the GQA or 2D RoPE vision encoder path, potentially surfacing only with specific image inputs or quantization levels.
EXAONE 4.5 shares architectural dependencies with Qwen2-VL's vision encoder in llama.cpp, meaning any future fixes required in that path could require simultaneous patches to EXAONE 4.5 as well.
Without benchmark figures published in the PR, users deploying EXAONE 4.5 via llama.cpp have no grounded performance baseline from this integration, risking adoption built on unverified expectations.

Opportunities

Inference tooling projects like Ollama and LM Studio can now add EXAONE 4.5 to their supported model libraries directly, with no custom build steps required following the llama.cpp mainline merge.
Multimodal application developers gain a 33B VLM option from LG AI Research with GGUF quantizations already live at LGAI-EXAONE/EXAONE-4.5-33B-GGUF, lowering integration overhead significantly.
LG AI Research gains community-driven testing and optimization across the broad llama.cpp user base, potentially surfacing inference quality issues faster than internal benchmarking alone could.

What we don't know yet

No benchmark comparisons appear in the PR, so how EXAONE 4.5 performs against other 33B vision-language models in llama.cpp inference remains unestablished from this integration.
Whether the GQA and 2D RoPE deviations from Qwen2-VL's vision encoder affect output quality at lower quantization levels is not addressed in the PR.
The PR discloses AI assistance was used for the modeling code but does not document what additional verification steps reviewers CISC and ngxson applied beyond standard PR review.

Originally reported by github.com

Read the original article →

Original headline: r/LocalLLaMA: EXAONE 4.5 — LG's 33B Open-Weight Vision-Language Model — Added to llama.cpp Mainline via PR #21733