Cohere Command A+ 218B runs on Apple Silicon via MLX
Key insights
- Cohere Command A+ activates only 25B of 218B parameters per token, making local inference feasible on high-memory M-series Macs.
- The Apache 2.0 license allows commercial deployment without Cohere licensing agreements, unlike many comparable-scale models.
- A community-driven MLX pull request, not an official Cohere release, enabled Apple Silicon support for this 218B MoE model.
Why this matters
Running a 218B-parameter enterprise model locally on Apple Silicon removes the cloud API dependency for organizations with data-residency or latency constraints, which directly expands the viable deployment surface for Cohere's open-weight stack. The MLX community's ability to port models of this scale before the original vendor does signals that open-weight release under permissive licenses increasingly shifts the integration roadmap from the model provider to the ecosystem. For founders and infrastructure teams, this confirms that MoE sparsity at the 25B-active-parameter range is now within reach of prosumer hardware, which reshapes the cost baseline for on-device enterprise inference in 2026.
Summary
Cohere's Command A+, a 218-billion-parameter mixture-of-experts model released under Apache 2.0, can now run natively on Apple Silicon after a community developer built a cohere2_moe implementation for the MLX framework and opened a pull request in the mlx-lm repository.
The model's architecture activates only 25 billion of its 218 billion parameters per forward pass, drawing on 8 of 128 experts plus a single shared expert. That sparsity is what makes local inference on M-series hardware plausible at all -- the active compute footprint is a fraction of the nominal model size, though memory bandwidth requirements remain steep.
Essentially: (Cohere, Apple) a large open-weight enterprise model meets consumer-grade silicon without a GPU cluster in the middle.
- 218B total parameters, 25B active per token, 128 experts with top-8 routing and one shared expert
- Pull request now open in mlx-lm; commenters independently confirmed successful loading on M-series hardware
- Apache 2.0 license means downstream commercial use requires no royalty or special agreement with Cohere
The successful port extends the frontier of what counts as "locally runnable" for enterprise-grade open-weight models, compressing a capability that previously required multi-GPU infrastructure into hardware that sits on a desk.
Potential risks and opportunities
Risks
- If the mlx-lm PR stalls without merge, downstream developers building on the cohere2_moe branch face an unmaintained fork as MLX evolves rapidly
- Users loading the full 218B weights without confirming memory headroom risk system instability on 128GB unified-memory Macs, potentially discouraging adoption before official hardware guidance exists
- Cohere's enterprise positioning could be complicated if the Apache 2.0 local path becomes the default for cost-sensitive buyers, reducing commercial API revenue leverage against OpenAI and Anthropic
Opportunities
- Apple has a direct incentive to publicize Command A+ MLX performance on the M3 Ultra and upcoming M4 Ultra as a proof point for the Mac Pro's enterprise AI positioning
- MLX tooling vendors and fine-tuning platforms (Axolotl, Unsloth) could rapidly add Command A+ support to capture developer mindshare while the model is generating community momentum
- Enterprises in regulated industries (legal, healthcare, finance) evaluating on-premises LLM deployments now have a credible 218B-parameter Apache 2.0 option to benchmark against hosted alternatives, strengthening the negotiating position of on-prem AI infrastructure vendors
What we don't know yet
- Peak unified memory requirement for full-weight loading on M-series hardware -- 192GB Mac Studio or 128GB MacBook Pro threshold not confirmed in the thread
- Whether Cohere plans to officially support or co-maintain the MLX implementation, or whether mlx-lm will carry it as a community contribution only
- Inference throughput benchmarks (tokens per second) on specific M-series chips are absent from the post and comments as of the thread date
Originally reported by Reddit / r/LocalLLaMA
Read the original article →Original headline: r/LocalLLaMA: Community Developer Ports Cohere Command A+ (218B MoE, Apache 2.0) to Apple Silicon via MLX — PR Open, Native M-Series Run Confirmed