infoq.com web signal

Apple Core AI Backs On-Device LLMs From 3B to 70B Parameters

By Alexis Dufresne Published June 21, 2026 at 06:42 UTC Updated June 21, 2026 at 06:45 UTC

apple generative ai edge ai on-device-ai developer-tools apple-silicon

TL;DR

Apple's Core AI, the official successor to Core ML for neural networks, supports transformer models from 3B to 70B parameters across iPhone, iPad, Mac, and Vision Pro.
Ahead-of-time compilation and zero-copy data paths enable near-instant model load times without per-token cloud costs or server infrastructure.
Apple now recommends three frameworks: Core ML for classic ML, Core AI for transformers, and MLX for custom model weights.

Apple's announcement of Core AI at WWDC 26 is about staking out the on-device AI runtime for the transformer era specifically. InfoQ's Sergio De Simone reports that Apple now recommends a three-tier framework lineup: Core ML for classic non-neural machine learning, Core AI for neural networks and transformers, and MLX for custom model weights. Core AI, positioned as Core ML's official successor for that middle tier, supports models ranging from compact 3B-parameter vision models to 70B-parameter reasoning models across iPhone, iPad, Mac, and Apple Vision Pro through a unified Swift API spanning CPU, GPU, and Neural Engine.

The technical choices point at what developers actually need to ship LLM-powered features. Ahead-of-time compilation shifts expensive model preparation work offline so users see near-instant load times. A memory-safe Swift API with zero-copy data paths and fine-grained inference memory control is aimed at extracting serious performance from constrained hardware. The PyTorch conversion path, via `torch.export.ExportedProgram` and a TorchConverter, means the existing open-weight model ecosystem plugs in directly rather than requiring developers to rebuild models from scratch for Apple's platform.

The business case is straightforward: every inference call that stays on-device is one that doesn't generate per-token cloud costs or require server infrastructure. For independent developers and small teams building generative features into apps, that arithmetic matters considerably. Healthcare, legal, and finance apps handling sensitive user data also get a credible path to powerful LLMs without the compliance exposure of routing data through cloud endpoints.

The honest caveat is that 'up to 70B parameters across iPhone and Mac' spans an enormous range of real hardware. What runs comfortably on a high-RAM Mac is a very different deployment from what any iPhone will support, and the reporting doesn't clarify which devices can handle which model sizes in practice. Apple Silicon is required across the board, which excludes older devices and creates fragmentation decisions for developers who need broad install base coverage.

What the coverage doesn't address is the performance ceiling or the App Store posture: how on-device inference via Core AI compares to cloud-hosted endpoints at equivalent parameter counts, and whether App Store review will allow apps to ship large open-weight model files. Those answers will determine whether Core AI becomes a primary deployment target or a supplementary option for privacy-sensitive use cases specifically.

Originally reported by infoq.com

Read the original article →

Original headline: Apple Launches Core AI at WWDC 26 — On-Device LLM Framework Succeeds Core ML, Supports Up to 70B-Parameter Models Across iPhone, iPad, Mac, and Vision Pro