williamangel.net web signal May 17th 2026

Apple Silicon local inference costs 3x more than OpenRouter

apple inference edge ai local-inference llm-economics edge-ai

Key insights

Local LLM inference on an M5 MacBook Pro costs roughly $1.50 per million tokens when hardware depreciation and electricity are included.
OpenRouter cloud equivalents run approximately 3x cheaper and 2x faster than local Apple Silicon inference under baseline assumptions.
Under pessimistic hardware-cost scenarios, on-device Apple Silicon inference can be up to 10x more expensive than cloud API alternatives.

Why this matters

Developers and founders building privacy-first products that rely on local inference have been operating under a cost assumption this analysis directly falsifies, meaning product pricing and margin models may need revision. The finding shifts the framing of on-device AI from a cost-saving strategy to an explicit privacy tax, which has real consequences for enterprise procurement decisions where CFOs, not engineers, approve infrastructure spend. As Apple continues positioning Apple Silicon as an AI-capable platform and OpenRouter expands its model catalog, the competitive economics of local vs. cloud inference will increasingly drive architecture decisions at scale.

Summary

Running local LLMs on Apple Silicon costs materially more than most privacy-focused developers assume, according to a developer analysis published May 17 that is trending on Hacker News at 253 points. The author benchmarked an M5 MacBook Pro against equivalent models on OpenRouter, amortizing hardware depreciation and electricity into a per-token cost. Under baseline assumptions, local inference lands at roughly $1.50 per million tokens, around three times the cloud price, and delivers about half the throughput. Under pessimistic hardware-cost assumptions, the gap widens to 10x. Essentially: (Apple Silicon, OpenRouter) the privacy premium for on-device inference is a real, measurable line item, not a rounding error. - Local M5 inference: ~$1.50 per million tokens vs. OpenRouter equivalents at roughly $0.50, at approximately half the tokens-per-second. - The 10x pessimistic estimate factors in faster hardware depreciation cycles and higher electricity costs, both plausible for power users. - The analysis directly targets a widespread assumption in the developer community that local inference is cost-competitive with cloud once you own the hardware. For practitioners deciding where to run inference, the choice is no longer just technical but financial, and the privacy premium now has a dollar figure attached to it.

Potential risks and opportunities

Risks

Privacy-focused SaaS vendors (Notion, Obsidian, Bear) that have publicly committed to local inference as a selling point face credibility pressure if enterprise customers absorb this cost analysis and push back on pricing.
Apple risks losing developer mindshare for on-device AI workloads to cloud-first inference providers if the cost gap hardens into a community consensus before Apple can respond with updated efficiency benchmarks.
Startups that raised on a 'local-first AI' thesis and projected infrastructure savings from avoiding cloud APIs may face investor scrutiny over unit economics within the next one to two quarters.

Opportunities

OpenRouter and competing inference aggregators (Together AI, Fireworks AI) can directly target privacy-sensitive developer segments with transparent per-token pricing that undercuts local inference TCO, using this analysis as third-party validation.
Hardware leasing and subscription models for Apple Silicon (e.g., MacStadium, Hetzner dedicated Mac) could reframe depreciation as a predictable OPEX, narrowing the cost gap and making local inference more financially legible for teams.
Inference optimization tooling vendors (Ollama, LM Studio, llama.cpp contributors) have a clear opening to publish competing benchmarks showing optimized local inference performance, capturing the developer audience currently being pushed toward cloud by this analysis.

What we don't know yet

Whether the analysis accounts for network egress costs and latency penalties that cloud API calls incur in bandwidth-constrained or air-gapped enterprise environments, which could partially offset the price gap.
How the cost comparison shifts for smaller, quantized models (e.g., 7B or 13B parameter variants) that may run more efficiently on Apple Silicon than the larger models likely benchmarked here.
Whether Apple's forthcoming M5 Pro and M5 Max chips, with higher unified memory bandwidth, materially change the tokens-per-second figure used in this analysis.

Originally reported by williamangel.net

Read the original article →

Original headline: HN: 'Apple Silicon Costs More Than OpenRouter' — Developer Analysis Finds Local M5 Inference Runs ~3× Pricier and ~2× Slower Than Cloud API; 253 Points