reddit.com via Reddit

r/LocalLLaMA: Qwen3.6-35B on MacBook M4 Pro Scores 37.8% on Terminal-Bench 2.0, Rivalling Claude Code + Sonnet 4.5 on Local Hardware

open source coding tools edge ai local-llm benchmark

Summary

A r/LocalLLaMA developer ran Qwen3.6-35B-A3B (Q6_K_XL) locally via llama.cpp on an M4 Pro 48GB MacBook on Terminal-Bench 2.0, achieving a 3-run average of 37.8% — comparable to the published benchmark scores of Claude Code + Sonnet 4.5, a frontier commercial agentic stack. The result is one of the first published data points showing a local open-weight model competing on a real agentic coding benchmark against a paid frontier service on consumer-grade hardware with no cloud costs.