reddit.com via Reddit

r/LocalLLaMA: AMD Strix Halo Crosses 100 t/s on Qwen3 30B-A3B in llama.cpp — First Triple-Digit Result Documented on Consumer APU at 30B Scale

amd inference edge ai local-llm amd benchmark

Summary

A developer benchmarked AMD Strix Halo (Ryzen AI MAX+ 395) in a ~$4,000 unified-memory laptop and documented 100+ tokens per second on Qwen3 30B-A3B via llama.cpp, the first triple-digit speed confirmed on a consumer APU at 30B-parameter scale. The post pairs raw llama-bench data with context on what the result means for developers considering high-end ARM laptops as serious local-AI workstations without a discrete GPU. Community discussion is validating the numbers and mapping the remaining gap versus CUDA on dedicated hardware.