reddit.com via Reddit May 21st 2026

AMD BC-250 PS5 APU Runs 35B LLM at 37 tok/s

amd open source inference edge ai local-llm cheap-compute hardware

Key insights

AMD BC-250 boards deliver 37.5 tok/s on a 35B MoE model using 16GB GDDR6 unified memory for under $150.
Vulkan is the sole working GPU compute path; ROCm is unsupported and Linux is mandatory for this setup.
A public GitHub guide documents Ollama and stable-diffusion.cpp on this hardware, lowering the barrier to entry.

Why this matters

Sub-$150 hardware running a 35B MoE model at usable inference speeds resets the cost floor for local AI deployment, which matters for anyone building privacy-sensitive or air-gapped applications. The Vulkan-only constraint reveals a gap in AMD's software ecosystem that leaves capable silicon stranded outside the mainstream ROCm toolchain, a problem that compounds as salvaged and off-label hardware becomes a real category. Community-driven integration work on orphaned mining hardware is now outpacing vendor support timelines, which signals where future low-cost inference capacity will actually come from.

Summary

Salvaged AMD BC-250 boards — the Zen 2 / RDNA2 APUs pulled from decommissioned crypto mining rigs and sharing silicon with the PlayStation 5 — are showing up on eBay for $50 to $150 and turning out to be surprisingly capable local inference machines. A developer in the LocalLLaMA community has published benchmark results and a full GitHub guide covering Ollama, Vulkan-based GPU inference, and stable-diffusion.cpp image generation on this hardware. The headline number: 37.5 tokens per second on a 35-billion-parameter MoE model, running entirely on 16GB of unified GDDR6 memory with a 64K context ceiling. Vulkan is the only working GPU compute path; ROCm support is absent and Linux is required. Essentially: (AMD, the PS5 supply chain) created hardware that crypto miners bought, abandoned, and are now quietly powering local AI workloads. - 16GB GDDR6 unified memory handles a 35B MoE model without offloading, which consumer GPU cards at this price point cannot match. - The Vulkan-only constraint means standard ROCm tooling and most GPU-optimized inference stacks won't run without workarounds. - A public GitHub guide lowers the setup barrier, making this a documented path rather than a one-off experiment. The broader pattern is hardware designed for one compute boom getting repurposed for the next one, with the community doing the integration work that AMD has no commercial incentive to fund.

Potential risks and opportunities

Risks

Developers building production pipelines on BC-250 hardware face no vendor support path if Vulkan inference regressions appear in future llama.cpp or Ollama releases.
AMD could face reputational pressure from the open-source community if ROCm continues to exclude salvaged consumer silicon while Nvidia's CUDA ecosystem covers a broader device range.
eBay supply of BC-250 boards is mining-liquidation dependent and could dry up or reprice sharply within months, stranding users mid-deployment on hardware they cannot replace at cost.

Opportunities

Ollama and llama.cpp maintainers could capture a growing low-cost hardware segment by formally documenting and testing Vulkan inference paths on AMD APU silicon.
Local AI appliance vendors (Framework, Minisforum, similar) could productize the BC-250 form factor as a purpose-built inference node before eBay supply dries up.
AMD has a narrow window to publish official ROCm support for Oberon-class silicon and convert an active hobbyist community into a documented enterprise-grade low-cost inference tier.

What we don't know yet

Whether AMD has any plans to extend ROCm support to BC-250 / Oberon silicon, or whether Vulkan remains the permanent ceiling for this hardware.
How BC-250 throughput and memory bandwidth compare to similarly priced discrete GPUs (e.g., used RX 6600 XT) under the same Vulkan inference stack.
Current eBay supply volume and price stability for BC-250 boards, given that mining rig liquidation is the sole source and inventory is finite.

Originally reported by reddit.com

Read the original article →

Original headline: r/LocalLLaMA: Developer Explores Salvaged AMD BC-250 (PS5 APU) Boards at $50–150 for Local LLM Inference — 35B MoE at 37.5 tok/s on 16GB GDDR6, Vulkan-Only Stack