Reddit via Reddit June 10th 2026

r/LocalLLaMA: Bonsai LM 1-Bit and 1.58-Bit Models (1.7B–8B) Benchmarked Across All Power Modes on $250 Jetson Orin Nano Super via llama.cpp CUDA

inference edge ai open source edge-inference quantization

Summary

A developer benchmarked five Bonsai LM models at 1-bit and 1.58-bit quantization (1.7B to ~8B parameters) on a $250 Jetson Orin Nano Super 8GB using llama.cpp CUDA, testing all four power modes from 7W to MAXN. The results demonstrate usable inference at ultra-low bit precision on sub-$300 edge hardware with full CUDA acceleration, directly relevant for embedded and IoT AI deployment without cloud dependency. Token-per-second figures across the full power envelope are included in the post for comparison.

Originally reported by Reddit

Read the original article →

Original headline: r/LocalLLaMA: Bonsai LM 1-Bit and 1.58-Bit Models (1.7B–8B) Benchmarked Across All Power Modes on $250 Jetson Orin Nano Super via llama.cpp CUDA