mistral.rs v0.8.2: Up to 2.8x Faster CUDA Inference Than llama.cpp on GB10, B200, and H100
Summary
The mistral.rs maintainer released v0.8.2 on June 1, 2026, claiming up to 2.8x faster CUDA throughput than llama.cpp on Gemma 4 dense and MoE models, with benchmarks run on NVIDIA GB10, H100, and B200 hardware. The release focuses exclusively on CUDA performance optimization and the developer's self-posted benchmark tables show mistral.rs leading llama.cpp at every tested point across the hardware lineup. The project is an active open-source inference engine built for local model serving.
Originally reported by reddit.com
Read the original article →Original headline: mistral.rs v0.8.2: Up to 2.8x Faster CUDA Inference Than llama.cpp on GB10, B200, and H100