PaddlePaddle's PP-OCRv6 beats billion-scale VLMs at OCR
TL;DR
- PP-OCRv6 ships three open-weight tiers on a PPLCNetV4 backbone: tiny 1.5M, small 7.7M, and medium 34.5M parameters.
- The 34.5M medium tier reaches 86.2% detection Hmean and 83.2% recognition accuracy on PaddleOCR's in-house benchmark.
- Authors report the medium model surpasses Qwen3-VL-235B, GPT-5.5 and Gemini-3.1-Pro despite using orders of magnitude fewer parameters.
A 34.5 million parameter OCR model that reportedly beats Qwen3-VL-235B, GPT-5.5 and Gemini-3.1-Pro at reading text in images. That is the headline claim PaddlePaddle is making for PP-OCRv6, released to Hugging Face alongside a writeup on the Hugging Face blog and a paper on arXiv.
The lineup is three tiers built on a shared PPLCNetV4 backbone: a tiny 1.5M model, a small 7.7M model, and a medium 34.5M model. On PaddleOCR's in-house multi-scenario OCR benchmark, the medium tier hits 86.2% detection Hmean and 83.2% recognition accuracy, gains of 4.6 and 5.1 percentage points over PP-OCRv5_server. Against the VLMs, the paper claims the medium model beats Qwen3-VL-235B by 8.3 points on recognition while using roughly 6,800× fewer parameters, and exceeds Gemini-3.1-Pro on detection Hmean by 39.4 points.
The speed claims push in the same direction. PaddlePaddle reports the medium model running 5.2× faster on Intel Xeon with OpenVINO, the tiny model 6.1× faster than PP-OCRv5_mobile on Apple M4, and roughly 1.1× faster on an A100. For practitioners running invoice, form, or archive pipelines at volume, the relevant comparison is not GPT-5.5 quality on a benchmark, it is "can I delete the API call and run this on the CPUs I already have." On these numbers, yes.
The honest caveats are familiar ones. The headline accuracy belongs to the medium tier; the tiny model lands at 80.6% detection and 73.5% recognition, which is a much narrower edge story. The VLM comparison runs on PaddleOCR's own benchmark, and reporting how a generalist model performs at a single specialist task tends to look worse for the generalist than a broader eval would. What the writeup does not give you is independent third-party verification, behavior on handwriting or degraded scans, or coverage of scripts outside the 50 supported languages.
The direction is still the part worth watching. Document OCR has been quietly sliding into the "just use a VLM" bucket, and a 34.5M parameter open-weights model that beats them at it on PaddlePaddle's own numbers is the strongest argument yet that the small-specialist lane still wins where the task is narrow and the volume is high.
Shared on Bluesky by 2 AI experts
-
PP-OCRv6 just released by Baidu huggingface.co/collections/... ✨ tiny 1.5M / small 7.7M / medium 34.5M ✨ 48+ languages ✨ Supports handwritten/printed/industrial/screen and card text ✨ Edge friendly deployment
View on Bluesky →
Originally reported by huggingface.co
Read the original article →Original headline: PP-OCRv6 - a PaddlePaddle Collection