huggingface.co web signal

Baidu Releases MIT-Licensed 3B OCR Model for Long Documents

baidu open source open-source document-ai

TL;DR

  • Baidu's Unlimited-OCR is a 3-billion-parameter MIT-licensed model that processes multi-page PDFs in a single inference pass.
  • The model has a 32,768-token context window and supports vLLM, SGLang, Ollama, llama.cpp, and Hugging Face Transformers.
  • No benchmark results are included in the release; training data and language coverage are also not disclosed.

Baidu published Unlimited-OCR to Hugging Face, a 3-billion-parameter model for document parsing released under an MIT license. The headline capability is what the model card calls "One-shot Long-horizon Parsing": processing multi-page PDFs and image stacks in a single inference pass rather than requiring documents to be pre-sliced page by page. A 32,768-token context window supports that approach on longer documents.

Most open-source OCR pipelines force you to cut input into individual pages, run each through a model separately, then stitch the outputs back together. The architectural bet here is that a sufficiently long context window lets the model handle that coherence itself. According to the model card, Unlimited-OCR builds on DeepSeek-OCR and DeepSeek-OCR-2, and deploys via Hugging Face Transformers, vLLM, SGLang, Docker, Ollama, and llama.cpp, meaning it should slot into most existing inference setups without significant rework.

The MIT license matters for adoption speed. Commercial teams in document-heavy sectors can integrate this without negotiating licensing terms, and support for Ollama and llama.cpp means fully local inference is accessible for organizations with data-sensitivity requirements. The model runs in BF16 at 3 billion parameters, which puts it within reach of standard GPU hardware rather than specialized infrastructure.

The honest caveat is that the model card provides no benchmark results. There is no comparison against established OCR datasets, cloud services, or other open models to anchor the performance claims. Training data and language coverage are also not disclosed, which matters considerably if your documents include non-Latin scripts or lower-resource languages. Citation information on the model card is listed as coming soon.

For teams currently dependent on multi-step chunking pipelines or cloud OCR APIs, this is worth a test run against your own document distribution before drawing conclusions about production fitness. Community evals will likely fill the benchmark gap quickly given the permissive license and broad deployment support.