Unlimited-OCR: A 3.3B Multilingual Model That Parses Full Documents Without Cropping

OCR on full documents has often meant cropping, stitching, and losing context. Unlimited-OCR flips the perspective: a single pass over images, PDFs, and multi-page files, with 32,000 output tokens that let you extract text and structure without breaking the logical flow.

From cropping to global parsing

In traditional workflows, OCR parsing is applied to pre-cropped regions, losing the relationship between distant elements on the page. Unlimited-OCR is designed to process the entire document at once, handling real-world layouts through two distinct modes: “base” for linear text and “gundam” for complex arrangements like tables and multiple columns. Rather than forcing a single encoder, the model lets users choose the best preprocessing, reducing reliance on external pipelines.

A license that opens the door to self-hosted deployment

The MIT license is no afterthought: it allows integration into proprietary products, fine-tuning on sensitive data, and on-premise installation without licensing fees. For those evaluating on-prem deployment, this shifts the equation. Data stays within corporate boundaries and the model can be tailored to specific domains (legal, medical, tax) without vendor lock-in. AI-RADAR has repeatedly noted how the combination of open licensing and manageable model size is fueling a new wave of internally managed document intelligence.

SGLang and streaming: inference as an industrial building block

Serving via SGLang with an OpenAI-compatible interface and streaming responses brings Unlimited-OCR close to the stacks already used for other LLMs. In an on-premise context, this means sharing GPUs and orchestration systems (Kubernetes, Docker) without introducing dedicated services, keeping TCO in check. The 3.3B parameter model, even in FP16, occupies less than 7 GB of VRAM, making it suitable for consumer cards or budget-conscious datacenter rigs. The 32K output window allows processing entire contracts or technical reports without splitting them—an architectural advantage beyond mere practicality.

Beyond DeepSeek-OCR: what the move signals

The project explicitly references the DeepSeek-OCR style but raises the bar with longer output and multilingual support. Its release on ModelScope—a Chinese platform competing with Hugging Face—signals an intent to reach a global audience of developers and system integrators. For IT decision-makers, the question isn’t whether Unlimited-OCR will top every benchmark, but whether the license-size-serving combination is enough to bring critical document OCR inside the corporate perimeter. Trade-offs remain: quality on non-Latin scripts, handling low-resolution scans, integration with existing document databases. Yet the direction—open, compact, on-prem-ready models—is now unmistakable.