NuExtract3: An Open-Weight VLM for Advanced Document Extraction

Numind, a company specializing in AI solution development, recently released NuExtract3, a 4-billion-parameter Visual Language Model (VLM). Based on the Qwen3.5-4B architecture and distributed under an Apache-2.0 license, this open-weight model was designed to address the challenges of extracting information from complex and visually structured documents. NuExtract3 is presented as the successor to NuMarkdown, expanding its predecessor's capabilities.

NuExtract3's primary goal is to make data extraction more practical and efficient from a wide range of inputs, including PDFs, screenshots, forms, tables, receipts, invoices, and multi-page documents. This capability is crucial for companies managing large volumes of documentation that need to automate information acquisition and analysis processes while maintaining control over their sensitive data.

Technical Details and Key Features

NuExtract3 has been specifically optimized for several critical operations in the field of document processing. Its main functionalities include converting document images into Markdown format, extracting structured data using predefined JSON templates, and effectively handling tables, forms, and pages with complex layouts. The model is capable of processing both textual and visual inputs, offering remarkable flexibility.

The model was trained on a node equipped with 8 H100 GPUs for three days, allowing it to process an extended context and ensure good performance even with long documents. To achieve the best results in terms of quality and Inference speed, particularly for Markdown conversion, Numind suggests processing documents page by page. This approach allows for better parallelization of workloads, optimizing the utilization of available computational resources.

On-Premise Deployment and Infrastructure Requirements

One of the most relevant aspects of NuExtract3 for our audience is its strong focus on self-hosted deployment. Numind has provided extensive documentation and offers the model weights in various formats, including Safetensors, GGUF, and MLX. This flexibility makes the model extremely easy to integrate into existing infrastructures, even with limited hardware resources.

For Inference, NuExtract3 requires a minimum of only 4GB of VRAM, making it accessible even on less powerful hardware or edge devices. The availability of multiple Quantization options (such as GPTQ, W8A8, FP8, Q4, Q6) allows operators to further optimize memory usage and execution speed based on specific needs and hardware constraints. The model has been tested with Frameworks like vLLM, SGLang, and llama.cpp, ensuring compatibility with widely adopted serving solutions in the industry. This focus on local deployment is crucial for organizations prioritizing data sovereignty and control over the Total Cost of Ownership (TCO).

Strategic Implications for the Enterprise

The introduction of an open-weight and easily self-hostable VLM like NuExtract3 offers significant strategic implications for businesses. The ability to keep AI workloads within one's own infrastructure perimeter addresses growing needs in terms of regulatory compliance, data security, and privacy management, especially in regulated sectors. Air-gapped environments or those with stringent data residency requirements can greatly benefit from solutions that do not depend on external cloud services.

For those evaluating on-premise deployment, NuExtract3 represents a concrete alternative to proprietary cloud-based solutions, offering greater control over the entire document processing pipeline. While cloud solutions can offer immediate scalability, self-hosted alternatives like NuExtract3 allow for optimizing TCO in the long run and customizing the environment based on specific operational needs. AI-RADAR offers analytical frameworks on /llm-onpremise to evaluate the trade-offs between these different deployment strategies, helping decision-makers choose the most suitable approach for their objectives.