QUEST-35B: The open-source Deep Research agent trained with 32 H100s

Fewer than thirty-two NVIDIA H100 GPUs and roughly 8,000 synthetic examples were enough for the NLP team at Ohio State University to train QUEST-35B, an autonomous research agent that competes with some frontier "deep research" systems, now released entirely open source. Not only the model, but also the training recipe, code, and datasets are publicly available. A move that redraws the boundaries of what can be built on-premise, without depending on cloud APIs or proprietary models.

Architecture and hidden costs

QUEST-35B is a 35-billion-parameter Large Language Model, a size that makes it runnable on hardware many organizations already own or can rent. The use of only ~32 H100s for training — a relatively modest number — and synthetic data instead of human annotations lowers the total cost of ownership and simplifies reproducibility. The team documented every step, from fine-tuning to research flow control, making the entire pipeline adaptable to domain-specific scenarios.

Data sovereignty and control

For companies operating in regulated sectors, the ability to run an advanced research agent entirely self-hosted means keeping internal documents, logs, and sensitive queries under lock and key. There are no calls to external endpoints, and compliance with regulations such as GDPR becomes manageable directly on local infrastructure. This model, with its open license, allows security audits and customization without vendor lock-in constraints.

The remaining gap

Despite competitive benchmark results, a gap remains with closed deep research systems: the ability to tap into fresh knowledge bases, handle complex multi-turn conversations, and, above all, scale across heterogeneous tool sets. However, QUEST-35B's transparency offers the community a testbed to close these distances, experimenting with retrieval-augmented generation, memory optimization, and local orchestration.

Toward enterprise deployment

Those evaluating on-premise deployment know the trade-off is not only technical. There is TCO: upfront GPU investment versus recurring API costs. But there is also control, latency, and the ability to fine-tune with proprietary data. QUEST-35B shows that with a contained cluster and an open recipe, a university lab has already taken the first step. The next one is up to enterprise teams.