Deep Research with Local LLMs: A Challenge for 2026

A user on the Reddit forum r/LocalLLaMA is trying to replicate the "Deep research" functionality of ChatGPT using large language models (LLMs) running locally. The goal is to overcome the limitations imposed by the paid version of ChatGPT while maintaining high accuracy.

Current Hardware and Software Setup

The user has a system with 3x 3090 GPUs, allowing them to run large models like GPT-OSS-120B or GLM Air at VRAM speed, or 30B models in Q8 for greater precision. Currently, they use OpenWebUI along with a local instance of SearXNG for searching and summarizing information.

Limitations and Search for Alternatives

Despite the available computing power, the user finds that the accuracy of the local setup is not comparable to that of ChatGPT, especially regarding the ability to perform complex search and analysis loops. They are therefore seeking suggestions and alternative configurations to improve the performance and accuracy of research with local LLMs.

For those evaluating on-premise deployments, there are trade-offs to consider. AI-RADAR offers analytical frameworks on /llm-onpremise to evaluate these aspects.