DeepRead: An Agent for Advanced Document Search

A new approach to agentic search, called DeepRead, promises to significantly improve the ability of large language models (LLMs) to answer complex questions about large documents. Unlike conventional methods that treat documents as simple sets of text chunks, DeepRead leverages the intrinsic structure of documents, such as hierarchical organization and sequential discourse structure.

DeepRead uses an LLM-based OCR model to convert PDFs into structured Markdown format, preserving headings and paragraph boundaries. It then indexes documents at the paragraph level, assigning each paragraph a metadata key that encodes its section identity and order within the section. This allows the agent to locate relevant paragraphs and read contiguously within a specific section.

Experiments show that DeepRead achieves significant improvements over traditional agentic search approaches in document question answering. Behavioral analysis reveals a reading and reasoning paradigm similar to human behavior, which consists of "locate then read".

For those evaluating on-premise deployments, there are trade-offs to consider carefully. AI-RADAR offers analytical frameworks on /llm-onpremise to evaluate these trade-offs.