DLLM-Searcher: A New Approach for Search Agents

Diffusion Large Language Models (dLLMs) offer unique efficiency advantages due to their inherently parallel decoding mechanism and flexible generation paradigm. However, the practical deployment of search agents is limited by latency due to the serial execution of multi-round reasoning, tool calling, and waiting for tool responses.

DLLM-Searcher addresses these challenges with an optimization framework for dLLM-based search agents. The framework solves the problem of limited agent capabilities through a two-stage post-training pipeline: Agentic Supervised Fine-Tuning (Agentic SFT) and Agentic Variance-Reduced Preference Optimization (Agentic VRPO). This improves the information seeking and reasoning capabilities of the dLLM.

To mitigate latency, DLLM-Searcher introduces P-ReAct, a novel paradigm that guides the model to prioritize tool_call instructions, allowing it to continue reasoning while waiting for the tool's response. Experimental results demonstrate that DLLM-Searcher achieves performance comparable to mainstream LLM-based search agents, with a 15% inference acceleration thanks to P-ReAct.

The project code is available on an anonymous repository.