Legal Case Retrieval: A Self-Evolving LLM Agent Refines Rules Without Training

Legal case retrieval remains a complex challenge for professionals and automated systems alike. The intricate nature of legal language and the need for extremely precise lexical alignment between queries and relevant cases make the task arduous. While dense retrieval models have made significant progress, empirical studies continue to indicate that BM25, a term-frequency-based ranking algorithm, maintains a strong baseline position in this domain.

This scenario has motivated the proposal of a new self-evolving framework, designed for rule-driven query rewriting, which aims to enhance BM25 without requiring any parameter training for the framework itself. At the heart of this innovation is an agent based on Large Language Models (LLMs), equipped with an automatic evaluation environment. This setup allows the agent to iteratively create rewriting rules, plan validation experiments over various rule combinations, and eliminate ineffective rules based on historical feedback.

The Self-Evolving Mechanism and the Role of LLMs

The framework's operation relies on a continuous cycle of learning and refinement. The LLM agent does not merely generate rules; it actively tests them in a simulated environment, collecting data on their performance. This feedback is crucial: it enables the agent to identify the most effective rules and discard those that do not contribute to improving retrieval precision. The LLM's intrinsic ability to process and interpret these experimental results, combined with its prior knowledge of rule elimination, plays a fundamental role in refining the rule set.

It has been observed that the framework's effectiveness is particularly evident when a "high-capacity core LLM" is employed. This suggests that the complexity and vastness of the knowledge encoded within such models are essential for generating pertinent rules and for their intelligent evaluation. For organizations operating in sensitive sectors like the legal domain, the use of high-capacity LLMs raises important questions regarding deployment infrastructure, often leaning towards self-hosted or on-premise solutions to ensure data sovereignty and compliance.

Results and Implications for the Legal Sector

The method was evaluated on the Chinese legal case retrieval benchmark, LeCaRD-v2. Experimental results demonstrated that the proposed framework outperforms non-evolutionary baselines, including approaches based on human-designed rules and greedy rule selection. This highlights a significant advantage in adopting a dynamic, self-optimizing approach over static methodologies.

The implications for the legal sector are considerable. Improving the precision of legal precedent retrieval can drastically reduce the time and resources dedicated to manual research, allowing professionals to focus on more complex aspects of legal analysis. The ability of a system to autonomously adapt and improve its search rules is a step forward towards more robust AI systems that are less dependent on constant human intervention for optimization.

Prospects for On-Premise Deployments

The requirement for a "high-capacity core LLM" to maximize the framework's performance brings significant infrastructural considerations. For companies prioritizing data sovereignty and regulatory compliance, deploying such LLMs in on-premise or air-gapped environments becomes a priority. This entails investments in dedicated hardware, such as GPUs with high VRAM, and the management of local stacks for inference and training, even if the framework itself does not require parameter training.

While the framework does not need additional training for its own rules, the choice and management of the underlying LLM remain crucial. Evaluating the Total Cost of Ownership (TCO) for an on-premise deployment of a large LLM, which includes hardware acquisition, energy, cooling, and maintenance costs, is a decisive factor. AI-RADAR specifically focuses on these trade-offs, offering analyses and frameworks to help decision-makers navigate the complexities of self-hosted LLM deployments, ensuring data control and security.