PQR: A Framework for Evaluating LLM Agents with Realistic Queries
Evaluating LLM-based agents is a complex challenge, often requiring significant human effort to identify meaningful failure scenarios. PQR is a new framework that overcomes the limitations of previous approaches, focusing on automatically generating ...