Large language models (LLMs) are increasingly used for mental-health support, but current evaluation methods often fail to capture the clinically critical dimensions of psychotherapy.

TherapyGym: A New Framework

TherapyGym is a framework designed to evaluate and improve therapy chatbots, focusing on two key aspects: clinical fidelity and safety. Fidelity is measured using the Cognitive Therapy Rating Scale (CTRS), implemented as an automated pipeline that scores adherence to CBT techniques over multi-turn sessions. Safety is assessed using a multi-label annotation scheme, covering therapy-specific risks, such as failing to address harm or abuse.

Bias Mitigation and Training

To mitigate bias and unreliability in LLM-based judges, TherapyJudgeBench, a validation set of dialogues with expert ratings, has been released. TherapyGym also serves as a training harness, using CTRS and safety-based rewards to drive reinforcement learning with configurable patient simulations. Models trained in TherapyGym show improved clinical fidelity scores, both under expert and LLM evaluation.