Large language models (LLMs) are increasingly used for mental-health support, but current evaluation methods often fail to capture the clinically critical dimensions of psychotherapy.
TherapyGym: A New Framework
TherapyGym is a framework designed to evaluate and improve therapy chatbots, focusing on two key aspects: clinical fidelity and safety. Fidelity is measured using the Cognitive Therapy Rating Scale (CTRS), implemented as an automated pipeline that scores adherence to CBT techniques over multi-turn sessions. Safety is assessed using a multi-label annotation scheme, covering therapy-specific risks, such as failing to address harm or abuse.
Bias Mitigation and Training
To mitigate bias and unreliability in LLM-based judges, TherapyJudgeBench, a validation set of dialogues with expert ratings, has been released. TherapyGym also serves as a training harness, using CTRS and safety-based rewards to drive reinforcement learning with configurable patient simulations. Models trained in TherapyGym show improved clinical fidelity scores, both under expert and LLM evaluation.
๐ฌ Comments (0)
๐ Log in or register to comment on articles.
No comments yet. Be the first to comment!