Introduction
A new framework has been introduced for evaluating the consistency-accuracy relation of LLMs under controlled input variations. The framework proposes a global metric that combines the CAR curve to quantify the trade-off between accuracy and consistency.
How CAT works
The CAT (Consistency-Accuracy Relation) is a reference frame that visualizes how model accuracy varies with increasing consistency requirements, as defined by the MCA metric. The framework also proposes the CORE index, a global metric that combines the area and shape of the CAR curve to quantify the trade-off between accuracy and consistency.
Application of CAT
The CAT has been applied to a diverse set of generalist and domain-specific LLMs, evaluated on multiple MC benchmarks. The result has demonstrated the effectiveness of the framework in evaluating consistency-accuracy of LLMs.
Extension of CAT
The CAT can be extended to support long-form, open-ended evaluations through adaptable scoring functions.
๐ฌ Comments (0)
๐ Log in or register to comment on articles.
No comments yet. Be the first to comment!