LLMs and Theory of Mind: A Comparative Study

Recent research has explored whether Large Language Models (LLMs) possess a "Theory of Mind" (ToM), that is, the ability to infer the beliefs, intentions, and emotions of others from a text. The study questions the real understanding of these models, considering that they are trained on linguistic data without direct social interactions.

Methodology and Results

The researchers evaluated five LLMs, comparing them with a human control group, using an adapted version of a test widely used in ToM research. The test consisted of answering questions about the beliefs, intentions, and emotions of the characters in some stories.

The results highlighted a performance gap between the models. Smaller and older models proved sensitive to the number of inferential cues available and vulnerable to the presence of irrelevant information. In contrast, GPT-4o showed high accuracy and robustness, achieving performance comparable to humans even in the most complex conditions. This result fuels the debate on the cognitive status of LLMs and the distinction between genuine understanding and statistical approximation.