TruthfulQA – LLM Glossary

TruthfulQA (Lin et al., 2022) probes whether language models generate truthful answers or confidently repeat common human misconceptions, pseudoscience, conspiracy theories, and urban legends. It operationalises truthfulness as "only asserting things the model has reason to believe are true."

Structure

Property	Detail
Questions	817 (38 categories)
Task type	Open-ended generation (evaluated by GPT-4 judge) OR multiple choice (MC1/MC2)
Categories	Conspiracies, misconceptions, health, law, finance, fiction, politics
Metric	% Truthful × % Informative (jointly)

The "Imitative Falsehood" Problem

Larger models trained on more human text can actually score lower on TruthfulQA — because they more faithfully replicate popular but false human beliefs. GPT-3 175B scored worse than GPT-3 6.7B on this benchmark at release, a counter-intuitive scaling failure.

Scores (MC1, 0-shot)

Model	Truthful (%)
Human	94%
GPT-4o	86.8%
Claude 3 Opus	88.5%
Llama 3.1 70B	82.1%
Llama 3.1 8B	69.3%

Why It Matters for On-Premise

In enterprise or professional on-premise deployments, false confident answers (hallucinations) carry real business risk. Low TruthfulQA scores in your on-premise model should be compensated with RAG that grounds answers in verified documents, or with output review workflows for high-stakes decisions.