Perplexity (PPL) – LLM Glossary

Perplexity is the exponentiated average negative log-likelihood of a text sequence under the model. Intuitively, it measures how "surprised" the model is by the text — a perfect model that always assigns probability 1 to the correct next token has PPL = 1.

Formula

PPL = exp( -1/N × Σ log P(token_i | context) )

A model with PPL = 5 on a dataset is "choosing between 5 equally likely next tokens on average." Lower is better.

PPL as a Quantization Quality Metric

Format	Llama 3 8B PPL (WikiText-2)	Degradation vs FP16
FP16	6.14	—
Q8_0	6.17	+0.5%
Q6_K	6.20	+1.0%
Q5_K_M	6.25	+1.8%
Q4_K_M	6.35	+3.4%
Q3_K_M	6.73	+9.6%
Q2_K	8.10	+32%

Limitations of PPL as a Quality Proxy

PPL measures general language fluency. A 3% perplexity increase from Q4 quantization may cause no perceptible degradation on most chat or instruction-following tasks, but a 30% increase (Q2) shows clearly in coherence. For task-specific evaluation, always measure on your actual use case (benchmark accuracy, ROUGE, human preference) rather than relying solely on PPL.