LiveCodeBench – LLM Glossary

LiveCodeBench (Jain et al., 2024) addresses a fundamental flaw in static code benchmarks: training contamination. Because LeetCode and Codeforces problems are widely distributed online, models trained after their release can memorise solutions. LiveCodeBench continuously collects new problems released only after each model's training cutoff, ensuring evaluation reflects genuine coding ability.

Design Philosophy

Property	Detail
Sources	LeetCode (contest), Codeforces, AtCoder
Update frequency	Monthly (new contests added continuously)
Contamination prevention	Only problems released after model cutoff are used per model
Difficulty range	Easy / Medium / Hard (per platform ratings)
Metric	pass@1 (execution against hidden test cases)

Why Contamination Matters

A model trained on data up to December 2024 may have seen solutions to all LeetCode problems published before that date — making its HumanEval and static LeetCode performance misleadingly high. LiveCodeBench's rolling window approach makes it one of the most trustworthy coding benchmarks available.

Scores (pass@1, Hard subset)

Model	Hard pass@1
o3 (2025)	69.8%
Claude 3.7 Sonnet	56.1%
GPT-4o	35.1%
DeepSeek-R1	52.7%
Llama 3.1 70B	22.4%