The results of the AIME 2026 benchmark have been published, highlighting how both proprietary and open-source models have achieved scores above 90%.

DeepSeek V3.2: cost efficiency

A particularly interesting aspect is the performance of DeepSeek V3.2, which completed the entire AIME 2026 test with a cost of only $0.09. This suggests a significant optimization in terms of computational resources required to run the model.

Relevance for the LLM world

These results are significant for the community that develops and uses large language models (LLMs), as they demonstrate that high performance can be achieved with low costs. For those evaluating on-premise deployments, there are trade-offs to consider carefully, as highlighted by the analytical frameworks available on /llm-onpremise.