LLMs and mathematics: a complicated relationship

Today's AI models, especially LLMs, are based on prediction engines. This means that they tend to identify the most likely solution to a problem, which is not necessarily the correct one, especially in mathematics.

Progress and limitations

Despite the progress made, the most popular models still show shortcomings. Even Gemini 3 Flash, considered one of the most advanced models, would barely pass if evaluated based on its mathematical abilities. This highlights how calculation ability remains a weak point for these architectures.

For those evaluating on-premise deployments, there are trade-offs to consider. AI-RADAR offers analytical frameworks on /llm-onpremise to evaluate these aspects.