AI-RADAR.IT · AI-RADAR.NET · AI-RADAR.TECH

News & analysis on local LLMs, stack & on-prem hardware.

📁 LLM AI generated

AI Is Acing Math Exams Faster Than Scientists Write Them

Published on 2026-02-25 19:07 🏆 IEEE Spectrum 📰 Read the original source article →

L'AI supera gli umani nei test di matematica a livello di dottorato

Artificial intelligence is making significant progress in the field of mathematics, solving complex problems at a surprising pace. Traditional mathematical benchmarks are struggling to keep up with these advances.

Frontier Math: a new challenge

Epoch AI has introduced Frontier Math, a rigorous benchmark designed to assess the mathematical reasoning capabilities of the latest AI tools. This test includes advanced math problems, divided into levels of increasing difficulty. The most advanced AI models, such as ChatGPT 5.2 Pro and Claude Opus 4.6, solve over 40% of the problems in the first three tiers and over 30% of the problems in the most advanced tier.

Aletheia and PhD-level mathematics

Recently, Google DeepMind announced that Aletheia, an experimental AI system derived from Gemini Deep Think, has achieved publishable PhD-level research results. Although the specific mathematical problem is niche, the result is significant for AI development. Aletheia operated in an essentially autonomous manner, without human guidance, and produced a new result.

The First Proof challenge

To address the need for more challenging benchmarks, a group of mathematicians proposed the First Proof challenge, a series of extremely difficult math problems. No one was able to provide correct solutions to all the problems by the deadline. OpenAI, with limited human supervision, managed to solve five of the ten problems.

A new frontier for AI

Epoch AI has introduced Frontier Math: Open Problems, a pilot benchmark consisting of open problems from research mathematics that professional mathematicians have unsuccessfully tried to solve. None of these problems have yet been solved by an AI. These new approaches aim to assess the capabilities of AI in mathematical areas of interest to researchers.

AI-Radar Takeaway

Artificial intelligence systems are rapidly improving in solving complex mathematical problems, surpassing the capabilities of scientists in some areas. New benchmarks are needed to assess the true capabilities of AI, as existing ones quickly become obsolete. Google DeepMind announced that Aletheia, an experimental AI system, has achieved publishable PhD-level results.

🤖 Ask AI about this

Want to dive deeper? Read the full article from the source:

📖 READ THE ORIGINAL ARTICLE

💻 Need GPU Cloud Infrastructure?

For running LLM inference, training models, or testing hardware configurations, check out this platform:

Vast.ai GPU Marketplace

Decentralized GPU marketplace with ultra-competitive pricing. Rent from a global network of providers. Perfect for experimentation, development, and cost-optimized workloads.

✓ Lowest prices ✓ Global network ✓ Flexible options

🔗 This is an affiliate link - we may earn a commission at no extra cost to you.

AI-RADAR NEWSLETTER

Stay ahead — get AI signals in your inbox

Daily or weekly digest of the most important AI news. 160+ readers, no spam.

💬 Comments (0)

🔒 Log in or register to comment on articles.

No comments yet. Be the first to comment!

🔍 Continue Exploring

Explore LLM On-Premise

Complete guide to running AI models locally: hardware, stack, and privacy.

AI models are starting to crack high-level math problems

AI models are starting to crack high-level math problems

AI models, starting with GPT 5.2, are demonstrating increasing capabilities in solving complex mathematical problems. The impact of these tools is being felt in

AI Model Attempts High-Level Math Challenges

AI Model Attempts High-Level Math Challenges

An artificial intelligence model tackles the First Proof math challenge, a competition testing reasoning capabilities on complex problems. The initiative aims t

AI models still struggle with math, but less than before

AI models still struggle with math, but less than before

According to the ORCA test, current large language models (LLMs), while improving, remain prediction engines and do not always provide the correct solution to m

Stilla emerges from stealth with $5M to boost AI collaboration

Stilla emerges from stealth with $5M to boost AI collaboration

Stockholm-based Stilla has raised $5 million to develop a platform that enhances collaboration between people and AI systems. The goal is to provide an intellig

GPT-5.5 Instant Raises the Bar for Health AI, but On-Prem Remains a Challenge

GPT-5.5 Instant Raises the Bar for Health AI, but On-Prem Remains a Challenge

OpenAI introduces GPT-5.5 Instant, optimized for ChatGPT's health and wellness responses with stronger reasoning, better context, and physician-informed evaluat

More in LLM

Even Google believes in small coding models

SpectralQuant narrows the Q4_K_M quantization gap to 96.5%: a leap for local models

Two new AI tools from Tokyo and Beijing fill the gap left by Anthropic's export ban

ConlangCrafter: The AI That Invents Imaginary Languages (and Could Teach Us How We Think)

Orthrus brings diffusion head to Qwen 3.5/3.6 and Gemma 4: open-source code dropping soon

Qwen Fine-tunes: Why Optimized Models Struggle to Impress

→ View all in LLM →

AI-Radar LLM On-Premise

Complete guide to running AI models locally: hardware, stack, privacy, and reference architectures.

👥 Join 160+ AI explorers

A free community of developers, engineers and AI enthusiasts following local AI daily.

Register free → Already a member? Log in