A user shared on Reddit the results of a benchmark called MineBench, focused on spatial reasoning, comparing the performance of Qwen 3 Max-Thinking and Qwen 3.5.
Benchmark Results
The results indicate a significant improvement by Qwen 3 Max-Thinking. According to the benchmark author, some builds of Qwen 3.5 have proven competitive with high-end models such as Opus 4.6, GPT-5.2, and Gemini 3 Pro.
MineBench
MineBench is a benchmark created to evaluate the spatial reasoning capabilities of language models. The source code and further details about the benchmark are available on GitHub.
๐ฌ Comments (0)
๐ Log in or register to comment on articles.
No comments yet. Be the first to comment!