A recent test challenged the reasoning abilities of 53 AI models by asking a seemingly simple question: "I want to wash my car. The car wash is 50 meters away. Should I walk or drive?". The correct answer, of course, is to drive, as the car needs to be taken to the car wash.

Surprising results

The results were surprising. Many models, including Llama 3.1 8B, Llama 3.3 70B, Mistral Small/Medium/Large, and DeepSeek v3.1/v3.2, suggested walking. Only GLM-5 and Kimi K2.5 (closed source models) provided the correct answer.

Performance analysis

  • Anthropic: 1 correct answer out of 9 (only Opus 4.6)
  • OpenAI: 1 correct answer out of 12 (only GPT-5)
  • Google: 3 correct answers out of 8 (only Gemini 3 models)
  • xAI: 2 correct answers out of 4 (Grok-4)
  • Perplexity: 2 correct answers out of 3 (with incorrect reasoning)
  • Meta (Llama): 0 correct answers out of 4
  • Mistral: 0 correct answers out of 3
  • DeepSeek: 0 correct answers out of 2

Interestingly, Perplexity models provided the correct answer, but based on incorrect reasoning, citing EPA studies and arguing that walking burns calories, requiring energy for food production, thus making walking more polluting than driving for 50 meters. This highlights how some models can arrive at the right answer through unconventional and questionable reasoning.

This test demonstrates that, despite advancements in the field of AI, basic reasoning abilities remain a significant challenge for many models.