A simulation pitted 12 large language models (LLMs) against each other in managing a virtual food truck, with an initial budget of $2,000. The goal was to assess their ability to make autonomous decisions on crucial aspects such as location selection, menu design, pricing strategy, staff management, and inventory control.
Surprising Results
Of the 12 participating LLMs, only 4 managed to avoid bankruptcy during the 30-day simulation. One model, in particular, generated a profit of $49,000. An interesting finding is that all models that opted for a loan failed, suggesting a difficulty in debt management.
Gemini and Decision Loops
Another noteworthy result concerns Gemini 3 Flash Thinking, which proved prone to getting stuck in infinite decision loops, making it impossible to complete the simulation. This stability issue emerged in 100% of the tests performed with this model.
A Testing Ground for AI
The simulation also offers a playable mode, allowing users to try their hand at managing the virtual food truck and compare their performance with that of the LLMs. This type of benchmark can be useful for evaluating the capabilities of AI agents in business contexts and identifying areas for improvement.
๐ฌ Comments (0)
๐ Log in or register to comment on articles.
No comments yet. Be the first to comment!