A simulation pitted 12 large language models (LLMs) against each other in managing a virtual food truck, with an initial budget of $2,000. The goal was to assess their ability to make autonomous decisions on crucial aspects such as location selection, menu design, pricing strategy, staff management, and inventory control.

Surprising Results

Of the 12 participating LLMs, only 4 managed to avoid bankruptcy during the 30-day simulation. One model, in particular, generated a profit of $49,000. An interesting finding is that all models that opted for a loan failed, suggesting a difficulty in debt management.

Gemini and Decision Loops

Another noteworthy result concerns Gemini 3 Flash Thinking, which proved prone to getting stuck in infinite decision loops, making it impossible to complete the simulation. This stability issue emerged in 100% of the tests performed with this model.

A Testing Ground for AI

The simulation also offers a playable mode, allowing users to try their hand at managing the virtual food truck and compare their performance with that of the LLMs. This type of benchmark can be useful for evaluating the capabilities of AI agents in business contexts and identifying areas for improvement.