Mark Zuckerberg admitted to Meta employees that the company's AI agents are not progressing as quickly as expected. The statement comes four months after an internal reorganization intended specifically to accelerate the development of agentic AI. The news, reported by The Next Web, reveals a sensitive phase: even for a giant like Meta, turning Large Language Models into reliable autonomous agents is proving more complex than anticipated.

AI assistants capable of acting proactively—booking appointments, writing code, handling transactions—require far more than a language model trained to complete sentences. They need planning, long-term memory, the ability to interact with external tools, and strict control over the actions taken. Each step of an agent must be generated, validated, and executed, often chaining dozens of calls to the model. This multiplies latency and computational demand compared to traditional inference, where the model produces a one-off response.

For those building or managing on-premise infrastructure, these difficulties have concrete weight. Running agentic pipelines on local hardware means dealing with limited VRAM resources, context windows that explode as dialogue turns accumulate, and the need to optimize inference through aggressive quantization techniques. An agent that must repeatedly call a model for a single task can quickly saturate a consumer GPU or server workstation, complicating proper workload sizing.

Meta’s experience, while operating at a cloud scale that is hard to replicate, signals how premature it is to consider these systems ready for widespread use. Even in self-hosted mode, companies experimenting with frameworks like LangChain or AutoGPT often face unpredictable behaviors, decision loops, and energy consumption costs higher than expected due to the length of chained reasoning.

The Total Cost of Ownership of an on-premise agentic infrastructure can grow faster than a simple token count suggests, because real-world usage leads to inference sequences much longer than lab experiments. Without careful memory management and model interaction, even systems based on Llama or Mistral can demand memory far beyond initial estimates, hitting VRAM limits that force compromises on the number of parallel users served.

Zuckerberg’s statement does not deny the strategic importance of agents, but makes it clear that the road to truly autonomous systems is still long. For the on-premise ecosystem, this means there is time to refine tools and best practices, but also that upcoming model releases will need to show significant progress in handling multi-step tasks to deliver concrete advantages outside hyperscale data centers.