If until yesterday enterprise AI seemed like an all-you-can-eat buffet, today the bill is starting to hurt. Internal recordings from Accenture obtained by 404 Media reveal a phenomenon making CFOs tremble: token spending is out of control and the biggest consumers are not developers, but colleagues from marketing, admin and sales.

Token binge eating: the ugly truth

In the crosshairs are seemingly harmless tasks like converting PDFs to presentations or markdown. "Turning PDFs into markdown: is that right?" Stuart Henderson of Accenture asks during an internal meeting, after seeing the data. The answer is an embarrassed yes. Justice Kwak, strategy lead for agentic AI, confirms: most consumption comes not from engineers but from non-technical profiles using tools like Claude Code, Copilot and Cursor to automate every minor office task.

The most extreme case is Uber: the CTO admitted blowing the entire annual AI budget in just four months. After pushing employees to use AI as much as possible, the company scrambled to impose strict limits. Walmart also hit the brakes after a surge in demand.

The end of the binge: the race to rescue

The dynamic is triggered by a model shift: providers like GitHub no longer offer flat subscriptions but charge per token. Without pre-defined budgets or access tiers, any employee can generate thousands of API calls for a trivial task. "Spending is becoming material to the cost structure, and completely unpredictable," says Kwak, noting how CFOs and COOs are starting to ask whether the game is worth the candle.

Accenture is rushing to fix this with a product called "Token IQ," designed to attribute token-level cost to real project outcomes. The idea is to give companies a granular lens on AI spend, overcoming the aggregate visibility that today makes it impossible to understand if those millions of tokens turned into slides are actually generating value.

AI without reins: the control conundrum

The episode marks a watershed: the phase of indiscriminate generative AI adoption is over. After pushing for rapid integration, large consultancies and their clients discover that scaling AI is not just about switching on subscriptions. Workflow automation with agents and horizontal use across the organization multiply costs exponentially, far beyond projections.

For those involved in deployment, the message is clear: without consumption governance, the cloud can become a bottomless pit. This is where the on-premise perspective regains ground. Pay-per-token APIs offer flexibility but make spending opaque and volatile. A self-hosted infrastructure, with LLMs running on dedicated hardware, transforms the cost into a predictable investment, with manageable TCO and full data sovereignty.

AI-RADAR: the push toward on-premise

The explosion of tokenized costs brings back into vogue the reasoning that AI-RADAR has long followed: when the monthly bill depends on every prompt from an employee, control slips away. On-premise hosting, whether a GPU cluster or a server with a quantized LLM, restores financial predictability and granular governance. Of course, it requires in-house skills and upfront investment, but it eliminates the risk of budgets burned on low-value tasks.

The story is not new: every enterprise technology goes through a Wild West phase before crashing into cost reality. Accenture’s "token ops" is a symptom of necessary maturation. While cloud providers push for unlimited consumption, the more discerning companies are beginning to evaluate hybrid or fully on-premise architectures, where the marginal cost of a single request tends to zero once the hardware is amortized.

The tokenpocalypse is not just an expense note problem: it’s the alarm bell pushing toward a more sober management and, for many, toward self-hosting.