Satya Nadella has issued a warning that sounds almost paradoxical: the CEO of one of the companies dominating the AI market today is calling for the profits generated by this technology not to be absorbed by a handful of players. The statement, reported by AFP, comes without technical details or specific figures, yet it carries significant weight precisely because it comes from the top of Microsoft, one of the three major cloud providers on which the vast majority of LLM workloads currently run.

The specter of oligopoly in AI

Nadella’s message touches on a raw nerve in the ecosystem: the development and delivery of large language models is structuring itself around a few platforms capable of offering compute capacity at a planetary scale. Companies like Microsoft (with Azure and the OpenAI partnership), Amazon, and Google control access to the most powerful GPUs, datasets, orchestration frameworks, and inference pipelines. For enterprises looking to adopt generative AI, the easiest path inevitably passes through the cloud services of these vendors.

The risk is twofold. On one hand, economic concentration: if the bulk of the value generated by AI flows into the coffers of a very small number of players, it reduces incentives for widespread innovation and raises barriers for startups and independent labs. On the other hand, there is technological dependency: an organization that offloads all its inference and fine-tuning workloads onto a single provider becomes tightly coupled to a proprietary ecosystem, making any future switch complex and expensive.

Self-hosted and on-premise: the path to decoupling

In this context, on-premise deployment — or more broadly self-hosted setups — represents not just an architectural choice but a strategic lever. Running LLMs on your own infrastructure, whether in a corporate data center or air-gapped environments, allows you to maintain data control, reduce latency for critical use cases, and, above all, free operational costs from the per-token pricing dictated by cloud services.

This is not a road without obstacles, of course. The hardware required to serve models with tens of billions of parameters demands GPUs with dozens of gigabytes of VRAM, multi-node configurations with fast interconnects, and an upfront investment (CapEx) that can be daunting. Techniques like quantization (INT8, FP8) and runtime optimization are lowering the entry threshold, but the operational management of update pipelines, security, and scaling remains a specialized skill set.

AI-RADAR’s read: trade-offs and sovereignty

For those weighing a shift from cloud to self-hosted, the key variable is TCO over a time horizon of at least three years. The service costs for inference on high-end models, multiplied by millions of monthly calls, can quickly surpass the expense of purchasing and maintaining a cluster of GPU nodes. Add to this sovereignty and compliance requirements, which are increasingly pressing in sectors like healthcare, finance, and public administration, where GDPR and local regulations demand that data remain confined to specific jurisdictions.

Nadella’s warning is not just about the distribution of profits; it touches on who will control the cognitive infrastructure of the next decade. One possible answer is precisely a multiplication of local, distributed, and interoperable inference nodes, capable of reducing the gatekeeping power of the large providers. On AI-RADAR you can find analytical tools and decision frameworks to map these trade-offs in the section dedicated to on-premise deployments.

Beyond rhetoric: a structural challenge

The words of a CEO like Nadella should not be dismissed as a simple stylistic exercise. They reflect a growing awareness: the current trajectory risks becoming unsustainable even for the very actors who are benefiting from it today. If the AI ecosystem closes in on itself, aggregate demand could slow, suffocated by excessive prices and a lack of alternatives. The opposite direction — a more fragmented market with hybrid and on-premise deployments — could instead generate more distributed value, accelerate vertical innovation, and create the conditions for truly mass adoption. In this sense, the real proving ground will be the ability of enterprises to shoulder their own AI infrastructure, with all that it entails in terms of skills and investment.