A crack in the “pay for the best” paradigm

For years the message was simple: if you want the most capable model, pay for a closed API. If you want something cheaper, accept a performance hit. Today that paradigm is showing clear cracks. Looking at recent releases, the tradeoff is dissolving, as seen on the model map: the upper-left quadrant – high intelligence at low cost – is increasingly crowded with open-weight models. DeepSeek, Qwen, GLM, Kimi, MiniMax: projects that a year ago were outsiders are now competing for a space that, according to market dynamics, was reserved for model-as-a-service.

The new model geography: cheaper, almost as good

The most interesting aspect isn’t the race to the absolute top. It’s the emergence of a competitive area where the capability gap between a frontier model and an open model is becoming smaller than the cost difference. For the vast majority of real-world workloads, you don’t need the best model on Earth. You need an LLM that’s good enough and cheap enough. And that’s exactly where open models are becoming extremely competitive. The question is no longer “how smart is it?”, but “how much does that extra 5% of capability cost?”.

Self-hosting: the economic return of control

This evolution has direct implications for those evaluating on-premise deployment. If the open model delivers comparable performance at a fraction of the cost, the TCO calculation changes dramatically. Closed APIs offered zero infrastructure and immediate reliability, but couldn’t compete on issues dear to companies handling sensitive data: full control, verifiable privacy, no vendor lock-in. Open models flip the perspective: you can run inference on hardware you own, apply quantization to optimize VRAM, and get predictable costs without per-token pricing surprises. Self-hosted is no longer just an ideological choice, but an economically rational option.

Why the closed API risks becoming a luxury

Of course, closed models retain advantages: fast access to cutting-edge capabilities, no infrastructure management, lower operational overhead. But when the quality gap shrinks, paying 10 times more for a marginal improvement becomes hard to justify. The prediction points to the next 12–18 months: businesses won’t just ask which model is smartest, but why spend a multiple for an often imperceptible advantage. And they’ll closely check comparisons with the open-weight counterpart, perhaps running on consumer GPU clusters or enterprise-grade GPU servers.

In this scenario, data sovereignty and model customization become strategic levers, not afterthoughts. For AI-RADAR readers, accustomed to evaluating local stacks, the signal is clear: the market is shifting under the feet of big API providers, while the open ecosystem gears up to offer – at low cost – what until yesterday was a luxury good.