In a market dominated by Nvidia and its GPUs for artificial intelligence, the news that SambaNova Systems is targeting a $10 billion valuation is not a mere financial announcement. It is a thermometer of how voracious the appetite for cheaper Large Language Model (LLM) inference has become, and how the industry is desperately seeking viable alternatives.

The California-based company, founded by parallel computing and Silicon Valley visionaries, is known for a radically different approach: instead of adapting GPUs to AI workloads, it builds processors called Reconfigurable Dataflow Units (RDUs) from the ground up, optimized to run large models with significantly lower energy consumption and cost per token compared to traditional graphics accelerators.

The rumored funding round – expected imminently – represents a turning point. Ten billion dollars is not a random threshold: it signals that investors are betting on a technology capable of challenging Nvidia's dominance in the inference segment, where companies spend the bulk of their AI operational budgets.

Why token cost has become the new battleground

Until recently, the debate centered on training: who had the largest cluster, who could race a model with hundreds of billions of parameters. Today, with the proliferation of open-source models and the widespread deployment of LLM-based assistants, inference – that is, using the model to respond to each individual query – has become the dominant expense. It is no coincidence that the market looks favorably on solutions that promise to slash token costs.

SambaNova enters this context with a precise architectural bet: a software-reconfigurable dataflow that adapts the layout of transistors to the model's topology. It is not just a chip. It is an entire system, DataScale, which includes hardware, software, and networking, designed for enterprise deployment. The company has built a solid reputation among research labs and large banks, where the need to run LLMs without relying on the public cloud is vital for data sovereignty and latency.

Beyond the GPU monopoly: on-premise finds new allies

Those operating in on-premise or air-gapped environments – banks, healthcare facilities, government agencies – know the problem well: high-end Nvidia GPUs are not only expensive, but often difficult to obtain and cool. A system like DataScale promises to handle inference workloads with a smaller physical footprint and lower energy consumption, reducing the Total Cost of Ownership (TCO) over a multi-year scale.

It must not be forgotten that TCO includes more than hardware. It encompasses energy, maintenance, rack space, and most importantly, the cost of specialized personnel. In this sense, SambaNova's offering – which proposes an integrated stack and a near-managed experience – aims to simplify operations, making on-premise not just a compliance obligation but an economically sustainable choice.

The pursuit of a ten-figure valuation suggests that customers are beginning to embrace these logics. For those tracking local deployment dynamics, it is a sign that GPU alternatives are leaving the experimental phase and approaching large-scale adoption.

The signal for the industry: there is no single silicon for AI

The news also carries a broader message: the era of 'one-size-fits-all' silicon for AI is coming to an end. Just as Nvidia leveraged its CUDA architecture for training, new players are carving out niches – and not small ones – in inference. SambaNova is not alone: Graphcore, Cerebras, Groq, and others are all competing to prove that a dedicated architecture can beat general-purpose.

For on-premise AI watchers, this multiplication of options is good news. It means greater bargaining power, lower prices, and the ability to choose the right hardware for the workload, without being locked into a single vendor ecosystem. However, it also introduces complexity: the lack of established standards and the need to thoroughly evaluate each solution are real hurdles. Analytical frameworks such as those AI-RADAR dedicates to comparing on-premise options then become essential to navigating the transition.

Beyond the valuation: what to watch in the coming months

The ten-billion figure is only a starting point. Investor interest will have to translate into contracts, implementations, and documented use cases. The next move will be to observe whether SambaNova can scale production and meet demand without hiccups. In a sector where announcements of miracle chips have often clashed with the harsh reality of fabrication, the proof of the pudding is always volume availability.

For the Italian and European ecosystem – increasingly attentive to digital sovereignty and reducing dependencies – the prospect of accessing modern, cost-effective inference hardware could accelerate plans for LLM adoption in sectors such as public administration, manufacturing, and healthcare. It will be interesting to see whether promises of low energy consumption translate into concrete savings on local data center electricity bills.