This isn’t just another mega-round. The $800 million injection into Together AI, led by Aramco Ventures and backed by names like Nvidia, Vista Equity Partners, and General Catalyst, signals that the market for open-source model inference and training in the cloud has outgrown its niche phase. The company’s valuation, now exceeding $8 billion, stems from an acceleration that Together AI itself quantifies as over $1 billion in revenue generated so far – a figure that, despite its vagueness, captures enterprises’ appetite for alternatives to proprietary giants.

The platform provides on-demand access to LLMs such as Llama, Mistral, and Falcon, hosting them on GPUs and managing orchestration, scalability, and APIs. For many teams, this is the shortcut that avoids tackling hardware provisioning and pipeline setup. Yet the European context demands a more layered reflection.

The data residency knot

Whenever a company bound by GDPR or sector-specific rules (healthcare, finance, defense) picks a cloud provider, it must ask where the model weights and user data physically reside. Together AI operates across distributed data centers, but guaranteeing data residency within EU borders isn’t automatic: it requires contractual agreements, technical audits, and often a level of trust that not all organizations can extend. This explains why many Italian enterprises are exploring on-premise or hybrid deployments, keeping hardware under their control and ensuring sensitive data latency doesn’t cross uncertain jurisdictional lines.

GPUs, TCO, and the open-source push

The record funding also shows that real demand for open-source LLM compute is surging. Yet those evaluating self-hosting know that GPU cost isn’t the only variable: VRAM, memory bandwidth, and energy consumption dictate the line between a smooth deployment and an unsustainable one. Together AI absorbs this complexity and turns it into APIs, but in the long run, for predictable workloads and confidential data, the TCO of an on-premise cluster with quantized models can flip the convenience equation.

In this picture, Nvidia’s presence among investors is no side note. The chipmaker has every interest in seeing GPU consumption explode, whether via cloud or private data centers. The side effect, however, is a supply scramble that could extend lead times even for those building local infrastructure.

For organizations facing the deployment decision, AI-RADAR has developed analytical frameworks on /llm-onpremise that help map these trade-offs without ideological shortcuts. The rise of platforms like Together AI doesn’t close the chapter: it makes the picture more nuanced, because the real watershed isn’t between cloud and on-premise, but between those who can afford to delegate data sovereignty and those who cannot.