## Inference beats Training: vLLM leads the transition The announcement of the $150 million seed round for vLLM (Inferact), with an $800 million valuation, marks a turning point in the artificial intelligence landscape. For the previous two years, investments have focused primarily on training foundation models and building massive computing clusters. Now, the bottleneck has shifted to inference, i.e., the efficiency with which these models are used. This change validates several trends observed in the open-source community: * **Software > Hardware:** Buying more H100 GPUs is no longer enough. You need an efficient software stack (PagedAttention, specialized kernels) to take full advantage of them. Software optimization for inference has become crucial. * **The Standardization Race:** vLLM aims to become the "Linux of Inference", the default engine for operating on NVIDIA, AMD, and Intel architectures. It remains to be seen whether, with these resources, they will focus on horizontal compatibility (making AMD/Intel usable) or vertical optimization (further reducing latency on CUDA). The main challenge is no longer throughput (batched tokens), but latency, especially cold start times and time-to-first-token.