New Custom V100 GPUs for On-Premise Deployments

The hardware landscape for artificial intelligence workloads continues to evolve, with increasing focus on solutions that balance performance, cost, and deployment flexibility. Recently, a video published on the Chinese platform Bilibili revealed the existence of highly customized NVIDIA V100 cards, designed with a unique form factor: single-slot and half-height. These GPUs, integrating NVLink technology, promise to maintain the full performance of the original core, opening new perspectives for on-premise deployments, particularly for scenarios with space and power constraints.

The project, attributed to a creator known as “显卡仙人” (translatable as “the GPU god”), is not yet widely available for purchase, but initial pre-orders suggest significant interest. The emergence of such custom solutions highlights the demand for specialized hardware that can adapt to existing infrastructures or specific requirements, away from standard formats often designed for large data centers. For CTOs and infrastructure architects, these innovations represent an opportunity to optimize Total Cost of Ownership (TCO) and computational density.

Technical Details and Power Options

The customized V100 cards stand out for their compact dimensions: 16 centimeters in length and 7.5 centimeters in height. This single-slot, half-height form factor is particularly relevant for servers with limited PCIe slots or for edge systems where space is a critical resource. The peculiarity of these cards lies in the fact that they are not mere adapters, but GPU cores soldered onto custom-designed PCBs (Printed Circuit Boards), thus ensuring the integrity and stability of performance.

A crucial aspect is power and cooling management. The basic version is designed for passive cooling and is powered exclusively via the PCIe slot, with a maximum consumption of 75W. This configuration makes it ideal for low-power systems. However, an alternative version with an external power connector is also planned, capable of supporting up to 300W, thereby unlocking the GPU's full potential for more intensive workloads. Both variants have been tested with benchmarks that, according to the video, confirm the retention of V100 core performance. In terms of memory, the 16GB VRAM version is initially planned for sale, with a 32GB VRAM variant under development.

Implications for On-Premise Deployments and TCO

For companies evaluating on-premise deployments or self-hosted solutions for Large Language Models (LLM) and other AI workloads, the introduction of GPUs like these custom V100s can have a significant impact. The ability to achieve high performance in such a small form factor and with flexible power options offers greater freedom in infrastructure design. This is particularly true for scenarios requiring data sovereignty, air-gapped environments, or where latency is critical, making the cloud less attractive.

The estimated price for the 16GB version, approximately ¥1500 (equivalent to about $220 USD), is a decisive factor. Such a low cost for a data center-class GPU, albeit of a previous generation, can drastically reduce the initial CapEx for implementing inference clusters or developing prototypes. This translates into a potentially lower TCO compared to purchasing newer generation cards or using cloud services, especially for workloads with consistent and predictable usage. The availability of 16GB and 32GB VRAM options also allows for scaling capabilities based on specific LLM model requirements, balancing cost with the need to handle different model sizes or larger batch sizes.

Future Prospects and Final Considerations

Although the product is not yet widely available, its existence and the interest generated by pre-orders indicate a clear market direction: the search for more accessible and adaptable hardware solutions. Innovation in custom GPUs, which reuse existing cores on optimized PCBs, can democratize access to advanced computational capabilities, making them available to a wider audience of developers and companies with more limited budgets.

For technical decision-makers, it is crucial to carefully evaluate the trade-offs. While cost and form factor are extremely advantageous, aspects such as software support, warranty, and long-term availability of unofficial products must also be considered. However, the emergence of these solutions underscores the ingenuity and innovation that can arise outside traditional channels, offering concrete alternatives for those seeking to build robust and cost-effective AI infrastructures. AI-RADAR continues to monitor these trends, providing analysis on frameworks and strategies to optimize on-premise deployments, as discussed in our sections dedicated to /llm-onpremise.