Optimizing Energy for AI: A New Deployment Paradigm

The rapid increase in power demand from data centers, driven by the expansion of Large Language Models (LLM) and other artificial intelligence applications, is forcing the industry to rethink its deployment strategies. The search for efficient and flexible energy solutions has become a top priority. In this context, Nvidia and its collaborators are launching an innovative pilot project that aims to build micro data centers near utility substations, operating in concert to optimize the use of available power.

This initiative, which involves the construction of approximately 25 small data centers, each with a capacity ranging from 5 to 20 megawatts, will be distributed across five different utilities in the United States. The core idea is to dynamically shift computational workloads between sites based on power availability. If a substation becomes overloaded or experiences an outage, the compute load would be rerouted to a different data center adjacent to a substation with spare capacity, ensuring continuity and efficiency. Nvidia is collaborating with InfraPartners for construction, Prologis for real estate services, and the non-profit EPRI (Electric Power Research Institute) for research and development.

Energy Flexibility and Distributed Infrastructure

The approach proposed by Nvidia and its partners addresses a growing need: the ability to quickly secure power from the grid, an increasingly precious commodity. Ben Sooter, director of Agentic AI Initiatives and Distributed AI Architecture at EPRI, highlights that the average nominally available power at individual substations is about 5 MW, with a maximum of 20 MW. While these figures are too small for most traditional large data center operators, building a fleet of such facilities, operated as if they were one larger entity, offers significant advantages. This strategy can double the overall available power by shifting loads from overburdened substations to those with more headroom.

Marc Spieler, senior director of energy at Nvidia, emphasizes the large-scale potential: with 55,000 substations in the U.S., even just 5, 10, or 20 MW of spare capacity at each adds up quickly. This energy flexibility not only allows for better utilization of existing infrastructure but can also accelerate grid connection times for new data centers, avoiding the long waits (up to a decade) often required for new connection approvals or new power plant construction. Furthermore, proximity to substations reduces the need for new power lines and grid infrastructure, leveraging existing fiber-optic lines for high-speed internet connectivity.

The Inference Advantage

The feasibility of this distributed deployment strategy largely depends on the type of AI workload. Training AI models, such as Large Language Models, requires massive data centers with tightly interconnected GPUs via technologies like Nvidia's NVLink and InfiniBand. For example, Meta's Llama 3.1 403B model took about two and a half months to train on 16,000 GPUs. Spreading a training workload across a fleet of mini data centers would not be practical due to high-speed interconnection requirements. However, training loads can be paused for short periods to curtail energy use during peak demand.

In contrast, Inference, which is the use of a trained model to generate responses or images, is much better suited for smaller, distributed data centers. Inference does not require the same number of GPUs or the same networking complexity as training, as it processes single user queries without the need for large-scale coordination between data chunks. Valerie Crafton, senior vice president of strategy and operations at Mod42, points out that inference is one of the few workloads that can be dynamically routed, allowing computation to align with power availability. Nvidia and EPRI estimate that compute workloads will need to be moved between substations only about 0.1 percent of the time. This โ€œsecond compute waveโ€ of smaller data centers for inference is expected to see significant demand by 2027.

Implications for On-Premise Deployments and Data Sovereignty

The distributed inference approach and micro data centers have significant implications for organizations considering on-premise or hybrid deployments. The ability to leverage unused grid capacity and distribute workloads can reduce the Total Cost of Ownership (TCO) and improve operational resilience. For companies needing to maintain data sovereignty or operate in air-gapped environments, the ability to physically control the location of their data centers and optimize local energy infrastructure becomes a critical factor. This model offers an alternative to large, centralized data centers, often located in regions with already saturated energy infrastructures.

The growing energy demand from data centers, which EPRI estimates could account for 9-17% of U.S. electricity generation by 2030, makes these solutions even more urgent. For those evaluating on-premise deployments, analyzing the trade-offs between initial costs, operational flexibility, and energy access is fundamental. AI-RADAR offers analytical frameworks on /llm-onpremise to evaluate these aspects, providing tools to understand how strategies like distributed inference can influence deployment decisions and resource management. The evolution towards smaller, more flexible data centers represents an important step towards a more sustainable and adaptable AI infrastructure for future needs.