NVIDIA and Google Cloud: AI Infrastructure to Cut Inference Costs and Ensure Data Sovereignty

Optimizing AI Inference: The NVIDIA and Google Cloud Collaboration

At the Google Cloud Next conference, Google and NVIDIA outlined a joint hardware and software roadmap aimed at addressing the economic burden of large-scale AI Inference. The primary goal is to make the implementation of Large Language Models (LLM) and other AI workloads more accessible and efficient for businesses, both in terms of cost and performance.

This strategic partnership aims to provide an integrated infrastructure capable of supporting the growing needs of enterprises adopting artificial intelligence, from the training phase to production Deployment. The presented solutions focus on a co-designed architecture to maximize energy efficiency and processing speed, crucial elements for modern accelerated computing.

Technical Details: A5X, Rubin, and Blackwell for Performance and Security

The companies introduced the new A5X bare-metal instances, which operate on NVIDIA Vera Rubin NVL72 rack-scale systems. This architecture, resulting from hardware and software co-design, is engineered to deliver up to ten times lower Inference cost per token compared to previous generations, while concurrently achieving ten times higher token Throughput per megawatt. For those evaluating on-premise deployments, this data is fundamental for an accurate TCO (Total Cost of Ownership) analysis, considering both operational and energy costs.

To manage the connectivity of thousands of processors and prevent processing delays, the A5X instances pair NVIDIA ConnectX-9 SuperNICs with Google Virgo networking technology. This configuration scales to 80,000 NVIDIA Rubin GPUs within a single site cluster, and up to 960,000 GPUs across a multisite Deployment. Sophisticated workload management is essential at this scale, as routing data across nearly a million parallel processors demands precise synchronization to avoid idle compute time.

Data Sovereignty and Operational Overhead for Agentic AI

Beyond raw processing capabilities, data governance remains a primary issue for enterprise Deployments. Highly regulated sectors, including finance and healthcare, often stall machine learning initiatives due to data sovereignty requirements and the risks of exposing proprietary information. To address these compliance mandates, Google Gemini models running on NVIDIA Blackwell and Blackwell Ultra GPUs are entering preview on Google Distributed Cloud. This Deployment method allows organizations to retain models entirely within their controlled environments, alongside their most sensitive data stores. The architecture incorporates NVIDIA Confidential Computing, a hardware-level security protocol that ensures training models operate within a protected environment where prompts and Fine-tuning data remain encrypted, preventing unauthorized parties, including the cloud infrastructure operators themselves, from viewing or altering the underlying data. For multi-tenant public cloud environments, a preview of Confidential G4 VMs equipped with NVIDIA RTX PRO 6000 Blackwell GPUs introduces these same cryptographic protections, giving regulated industries access to high-performance hardware without violating data privacy standards. This release represents the first cloud-based confidential computing offering for NVIDIA Blackwell GPUs.

Building multi-step agentic systems requires connecting Large Language Models to complex application programming interfaces, maintaining continuous vector database synchronization, and actively mitigating algorithmic hallucinations during execution. To streamline this significant engineering requirement, NVIDIA Nemotron 3 Super is now available on the Gemini Enterprise Agent Platform. The platform provides developers with tools to customize and Deploy reasoning and multimodal models specifically designed for agentic tasks. Training these models at scale introduces heavy operational overhead, particularly when managing cluster sizing and hardware failures during long reinforcement learning cycles. Google Cloud and NVIDIA introduced Managed Training Clusters on the Gemini Enterprise Agent Platform, which includes a managed reinforcement learning API built with NVIDIA NeMo RL. This system automates cluster sizing, failure recovery, and job execution, allowing data science teams to concentrate on model quality rather than low-level infrastructure management.

Impacts Across the Accelerated Compute Ecosystem and Future Outlook

The integration of machine learning into heavy industry and manufacturing presents a different class of engineering challenges. Connecting digital models to physical factory floors requires exact physical simulations, massive compute power, and standardization across legacy data formats. NVIDIA's AI infrastructure and physical AI libraries are now available on Google Cloud, providing the foundation for organizations to simulate and automate real-world manufacturing workflows. By utilizing NVIDIA Omniverse libraries and the Open Source NVIDIA Isaac Sim Framework via the Google Cloud Marketplace, developers can bypass some of these translation issues to construct physically accurate digital twins and train robotics simulation Pipelines prior to physical Deployment. Deploying NVIDIA NIM microservices, such as the Cosmos Reason 2 model, to Google Vertex AI and Google Kubernetes Engine enables vision-based agents and robots to interpret and navigate their physical surroundings.

Translating these hardware specifications into quantifiable financial returns requires inspecting how early adopters utilize the infrastructure. The broad portfolio includes options scaling from full NVL72 racks down to fractional G4 VMs offering just one-eighth of a GPU, allowing customers to precisely provision acceleration capabilities for reasoning and data processing tasks. OpenAI, for instance, uses large-scale Inference on NVIDIA GB300 and GB200 NVL72 systems on Google Cloud to handle demanding workloads, including ChatGPT operations. This collaboration between NVIDIA and Google Cloud aims to provide a computing foundation designed to advance experimental agents and simulations into production systems that secure fleets and optimize factories in the physical world, offering essential flexibility and control for technical decision-makers.

NVIDIA and Google Cloud: AI Infrastructure to Cut Inference Costs and Ensure Data Sovereignty

Optimizing AI Inference: The NVIDIA and Google Cloud Collaboration

Technical Details: A5X, Rubin, and Blackwell for Performance and Security

Data Sovereignty and Operational Overhead for Agentic AI

Impacts Across the Accelerated Compute Ecosystem and Future Outlook

💬 Comments (0)

🔍 Continue Exploring

Explore LLM On-Premise

Nvidia says AI monetization supports sustained CSP capex

Google Cloud AI: intelligence, speed and adaptability of models

Blackwell GPUs: a cost-effective alternative to the cloud for LLMs?

👥 Join 160+ AI explorers