India is closing in on a new record for the Unified Payment Interface (UPI): 750 million daily transactions, with the declared goal of reaching one billion. According to Dilip Asbe, MD and CEO of the National Payments Corporation of India, artificial intelligence is the key to get there. Speaking to TechCrunch at Mumbai Tech Week, Asbe said AI could drive the next half of that growth, a leap from 750 million to a billion daily operations.

A multiplier for an already hyper-scalable system

UPI is already a real-time payment infrastructure handling unparalleled volumes worldwide, integrating hundreds of banks and services. Why does AI now become the decisive factor? It’s not just about automation: with a billion transactions, the number of anomalies, fraud attempts, and routing bottlenecks grows non-linearly. Machine learning models trained on historical data can anticipate failures, detect suspicious patterns, and dynamically allocate network resources. AI inference, in this context, is not a luxury but an enabler of resilience.

Latency, volumes, and the infrastructure choice

Processing a billion payments a day means handling over 11,500 transactions per second at peak. Every millisecond counts, and the latency introduced by cloud-based processing – round-trips to remote data centers, network queues, virtualization – can become a bottleneck. That’s why such scenarios often push towards on-premise or edge deployment: inference runs on dedicated hardware (often GPUs or custom accelerators) as close to the data as possible. Moreover, payments involve sensitive information that, due to sovereignty regulations, many countries prefer to keep within precise geographic boundaries. Self-hosting AI is not just about performance: it’s a matter of control and compliance.

The TCO knot and the control factor

At these scales, total cost of ownership becomes a decisive factor. The cloud’s pay-as-you-go model, with per-API-call or per-token pricing, can balloon to the point of making daily operations unsustainable. A self-hosted infrastructure, based on Capex-purchased servers, brings financial predictability and allows optimization of models for the underlying hardware, for example via INT8 or FP16 quantization to reduce VRAM requirements. AI-RADAR has repeatedly pointed out that for high-frequency inference workloads, the trade-off between CapEx and OpEx must be carefully analyzed, especially when the continuity of a national service is at stake.

A story that looks to the future of global payments

India is setting a benchmark for the entire industry. The combination of AI, extreme volumes, and latency requirements is reshaping payment system architectures, and this is no isolated case. For other central banks, interbank networks, and large fintech platforms, the UPI case shows that national-scale AI demands a deep rethink: from chip selection to data geography, to model training and updating pipelines. For those facing these decisions, AI-RADAR offers analytical tools and comparison frameworks for on-premise deployment, available at /llm-onpremise, helping to map trade-offs without ideological shortcuts.