Citigroup CEO Jane Fraser told the South China Morning Post that the financial sector is running not one, but two AI races. The first is offensive: applying AI to business models to drive revenue, shorten product development cycles, and improve customer service. The second race is defensive — Fraser summed up Citi’s role with a simple verb: protect the bank.
If the offensive race grabs headlines and investor presentations, the defensive one may be the most critical for the system’s stability. Protecting an institution today means facing threats amplified by AI itself: increasingly sophisticated fraud, deepfakes impersonating executives, automated cyberattacks. This battle plays out away from the spotlight, in data centers where infrastructure choices carry more weight than many business strategies.
Here, the question of where to run the models becomes central. Banks hold extremely sensitive data — transactions, assets, behavioral profiles. Sending that data to a public cloud for inference on a Large Language Model (LLM) can violate regulations like GDPR or central bank supervisory rules. That’s why many organizations are eyeing on-premise deployment, or at most internally managed private clouds.
Bringing an LLM in-house is no trivial feat. It requires GPUs with tens of gigabytes of VRAM, fast storage, and the ability to orchestrate inference pipelines without bottlenecks. Techniques like quantization become essential: reducing a model from FP16 to INT8 or even 4-bit allows running larger models on less exotic hardware, but introduces a trade-off between speed and response quality. For defensive tasks — anomaly detection, compliance monitoring, threat intelligence — a slight drop in accuracy may be acceptable, as long as the data remains under control.
The Total Cost of Ownership (TCO) of a self-hosted stack isn’t limited to hardware purchases. It includes in-house expertise for maintenance, continuous updates, and energy costs. Compared to the cloud, on-premise offers predictability of spending and real data sovereignty, but demands significant upfront investment. The defensive race is pushing a rethink of IT architecture: it’s no longer just about compliance, but about withstanding an opponent that uses AI offensively.
Even the offensive race, with LLMs applied to personalize products or accelerate development, can benefit from fine-tuning on proprietary data in complete security. The two races, though distinct, converge on the same problem: which stack to choose to balance performance, cost, and data sovereignty. Fraser’s remarks come at a time when the industry is under pressure to invest heavily. The question is no longer whether to adopt AI, but how to do it without undermining customer trust and infrastructure resilience. A game increasingly played on the servers inside one’s own machine room.
💬 Comments (0)
🔒 Log in or register to comment on articles.
No comments yet. Be the first to comment!