The idea that a software company builds its own hardware is no longer an exception, but OpenAI fielding a custom-designed inference chip with Broadcom and TSMC adds a tile to a mosaic already dense with signals. Custom silicon ceases to be the privilege of cloud providers: now model makers are jumping in, and the move speaks directly to those managing real workloads, especially when costs, latency, and data control are at stake.

A chip born for inference

No official datasheet exists yet, but the positioning is clear: the new processor is designed by Broadcom – a company with decades of ASIC expertise for data centers – and manufactured by TSMC, likely on an advanced node. The target is inference, that moment repeated millions of times when an LLM generates token after token. There, energy efficiency and memory bandwidth matter more than peak floating-point performance. Unlike GPUs, a dedicated ASIC can operate at reduced precision (INT8, FP8) already standard on NVIDIA chips but with lower overhead and a better token-per-joule ratio. The goal: run massive loads with fewer watts and, over time, a lower cost per query.

Why OpenAI moved now

The dependence on NVIDIA has cost every hyperscaler dearly: H100 GPUs were in shortage for long stretches, and the hourly compute price remains high. For a service like ChatGPT handling hundreds of millions of requests, even a small reduction in inference cost translates into massive savings. Google paved the way with TPUs, Amazon with Trainium and Inferentia, Microsoft announced Maia, and Meta works on MTIA. OpenAI, so far focused on software, decided to enter silicon to optimize the execution of its own models – perhaps with accelerators that better understand the characteristics of GPT architectures. It’s not excluded that the chip hides custom logic for sparse attention or specific quantization techniques, but without official details any guess is premature.

What it changes for on-premise deployment

For AI-RADAR readers, the real question isn’t whether the chip will run in OpenAI’s data centers, but whether this trend will influence the hardware available to those wanting to self-host their LLMs. Today, the TCO of an on-premise infrastructure is dominated by GPU cost and power consumption. If the market for custom inference chips takes off, cheaper alternatives to general-purpose GPUs are plausible. Broadcom has a long history of ASICs sold to third parties (think network or storage chips), so it’s not unrealistic to imagine that these designs might one day be offered to enterprise customers.

For companies bound by GDPR or handling sensitive data, an energy-efficient purchasable inference chip would mean running state-of-the-art models without relying on the cloud, reducing operational costs and audit complexity. Still, hurdles remain around programmability and software support: an ASIC without a mature framework risks being inaccessible to most. And NVIDIA, with CUDA, has a huge moat. The road to an on-premise ecosystem based on custom chips is long, but the fact that a pioneer like OpenAI invests in silicon signals the direction.

An ecosystem fragmenting, but with a common thread

The proliferation of custom chips is reshaping the supply chain. TSMC and Broadcom become strategic nodes for anyone wanting to bypass NVIDIA. But fragmentation risks complicating life for software developers, who today can rely on a unified platform. Europe, and Italy in particular, observe these developments from a delicate position: strong sensitivity to data sovereignty but little advanced semiconductor manufacturing capacity. In this landscape, understanding how chips affect TCO and latency becomes crucial. AI-RADAR has always warned against considering only list prices: any assessment must weigh cost per token, watts dissipated, management complexity, and above all the freedom to move workloads over time.

OpenAI’s move isn’t an immediate revolution for on-premise deployments, but a thermometer measuring the fever of an industry in turmoil. When model makers start designing their own silicon, the message is clear: hardware is too important to be entirely delegated to third parties.