OpenAI cooks up Jalapeño: custom chip with Broadcom as race against Nvidia dependency heats up

The announcement of a chip named Jalapeño

OpenAI has revealed plans for Jalapeño, an inference chip developed in partnership with Broadcom. It’s not just a product announcement – it’s a sign of structural change in the AI hardware market. For years Nvidia has held a near-monopoly on GPUs used to train and serve LLMs, but now the companies that consume the most compute are designing their own silicon. Google has been doing it with TPUs for almost a decade, Apple integrated its Neural Engine into SoCs, SpaceX is working on custom chips for its constellations, and now OpenAI joins the club.

Beyond dependence: custom silicon as a lever for independence

This move is not purely technical; it’s a response to single-supplier risk. When the cost of Nvidia GPU clusters becomes a massive budget line and availability can fluctuate, building an in-house accelerator optimized for inference reduces two critical variables: cost per token and procurement latency. Moreover, an internally designed chip can be tuned exactly to model sizes and the company’s traffic patterns, lowering energy consumption and improving overall efficiency.

Implications for on-premise deployments

This trend matters directly for those running local infrastructure. So far, the on-premise market for LLMs has been shaped by Nvidia GPUs (with a few AMD or Intel alternatives). If the move toward ASICs or specialized chips gains momentum, new third-party accelerator vendors could emerge, targeting specific inference workloads. This isn’t just about performance: data sovereignty and regulatory compliance (GDPR, sector-specific rules) would find an ally in hardware that can be kept entirely under one’s control, without depending on silicon whose supply chain is concentrated in a few hands.

The challenges: you can’t “bake” a chip overnight

Taking a custom chip from design to production is a multi-hundred-million-dollar, multi-year process. It requires microarchitecture expertise, EDA toolchains, and relationships with foundries – all barriers that explain why only companies with deep pockets and stable workloads can afford the investment. Moreover, designing for the inference of an ever-evolving LLM (with new model architectures) risks making the hardware obsolete unless some flexibility is built in. It’s the classic trade-off between extreme efficiency and longevity.

AI-RADAR’s perspective

Anyone dealing with on-premise AI should watch this evolution closely. If the custom accelerator market democratizes – for example through the RISC-V ecosystem or foundry services like those from Intel and TSMC – the next generation of local datacenters might no longer depend exclusively on GPUs. AI-RADAR tracks specialist silicon developments and offers analytical frameworks to weigh the trade-offs between commodity GPUs, dedicated ASICs, and hybrid solutions. For now, Jalapeño is just a spicy shard in a heating-up landscape, but the message is clear: the era of total dependency on a single hardware supplier is fading.