OpenAI and Broadcom unveil Jalapeño, a chip for LLM inference at scale

It doesn’t arrive with the fanfare of a gaming GPU or a training accelerator, yet the new chip from OpenAI and Broadcom could shift the landscape for those managing massive inference workloads. Named Jalapeño, it has been designed from the silicon up to serve language models at data-center scale. The announcement, short on specifics, marks the start of a long-term collaboration that promises increasingly refined future iterations.

A chip built for inference, not training

Most commercially available GPUs are born for general-purpose workloads—rendering, scientific simulation—and only later adapted for LLM inference. Jalapeño flips that logic: an ASIC (application-specific integrated circuit) strips away unnecessary overhead, concentrating transistors and memory bandwidth on the one task that matters when responding to millions of prompts per second: low-latency matrix-vector multiplication. No graphics units, no display backplane—just what is needed to turn tokens into tokens as quickly and with as little energy as possible.

This kind of specialization is a classic semiconductor story. When a workload becomes stable and predictable enough, abandoning the flexibility of a general-purpose CPU or GPU for a dedicated design can reduce cost per query by an order of magnitude. Google’s TPUs and AWS’s Trainium and Inferentia have already paved the way. Jalapeño brings this approach into OpenAI’s orbit, a company historically reliant on NVIDIA GPUs to run its models.

What it means for data centers and TCO

For a data-center operator, adopting inference-specific chips directly impacts TCO. Fewer watts per token mean lower electricity bills, less heat dissipation, and denser racks. In a world where inference demand grows by double digits quarter after quarter, even a modest improvement in energy efficiency translates into substantial savings at facility scale.

Broadcom contributes its experience in manufacturing custom chips for large enterprise and cloud clients. The most likely scenario is that Jalapeño will initially be offered through OpenAI’s cloud infrastructure, but the press release does not rule out future direct access to the OEM market. Should that happen, providers of on-premise solutions—from banks to government agencies with strict data-sovereignty requirements—could consider integrating these accelerators into their own racks, provided they can purchase and manage them with their software stacks.

The missing piece for on-premise?

Organizations evaluating self-hosted LLM deployments today face a harsh reality: GPUs with sufficient VRAM are expensive, power-hungry, and often in short supply. An inference-optimized chip, if made available outside proprietary cloud circles, would open up new room for maneuver. For predictable workloads—internal virtual assistants, document analysis, process automation—Jalapeño’s claimed efficiency could lower the barrier to entry, reducing CapEx and simplifying thermal management in an enterprise data center.

Of course, unknowns remain: software support, compatibility with serving frameworks like vLLM or TGI, and the ability to handle different model sizes and quantization levels. A GPU can adapt to FP16, INT8, and optimized attention techniques with relative ease; an ASIC requires all of this to be planned at design time. The long-term roadmap announced by OpenAI and Broadcom suggests that subsequent generations will fill any gaps, but the first iteration deserves careful scrutiny.

The bigger picture: the fragmentation of AI hardware

Jalapeño’s announcement fits into a well-established trend: the proliferation of specialized silicon for AI. NVIDIA maintains dominance with its GPUs and the recent Blackwell line, but startups like Groq and Cerebras push radically different architectures, while hyperscalers keep churning out proprietary chips. In this landscape, hardware choice increasingly becomes a long-term architectural decision that ties you to an ecosystem of tooling, drivers, and deployment pipelines.

For organizations that prioritize data sovereignty, the direction is clear: diversify suppliers and evaluate trade-offs with robust analytical tools. On AI-RADAR, the framework dedicated to on-premise deployments helps weigh exactly these variables, from per-token efficiency to software maturity.

In the meantime, the name Jalapeño—a chili pepper—suggests that OpenAI and Broadcom have no intention of going unnoticed. The challenge is serving planetary-scale inference without burning through costs. And for those following the self-hosting path, it is a signal not to be ignored.