OpenAI and Broadcom unveil Jalapeño chip for LLM inference

AIHardware

26 Jun

The first OpenAI Intelligence Processor is running laboratory workloads and is scheduled for initial data center deployment by the end of 2026

Sam Altman, left, and a Broadcom executive presenting the Jalapeño Intelligence Processor developed for large language model inference. — ***OpenAI Chief Executive Officer Sam Altman, left, presents the Jalapeño Intelligence Processor with a Broadcom executive***

OpenAI and Broadcom have unveiled Jalapeño, OpenAI’s first processor designed specifically for large language model inference, as the AI developer expands into the hardware used to operate ChatGPT, Codex, its application programming interface, and future agent-based products.

The accelerator was developed with Broadcom and manufacturing partner Celestica. OpenAI says engineering samples are running machine learning workloads at the intended production frequency and power, including GPT-5.3-Codex-Spark.

Initial deployment is planned by the end of 2026 through data center partners including Microsoft. OpenAI and Broadcom intend to expand the system across several chip generations and deploy it at gigawatt scale.

OpenAI says early testing indicates Jalapeño will deliver substantially better performance per watt than current leading accelerators. Final measurements have not been published, with a detailed technical report due in the coming months.

The launch adds chip design to OpenAI’s existing work across models, products, kernels, memory systems, networking, scheduling, and deployment infrastructure.

Chip is designed around LLM inference workloads

Jalapeño has been built for inference, the process through which a trained AI model responds to prompts and performs tasks for users.

OpenAI says the architecture is informed by the workloads running across ChatGPT, Codex, the OpenAI application programming interface, and products expected to use longer, multi-step agent workflows.

Rather than adapting an accelerator originally created for broader AI workloads, OpenAI designed Jalapeño around the compute, memory movement, networking, and serving patterns used by large language models.

The architecture is intended to reduce unnecessary data movement and balance computing, memory, and networking capacity. OpenAI says this should allow more of the chip’s theoretical computing capacity to be used in practice.

Richard Ho, who leads OpenAI’s hardware program, says: "Jalapeño was designed from the ground up for LLM inference using detailed insights from our close collaboration with OpenAI researchers. We optimized the architecture around the kernels, memory movement, networking, and serving patterns that matter most for frontier AI models. Based on early testing, Jalapeño will efficiently execute our most important workloads close to the hardware’s theoretical limits."

OpenAI says Jalapeño has been designed with enough flexibility to support current and future large language models across the wider AI industry, rather than operating solely with OpenAI models.

The expected benefits include reduced latency for interactive AI products, increased throughput, and lower energy requirements for each inference workload. Those claims remain subject to the final performance data due in OpenAI’s technical report.

Broadcom and Celestica support production systems

OpenAI developed the processor architecture, while Broadcom provided silicon implementation, networking, and connectivity technology.

Broadcom’s contribution includes Tomahawk networking silicon, which is intended to connect large numbers of accelerators inside data center systems.

Celestica is supporting the board design, rack integration, system assembly, and production infrastructure required to move Jalapeño from individual chips into deployable computing systems.

The accelerator progressed from initial design to manufacturing tape-out in nine months. Tape-out is the stage at which a completed chip design is sent for manufacturing.

The partners describe this as an unusually short development cycle for a high-performance application-specific integrated circuit. OpenAI says its own models were used during parts of the chip design and optimization process, although it has not provided details about the tasks they completed or the amount of development time saved.

The first processor was presented to OpenAI Chief Executive Officer Sam Altman and President and Co-Founder Greg Brockman by Broadcom President and Chief Executive Officer Hock Tan and President Charlie Kawwas.

The project gives OpenAI more direct control over the infrastructure used to operate its products. It also reduces reliance on general-purpose accelerator designs for at least part of OpenAI’s future inference capacity.

Microsoft data center deployment starts in 2026

Jalapeño is the first processor in a planned series of OpenAI and Broadcom accelerators.

The partners expect initial deployment by the end of 2026, followed by further generations in subsequent years. The rollout will combine OpenAI-designed chips with Broadcom networking and connectivity systems and Celestica’s board and rack infrastructure.

Microsoft is named as one of the data center partners involved in the gigawatt-scale deployment. OpenAI has not disclosed the first deployment locations, the number of accelerators involved, or how capacity will be divided between ChatGPT, Codex, application programming interface customers, and other workloads.

Hock Tan, President and Chief Executive Officer at Broadcom, says: "Our collaboration with OpenAI represents a fundamental commitment to scaling the physical infrastructure required for the next decade of AI. This is just the beginning of a multi-generation roadmap. By co-developing our industry-leading silicon directly with OpenAI, we are enabling the deployment of gigawatt scale data centers with Microsoft and other partners beginning in 2026."

OpenAI says greater inference efficiency could support faster responses, longer Codex tasks, more reliable service during periods of high demand, and lower costs for developers building with its models.

Engineering samples are now running in the laboratory, with the technical performance report due in the coming months. Initial deployment with Microsoft and other data center partners remains scheduled for the end of 2026.

ETIH Innovation Awards 2026