Ironwood: Google’s First TPU Designed for the Inference Era

Today at Google Cloud Next ’25, They’ve unveiled Ironwood — there seventh-generation Tensor Processing Unit (TPU). It’s the most powerful, scalable, and energy-efficient custom AI accelerator yet, and the first specifically designed for the era of inference.

For over a decade, TPUs have powered Google’s most demanding AI training and serving workloads, enabling our Cloud customers to do the same. Ironwood marks a major evolution: it’s purpose-built to fuel the next generation of thinking, inferential AI models at scale.

Ironwood signals a significant shift — not just in AI infrastructure, but in how AI itself advances. We’re moving from responsive models that deliver real-time information for human interpretation, to proactive models that generate insights and understanding on their own. We call this the “age of inference,” where AI agents don’t just serve up data — they anticipate, interpret, and collaborate.

To meet the massive computational and communication demands of this new generation, Ironwood was engineered for scale. Up to 9,216 liquid-cooled chips are connected by breakthrough Inter-Chip Interconnect (ICI) networking, spanning nearly 10 MW of power. Ironwood is also a key component of our new Google Cloud AI Hypercomputer architecture, which tightly integrates hardware and software to optimize performance for the most intensive AI workloads.

With Ironwood, developers can easily tap into the immense computing power of tens of thousands of TPUs through Google’s Pathways software stack — making massive-scale AI training and serving both more accessible and more efficient.

Here’s a deeper look at how these innovations combine to deliver unmatched performance, cost, and power efficiency for the world’s most ambitious AI workloads.

Powering the Age of Inference with Ironwood

Ironwood is purpose-built to meet the complex computational and communication demands of “thinking models,” including Large Language Models (LLMs), Mixture of Experts (MoEs), and advanced reasoning tasks. These models require massive parallelism and highly efficient memory access. Ironwood minimizes data movement and on-chip latency while executing large-scale tensor operations.

At the frontier of AI, the computational needs of thinking models far exceed the capabilities of a single chip. To address this, Ironwood TPUs are connected through a low-latency, high-bandwidth Inter-Chip Interconnect (ICI) network, enabling synchronized communication across full TPU pod scale.

For Google Cloud customers, Ironwood is available in two configurations, tailored to workload demands: a 256-chip system and a 9,216-chip system.

When scaled to 9,216 chips, Ironwood delivers an astounding 42.5 exaflops of compute power — over 24 times the performance of El Capitan, the world’s largest supercomputer, which offers 1.7 exaflops per pod. Each Ironwood chip reaches a peak performance of 4,614 teraflops, enabling training and inference for the most advanced dense LLMs and MoE models with thinking capabilities. Ironwood’s memory and networking architecture ensures critical data remains readily available to sustain performance at these massive scales.

Ironwood also introduces an enhanced SparseCore — a specialized accelerator designed for ultra-large embeddings typical in advanced ranking, recommendation, financial modeling, and scientific computing workloads. Expanded SparseCore capabilities allow Ironwood to accelerate an even broader range of applications beyond traditional AI.

Pathways, Google’s ML runtime developed by DeepMind, orchestrates efficient distributed computing across thousands of TPU chips. With Pathways on Google Cloud, scaling AI workloads across hundreds of thousands of Ironwood chips becomes seamless, driving forward the next frontier of generative AI computation.

Figures

Figure 1: A green bar chart showing progressive improvements in total FP8 peak FLOPS performance relative to TPU v2, Google’s first external Cloud TPU.
Figure 2: A side-by-side technical comparison of recent TPU generations, highlighting peak FLOPS per chip and key innovations, with native FP8 support introduced in Ironwood.

Figure 1: Improvement in total FP8 peak FLOPS performance relative to TPU v2, Google’s first externally available Cloud TPU.

Figure 2: Side-by-side comparison of the technical specifications for the 3D torus versions of Cloud TPU products, including the latest generation, Ironwood. FP8 peak TFLOPS are emulated for v4 and v5p, but natively supported in Ironwood.

Ironwood’s Key Features

Google Cloud brings over a decade of AI compute expertise — honed through delivering planetary-scale services like Gmail and Search — into every aspect of Ironwood’s design. Highlights include:

Dramatic performance and energy efficiency gains:
Ironwood offers twice the performance-per-watt compared to Trillium, our sixth-generation TPU. With power availability becoming a key constraint for AI expansion, Ironwood provides significantly greater compute capacity per watt. Thanks to advanced liquid cooling and optimized chip design, Ironwood sustains up to 2x the performance of standard air-cooled systems even under continuous, intensive AI workloads. Compared to our first Cloud TPU launched in 2018, Ironwood delivers nearly 30x better power efficiency.
Massive memory upgrades:
Each Ironwood chip features 192 GB of High Bandwidth Memory (HBM) — a 6x increase over Trillium — enabling larger models and datasets with reduced need for data transfer, boosting overall efficiency.
Breakthrough memory bandwidth:
Ironwood achieves 7.37 TB/s of HBM bandwidth per chip — 4.5x higher than Trillium — ensuring rapid access to memory for data-intensive AI tasks.
Enhanced interconnect speeds:
Ironwood’s ICI bandwidth is boosted to 1.2 TBps bidirectional, a 1.5x improvement over Trillium. This enables faster, more efficient communication between chips, unlocking highly scalable distributed training and inference.

Improvement in Google’s TPU power efficiency relative to the earliest-generation Cloud TPU v2, measured by peak FP8 FLOPS delivered per watt of thermal design power (TDP) per chip package.

Ironwood: Built for the AI Demands of Tomorrow

Ironwood marks a breakthrough for the age of inference, delivering major advancements in compute power, memory capacity, interconnect networking, and system reliability. Combined with nearly 2x greater power efficiency, Ironwood empowers our most demanding customers to tackle training and serving workloads with unmatched performance and ultra-low latency — all while keeping pace with the exponential growth in AI computing needs.

Today, leading thinking models like Gemini 2.5 and the Nobel Prize-winning AlphaFold already run on TPUs. With Ironwood, we’re excited to see what new AI breakthroughs our developers and Google Cloud customers will achieve when it becomes available later this year.

This content is sourced from https://blog.google/products/google-cloud/ironwood-tpu-age-of-inference/

Powering the Age of Inference with Ironwood

Figures

Ironwood’s Key Features

Ironwood: Built for the AI Demands of Tomorrow

Leave a Comment Cancel Reply