(Image credit: Google) Googles Ironwood TPU scales to 9216 chips with record 1.77PB shared memoryDual die architecture delivers 4614 TFLOPs FP8 and 192GB HBM3e per chipEnhanced reliability cooling and AI assisted design features enable efficient inference workloads at scaleGoogle closed out the machine learning sessions at the recent Hot Chips 2025 event with a detailed look at its newest tensor processing unit, Ironwood.The chip, which was first revealed at Google Cloud Next 25 back in April 2025, is the companys first TPU designed primarily for large scale inference workloads, rather than training, and arrives as its seventh generation of TPU hardware.Each Ironwood chip integrates two compute dies, delivering 4,614 TFLOPs of FP8 performance - and eight stacks of HBM3e provide 192GB of memory capacity per chip, paired with 7.3TBs bandwidth.1.77PB of HBMGoogle has built in 1.2TBps of IO bandwidth to allow a system to scale up to 9,216 chips per pod without glue logic. That configuration reaches a whopping 42.5 exaflops of performance.Memory capacity also scales impressively.
Across a pod, Ironwood offers 1.77PB of directly addressable HBM. That level sets a new record for shared memory supercomputers, and is enabled by optical circuit switches linking racks together.The hardware can reconfigure around failed nodes, restoring workloads from checkpoints.The chip integrates multiple features aimed at stability and resilience.
These include an on-chip root of trust, built-in self test functions, and measures to mitigate silent data corruption.Are you a pro? Subscribe to our newsletterSign up to the TechRadar Pro newsletter to get all the top news, opinion, features and guidance your business needs to succeed!Contact me with news and offers from other Future brandsReceive email from us on behalf of our trusted partners or sponsorsLogic repair functions are included to improve manufacturing yield. An emphasis on RAS, or reliability, availability, and serviceability, is visible throughout the architecture.Cooling is handled by a cold plate solution supported by Googles third generation of liquid cooling infrastructure.Google claims a twofold improvement in performance per watt compared with Trillium.
Dynamic voltage and frequency scaling further improves efficiency during varied workloads.Ironwood also incorporates AI techniques within its own design. It was used to help optimize the ALU circuits and floor plan.A fourth generation SparseCore has been added to accelerate embeddings and collective operations, supporting workloads such as recommendation engines.Deployment is already underway at hyperscale in Google Cloud data centers, although the TPU remains an internal platform not available directly to customers.Commenting on the session at Hot Chips 2025, ServeTheHomes Ryan Smith said, This was an awesome presentation.
*Reporting by Techradar.*