Google has unveiled its 8th generation of Tensor Processing Units (TPUs), introducing two specialized chips: TPU 8t for large-scale model training and TPU 8i for low-latency inference. The TPU 8t delivers nearly 3x compute performance over the previous generation, scales to 9,600 chips in a single superpod with 2 petabytes of shared HBM, and can scale to a million chips in a single cluster. The TPU 8i targets agentic workloads with 288GB of memory, doubled ICI bandwidth at 19.2 Tb/s, and 80% better performance per dollar. Google's design philosophy of co-designing silicon with software and networking remains central to these gains.

3m read timeFrom infoq.com
Post cover image

Sort: