Rebuilding The Foundation: Why AI Infrastructure Needs To Change

AI workloads are pushing infrastructure beyond its limits, requiring a fundamental rebuild rather than incremental optimization. Key areas covered include the shift to 102.4Tbps networking silicon as the new baseline for AI clusters, the rise of Linear-drive Pluggable Optics (LPO) and Co-Packaged Optics (CPO) to address power and bandwidth constraints, and the emerging 'scale-across' paradigm that treats compute across multiple geographic locations as a single pool. Storage is identified as an underappreciated bottleneck, with KV caches, checkpoint writes, and data ingestion all capable of idling expensive GPUs. Security challenges unique to AI—including model weight theft, adversarial inputs, and training data poisoning—demand hardware-rooted trust, confidential computing, and DPU-based network enforcement. Organizations that invest in all these layers together, not just GPUs, will be positioned to train and deploy next-generation AI systems.

#networking

#ai-security

#ai-infrastructure

Mar 17•7m read time•From blogs.cisco.com

Table of contents

The bandwidth wall and the rise of co-packaged optics Scale-across: Beyond the single cluster Storage: The forgotten bottleneck Security in an era of valuable weights Dive deeper into the announcements we made this week at GTC.

Comment

Bookmark

Copy

Sort: