Asana's platform engineering team tackled total load scalability and operational complexity in their invalidation pipeline. They addressed invalidation fanout issues through multiple approaches: attempting domain-isolated invalidations (which failed due to batching overhead), implementing invalidator replication per database as a stopgap, and ultimately migrating to a cell-based architecture for 10-20x fanout reduction. To reduce operational complexity from cache thrashing and binlog tailer issues, they're splitting the invalidator into separate generation and distribution components connected by an external datastore like Redis Streams or Kafka. Key lessons include biasing toward simplicity, validating core principles early, and recognizing the silent costs of maintaining complex generic systems.
Table of contents
Problem 2: Total load scalabilityProblem 3: Operational complexityInvalidator separationLessons learnedSort: