Atlassian's Jira Migrations team shares a detailed engineering story of scaling their DC-to-Cloud migration pipeline from 20K to 50K user environments. The original API-driven (push-based) architecture suffered from timeouts, rate limiting, and poor throughput. A new V4 architecture backed by Kafka and an in-house ETL tool called Lithium introduced a pull-based model. Despite initial regressions (34% slower, 60% throughput drop), systematic profiling and fixes—including correcting polling timeouts, eliminating N+1 patterns, Kafka partition-key-based lock contention reduction, and t-shirt sizing for project concurrency—yielded a 6x median throughput improvement. For 50K scale, additional challenges like slow startup, Kafka partition exhaustion, long-tail small-project overhead, and database replication lag were each addressed with targeted solutions. The final result: 6,000+ projects and ~7.5M work items migrated within 24 hours, with infrastructure costs reduced by up to $65K/month.

14m read timeFrom atlassian.com
Post cover image
Table of contents
Prologue – Migrations and relevant backgroundThe V4 ArchitectureThe pipelineAct I – The 20K Scale: When “Modern” Meant “Slower”Finding the BottlenecksFirst Wave of Fixes for 20K ScaleAct II – Raising the Bar: The 50K Scale ChallengeOur 4-Part Strategy for 50K ScaleAct III – The 50K Journey: Problems We Had to Solve

Sort: