Scaling Jira cloud Migrations, One Bottleneck at a Time

Atlassian's Jira Migrations team shares a detailed engineering story of scaling their DC-to-Cloud migration pipeline from 20K to 50K user environments. The original API-driven (push-based) architecture suffered from timeouts, rate limiting, and poor throughput. A new V4 architecture backed by Kafka and an in-house ETL tool called Lithium introduced a pull-based model. Despite initial regressions (34% slower, 60% throughput drop), systematic profiling and fixes—including correcting polling timeouts, eliminating N+1 patterns, Kafka partition-key-based lock contention reduction, and t-shirt sizing for project concurrency—yielded a 6x median throughput improvement. For 50K scale, additional challenges like slow startup, Kafka partition exhaustion, long-tail small-project overhead, and database replication lag were each addressed with targeted solutions. The final result: 6,000+ projects and ~7.5M work items migrated within 24 hours, with infrastructure costs reduced by up to $65K/month.

#kafka

#distributed-systems

#jira

Mar 06•14m read time•From atlassian.com

Table of contents

Prologue – Migrations and relevant background The V4 Architecture The pipeline Act I – The 20K Scale: When “Modern” Meant “Slower”Finding the Bottlenecks First Wave of Fixes for 20K Scale Act II – Raising the Bar: The 50K Scale Challenge Our 4-Part Strategy for 50K Scale Act III – The 50K Journey: Problems We Had to Solve

Comment

Bookmark

Copy

Sort: