monday.com redesigned its authorization system to handle scale by replacing real-time data enrichment with an in-memory, replication-based approach. The old system queried MySQL and multiple microservices on every request, adding ~200ms of latency and creating reliability risks. The new architecture uses WAL-based CDC pipelines to replicate data into DynamoDB and S3 snapshots, then loads tenant data into pod memory with a stickiness layer for consistent routing. The migration was done incrementally with continuous integrity checks. Results: P50 latency dropped from 240ms to 6ms, P99 from 720ms to 80ms, with full independence from tier 2–3 services and automated horizontal scaling.

6m read timeFrom engineering.monday.com
Post cover image
Table of contents
Where We Started and the PainDesigning the New BackboneMigration StrategyImpact and ResultsClosing Thoughts

Sort: