Notion experienced a 10-fold increase in data growth since 2021, reaching over 200 billion blocks stored in their Postgres database by 2024. This exponential growth led to the development of a new data lake infrastructure to manage the heavy load and improve scalability, performance, and cost-efficiency. The new setup includes using S3 for storage, Kafka for data ingestion, and Apache Hudi for managing updates. This overhaul has resulted in significant cost savings, reduced ingestion times, and enabled new features such as Notion AI.
Table of contents
Data Modeling for Performance: Virtual Masterclass (Sponsored)What is a Block?Build client and partner portals in 30 minutes (Sponsored)The Initial Data Warehouse ArchitectureNotion’s New Data LakeSolving Scaling Challenges of the New Data LakeConclusionSPONSOR USSort: