Best of Data ManagementOctober 2024

  1. 1
    Article
    Avatar of bytebytegoByteByteGo·2y

    1.8 Trillion Events Per Day with Kafka: How Agoda Handles it

    Agoda manages 1.8 trillion daily events through Apache Kafka with strategies like 2-step logging architecture, splitting Kafka clusters by use case, developing robust auditing systems, and dynamic load balancing solutions. Their approach ensures resiliency, flexibility, and efficient resource utilization despite hardware heterogeneity and inconsistent message workloads. Key solutions include lag-aware producers and consumers that adapt based on real-time data, mitigating over-provisioning issues and ensuring balanced workloads.

  2. 2
    Article
    Avatar of tdsTowards Data Science·2y

    Scaling RAG from POC to Production

    Retrieval Augmented Generation (RAG) is becoming a key architecture for large-scale applications of AI, balancing the capabilities of large language models with the accuracy of indexed data. Scaling from a proof of concept (POC) to production presents multiple challenges, including performance, data management, and risk mitigation. Addressing these challenges involves architectural components such as scalable vector databases, caching mechanisms, advanced search techniques, and a Responsible AI layer. Strategic planning and integration into existing workflows are crucial for successful scaling.

  3. 3
    Article
    Avatar of decuberssDecube·2y

    Understanding Data Products and Data Contracts: Building Trust in Modern Data Management

    Data products and data contracts transform raw data into reliable assets, helping organizations manage data quality and access control. Data products are curated and cleaned-up data sets designed to solve specific business problems. Data contracts are formal agreements that ensure data meets specified quality and update standards, fostering trust. Domain management organizes data by business function, enhancing order and security.