Best of Data ManagementNovember 2024

  1. 1
    Article
    Avatar of detlifeData Engineer Things·2y

    I spent 3 hours learning how Uber manages data quality.

    Uber leverages a comprehensive data quality platform that utilizes automatic detection and management to maintain high data standards across over 2,000 datasets. The platform includes components such as Test Execution Engine, Test Generator, and Alert Generator to ensure operational excellence. The platform automates various tasks, such as generating tests and alerts, and rerunning failed tests to verify incidents. Uber also integrates its data quality tools with other platforms to provide a seamless experience for its internal teams.

  2. 2
    Article
    Avatar of swizecswizec.com·2y

    Why software only moves forward

    Software systems, especially at scale, cannot afford rollbacks or cut-overs and must always move forward due to the permanent nature of data. Data, once saved, must be managed forever, requiring updates to be additive and systems to be distributed. Challenges arise as different parts of the system need to operate on shared definitions of business logic, leading to complexities during updates. Key strategies include making additive changes, being permissive about inputs, and managing updates to both databases and code to ensure systems remain in sync.

  3. 3
    Article
    Avatar of cerbosCerbos·2y

    How to address decentralized data management in microservices

    Transitioning from monolithic to microservices architecture includes challenges and benefits in handling decentralized data management. The post discusses the advantages like scalability, flexibility, performance, and fault isolation, alongside challenges such as complex data integration, increased development complexity, latency issues, and security risks. It details patterns and techniques like eventual consistency, Saga pattern, event sourcing, domain-driven design (DDD), and command query responsibility segregation (CQRS) to mitigate these challenges. Uber's case study highlights practical implementation of these methods to maintain data integrity and ensure system reliability.

  4. 4
    Article
    Avatar of sspdataData Engineering·1y

    Medallion Architecture Hype or Useful?

    Medallion Architecture is a term coined by Databricks that aims to simplify data architecture for business and domain experts. However, it may be confusing for data professionals who are accustomed to classical data architecture models such as stage, cleansing, core, and mart, where marts are typically persisted in cubes for faster responses.

  5. 5
    Article
    Avatar of medium_jsMedium·2y

    What is Data Cleaning?

    Data cleaning involves finding and correcting errors, incomplete data, and inconsistencies in datasets to ensure accuracy and reliability for analysis. Clean data is crucial for businesses to avoid dysfunction and financial losses, support GDPR compliance, boost customer support and trust, and enhance efficiency. Tools like Excel, R, and Python libraries can facilitate more efficient data cleaning processes.