Best of Materialized View2024

  1. 1
    Article
    Avatar of materializedviewMaterialized View·2y

    SlateDB: An Embedded Storage Engine Built on Object Storage

    SlateDB is a newly open-sourced, cloud-native embedded storage engine built as a log-structured merge-tree (LSM tree) on object storage like S3 and GCS. It is designed for use cases such as stateful stream processing and serverless functions, offering bottomless storage capacity and high durability at the cost of higher latency and API costs. The project has been well-received with significant community contributions and is licensed under Apache 2.0. Future plans include adding features like on-disk and in-memory caches, snapshots, and range queries.

  2. 2
    Article
    Avatar of materializedviewMaterialized View·1y

    S3 Is the New SFTP

    Fintech companies handle diverse data processing tasks, including shuffling files between vendors and partners, often using SFTP. Transitioning to modern data lakehouses using S3, Apache Iceberg, and Apache Parquet can centralize and streamline this process. This new method allows ease of access and management while maintaining advantages such as fast transfers and central access control. Although challenges like schema evolution remain, adopting data lakehouses can benefit companies seeking efficient and scalable data solutions. The trend is supported by rising customer demand and the involvement of startups providing innovative data export platforms.

  3. 3
    Article
    Avatar of materializedviewMaterialized View·2y

    Modular Monoliths Are a Good Idea, Actually

    The post argues in favor of modular monoliths as an alternative to microservices for achieving high cohesion and low coupling in software architecture. It highlights the challenges associated with scaling monoliths and migrating to microservices, such as increased complexity and the need for extensive tooling. The author suggests that modular monolithic applications, which include tooling for incremental build systems, testing, branch management, and database isolation, offer a balanced solution. This approach can extend the life of a monolith and make transitioning to a service-based architecture more manageable.

  4. 4
    Article
    Avatar of materializedviewMaterialized View·2y

    It's Time to Merge Analytics and Data Engineering (Again)

    The post argues for merging analytics and data engineering roles, citing the commoditization of data pipelines and the limited value provided by distinct analytics engineers. With advancements like LLMs, data integration tools, and data pipeline vendors, there's a push for a consolidated data team handling extraction, transformation, and loading (ETL) processes. The author notes emerging tools that facilitate this transition and predicts a convergence of these roles in the coming years.

  5. 5
    Article
    Avatar of materializedviewMaterialized View·2y

    DuckDB Is Not a Data Warehouse

    DuckDB is a highly portable and fast tool for handling columnar data, often used by analytics and data engineers for various creative purposes. However, it is not considered a viable solution for large enterprise data warehousing due to its deployment model and limited scalability. MotherDuck aims to address these issues by building a centralized deployment model but faces tough competition from established cloud data warehouses like Snowflake and BigQuery, as well as PostgreSQL extensions.