Best of Data Management — 2024

1
Article
KDnuggets·2y
10 GitHub Repositories to Master SQL
This post lists 10 GitHub repositories that can help readers master SQL and database management. The repositories include tutorials, practice exercises, comprehensive courses, and tools for SQL-related tasks.
199
4
2
Article
The Polymathic Engineer·2y
How to design a system for scale
Scalability is essential for software engineers as applications grow. Three key techniques for scaling systems are adding server clones, functional partitioning, and data partitioning. Adding server clones involves creating interchangeable copies of existing servers to distribute loads. Functional partitioning breaks down the system into smaller, independent components each handling specific tasks. Data partitioning divides datasets across multiple machines to speed up processing and storage. Each technique has its pros and cons and requires careful consideration for effective implementation.
161
1
3
Article
Data Engineer Things·2y
I spent 3 hours learning how Uber manages data quality.
Uber leverages a comprehensive data quality platform that utilizes automatic detection and management to maintain high data standards across over 2,000 datasets. The platform includes components such as Test Execution Engine, Test Generator, and Alert Generator to ensure operational excellence. The platform automates various tasks, such as generating tests and alerts, and rerunning failed tests to verify incidents. Uber also integrates its data quality tools with other platforms to provide a seamless experience for its internal teams.
131
2
4
Article
Community Picks·2y
10 Microservices Architecture Challenges for System Design Interviews
This post discusses the challenges faced in Microservices architecture and provides strategies to overcome them. It covers topics such as service communication, data management, distributed tracing, service orchestration, deployment and DevOps, testing, security and access control, scalability and resource allocation, versioning and compatibility, and organizational complexity and communication.
131
2
5
Article
swizec.com·2y
Why SQL is Forever
SQL and relational databases remain fundamental for transactional data, despite the advances and popularity of NoSQL technologies over the past decades. Many NoSQL systems have either been removed, adapted to include SQL/natively support transactions, or are mainly used for caching and analytics. This demonstrates the enduring flexibility and utility of SQL, including new features like JSON support and vector databases, which relational databases have successfully integrated while maintaining ACID properties.
121
7
6
Article
builder.io·2y
Understanding and Implementing Structured Data
Structured data helps organizations organize and standardize their information for easier access and updates, improving decision-making and efficiency. Unstructured data, which constitutes around 80-90% of data, requires specialized tools to analyze. Implementing structured content models and using a CMS can significantly enhance SEO, personalization, and dynamic content delivery across multiple platforms. Headless CMS and modular UI components further support flexible, sustainable digital experiences.
113
7
Article
Towards Dev·1y
Mastering Data Modeling : A Step-by-Step Guide
Database modeling involves three main phases: conceptual, logical, and physical. The conceptual phase defines high-level business requirements and uses tools like entity-relationship diagrams (ERDs) to represent entities and relationships. The logical phase focuses on normalizing data to eliminate redundancies and improve integrity, while the physical phase implements the design in a specific database system, considering storage structures and indexing strategies. Effective data modeling ensures a well-organized and efficient database structure.
83
8
Article
Hacker News·2y
mayneyao/eidos: Offline alternative to Notion. Eidos is an extensible framework for managing your personal data throughout your lifetime in one place.
Eidos is an offline, extensible framework designed to manage your personal data throughout your lifetime. It operates entirely within your browser with PWA support, offering local data storage for high performance without an internet connection. The platform integrates AI features, accessible even in offline mode, and allows extensive customization via JavaScript and TypeScript, among other tools. Eidos supports developer-friendly features such as API & SDK and SQLite standardization. The project leverages various open-source components and is licensed under AGPL.
78
2
9
Article
Bits and Pieces·2y
10 Challenges In Implementing Microservices
Implementing microservices can be challenging, but there are solutions to overcome the common challenges. Domain-Driven Design (DDD) and Event-Driven Architecture (EDA) can help manage complexity. Proper service discovery and communication mechanisms are important for large-scale applications. Data management and consistency can be addressed through strategies like CQRS and the Saga pattern. Deployment and DevOps automation can streamline the process. Monitoring and observability are essential for performance insights. Service resilience and fault tolerance can be achieved through circuit breakers and health checks. Security measures like authentication, secure communication, input validation, data encryption are crucial. Effective team organization and communication are necessary for collaboration. Versioning and compatibility can be managed using semantic versioning and API versioning. Scalability can be achieved through horizontal scaling and container orchestration.
75
1
10
Article
ByteByteGo·2y
1.8 Trillion Events Per Day with Kafka: How Agoda Handles it
Agoda manages 1.8 trillion daily events through Apache Kafka with strategies like 2-step logging architecture, splitting Kafka clusters by use case, developing robust auditing systems, and dynamic load balancing solutions. Their approach ensures resiliency, flexibility, and efficient resource utilization despite hardware heterogeneity and inconsistent message workloads. Key solutions include lag-aware producers and consumers that adapt based on real-time data, mitigating over-provisioning issues and ensuring balanced workloads.
64
11
Article
swizec.com·2y
Why software only moves forward
Software systems, especially at scale, cannot afford rollbacks or cut-overs and must always move forward due to the permanent nature of data. Data, once saved, must be managed forever, requiring updates to be additive and systems to be distributed. Challenges arise as different parts of the system need to operate on shared definitions of business logic, leading to complexities during updates. Key strategies include making additive changes, being permissive about inputs, and managing updates to both databases and code to ensure systems remain in sync.
49
1
12
Article
Community Picks·2y
Bring Postgres relationships to light
Entity-relationship diagrams (ERDs) are invaluable for visualizing and managing complex Postgres databases. ERDs showcase entities, attributes, and relationships, making database structures more accessible, especially for non-technical team members. Tools like Outerbase simplify the creation and maintenance of ERDs by automatically generating and updating diagrams for Neon databases. This democratization of data aids developers in understanding, communicating, and scaling database schemas efficiently.
49
13
Article
The Knights of Unity·2y
Database System in Unity using Resources and ScriptableObjects – The Knights of Unity
Explore an efficient method to store and manage data in Unity using Resources and ScriptableObjects. This approach bridges the gap between developers and designers by allowing runtime data reading and easy data manipulation without additional plugins. It is particularly useful for RPG and multiplayer games, offering robust and simple data handling with dynamic loading from the Resources folder.
48
14
Article
PlanetScale·2y
Sharding strategies: directory-based, range-based, and hash-based
Discover the different types of sharding strategies—directory-based, range-based, and hash-based—along with their pros and cons. Understand how solutions like Vitess and PlanetScale are making sharding more approachable, even though it remains a complex task. Learn how to choose the right sharding strategy based on your database needs while considering the potential challenges like uneven data distribution and added query complexity.
37
15
Article
Cerbos·2y
How to address decentralized data management in microservices
Transitioning from monolithic to microservices architecture includes challenges and benefits in handling decentralized data management. The post discusses the advantages like scalability, flexibility, performance, and fault isolation, alongside challenges such as complex data integration, increased development complexity, latency issues, and security risks. It details patterns and techniques like eventual consistency, Saga pattern, event sourcing, domain-driven design (DDD), and command query responsibility segregation (CQRS) to mitigate these challenges. Uber's case study highlights practical implementation of these methods to maintain data integrity and ensure system reliability.
30
11
16
Article
Data Engineering·1y
Medallion Architecture Hype or Useful?
Medallion Architecture is a term coined by Databricks that aims to simplify data architecture for business and domain experts. However, it may be confusing for data professionals who are accustomed to classical data architecture models such as stage, cleansing, core, and mart, where marts are typically persisted in cubes for faster responses.
27
2
17
Article
Community Picks·1y
DELETEs are difficult
DELETE operations in databases, particularly PostgreSQL, can pose significant challenges and are often overlooked compared to SELECT and INSERT operations. DELETE commands involve several steps, including row identification, lock acquisition, trigger execution, marking rows for deletion, index updates, and more. This process may lead to bloat, necessitating autovacuum processes to reclaim space. Strategies such as batching DELETE operations, using partitioning, and managing autovacuum settings are essential for maintaining database performance and efficiency.
26
4
18
Article
asayer·2y
Processing CSV files with Papaparse
This post explores CSV file processing using the powerful JavaScript library PapaParse. It covers what CSV files are, how they are used, and delimiters. It provides examples of importing, parsing, and displaying CSV files, as well as generating and customizing CSV exports.
24
1
19
Article
Towards Data Science·2y
Scaling RAG from POC to Production
Retrieval Augmented Generation (RAG) is becoming a key architecture for large-scale applications of AI, balancing the capabilities of large language models with the accuracy of indexed data. Scaling from a proof of concept (POC) to production presents multiple challenges, including performance, data management, and risk mitigation. Addressing these challenges involves architectural components such as scalable vector databases, caching mechanisms, advanced search techniques, and a Responsible AI layer. Strategic planning and integration into existing workflows are crucial for successful scaling.
21
20
Article
TigerData (Creators of TimescaleDB)·2y
PostgreSQL vs MySQL: Which to Choose and When
Comparison of PostgreSQL and MySQL as relational databases. Explore their similarities, strengths, and weaknesses. Learn about their features and decide the right database solution based on your project requirements, scale, and data operations.
20
21
Article
C# Corner·2y
Soft Deletes with EF Core
Soft deletes mark records as inactive without physically removing them from the database, allowing for data recovery, auditability, and logical deletion. This guide explains how to implement soft deletes in an EF Core application by defining a soft delete flag, updating the DbContext to apply global filters, handling soft delete operations, restoring soft-deleted records, and including soft-deleted records in queries when necessary.
19
22
Video
ThePrimeTime·2y
I Will Dropkick You If You Use A Spreadsheet
Spreadsheets, while often convenient, can lead to serious technical debt and inefficiencies when used in automated processes. Although they empower non-technical staff and provide quick fixes, their use in larger, scalable systems is highly discouraged. Alternatives like SQLite or more robust databases are recommended for lasting solutions. The post reflects on corporate anecdotes where the misuse of spreadsheets led to chaos, emphasizing the importance of proper data management tools.
19
7
23
Article
KDnuggets·2y
3 Courses You Should Consider If You Want to Become a Data Analyst
This post discusses three different courses that individuals can consider taking if they want to become a data analyst. It highlights courses offered by DataCamp, Meta, and Google, providing information on the skills and knowledge that can be gained from each.
19
24
Article
strongdm·1y
How to Create a Database in PostgreSQL
Managing large datasets efficiently can be challenging. PostgreSQL is an advanced database system known for its reliability and performance. This guide covers installing PostgreSQL, connecting via psql or pgAdmin, creating databases, and managing permissions. Common issues such as permission errors and encoding conflicts are also addressed. Additionally, StrongDM enhances database security and management through centralized dashboards, audit logging, tool integration, and role-based access control.
18
1
25
Article
Crunchy Data·1y
Postgres Partitioning with a Default Partition
Effective partitioning in PostgreSQL can be crucial for maintaining a database with growing application data. Default partitions serve as a catch-all for data that doesn't fit existing partitions and help manage unexpected or incorrect data entries. It's essential not to leave data in default partitions and regularly monitor and move valid data to appropriate child partitions. Tools like pg_partman can assist in managing this process, automatically creating child partitions and providing functions to check and handle data in default partitions.
17

See all Data Management archives