Best of DatabaseDecember 2025

  1. 1
    Article
    Avatar of datadogDatadog·21w

    How microservice architectures have shaped the usage of database technologies

    Microservices have transformed database usage from monolithic, single-database architectures to distributed systems where organizations run multiple database technologies simultaneously. Analysis of 2.5 million services shows over half of organizations now use both SQL and NoSQL databases side by side, with many adopting 3+ different database technologies. This shift enables teams to choose the right tool for each service but introduces new challenges: fragmented schemas require data integration layers like GraphQL, analytics demands OLAP systems like Snowflake, and service communication relies heavily on message queues like Kafka and RabbitMQ for asynchronous decoupling.

  2. 2
    Article
    Avatar of roadmaproadmap.sh·24w

    NEW ROADMAP: Elasticsearch

    A new learning roadmap for Elasticsearch has been released on roadmap.sh, providing a structured guide for developers working with search and database applications to learn and master Elasticsearch.

  3. 3
    Article
    Avatar of motherduckMotherDuck·21w

    Stop Paying the Complexity Tax

    Most organizations don't need massive distributed data systems. The industry has over-engineered solutions for edge cases, forcing everyone to pay a complexity tax for scale they'll never require. Modern single-machine databases can handle what previously required distributed systems, with machines now offering 192 cores and 1.5TB of memory. By separating storage (cheap, infinite object storage) from compute (ephemeral, cloneable instances), and designing for the common case of small data with occasional big compute needs, teams can achieve better performance with dramatically simpler architecture. DuckDB exemplifies this approach by focusing on the complete user experience, not just query performance, while MotherDuck extends it with cloud durability and per-user isolation through individual database instances that spin up in under 100ms.

  4. 4
    Article
    Avatar of hnHacker News·22w

    Avoid UUID Version 4 Primary Keys

    UUID Version 4 primary keys cause significant performance problems in PostgreSQL due to their random nature. Random values trigger excessive index page splits during inserts, create fragmented indexes with poor density (~79% vs ~98% for integers), and require accessing 31,000% more buffer pages for queries. The randomness prevents efficient B-Tree index operations and degrades cache hit ratios. Time-ordered alternatives like UUID Version 7 perform better by including timestamps in the first 48 bits. For most applications, integer or bigint primary keys backed by sequences remain the optimal choice, offering better performance, smaller storage footprint (4-8 bytes vs 16 bytes), and natural ordering. When obfuscation is needed, pseudo-random codes can be generated from integers using XOR operations and base62 encoding.

  5. 5
    Video
    Avatar of codingwithlewisCoding with Lewis·23w

    The Database Query That Cost $1,000,000

    Shopify nearly incurred $1 million in monthly BigQuery costs due to inefficient queries scanning 75 GB per request. By implementing database clustering to organize data by date, geography, and timestamp, they reduced query size to 508 MB, cutting costs to under $1,400 monthly. The case demonstrates how proper data warehouse optimization and partitioning strategies can prevent massive cloud infrastructure expenses.

  6. 6
    Article
    Avatar of lobstersLobsters·22w

    Go ahead, self-host Postgres

    Self-hosting Postgres is more practical than cloud providers suggest. The author shares two years of experience running self-hosted Postgres serving millions of daily queries with minimal operational overhead (30 minutes monthly). Managed services like AWS RDS run standard Postgres with operational tooling, but at significant markup. Self-hosting offers better performance tunability, lower costs (dedicated servers cost less than equivalent RDS instances), and comparable reliability. The article provides specific configuration guidance for memory, connections, storage, and WAL settings, plus realistic time estimates for maintenance tasks. Self-hosting makes sense for most teams between complete beginners and enterprise-scale operations requiring dedicated database engineers.

  7. 7
    Article
    Avatar of gitlabGitLab·24w

    Deploying the world's largest GitLab instance 12 times daily

    GitLab deploys code to GitLab.com up to 12 times daily using their own CI/CD platform, handling millions of developers without downtime. The deployment pipeline uses progressive rollouts through staging and production Canary environments (5% traffic), followed by full staging and production deployments. Key technical challenges include managing hybrid infrastructure (Helm charts for containers, Omnibus packages for Gitaly), handling database migrations with backward compatibility, and maintaining multi-version compatibility during deployments. The expand-migrate-contract pattern ensures safe schema changes, while post-deploy migrations run only after multiple successful deployments to minimize rollback risks. This approach validates GitLab's deployment features at massive scale before customers use them.

  8. 8
    Article
    Avatar of convexConvex·23w

    Why ctx.db is changing, and what you should do about it

    Convex 1.31.0 introduces a breaking API change where db.get, db.patch, db.replace, and db.delete now require the table name as the first argument. This change improves API consistency, enhances security by preventing cross-table ID vulnerabilities, and paves the way for custom document IDs. Existing code remains functional but should be migrated using the provided ESLint plugin or standalone codemod tool, which automatically infer table names from TypeScript types.

  9. 9
    Article
    Avatar of neontechNeon·20w

    Stop Mocking Auth (It’s Breaking Your Tests)

    Mocking authentication in tests creates false confidence by skipping critical failure points like password verification, database constraints, and session management. Real auth testing is traditionally difficult due to shared state and slow database provisioning. Database branching offers a solution by creating isolated, copy-on-write database instances with separate auth endpoints for each test run, enabling fast, isolated testing against real authentication flows without test collisions or production data pollution.

  10. 10
    Article
    Avatar of tzhsbevyajhcmr0fmoxfjAlexey Zerkalenkov·24w

    The Frontend Database API Gateway

    A frontend-first API gateway that enables developers to build applications without waiting for backend implementation. It allows rapid prototyping with runtime data and seamless integration with various backend services like Supabase, Firebase, GraphQL, and REST APIs without requiring frontend code rewrites.

  11. 11
    Article
    Avatar of cratedbCrateDB·22w

    Distributed Search Engines and Real Time Analytics at Scale

    Distributed search engines partition data across multiple nodes to handle massive datasets with low latency, but struggle with complex aggregations, analytical queries, and joins. Modern workloads increasingly require both search and real-time analytics capabilities in a single platform. The article explores how distributed search architectures work, their limitations, and the convergence toward unified analytics databases that treat search as one capability among many, rather than a standalone engine requiring separate infrastructure.

  12. 12
    Article
    Avatar of jetbrainsJetBrains·22w

    Query Consoles Are Coming Back

    JetBrains is reverting a controversial workflow change in DataGrip 2025.3 that replaced query consoles with query files. The redesign caused issues with global data sources and disrupted user workflows. Version 2025.3.1, releasing this week, will restore the original query console behavior. Users who created query files during the migration can delete them, convert them to consoles, or keep them for a planned improved workflow early next year. The team acknowledges failing their zero-regression standard and commits to more careful, flexible updates going forward.

  13. 13
    Article
    Avatar of crunchydataCrunchy Data·23w

    Postgres 18 New Default for Data Checksums and...

    Postgres 18 now enables data checksums by default during database initialization, providing automatic protection against silent data corruption. Data checksums work by calculating and storing a digital fingerprint for each 8KB data page, then verifying it on read to detect corruption. While this improves data integrity out-of-the-box, it creates a compatibility challenge for pg_upgrade users: both old and new clusters must have matching checksum settings. Existing databases without checksums can either use the new --no-data-checksums flag during upgrade initialization, or preferably enable checksums beforehand using the pg_checksums utility (though this requires downtime).

  14. 14
    Article
    Avatar of perfplanetcalWeb Performance Calendar·22w

    The Old Ways Are the Best: 100 Lighthouse, 0ms TBT, 32ms Queries

    A developer achieves exceptional performance metrics (100 Lighthouse score, 0ms Total Blocking Time, 32ms queries) by rejecting modern frameworks in favor of older techniques. The approach uses DATAOS (DOM As The Authority On State), treating the DOM itself as the state container instead of maintaining separate state objects, eliminating reconciliation overhead. On the backend, 1972-era bitmap indexing with RoaringBitmaps enables constant-time queries regardless of dataset size. The resulting application uses 32KB of vanilla JavaScript (15% of React's size) with a total payload under 100KB, demonstrating that native browser APIs and decades-old database techniques can outperform contemporary frameworks for most web applications.

  15. 15
    Article
    Avatar of cybertec_postgresqlCYBERTEC PostgreSQL·22w

    Comparing stats! PostgreSQL 18 against 17

    PostgreSQL 18 introduces several new statistics columns for performance monitoring. The pg_stat_all_tables table adds four time-tracking columns for operations. VACUUM/ANALYZE now reports WAL, CPU, and read statistics. The pg_stat_io table gains three new byte-level I/O columns (read_bytes, write_bytes, extend_bytes) while removing the generic op_bytes column. Additionally, pg_stat_statements now tracks parallel worker activity with two new columns for launched and planned parallel workers.

  16. 16
    Article
    Avatar of duckdbDuckDB·23w

    Announcing DuckDB 1.4.3 LTS

    DuckDB 1.4.3 LTS is now available with important bugfixes addressing correctness issues in HAVING clauses, JOIN operations, and indexed table updates. The release introduces beta support for Windows ARM64, including native extension distribution and Python wheels via PyPI. Benchmarks on TPC-H SF100 show 24% performance improvement for native ARM64 compared to emulated AMD64 on Snapdragon-based systems. Additional fixes include race condition crashes, memory management improvements during WAL replay, and various edge cases in Unicode handling and Parquet exports.

  17. 17
    Article
    Avatar of clickhouseClickHouse·23w

    Introducing pg_clickhouse: A Postgres extension for querying ClickHouse

    ClickHouse released pg_clickhouse v0.1.0, an Apache 2-licensed PostgreSQL extension that enables transparent execution of analytics queries on ClickHouse directly from PostgreSQL. Built on the foundation of clickhouse_fdw, the extension addresses the challenge of migrating analytical queries when moving workloads from PostgreSQL to ClickHouse. Key features include advanced query pushdown capabilities, support for ordered-set aggregates like percentile_cont(), SEMI JOIN pushdown, and transparent conversion of PostgreSQL aggregate FILTER expressions to ClickHouse combinators. Testing with TPC-H benchmarks shows 21 of 22 queries execute efficiently with 12 achieving full pushdown. The roadmap includes completing pushdown coverage for all analytic workloads, supporting all ClickHouse data types, and adding DML features.

  18. 18
    Article
    Avatar of arstechnicaArs Technica·24w

    In comedy of errors, men accused of wiping gov databases turned to an AI tool

    Two federal contractors were arrested for allegedly deleting 96 government databases and sensitive records minutes after being fired. The defendants, previously convicted of similar crimes in 2015, attempted to cover their tracks by using an AI chatbot to learn how to clear SQL server logs and Windows event logs. Despite their efforts to destroy evidence, including wiping their laptops three days later, prosecutors obtained sufficient records to charge them with conspiracy to destroy government databases.

  19. 19
    Article
    Avatar of itsfossIt's Foss·22w

    Watch Out Elasticsearch! Tiger Data's PostgreSQL BM25 Search Extension Goes Open Source

    Tiger Data has open-sourced pg_textsearch, a PostgreSQL extension that enables BM25 relevance-ranked keyword searches directly within PostgreSQL. Previously available only on Tiger Cloud, the extension is now released under The PostgreSQL License on GitHub. It supports 29+ languages, works with partitioned tables, and uses a memtable architecture for efficient indexing. The extension allows developers to run modern search capabilities without setting up external systems like Elasticsearch, and can be combined with pgvector for hybrid keyword and semantic search within a single database.

  20. 20
    Article
    Avatar of logrocketLogRocket·24w

    Tanstack DB 0.5 Query-Driven Sync: Loading data will never be the same

    TanStack DB 0.5 introduces Query-Driven Sync, a feature that eliminates API sprawl by transforming client-side queries into precise network requests. Instead of creating multiple backend endpoints, developers define queries directly in components, and TanStack DB automatically generates the appropriate API calls. The feature offers three sync modes: Eager (loads entire dataset upfront), On-demand (fetches only requested data using predicate mapping), and Progressive (loads initial batch immediately while syncing remaining data in background). Query-Driven Sync optimizes performance through request deduplication, delta fetching, and intelligent joins, making it particularly effective when paired with sync engines like Electric or PowerSync for real-time data synchronization.

  21. 21
    Article
    Avatar of muratbuffaloMetadata·20w

    Rethinking the Cost of Distributed Caches for Datacenter Services

    Distributed caching in datacenters provides 3-4x better cost efficiency primarily by reducing CPU usage rather than just improving latency. Application-level caches that store fully materialized objects deliver far better cost savings than storage-layer caches by eliminating query amplification and coordination overhead. The approach works best for rich-object workloads but struggles with strong consistency requirements, as freshness checks traverse most of the database stack and erase cost benefits. Cache placement matters more than cache size for cost optimization.

  22. 22
    Article
    Avatar of systemdesigncodexSystem Design Codex·22w

    Database Scaling and Performance Tips

    Database performance and scalability depend on factors like item size, dataset size, and throughput requirements. Seven key strategies can optimize databases: indexing speeds up queries by creating data shortcuts; materialized views pre-calculate complex query results; denormalization duplicates data to reduce joins; vertical scaling upgrades server hardware; caching stores frequently accessed data in fast storage; replication creates multiple data copies across servers for read distribution; and sharding partitions databases into smaller units for horizontal scaling. Each strategy offers specific benefits but comes with trade-offs like increased complexity, storage overhead, or potential data inconsistency.

  23. 23
    Article
    Avatar of ayendeAyende @ Rahien·22w

    RavenDB Kubernetes Operator

    RavenDB has released an official Kubernetes Operator that simplifies deploying and managing RavenDB clusters in Kubernetes environments. The Operator automates certificate management, handles safe rolling upgrades with health checks, provides flexible external access options for major cloud providers and ingress controllers, and offers declarative storage orchestration. It eliminates the manual complexity of configuring StatefulSets, Services, and TLS certificates by using a single RavenDBCluster custom resource. The Operator is available via Helm and supports EKS, AKS, Kind, Minikube, and Kubeadm clusters.

  24. 24
    Article
    Avatar of postgresPostgreSQL·23w

    pg_ai_query v0.1.0 — First stable release with multi-model AI for PostgreSQL

    pg_ai_query v0.1.0 is now stable, bringing AI-powered query development directly into PostgreSQL. The extension generates SQL from natural language, provides AI-interpreted EXPLAIN ANALYZE results, and offers automated index and rewrite recommendations. It supports multiple AI providers including OpenAI, Anthropic, Google Gemini, OpenAI-compatible APIs like OpenRouter, and local models through Ollama. The extension works with PostgreSQL 14+ on Linux and macOS, enabling developers to choose between cloud models or fully local inference while staying within the Postgres environment.

  25. 25
    Article
    Avatar of hnHacker News·23w

    stoolap/stoolap: A Modern Embedded SQL Database written in Rust

    Stoolap is an embedded SQL database written in Rust that supports both in-memory and persistent storage with ACID compliance. It features MVCC transactions with two isolation levels, time-travel queries for historical data access, multiple index types (B-tree, Hash, Bitmap), window functions, CTEs including recursive queries, and a cost-based query optimizer. The database includes 100+ built-in functions across string, math, date/time, JSON, aggregate, and window categories. It uses write-ahead logging with periodic snapshots for durability and can be used as a library or via command-line interface.