Best of Google Cloud PlatformSeptember 2024

  1. 1
    Article
    Avatar of detlifeData Engineer Things·2y

    I spent 5 hours learning how Google manages terabytes of metadata for BigQuery.

    Google BigQuery uses innovative techniques to manage massive amounts of metadata efficiently, treating it as crucial as the data itself. BigQuery's architecture includes Colossus for storage, Dremel for querying, and a dedicated shuffle service, all coordinated by Borg. Metadata is handled in a distributed manner using a unique columnar storage format called CMETA, improving efficiency and performance. Real-time data ensures physical query plans adapt dynamically for optimized results, while integrated metadata scans enhance query processing.

  2. 2
    Article
    Avatar of simplethreadSimple Thread·2y

    Migrating a small web application from SQL using DuckDB

    Greg Kontos reduced data storage costs by over 99% by migrating his hobby recipe tracking site from a GCP Cloud SQL instance to using DuckDB. Initially faced with a $68/month cost, with $67 of that for the SQL database, Greg explored various alternatives including serverless databases, dataframes, and local databases. He ultimately chose DuckDB for its SQL-like interface, simplicity, and low cost. Despite some minor issues, the transition was successful, lowering monthly costs to just $0.25 and maintaining functionality.

  3. 3
    Article
    Avatar of hnHacker News·2y

    It is hard to recommend Google Cloud

    The author shares their difficult experience with Google's service changes, specifically the shutdown of Google Domains and Google Container Registry. They had to migrate their domain and projects, encountering significant challenges with little benefit. Despite recognizing Google Cloud's superior engineering and user experience, the frequent need to adapt to changes has made it tough to recommend.

  4. 4
    Article
    Avatar of medium_jsMedium·2y

    Graph RAG into Production — Step-by-Step

    This guide explores how to productionize Graph RAG using a Google Cloud-native, fully serverless implementation. It introduces Graphrag-lite for deploying an end-to-end Graph RAG pipeline, covering steps from graph extraction and storage to community detection and query processing. The article also discusses optimizing throughput latency in LLM applications via parallelized and serverless architectures. Graph2nosql, a lightweight Python interface, is highlighted for managing knowledge graphs in NoSQL databases like Firestore.