Best of Google Cloud Platform — September 2024

1
Article
Data Engineer Things·2y
I spent 5 hours learning how Google manages terabytes of metadata for BigQuery.
Google BigQuery uses innovative techniques to manage massive amounts of metadata efficiently, treating it as crucial as the data itself. BigQuery's architecture includes Colossus for storage, Dremel for querying, and a dedicated shuffle service, all coordinated by Borg. Metadata is handled in a distributed manner using a unique columnar storage format called CMETA, improving efficiency and performance. Real-time data ensures physical query plans adapt dynamically for optimized results, while integrated metadata scans enhance query processing.
51
2
Article
Simple Thread·2y
Migrating a small web application from SQL using DuckDB
Greg Kontos reduced data storage costs by over 99% by migrating his hobby recipe tracking site from a GCP Cloud SQL instance to using DuckDB. Initially faced with a $68/month cost, with $67 of that for the SQL database, Greg explored various alternatives including serverless databases, dataframes, and local databases. He ultimately chose DuckDB for its SQL-like interface, simplicity, and low cost. Despite some minor issues, the transition was successful, lowering monthly costs to just $0.25 and maintaining functionality.
20
3
Article
Hacker News·2y
It is hard to recommend Google Cloud
The author shares their difficult experience with Google's service changes, specifically the shutdown of Google Domains and Google Container Registry. They had to migrate their domain and projects, encountering significant challenges with little benefit. Despite recognizing Google Cloud's superior engineering and user experience, the frequent need to adapt to changes has made it tough to recommend.
19
4
Article
Medium·2y
Graph RAG into Production — Step-by-Step
This guide explores how to productionize Graph RAG using a Google Cloud-native, fully serverless implementation. It introduces Graphrag-lite for deploying an end-to-end Graph RAG pipeline, covering steps from graph extraction and storage to community detection and query processing. The article also discusses optimizing throughput latency in LLM applications via parallelized and serverless architectures. Graph2nosql, a lightweight Python interface, is highlighted for managing knowledge graphs in NoSQL databases like Firestore.
13

See all Google Cloud Platform archives