Best of ByteByteGo — February 2026

1
Article
ByteByteGo·9w
How OpenAI Scaled to 800 Million Users With Postgres
OpenAI scaled PostgreSQL to handle millions of queries per second for 800 million ChatGPT users using a single-primary architecture with read replicas. Their approach focused on three pillars: minimizing primary database load through read offloading and write optimization, query and connection optimization using PgBouncer for connection pooling, and preventing cascading failures with cache locking and rate limiting. They addressed PostgreSQL's MVCC constraints by migrating write-heavy workloads to sharded systems and enforcing strict schema change rules. The system achieves five-nines availability with low double-digit millisecond p99 latency through systematic optimization rather than sharding.
330
2
2
Article
ByteByteGo·8w
How Uber Reinvented Access Control for Microservices
Uber built Charter, an attribute-based access control (ABAC) system to handle authorization across thousands of microservices at microsecond latency. Traditional role-based policies couldn't express complex conditions like region-matching or ownership relationships. Charter distributes policies to services, which evaluate them locally using an embedded authfx library. Conditions are written in Google's Common Expression Language (CEL) and evaluated against attributes fetched at runtime from typed attribute stores (actor, resource, action, environment). A real-world example shows how a single ABAC policy replaced thousands of individual Kafka topic policies by dynamically checking ownership data from Uber's uOwn service. Since adoption, 70 Uber services use attribute-based policies, gaining fine-grained, dynamic, and scalable authorization without code deployments.
80
2
3
Article
ByteByteGo·10w
How LinkedIn Built a Next-Gen Service Discovery for 1000s of Services
LinkedIn replaced its decade-old Zookeeper-based service discovery system with a next-generation architecture using Kafka for writes and gRPC/xDS for reads. The new system handles hundreds of thousands of service instances with 10x better median latency (P50 < 1s vs 10s) and 6x better P99 latency. Key improvements include horizontal scalability through Go-based Observer components, eventual consistency over strong consistency, multi-language support via xDS protocol, and cross-fabric capabilities. The migration used a dual-mode strategy where applications ran both systems simultaneously, with automated dependency analysis to safely transition thousands of services without downtime.
72
4
Article
ByteByteGo·11w
How Grab Built a Vision LLM to Scan Images
Grab built a custom 1B-parameter Vision LLM to extract information from Southeast Asian documents for eKYC verification. Starting with Qwen2-VL 2B, they progressed from LoRA fine-tuning to full parameter training, then built a lightweight model from scratch combining Qwen2-VL's vision encoder with Qwen2.5's compact language decoder. The four-stage training process included projector alignment, vision enhancement, language-specific visual training on synthetic OCR data, and task-specific fine-tuning. The final model achieved comparable accuracy to the 2B version while delivering 48-56% faster latency, addressing challenges with non-Latin scripts and diverse document formats across the region.
54

See all ByteByteGo archives