Wise's engineering organization runs 1000+ microservices across 850+ autonomous squads, held together by a deliberate internal platform strategy. Key elements include a versioned microservice chassis framework for Java services, a custom Next.js abstraction (CRAB) for frontend, and Gradle plugins that standardize CI/CD across 700+ repositories. Deployments shifted from a simple in-house tool to Spinnaker with automated canary analysis — routing 5% of traffic, monitoring technical and business metrics for 30 minutes, and auto-rolling back on anomalies, which prevented hundreds of incidents in 2024. Infrastructure runs on a rebuilt Kubernetes platform (CRP) using RKE2, Rancher, Helm, and ArgoCD, scaling from 6 to 20+ clusters. The data stack combines Kafka, Snowflake, Apache Iceberg, Trino, and Amazon RDS/Atlas, with ML workloads on SageMaker and Ray Serve. Observability is unified on the Grafana LGTM stack (Loki, Grafana, Tempo, Mimir) ingesting ~6 million metric samples per second. The overarching pattern is treating internal infrastructure as a product for the engineering organization itself.
Table of contents
AI inference: 24,240 TPS vs 1,863 TPS H100 (Sponsored)Standardizing the Starting PointShipping Code SafelyConnecting to Payment RailsData, ML, and AIUnified ObservabilityConclusionSort: