Lessons learned from running Kubernetes in production for 8 years include the complexity of Kubernetes, the importance of managing Kubernetes certificates, keeping Kubernetes and Helm up to date, maintaining centralized Helm charts, disaster recovery planning, backing up secrets, considering vendor-agnostic vs "going all in" approaches, and optimizing node types and cost with reserved instances. Observability through monitoring, alerting, and logging is crucial, and security measures like access control and container vulnerability scanning are necessary. The company experienced two major cluster crashes due to certificate expirations. Migrating from self-managed on AWS to managed on Azure (AKS) improved ease of use, integrated Azure services, and reduced costs. Overall, Kubernetes has been a game-changer for the company, providing scalability, cost optimization, improved developer experiences, and faster time-to-market for new products and services.
Table of contents
Lessons From Our 8 Years Of Kubernetes In Production — Two Major Cluster Crashes, Ditching Self-Managed, Cutting Cluster Costs, Tooling, And MoreEarly Decision8 Years In ProductionMigrating From Self-Managed On AWS To Managed On Azure (AKS)Cluster Crash #1Cluster Crash #2LearningsObservabilitySecurityOur Setup Over The YearsGame ChangerFinal WordsSort: