Groww built an internal chaos engineering platform to validate system resilience before high-traffic events like IPO listings. The platform uses a dedicated load-test Kubernetes cluster isolated from production, a traffic replayer that mirrors real production traffic, Argo Workflows as the orchestration control plane, and Chaos
Table of contents
IntroductionGoals: What we wanted to achieveWhy We Chose Chaos MeshGet Groww Engineering Team ’s stories in your inboxThe Architecture1. Multi-Cluster Setup2. Traffic-Replayer: The Foundation of Realism3. Argo Workflows: The Control Plane4. Chaos Mesh: The Fault Injection Plane5. Observability with OllyValidation: IPO ReadinessFuture RoadmapConclusion: Reliability as a CultureSort: