This post builds on our earlier work modernising Slack’s Chef infrastructure. Instead of a disruptive migration to Policyfiles, we focused on practical improvements to our existing EC2 and Chef frameworks - delivering safer, more reliable deploys with minimal change for our service owners.

The Slack Blog serves as a resource for teams and developers looking to make the most out of Slack, a popular collaboration platform. The blog covers a wide range of topics related to Slack usage, including tips and tricks for effective communication, productivity hacks, and best practices for managing teams and projects. Developers can learn how to leverage Slack's API and integrations to automate workflows, build custom bots, and enhance team collaboration. With insights from industry experts and real-world use cases, the Slack Blog provides resources to help developers unlock the full potential of Slack for their projects and teams.

Slack engineering

Slack improved their Chef infrastructure safety by splitting a single production environment into six isolated buckets (prod-1 through prod-6) mapped to availability zones, implementing a release train model with staggered rollouts. They built Chef Summoner, a service that triggers Chef runs based on S3 signals rather than fixed cron schedules, reducing blast radius during deployments. The approach avoided disruptive migration to Policyfiles while achieving safer deployments. Changes now take longer to propagate but provide time to catch issues before full rollout. A fallback cron job ensures Chef runs every 12 hours even if Summoner fails, maintaining compliance.

Advancing Our Chef Infrastructure: Safety Without Disruption