John Allspaw's LinkedIn comment prompts a discussion of Erik Hollnagel's Safety-II model and its implications for software reliability. Safety-II reframes reliability not as the absence of failures but as the result of active, everyday work that continuously prevents incidents. Rather than focusing solely on what went wrong during incidents (Safety-I), Safety-II asks organizations to study how normal work consistently goes right and to amplify those practices. The author argues this is a radical shift that cuts against industry intuitions, notes that almost no tech organizations currently practice it, and acknowledges that the resilience-in-software community is trying to push incident analysis in this direction — but has a long way to go.
Table of contents
Share this:Sort: