Marc Brooker, AWS Distinguished Engineer, shares insights from reading 3,000+ cloud system postmortems, covering what makes great postmortems, why on-call is a powerful learning tool, and how AWS's weekly COE review has been central to its success. He explains why caches can be dangerous in distributed systems due to metastable failures, and how Aurora DSQL was designed to avoid common relational database outage patterns using MVCC and optimistic locking. He also shares his perspective on how AI will reshape software engineering careers, advising junior engineers to focus on understanding customers and problems, and senior engineers to stay hands-on with modern agentic tools. He also advocates for writing as a tool for both scaling expertise and sharpening thinking.
1 Comment
Sort: