Operability on alexos.dev

Operability on alexos.dev https://alexos.dev/tags/operability/ Recent content in Operability on alexos.dev Hugo -- gohugo.io en-gb Fri, 24 Sep 2021 19:28:00 -0100 Developer-Friendly Runbooks: A Guide https://alexos.dev/2021/09/24/developer-friendly-runbooks-a-guide/ Fri, 24 Sep 2021 19:28:00 -0100 https://alexos.dev/2021/09/24/developer-friendly-runbooks-a-guide/ Photo by Cookie the Pom on Unsplash When things go wrong - and yes, they will go wrong - it’s extremely helpful to have easy access to a set of runbooks to guide the unfortunate engineer through the steps needed to mitigate the problem as swiftly as possible. In this post I’m going to describe the approach we use for this where I work, which we’ve found to work very well. Team Nimbus and the Agents of Chaos https://alexos.dev/2020/03/29/team-nimbus-and-the-agents-of-chaos/ Sun, 29 Mar 2020 18:00:00 -0100 https://alexos.dev/2020/03/29/team-nimbus-and-the-agents-of-chaos/ This blog post is about my team’s first ever Chaos Day - where we ran a series of experiments designed to test how our platform performed when we tried to disrupt it, or the workloads that run on it. This entry was originally posted on Medium under my employer’s publication. It was January 2020, and we had just gone through another Peak trading period - significant for a larger retailer.