<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>SRE on alexos.dev</title>
    <link>https://alexos.dev/tags/sre/</link>
    <description>Recent content in SRE on alexos.dev</description>
    <generator>Hugo</generator>
    <language>en-gb</language>
    <lastBuildDate>Sun, 29 Mar 2020 18:00:00 -0100</lastBuildDate>
    <atom:link href="https://alexos.dev/tags/sre/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Team Nimbus and the Agents of Chaos</title>
      <link>https://alexos.dev/2020/03/29/team-nimbus-and-the-agents-of-chaos/</link>
      <pubDate>Sun, 29 Mar 2020 18:00:00 -0100</pubDate>
      <guid>https://alexos.dev/2020/03/29/team-nimbus-and-the-agents-of-chaos/</guid>
      <description>&lt;p&gt;This blog post is about my team&amp;rsquo;s first ever Chaos Day - where we ran a series of experiments designed to test how our platform performed when we tried to disrupt it, or the workloads that run on it.&lt;/p&gt;&#xA;&lt;blockquote&gt;&#xA;&lt;p&gt;This entry was originally &lt;a href=&#34;https://medium.com/john-lewis-software-engineering/team-nimbus-and-the-agents-of-chaos-ab257e41fe36&#34;&gt;posted on Medium&lt;/a&gt; under my employer&amp;rsquo;s publication.&lt;/p&gt;&#xA;&lt;/blockquote&gt;&#xA;&lt;p&gt;It was January 2020, and we had just gone through another Peak trading period - significant for a larger retailer. The Digital Platform had performed extremely well. There were no incidents, no last-minute panic scaling, and no fall-backs enabled — even though the number of services and overall complexity of the platform was significantly higher than this time last year. Not perhaps the backdrop to make a compelling case for running a series of complex operational test scenarios then? Well, we are not the sort to be resting on our laurels &amp;hellip;&lt;/p&gt;</description>
    </item>
  </channel>
</rss>
