Netflix and their “Chaos Monkey”

Netflix has been consistently leading the pack when it comes to running a massive application on the cloud. Being out in front, 
they’ve run into issues people hadn’t really considered yet. One 
thing they got a lot of attention for centered on their efforts to 
get every developer thinking about the things that change when you
 start building something big on a cloud platform. In order to keep
 their engineers on their toes, they created (and blogged about) the
 “Chaos Monkey”. “It was at its heart a script with admin privileges 
that would intentionally and randomly break their system (by randomly killing off members of auto-scaling groups)”. With the
 knowledge that the chaos monkey might kill off a key component of your
 application, their architects knew they had to build for it by adding 
smart redundancies. It paid off through numerous significant AWS
 outages that left other public cloud users counting their lost dollars
 while manically refreshing the AWS status page.

One of the greatest things about moving your application into the
 cloud is how much friction you remove by giving developers direct 
access to the resources. Unfortunately you can end up with a whole 
lot of those resources spoken for but no longer in active use. 
Netflix came up with a great solution the “Janitor Monkey“. But the best part about it is that they’re blogging about it, and sharing the

It’s nice to see these engineers coming up with really smart ways to
 keep a massive cloud deployment efficient by cleaning up the detritus 
that can build up over time. It’s especially great to see how much 
they’re giving back to the community at large – thanks Netflix!

mCloud Helix: Private Cloud for Enterprise

About the Author


Leave a Comment

Current month ye@r day *