Chaos engineering is the discipline of experimenting on a distributed system, in order to build confidence in the system’s capability to withstand turbulent conditions in production.

keep_calm_chaos_reign

Following things should be kept in mind while designing a Chaos Experiment:

1. Pick a Hypothesis

This step involves the selection of hypothesis which is required to be tested.

For eg:

2. Identify the metrics to monitor for the experiment

This steps discusses about the metrics which will enable you to evaluate the outcome of the experiment.

For eg:

3. Notify the involved Business Units

This is an important step which discusses about notifying the Service Business Unit so that all the teams around that service are aware of following:

4. Run the experiment

This step involves to run the chaos experiment and observe the metrics.

If you’re running the experiment in the production, ability to abort/stop the experiment could help in preventing unnecessary harm if experiment doesn’t execute as per the plan.

5. Analyze the results

In this step, you gather the metrics to answer the following question:

6. Automate the process

Once you’ve confidence in manually running your chaos experiments, automating the same with scripts and workflow engine can help you run the experiments regularly and automatically.

Famous Chaos Engineering Tools

References