Make infrastructure easier to rebuild than to repair.
The average age of a Netflix AWS instance is twenty-four days.
Interrupting technology workers is easy, because the consequences are invisible to almost everyone.
In complex systems, adding more inspection steps and approval processes actually increases the likelihood of future failures.
Over the following year, they eliminated testing as a separate phase of work, instead integrating it into everyone's daily work. They doubled the features being delivered per month and halved the number of defects.
Bureaucracies are incredibly resilient and are designed to survive adverse conditions - one can remove half the bureaucrats, and the process will still survive.
When we have a tightly coupled architecture, small changes can result in large scale failure.
Our deployment pipeline infrastructure becomes as foundational for our development processes as our version control infrastructure.
If we find that unit or acceptance tests are too difficult and expensive to write and maintain, it's likely that we have an architecture that is too tightly coupled.
Any successful product or organization will necessarily evolve over its life cycle... eBay and Google are each on their fifth entire rewrite of their architecture from top to bottom.
... which can lead to the unfortunate metric of mean time until declared innocent.
The principle of small batch sizes also applies to code reviews.
80% of MTTR (mean time to recovery) is spent trying to determine what changed.
High performing DevOps organizations will fail and make mistakes more often... If high performers are performing thirty times more frequently but with only half the change failure rate, they're obviously having more failures. [Roy Rapoport, Netflix]
Spiders repair rips and tears in the web as they occur, not waiting for the failures to accumulate. [Dr Steven Spear]