A Summary of the Book “Infrastructure As Code” by Kief Morris
I love reading technical books, but I always find it hard to decide whether to read a book or not. And when I do read a technical book, it’s hard for me to retain a lot of what I’ve read. So I’m experimenting with writing summaries of technical books — as a way to help others decide if they want to read them, and as a way to force myself to understand/retain the concepts within.
My first book summary will be of Kief Morris’s Infrastructure As Code.
This is a great “definitive” book about the concept of Infrastructure As Code (IaC). IaC is a set of principles and patterns for managing infrastructure in the cloud. This book is focused more on general principles and ideas than tools, but it does have some illustrative examples.
The book is split into three sections, Foundations, Patterns, and Practices. I’ll spend more time summarizing Part I, and in particular, the first chapter, since that’s where the high-level principles are covered, to give you an overview, but if you find that interesting, you should really buy the book!
In Part I (Foundations), Morris focuses on the core principles behind IaC. IaC can be seen to have grown out of the DevOps movement, and the shift to the cloud and X-as-a-Service. Basically, the argument is that the barrier of provisioning new infrastructure has gone down so quickly, but the principles behind managing infrastructure haven’t kept up. IaC is an approach to solve these problems, by approaching infrastructure automation using practices from software development. In other words, infrastructure should be treated as if it were software and data — applying principles like version control, automated testing, and continuous integration and delivery.
The premise is that modern tooling can treat infrastructure as if it were software and data.
Chapter 1 starts by exploring some of the problems that can occur for organizations that don’t apply the proper principles. Some of these problems include things like Server Sprawl (large number of servers that aren’t well-maintained), Configuration Drift (unknown inconsistencies across servers due to manual fixes and tweaks), and Snowflake Servers (“special” servers that can’t easily be replicated). The implications are fragile infrastructure and automation fear, which feed each other in a cycle (fragile infrastructure leads to fear of automation, which just leads to more fragile infrastructure).
Additionally, Morris introduces the basic principles behind Infrastructure As Code. Firstly, systems should be easily reproducible. Second, systems should be disposable (“servers should be like cattle, not pets”). Next, they should be consistent (in particular, systems providing a similar service should be almost identical). Additionally, any action carried out on infrastructure should be easy, cheap, and repeatable in order to accommodate changing designs.
It should be possible to effortlessly and reliably rebuild any element of an infrastructure.
The final section of the chapter introduces high-level practices to achieve these principles. Firstly, IaC advocates the use of definition files to define infrastructure and configuration — this is where the name “Infrastructure As Code” comes from. Many of the other practices can only be enabled through this declarative use of such files.
The cornerstone practice of infrastructure as code is the use of definition files.
For instance, these files become self-documenting, relieving the burden of maintaining separate documentation. Also, those files can be checked into a version control system, which gives added benefits like traceability, reversibility/rollbacks, visibility, and actionability (in turn, actionability can allow things like automated testing and deployment). Finally, changes should be made in small increments rather than large batches, and services should be continuously available across changes.
The remaining four chapters of Part I try to break down the landscape of infrastructure tools into several areas (though there is often overlap and some tools can provide utility at more than one area). Chapter 2 discussed Dynamic Infrastructure Platforms, or services that provide compute, storage, and/or networking. Dynamic infrastructure must be programmable (vs. only providing a UI), and provisioned in an on-demand/self-service method (vs. requiring tickets and hence time to fulfill). The chapter also discusses different types of dynamic infrastructure platforms (public, private, community, and bare-metal clouds), and the trade-offs for choosing between them (things like security and legality, variable capacity, cost, and portability).
Chapter 3 walks through infrastructure definition tools for managing dynamic infrastructure in a way that’s consistent with the principles described in chapter 1. For example, such tools should be able to run without interactive prompting during execution, and should be idempotent. One key takeaway here is that there are tools that are designed with IaC principles in mind (e.g. Terraform, which uses declarative configuration files). Other tools are by default more procedural, but could be used in an IaC way — it just requires more discipline.
Chapter 4 discussed server configuration tools. So what’s the difference between defining infrastructure and configuring it? To quote the book: “An infrastructure definition tool will create servers but isn’t responsible for what’s on the server itself. So the definition tool declares that there are two web servers, but does not install the web server software or configuration files onto them.” The latter is the job of configuration tools, like Ansible, Chef, or Puppet. The chapter discusses the trade-offs of two approaches these tools can take: pull (agent-model) vs. push (SSH-model). There’s also a discussion of the container (Docker) approach to setting up run-times.
Finally, Chapter 5 includes a great discussion of tools that are useful in a dynamic infrastructure environment, and ways to choose the right tool. This includes monitoring (alerting, metrics, logging), service discovery tools, distributed process management (also known as orchestration), and deployment.
Part II (Patterns) discusses best practices and anti-patterns for infrastructure in much more depth. Chapter 6 discusses patterns for provisioning of servers, and additionally has some interesting background sections on the “anatomy” and “lifecycle” of a typical server. Chapter 7 discusses server templates and images, and has an interesting discussion on the trade-offs between “baking” elements into a template vs. provisioning them at server creation time. Chapter 8 discusses ways of keeping servers up-to-date and managing them to avoid inconsistencies, with some interesting patterns like continuous synchronization, immutable servers, and phoenix servers. Finally, Chapter 9 covers more advanced patterns to reduce complexity when managing larger configurations with various types of infrastructure.
The final section, Part III (Practices), applies the principles and patterns via more hands-on techniques. For instance, Chapter 10 discusses some software engineering practices that can be applied to infrastructure, like using a version control system and implementing continuous integration and delivery.
Defining a system as infrastructure as code and building it with tools doesn’t make the quality any better… What infrastructure as code does is shift the focus of quality to the definitions and tooling systems. It is essential to structure and manage automation so that it has the virtues of quality code: easy to understand, simple to change, with fast feedback from problems. If the quality definitions and tools used to build and change infrastructure is high, then the quality, reliability, and stability of the infrastructure itself should be high.
Chapter 11 covers testing strategies, and Chapter 12 integrates both those learnings into a “Change Management Pipeline” or “Continuous Provisioning”. Chapter 13 discusses “workflows”, which essentially means trying to automate everything and avoiding manual or ad-hoc changes. It also discusses tips on organizing code-bases. Chapter 14 covers achieving “continuity”, which is basically a catch-all term the book uses for reliability (e.g. zero-downtime changes), data availability/consistency, recovery, and security. Finally, Chapter 15 discusses organizational techniques to implement the principles in the book, such as how to move from an older architecture, how to measure effectiveness, and how to best structure teams and responsibilities.
It’s worth noting that Part III (apart from chapters 14 and 15) felt a little too “idealogical” to me. Discussions throughout the rest of the book presented principles or patterns with trade-offs, but which of those work as practices for any particular team may vary. And while Morris had intentionally attempted to be tool-agnostic, it also means the chapters don’t provide as much practical value, either. They’re still very informative, I just feel like if I was applying the rest of the book, I might take the practices in the last section with a grain of salt.
Otherwise, this was a pretty awesome book, whether you’re building, operating, or using infrastructure in the cloud. Since it comes out of ThoughtWorks as well, a lot of the ideas in here are consistent with other ThoughtWorks ideas and books around architecture, like Evolutionary Architecture and Building Microservices.