Since the release of Docker in 2013 and Kubernetes in 2015, containers have become the defacto standard for developing, deploying and hosting applications. Here at AWeber, we work with containers every day, but do we understand how they work? Before containers, we used virtual machines (VMs). So it’s easy to think of containers as lightweight VMs. However, this is a misconception. In this post, we are going to dig into what containers really are.
Containers are not VMs. In the simplest of terms, VMs are hardware virtualization, and containers are operating system virtualization. Hypervisors create virtual hardware. Container engines create virtual operating systems. Virtual hardware still requires an OS to be installed. Removing this layer allows containers to be simpler, lighter weight, and more portable. So how do container engines create a virtual OS?
In Linux, everything is a file. Everything from I/O devices and network adapters to binaries and libraries exist as files inside the root filesystem. Since the early days of Linux there has been a feature called chroot. Chroot is an operation that effectively changes the root directory for a process, that process and all child processes are locked in a chroot jail obstructing view of the parent filesystem. Long before the concept of containers developers and administrators used chroot to isolate portions of the filesystems to certain processes, but there were limitations and hacks emerged to gain wider system access. In 2002, the Linux kernel released a new concept called Namespaces which allowed for partitioning aspects of the kernel. The first Namespace to be released was the Mount (mnt) Namespace which expanded upon the concepts of chroot by allowing for the creation of a virtual list of mount points which were exposed only a given a process group. restricting access to the widers system mount list. This allowed for a set a processes to receive an isolated set of mount points, but now we needed a better method for groups and restricting a set of processes.
A key feature of containers is that all processes that run inside a container are isolated from the processes of other containers on the same host system. So naturally, when the the Linux kernel released a new Namespace feature for partitioning kernel resources by process, the evolution of containers took a big step forward. The Process ID (pid) Namespace offered a means of grouping a set of processes and restricting them from the other processes on the system. By assigning each container a unique namespace, the processes of one container are inaccessible from the processes of another container. When a container starts, the host system assigns the container a namespace and a PID for the containers root process. All the processes of that container run inside this namespace as a child of this PID. Inside the container, however, that same root process has a PID of 1 and all processes inside a container are child processes of it. Since this process is the root PID inside the container, it is unable to see any parent processes that exist on the host system.
Ok great, we have a way to create a group of isolated processes and virtual mount points for these process. We are well on our way to creating a virtual operating system, but files and processes need access to hardware resources. How do we control the access to resources for this new namespace of processes and files.
If a single container could consume all the resources of a host system, containers would not be living up to their name; containers must be contained. Luckily the Linux kernel is back to help us out with another feature called Control Groups or cgroups for short. Using cgroups we can segregate a group of processes (think namespace) and control their access to system resources like CPU, RAM, Network, and Disk I/O. Cgroups can also prioritize and track access to resources. Leveraging cgroups, a container engine can provide fine-grained control over the amount of resources each container can consume.
Combining cgroups with the pid and mnt namespaces we can now piece together a crude platform for a virtual operating system with isolated files, process, and resource access. We just need something that will assemble all these components together and get them integrated like a well-oiled machine.
Container engines like Docker, LXC, and RKT take the tools we have discussed here and combine them with things like bridged networking and version controlled filesystem tarballs. They package it all with APIs and command line tools to offer the fully capable container engines we use every day. We can now see how under the lid of these engines is an array of kernel features and systems tools neatly fit together in a tightly packed container; which, with the lid closed, seamlessly offers virtual operating systems that feel and operate like lightweight virtual machines.