Managing Dependencies

Table of Contents

What solutions are there for deploying software, when that software has dependencies on other software?

We'll restrict our attentions to software deployment on top of the Linux kernel, because that's the most common case, and the only case that's interesting to me.

1 Traditional binary package managers

apt, yum, dnf, pacman, OSTree, etc.

Express your dependencies as dependencies on packages.

These install packages system-wide, in the root directory, so you can only have one setup on the system at a time. As a result, you need root privileges, because installing new or different software would interfere with others.

OSTree is an example of a more advanced instance of these package managers; it is based on RPM and allows versioning and rollback of the entire system.

1.1 Usage

These are generally used by groups that don't write their own software (beyond configuration scripts), but rather just administrate and deploy systems.

2 Source-based package managers

buildroot, portage, brew, etc.

These kind of package managers allow building software from scratch at an arbitrary prefix. This is nice because it allows multiple things to exist on the same system; that is, you can build arbitrary environments with different things installed into them. So you don't need root.

2.1 Usage

Groups that want to modify low-level things, such as the C library, use source-based package managers. This is generally groups building operating systems (CoreOS and Google use portage) or groups doing embedded development (which use buildroot).

3 Relocatable package managers

Conda, etc., as well as an internal tool at my job.

These let you use pre-existing binaries at an arbitrary prefix, which allows you to build arbitrary environments like with source-based package managers, but with the very important efficiency gain of using pre-compiled binaries. But this is not very well supported by Unix as a whole. So it sadly doesn't really work.

3.1 Usage

If it worked, these package managers would be usable by absolutely everyone. But it doesn't work.

Conda, of course, is used by the Python community, and it works decently for C and Python, so it is used by groups that just write in C and Python.

4 Language-based package managers

pip, maven, cargo, cabal, etc.

These (generally) allow you to build arbitrary environments containing arbitrary versions. They do this by taking advantage of the feature of their specific language, to ensure that library dependencies are looked up based on the environment.

The problem is that none of these tools have support for cross-language dependencies. So if your Python program depends on a Java program, or your Haskell program depends on a Rust program, or if (far more likely) all your stuff depends on various C libraries, you are basically screwed. (And this makes sense - they're all using language-specific hacks to get arbitrary environments)

The general solution is to rely on the Linux distro package manager to provide those C libraries, but there's no way to express that dependency - it's completely ad-hoc.

4.1 Usage

Groups that do everything in one language will traditionally use a language-based package manager, on top of a Linux distribution such as Debian (which is managed by apt).

5 Functional package managers

Nix, Guix

You can't install packages at arbitrary prefixes, like with relocatable and source-based package managers. But you can build arbitrary environments containing arbitrary versions of packages, without requiring privileges, which is what you ultimately want to do anyway.

So this is like a language-based package manager, except it is language-independent; you can use it for every language.

5.1 Usage

Any group can use Nix, no matter what language they use, or to what depth they modify the system. This is just a consequence of underlying good design.

Today, it's mostly Haskell people, though.

6 Containers

Docker, Flatpak, Snappy, etc.

Typical usage of these tools goes like this: You use a traditional Linux distro package manager such as apt to install various core operating system packages (such as the C library) into the root directory. Then you use one or more language-based package managers to install some language-specific software and libraries into a global language-specific environment.

The dependencies here are all ad-hoc, because there's no way to express dependencies between apt and pip and maven.

Then Docker (like the other tools) uses process isolation so you can isolate this "root directory" from other "root directories" on the system. You call this root directory a "container image". Note that this is the same as just having one virtual machine per application, just lower-overhead. For more details, see my post on Docker here.

Like one virtual machine per application, this conflate two things: "package mangement" and "process isolation". It uses process isolation as a hack to get more granular package management, but a tool like Nix can give you that granularity directly, without any isolation overhead or complexity.

Process isolation and package management can be handled independently. There are many flexible tools available to perform various kinds of process isolation; your package management system shouldn't depend on a certain kind of process isolation.

Also note that container-based isolation is (today, on conventional Linux systems) a privileged operation! You need root to do it! And that's good, because tricks with containers and arbitrary images can allow you to escalate back to root if you don't already have it. So a container-based solution regresses back to the Linux distro package manager scenario, where you need root to install new packages/set up new application environments.

6.1 Usage

People are excited about containers (or more specifically, Kubernetes) as a means of scheduling applications on a collection of hosts. That is indeed pretty exciting.

As a requirement for dynamically scheduling applications across hosts, you need a way to deploy those applications to hosts. The Docker approach is as I have already described. But you could just as well use Nix to deploy those applications, while still using the exciting cluster-scheduling and process isolation parts.

So, a lot of people are excitedly using the cluster-scheduling and process isolation aspects of Docker. And, because they're forced to, a lot of people are also using the ad-hoc containerization hack for package management. This is unfortunate, but it shouldn't be taken as a signal that container-based package management is actually a good idea.

Note that you can easily build container images with only the use of Nix. Then your package management will work equally well both inside and outside of a container.

7 Conclusion

As far as I can tell, no-one other than the Nix and Guix teams are trying to solve the fundamental problems of dependency management and software deployment.

Large groups have their own bespoke systems and can vendor absolutely everything, small groups just use apt or yum plus virtual machines or containers. Other specialist concerns use the other things I mentioned.

I would really like to hear about other alternative solutions that try to do better than Docker and the systems like it. But at the moment, it seems like Nix (or Guix) is the only option.

I recommend reading the first chapter of the PhD thesis of Eelco Dolstra (the creator of Nix). It has a good analysis of some of the issues of software deployment.

Created: 2017-06-27 Tue 10:47

Emacs 25.1.1 (Org mode 8.2.10)

Validate