Beyond process supervisors

When running processes on Unix systems, it is important that every process be waited-on by its parent process.

Processes (such as "daemons") should not be run in "the background", orphaned and reparented to init, such that nothing will be notified if they exit.

Instead, they should be run under some other program, perhaps a process supervisor, which will wait on them; much has been written about this before.

This is absolutely true and correct, and any system that is starting up processes in "the background" and just letting them leak is fundamentally flawed and must be fixed; and a dedicated process supervisor, running as a daemon, is a good way to approach fixing such problems.

But ultimately, the dedicated process supervisor daemon isn't necessary. Starting and waiting on child processes is something any Unix program can do. There's no need for a specialized process supervision daemon.

If I have a collection of processes which together provide some functionality or application, it's natural for me to start those processes as my own child processes, and then wait on them and implement my own failure-handling policies based on the domain-specific information that I have about their behavior.

With this approach, I start the child processes that are relevant to me and then keep running, waiting on those child processes and performing any maintenance tasks necessary (such as restarting failed processes).

This is a naturally recursive structure, where a given process might spawn more child processes to perform its task, each of which might in turn spawn still more child processes. This recursive structure ultimately bottoms out at the init process - systemd or something else.

With systemd, a weak form of this recursive structure is always the case: The "system instance" of systemd (which is the init process) starts and waits on a "user instance" of systemd for each interactive user; the user instance in turn runs and waits on various user processes, such as a desktop environment.

But systemd only allows for two levels of recursion: system, then user. There's no way to start a systemd instance that just manages a single user application, with a lifecycle tied to that single application.

The best way to achieve further such recursion is to simply write a regular Unix program which just starts its own child processes and waits on them. Indeed, this is already how most applications which run child processes work. They don't delegate the core Unix primitive of "fork and exec" to some system service; they simply perform it themselves.

Many designs are much easier when starting one's own child processes. If I want to connect two processes with a socketpair, the simple and obvious way is to create the socketpair, then start the two processes, so that they inherit the file descriptors. If I want to start some processes in the same set of namespaces, similarly the easiest way is to just enter those namespaces and then start the processes. And, of course, if I want to handle process failure with more intelligence than "restart it 5 times", it's easy to implement that logic in a general purpose programming language.

Delegating this basic functionality to a separate service like systemd provides little benefit, if one is careful to monitor one's own child processes.

With modern programming languages and libraries, it's easier than ever to write Unix programs which start and wait on collections of processes.

When building your next multi-process system, consider starting and monitoring processes directly, instead of using a process supervisor. It can make sophisticated designs much easier, and also combines well with richly typed designs.