Linux needs an explicitly defined userspace API
Linux guarantees the stability of its userspace API,
but the API itself is only informally described, primarily with English prose.
We should write an explicit, authoritative definition of the Linux userspace API.
As background, note that the Linux userspace API (which consists of system calls, device files, various protocols, etc)
is distinct from the C functions implemented by a libc, such as
read(2).
Under the hood, in a conventional libc like glibc, read(2) calls the Linux system call sys_read,
manually passing arguments in registers in an architecture-specific way according to the specific details of sys_read.
This is at best documented in manpages, and more usually is defined only by the implementation.
Anyone else who wants to work with a sys_read syscall, in any way, needs to duplicate all those details.
If we created an explicit definition of the Linux userspace API,
there would be numerous benefits:
- Debugging tools which need to understand the format of syscalls and their arguments in great detail,
such as strace,
are currently primarily hand-written with great duplication of effort.
Even a basic description of syscalls would allow much of this code to be generated instead.
- It often takes a long time for newly-added syscalls to be usable in userspace.
With an explicit definition of the Linux userspace API,
it would be easy to automatically generate functions for new syscalls,
which could be deployed quickly either as part of libc or in a separate syscall library.
- Implementers of new languages currently almost always make syscalls by going through libc.
Supporting interoperability with C in this way is a major burden,
and the resulting interfaces are typically highly unidiomatic for the new language.
With an explicit definition of the Linux API,
it would be much easier for new languages to make syscalls directly
(rather than through libc)
by automatically generating syscall functions which are idiomatic to the new language;
for example, functions which preserve memory-safety and type-safety in Rust.
- Reimplementers of the Linux API, such as Linuxulator, WSL1, and gVisor,
would be able to generate stubs for the interfaces they need to implement automatically,
reducing duplicated code and making them conform better to the Linux API.
- Changes to Linux behavior that require a change in the API definition would deserve greater scrutiny by maintainers,
since such a change might break userspace.
This certainly could never catch all possible API breaks,
but it would be one more way to prevent regressions.
- Any other tool which needs to understand the Linux API would benefit,
such as more esoteric projects to
batch syscalls, intercept and rewrite syscalls,
forward syscalls to remote hosts,
or any other syscall manipulations.
A basic API definition would simply describe valid combinations and formats of syscall arguments in a machine-parseable form,
something which is currently primarily described in English-language manpages.
More detailed API definitions could describe behaviors
such as the requirement that a syscall be passed an open file descriptor rather than an arbitrary integer,
which would require a description of how different syscalls affect the file descriptor table.
To write this definition, a new Linux-specific format for the definition might need to be created.
At a minimum, it will need to be able to describe
bit-level data formats, complex pointer-based datastructures, tagged unions, "overloaded" syscalls such as ioctl,
and architecture-specific divergences.
Most existing formats and languages for describing interfaces like this
unfortunately lack these capabilities.
Ultimately, this definition must be upstream,
which means that it needs to be maintainable by existing Linux developers.
One way to achieve that might be to integrate it into the C code,
possibly in combination with existing Linux macros such as SYSCALL_DEFINE.
The API description can then be automatically extracted from the C code into a more-easily-reusable format,
which can be used as input for other tools.
One step in this direction is
Documentation/ABI,
which specifies the stability guarantees for different userspace APIs in a semi-formal way.
But it doesn't specify the actual content of those APIs,
and it doesn't cover individual syscalls at all.
I would love to collaborate with anyone interested in creating an explicit, authoritative definition of the Linux API,
so
contact me
if you're interested.
See also my
LKML post.