Summary of the FlexSC paper

Table of Contents

This is a summary of the 2010 paper "FlexSC: Flexible System Call Scheduling with Exception-Less System Calls", which is one of my favorite papers, interspersed with some of my own commentary. (This is adapted from a presentation I gave at work.)

1. Problem

  • A kernel provides safe APIs to operations which require control of hardware.
  • You talk to the kernel with system calls.
  • System calls are function calls that "mode switch" on call/return, so the kernel runs with control of hardware, and you don't.

  • Mode switching is slow.
  • This is the traditional understanding of "why system calls are slow": the direct cost of making the call, compared to function calls.

  • This paper's observation: Mode switching is not the only cost!
  • The other cost is worse locality.
  • Locality: A program doing the same thing over and over. (e.g. accessing memory, executing the same code, taking the same branch…)
  • The existence of locality is why caches speed up execution!

  • System calls execute in the kernel, which pulls kernel-specific code and data into cache, which is slow.
  • When the system call returns, your program will pull its own code and data into cache, which is again slow.
  • Changing what you're doing reduces locality which makes things slow (because you have to update caches and other processor state)

  • The paper finds that after a system call, the instructions-per-second is reduced by up to 50%.
  • This isn't the cost of slow mode switching, it's the cost of bad locality!

  • Today's Spectre/Meltdown mitigations make this worse: part of the mitigation is to flush a bunch of caches when switching into the kernel!

  • This locality cost isn't specific to system calls.
  • Many function calls are into libraries that have lots of internal state (OpenOnload, for example)
  • Making any such function call will have these locality costs, and therefore slow you down.

2. Solution

  • The problem is bad locality.
  • The solution is to increase locality.
  • Don't switch your core between kernel and your program: dedicate a core to your program, another core to the kernel, and send system calls from one to the other!
  • Each core will then have much better locality.

  • The program core isn't executing kernel code, so there's no impact on its caches.
  • The kernel core isn't executing program code, so there's no impact on its caches.
  • Both execute faster!

  • They send system calls between cores using shared memory.
  • It's similar to shared-memory multi-threaded pipelined software.

  • They built a "green thread" / "N:1 threading" / "userspace threads" thread library on top of this.
  • When a thread makes a system call, the system call is sent to the kernel core, and other threads execute until the original thread's result comes back.
  • Their library is a drop-in replacement for standard Linux pthreads.

3. Result

  • Incredible speedups!

We show how FlexSC improves performance of Apache by up to 116%, MySQL by up to 40%, and BIND by up to 105% while requiring no modifications to the applications.

  • These are from the locality benefits!
  • Basically no cost!

4. Today's implementations and related work

Here are some interesting related works.

4.1. Promise pipelining

  • FlexSC is a generalization of "system call batching", executing multiple operations at once with a single actual system call
  • A further generalization of FlexSC is promise pipelining.
  • Promise pipelining lets you do multiple operations at once, with operations depending on the results of previous operations

4.2. Shared memory datastructures

4.3. rsyscall!

  • rsyscall programs run in a Python interpreter thread, and send system calls to dedicated syscall-running processes.
  • Just a happy coincidence; it's written for a completely different purpose.
  • Should be nice and high-performance…

4.4. io_uring

  • Roughly, io_uring is two things: FlexSC-style asynchronous system calls (IORING_SETUP_SQPOLL), and a different in-kernel implementation of filesystem IO.
  • They could be separated, perhaps…

Created: 2022-02-03 Thu 18:17

Validate