- You're really a Microsoft shill, aren't you?
- But Microsoft did pay you, right?
- You hate Linux, though?
- And you think we should all use Windows?
- I am surprised you didn't mention UNIX security. What do you think of capability-oriented systems?
- You criticize UNIX and say that Mach has some features it lacks, but isn't Mach a form of UNIX?
- Putting wildcard expansion in the shell isn't a bug, its a feature.
- You advocate message passing microkernels, but aren't they really expensive?
- With what do you propose we replace UNIX?
8. You advocate message passing microkernels, but aren’t they really expensive?
Scalable is the new fast. A few years ago, you just had to wait a little while and you could buy a CPU that ran twice as fast. Now if you wait, you will get one that’s about the same speed but does twice as many things at once.
The most recent developments have dramatically changed the costs associated with microkernels. Context switches are traditionally expensive. Anyone who grew up on x86 machines will be familiar with the idea that you have time to go and get a cup of coffee while a context switch takes place. And I’m not talking about instant coffee. For people who are more used to RISC platforms, a context switch has slightly more overhead than a function call.
It used to be that a monolithic kernel needed two context switches for a system call: one into the kernel and one back. A microkernel needed four: one into the kernel, one out of the kernel and into the server handling the call, then two going back the other way. QNX showed that this didn’t have to be the case. If your system calls were asynchronous, then you could batch a few of them up and perform several in a single set of context switches.
SMT and multiple cores change the rules slightly further. With SMT (or HyperThreading, as Intel calls it), the cost of switching between two processes on virtual processors on the same core became almost zero; you don’t need to save the state to memory; you just flip a switch and use the other set of registers for a bit. If the kernel lives in one context and user-space process in another, then switching between the two is very fast.
The second part of the puzzle is provided by multiple cores. If each server in a microkernel system is already running in its own core, then the cost of switching to it is dramatically reduced. Add in a kernel thread that scans every process’s address space for queued messages and moves them around asynchronously, and you can get away without any context switches.
This sounds expensive, because it involves a lot of polling. It can, however, be combined effectively with the power-saving features on modern chips. When a process realizes it is spending a lot of time polling, it can throttle the core it is running on to a lower clock speed.
The ideas outlined here are just one potential approach to making microkernels more efficient. For a taste of how scalable asynchronous programming can be, try playing with Erlang and get used to the feeling that you can make your code faster by throwing another few dozen cluster nodes at it. Remember too that a few dozen cores in a laptop will be common in a few years. If you think Erlang—a dynamic, asynchronous message passing language—is slow, then take a look at the benchmarks that show Yaws (an Erlang web server) handling 20 times as many connections as Apache. Then remember that Yaws scales better to more parallel hardware...