- Obtaining the Kernel Source
- The Kernel Source Tree
- Building the Kernel
- A Beast of a Different Nature
- Conclusion
A Beast of a Different Nature
The Linux kernel has several unique attributes as compared to a normal user-space application. Although these differences do not necessarily make developing kernel code harder than developing user-space code, they certainly make doing so different.
These characteristics make the kernel a beast of a different nature. Some of the usual rules are bent; other rules are entirely new. Although some of the differences are obvious (we all know the kernel can do anything it wants), others are not so obvious. The most important of these differences are
- The kernel has access to neither the C library nor the standard C headers.
- The kernel is coded in GNU C.
- The kernel lacks the memory protection afforded to user-space.
- The kernel cannot easily execute floating-point operations.
- The kernel has a small per-process fixed-size stack.
- Because the kernel has asynchronous interrupts, is preemptive, and supports SMP, synchronization and concurrency are major concerns within the kernel.
- Portability is important.
Let's briefly look at each of these issues because all kernel developers must keep them in mind.
No libc or Standard Headers
Unlike a user-space application, the kernel is not linked against the standard C library—or any other library, for that matter. There are multiple reasons for this, including a chicken-and-the-egg situation, but the primary reason is speed and size. The full C library—or even a decent subset of it—is too large and too inefficient for the kernel.
Do not fret: Many of the usual libc functions are implemented inside the kernel. For example, the common string manipulation functions are in lib/string.c. Just include the header file <linux/string.h> and have at them.
Of the missing functions, the most familiar is printf(). The kernel does not have access to printf(), but it does provide printk(), which works pretty much the same as its more familiar cousin. The printk() function copies the formatted string into the kernel log buffer, which is normally read by the syslog program. Usage is similar to printf():
printk("Hello world! A string '%s' and an integer '%d'\n", str, i);
One notable difference between printf() and printk() is that printk() enables you to specify a priority flag. This flag is used by syslogd to decide where to display kernel messages. Here is an example of these priorities:
printk(KERN_ERR "this is an error!\n");
Note there is no comma between KERN_ERR and the printed message. This is intentional; the priority flag is a preprocessor-define representing a string literal, which is concatenated onto the printed message during compilation. We use printk() throughout this book.
GNU C
Like any self-respecting Unix kernel, the Linux kernel is programmed in C. Perhaps surprisingly, the kernel is not programmed in strict ANSI C. Instead, where applicable, the kernel developers make use of various language extensions available in gcc (the GNU Compiler Collection, which contains the C compiler used to compile the kernel and most everything else written in C on a Linux system).
The kernel developers use both ISO C991 and GNU C extensions to the C language. These changes wed the Linux kernel to gcc, although recently one other compiler, the Intel C compiler, has sufficiently supported enough gcc features that it, too, can compile the Linux kernel. The earliest supported gcc version is 3.2; gcc version 4.4 or later is recommended. The ISO C99 extensions that the kernel uses are nothing special and, because C99 is an official revision of the C language, are slowly cropping up in a lot of other code. The more unfamiliar deviations from standard ANSI C are those provided by GNU C. Let's look at some of the more interesting extensions that you will see in the kernel; these changes differentiate kernel code from other projects with which you might be familiar.
Inline Functions
Both C99 and GNU C support inline functions. An inline function is, as its name suggests, inserted inline into each function call site. This eliminates the overhead of function invocation and return (register saving and restore) and allows for potentially greater optimization as the compiler can optimize both the caller and the called function as one. As a downside (nothing in life is free), code size increases because the contents of the function are copied into all the callers, which increases memory consumption and instruction cache footprint. Kernel developers use inline functions for small time-critical functions. Making large functions inline, especially those used more than once or that are not exceedingly time critical, is frowned upon.
An inline function is declared when the keywords static and inline are used as part of the function definition. For example
static inline void wolf(unsigned long tail_size)
The function declaration must precede any usage, or else the compiler cannot make the function inline. Common practice is to place inline functions in header files. Because they are marked static, an exported function is not created. If an inline function is used by only one file, it can instead be placed toward the top of just that file.
In the kernel, using inline functions is preferred over complicated macros for reasons of type safety and readability.
Inline Assembly
The gcc C compiler enables the embedding of assembly instructions in otherwise normal C functions. This feature, of course, is used in only those parts of the kernel that are unique to a given system architecture.
The asm() compiler directive is used to inline assembly code. For example, this inline assembly directive executes the x86 processor's rdtsc instruction, which returns the value of the timestamp (tsc) register:
unsigned int low, high; asm volatile("rdtsc" : "=a" (low), "=d" (high)); /* low and high now contain the lower and upper 32-bits of the 64-bit tsc */
The Linux kernel is written in a mixture of C and assembly, with assembly relegated to low-level architecture and fast path code. The vast majority of kernel code is programmed in straight C.
Branch Annotation
The gcc C compiler has a built-in directive that optimizes conditional branches as either very likely taken or very unlikely taken. The compiler uses the directive to appropriately optimize the branch. The kernel wraps the directive in easy-to-use macros, likely() and unlikely().
For example, consider an if statement such as the following:
if (error) { /* ... */ }
To mark this branch as very unlikely taken (that is, likely not taken):
/* we predict 'error' is nearly always zero ... */ if (unlikely(error)) { /* ... */ }
Conversely, to mark a branch as very likely taken:
/* we predict 'success' is nearly always nonzero ... */ if (likely(success)) { /* ... */ }
You should only use these directives when the branch direction is overwhelmingly known a priori or when you want to optimize a specific case at the cost of the other case. This is an important point: These directives result in a performance boost when the branch is correctly marked, but a performance loss when the branch is mismarked. A common usage, as shown in these examples, for unlikely() and likely() is error conditions. As you might expect, unlikely() finds much more use in the kernel because if statements tend to indicate a special case.
No Memory Protection
When a user-space application attempts an illegal memory access, the kernel can trap the error, send the SIGSEGV signal, and kill the process. If the kernel attempts an illegal memory access, however, the results are less controlled. (After all, who is going to look after the kernel?) Memory violations in the kernel result in an oops, which is a major kernel error. It should go without saying that you must not illegally access memory, such as dereferencing a NULL pointer—but within the kernel, the stakes are much higher!
Additionally, kernel memory is not pageable. Therefore, every byte of memory you consume is one less byte of available physical memory. Keep that in mind the next time you need to add one more feature to the kernel!
No (Easy) Use of Floating Point
When a user-space process uses floating-point instructions, the kernel manages the transition from integer to floating point mode. What the kernel has to do when using floating-point instructions varies by architecture, but the kernel normally catches a trap and then initiates the transition from integer to floating point mode.
Unlike user-space, the kernel does not have the luxury of seamless support for floating point because it cannot easily trap itself. Using a floating point inside the kernel requires manually saving and restoring the floating point registers, among other possible chores. The short answer is: Don't do it! Except in the rare cases, no floating-point operations are in the kernel.
Small, Fixed-Size Stack
User-space can get away with statically allocating many variables on the stack, including huge structures and thousand-element arrays. This behavior is legal because user-space has a large stack that can dynamically grow. (Developers on older, less advanced operating systems—say, DOS—might recall a time when even user-space had a fixed-sized stack.)
The kernel stack is neither large nor dynamic; it is small and fixed in size. The exact size of the kernel's stack varies by architecture. On x86, the stack size is configurable at compile-time and can be either 4KB or 8KB. Historically, the kernel stack is two pages, which generally implies that it is 8KB on 32-bit architectures and 16KB on 64-bit architectures—this size is fixed and absolute. Each process receives its own stack.
The kernel stack is discussed in much greater detail in later chapters.
Synchronization and Concurrency
The kernel is susceptible to race conditions. Unlike a single-threaded user-space application, a number of properties of the kernel allow for concurrent access of shared resources and thus require synchronization to prevent races. Specifically
- Linux is a preemptive multitasking operating system. Processes are scheduled and rescheduled at the whim of the kernel's process scheduler. The kernel must synchronize between these tasks.
- Linux supports symmetrical multiprocessing (SMP). Therefore, without proper protection, kernel code executing simultaneously on two or more processors can concurrently access the same resource.
- Interrupts occur asynchronously with respect to the currently executing code. Therefore, without proper protection, an interrupt can occur in the midst of accessing a resource, and the interrupt handler can then access the same resource.
- The Linux kernel is preemptive. Therefore, without protection, kernel code can be preempted in favor of different code that then accesses the same resource.
Typical solutions to race conditions include spinlocks and semaphores. Later chapters provide a thorough discussion of synchronization and concurrency.
Importance of Portability
Although user-space applications do not have to aim for portability, Linux is a portable operating system and should remain one. This means that architecture-independent C code must correctly compile and run on a wide range of systems, and that architecture-dependent code must be properly segregated in system-specific directories in the kernel source tree.
A handful of rules—such as remain endian neutral, be 64-bit clean, do not assume the word or page size, and so on—go a long way. Portability is discussed in depth in a later chapter.