Competition Among Open Source Compilers
Since its creation, the C language has been tightly tied to UNIX. C was designed as a portable assembly language for reimplementing UNIX, to make it easier to port to different platforms.
In 1984, Richard Stallman began the GNU (GNU’s Not UNIX) Project to provide a clone of the UNIX operating system using entirely Free Software. Because a C compiler is a core component of any UNIX-like operating system, he wrote one: the GNU C Compiler (GCC).
Over the years, GCC was rewritten a few times, and support for various languages was added. When it began to support more languages than just C, the name was changed to the GNU Compiler Collection, keeping the same GCC abbreviation. As with other parts of the GNU Project, GCC uses the GNU General Public License, although it has a special exception for any parts of the compiler that are embedded in the compiled output.
The BSD Issue
Although the GNU Project has GCC, and most proprietary UNIX systems have their own compilers, the BSD projects typically have none. In the base system of any BSD system, GCC is the largest piece of GPL’d code. Over the years, it periodically looked as though TenDRA might replace it. TenDRA, a BSD-licensed C compiler originally started by the Defence Evaluation and Research Agency (the institution that used to be the UK’s equivalent of DARPA) focuses on correctness, and would be a good match for systems like OpenBSD, but progress has been slow.
Recently, another option appeared from an unexpected direction. Back in the 1970s, Stephen Johnson of Bell Labs wrote the Portable C Compiler (PCC). Unlike many earlier compilers, it had a clean separation between the parser and code-generation stages, allowing it to be ported to new architectures easily—a feature present in most newer compilers. This compiler was included with a lot of UNIX variants, including 4.3BSD-Reno.
PCC never underwent the same degree of growth as GCC, and remains a very small project. The source code is under 1 megabyte (compressed), and compiling it on a 1 GHz machine takes only a few seconds, whereas compiling GCC on the same machine takes most of an afternoon. PCC isn’t as strong in terms of optimization as GCC, but its small size makes it much easier to verify that the output is correct.
In September 2007, PCC was imported into the OpenBSD source tree, with the aim of evaluating it as a GCC replacement for future releases. A simple compiler is very attractive to the BSD communities—particularly one that’s portable. GCC has a habit of changing the interface to the back end, orphaning architectures. This behavior has resulted in some ports of NetBSD, for example, having to stick with old versions of GCC, since no one with the expertise to maintain the compiler port is willing to do so.
It’s difficult to overstate the relative complexity of GCC versus PCC. The codebase for GCC is almost 100 times the size of the PCC codebase. The OpenBSD team hopes that this difference in scale will make getting involved with PCC development a lot less daunting than getting involved with GCC.
PCC began life on the VAX, and didn’t support x86 until very recently. The port took one person less than a week, which makes supporting other architectures seem quite plausible; the total amount of x86-specific code comes to less than 4,000 lines.