Clang!
The part of LLVM that’s been getting a lot of attention recently is the C language family front end, known as "clang." Currently, for compiling C, C++, Objective-C, and a few other languages, LLVM uses code taken from GCC—parsing the code with GCC and then converting GIMPLE (the GCC intermediate representation) into LLVM IR, which is fed into the optimization stages. This strategy is not ideal, for two reasons:
- GIMPLE throws away some of the semantic information that could be useful to optimizers.
- The GCC front end is GPL’d (since GCC is GPL’d), and the rest of the LLVM code is under a BSD-style license.
Yet another problem: Apple seems to have a corporate allergy to GPLv3. and since GCC is now developed under this license, Apple is forced to maintain its fork completely independently of the main version. Even in version 2, the GPL presents other problems. Apple wants to integrate the compiler’s parser closely with its (proprietary) IDE, so that syntax highlighting is done by something that’s capable of understanding macros and has exactly the same behavior as the compiler. The idea is that warnings can be displayed without needing to go through the whole compile process. But the parser from GCC can’t be used for this without making the IDE GPL’d as well.
Last June, Apple began the clang project, a C-family (C, Objective-C and C++) front end for LLVM. Like the rest of LLVM, this is highly modular, allowing individual parts to be used in other projects easily. Somewhat unusually for Apple, clang is being developed out in the open, in a University of Illinois, Urbana-Champaign (UIUC) Subversion server, with public mailing lists for developers (also hosted by UIUC).
In many ways, the clang front end can be seen as a simple compiler in its own right. It takes C source code and compiles it into LLVM "machine code." Unlike most compilers, it performs no optimizations (LLVM does those for clang). This approach makes LLVM very interesting for developers who want to implement their own languages. Writing a compiler that targets LLVM is much easier than producing one that targets a real architecture. You don’t have to worry about register allocation at all, and you can produce very inefficient code that still will run fast.