- What Is LLVM?
- The Intermediate Representation
- Optimizers Everywhere
- Clang!
- The Code and the People
Optimizers Everywhere
The LLVM infrastructure is designed to be modular. Each of the optimization passes is a self-contained transform that takes LLVM IR as input and produces it as output. Any combination of the optimizations can be run in any order. (Sometimes you might even want to run one more than once.)
LLVM aims to allow optimizations to be run at any time. When a module is compiled to the IR, the first set of optimizations runs. Then, when it’s linked with other modules, it can be optimized again. This functionality is used by the OpenGL Shading Language (GLSL) implementation on newer versions of Mac OS X.
GLSL is a language for writing shaders for OpenGL programs, and features a lot of vector operations. When an OpenGL program runs, the shader program is sent to the driver, which just-in-time (JIT) compiles it to the GPU’s instruction set and loads it. For GPUs that don’t support the program, the driver needs to provide fallback code to run the program on the CPU. Before adopting LLVM, Apple had two GLSL implementations. One was a simple interpreter, in which every GLSL operation was a simple C function call. The other was a hand-coded JIT that used emitted AltiVec instructions.
The new version unifies these implementations. The JIT emits LLVM code that simply calls the functions that the interpreter uses. However, these functions are compiled to LLVM IR, not to native code. At runtime, the LLVM link-time optimization passes run, inlining the operations and performing a number of other optimizations. The final code takes advantage of whatever vector unit the target CPU has (SSE or AltiVec), running about 10% faster than the original hand-coded JIT. Since the same code is used in the interpreter as in the JIT, it’s also much easier to debug.
The IR doesn’t have to be compiled to native code; it can also run in an interpreter. This approach allows runtime optimizations to be performed, transforming the program at runtime into a more optimal version based on profiling information. Alternatively, the profiling information can be collected at runtime and the optimizations can be applied between program runs, in what LLVM calls idle-time optimization.
Once the optimizations have run, the IR is exported. Usually the exported format is machine code for some architecture, but a few other back ends exist, including one that produces C code and one that produces MSIL for the .NET runtime (still under development). The mechanism is quite simple. Writing a back end just requires you to map each LLVM instruction to a native instruction (or sequence of instructions). This is great for RISC architectures, but for something like x86 it’s not ideal. In addition to the simple mappings, it’s also possible to define more complex mappings that translate a sequence of LLVM IR instructions. These will be tried first, lowered to simpler mappings for architectures that don’t support the more complex mappings. This technique is used in particular for emitting vector instructions. LLVM supports vectors types in the IR. Operations on vector instructions are generated directly on architectures that support them, or lowered to sequences of scalar operations on ones that don’t support vectors.