How the LLVM Compiler Infrastructure Works
I talked a bit about the low-level virtual machine (LLVM) when comparing open source compilers. Since then, I’ve become involved with the LLVM project, working on code generation for the Objective-C language. In this article, I’ll give you a more in-depth overview of how LLVM works.
What Is LLVM?
LLVM is a virtual machine infrastructure that doesn’t provide any of the high-level features you’d find in something like the Java or .NET virtual machines, including garbage collection and an object model.
The basic design of LLVM is an unlimited register machine (URM), familiar to most computer scientists as a universal model of computation. It differs from most URMs in two ways:
- Registers are single-assignment. Once a value for a register has been set, it can’t be modified. This is a common representation in a lot of compilers, and has been since the idea was invented by an IBM researcher in 1985.
- Each register has a type associated with it.
LLVM programs are assembled from basic blocks. A basic block is a sequence of instructions with no branches. The phi instruction is used to create conditional execution. The name comes from the original work in static single assignment, so the semantics will be familiar to anyone who has worked on a compiler that uses this form. It allows the value of an LLVM register to be set to one of a group of values, depending on the basic block from which the current value was entered. Consider the following snippet of C:
if(condition) a = 1; else a = 2;
In LLVM, or any other compiler with an SSA intermediate representation, a basic block would be constructed for each of the assignments. A phi instruction would then be used in the following code to select the correct value for a.
In LLVM, there are two sorts of registers:
- Global registers have names that are valid in the entire module (or possibly the entire program).
- Local registers have names that are valid only in the current function.