- The Language Barrier
- Compiler Cleverness
- Evolving the Language
- Future Challenges
Compiler Cleverness
This mismatch between the hardware's concrete model and the language's abstract model isn't newit's experienced by every high-level language. Smalltalk-family languages, such as Java, present a model of memory as a set of discrete blocks with dynamic message-passing for flow control. With the possible exception of the Burroughs large systems architecture, no platforms actually look like that.
Trying to map a high-level language to an architecture with different semantics is difficult, but not impossible. It's sometimes possible, for example, to extract parallelism from purely sequential code. Consider something like this loop in C:
for (int i=0 ; i<100 ; i++) { j[i] = sin(k[i]); }
At first glance, you might think that you could do each loop iteration entirely in parallel. If you were generating code for a vector unit, you might rewrite this as 25 loop iterations, each doing a vector-sine. For a GPU, you could generate a small kernel that did a single sine operation, and then run it on 100 inputs all in parallel. (For such a simple program, the cost of copying data to and from the GPU would make this a silly thing to do.)
But what happens if k and j are both pointers to the same array, with j being offset forward by one element? In this case, the loop will fill each entry in the array with the sine of the previous element. Because this result is entirely valid in C, the compiler can't optimize this code easily. It could bracket the optimized version in a test that the two ranges didn't overlap, generating code for both cases, but this situation gets particularly complex when you consider running code on multiple cores, especially when they have different instruction sets.
There's a very simple solution: Ask the programmer to provide a bit more information. This is the approach that OpenMP uses for concurrency. OpenMP defines a set of annotations that you can add to a program to specify concurrency potential. In the example above, you could specify that j and k are not aliased, and the compiler would then be able to make the loop parallel.
You can do something similar for targeting the GPU. Recently I've been working on a new open standard called HMPP, originally created by a French company called CAPS and now also supported by PathScale. The design is very similar to OpenMP. You can mark up looks in C/C++ and Fortran programs and have them run on the GPU.
The advantage of this approach is that it allows you to take a large existing codebase and immediately accelerate it, without breaking (source) compatibility with systems that don't have a compatible GPU.
The annotations that these systems provide are metadata. They clarify the program's semantics, rather than defining the semantics. They don't alter how the program will behave; they just give the compiler some information that it can't easily determine for itself.
In some cases, these systems can be replaced by static and dynamic analysis of the program. For example, in the simple loop that we had earlier, imagine if the line had been as follows:
float *j = calloc(sizeof(float), 100);
The compiler knows that a pointer returned by the standard library function calloc() will not alias any other memory in the system. Therefore, it can determine automatically that j and k are not aliased, performing any relevant optimizations.
If j and k are both function parameters, the scenario is a bit more difficult. You need some inter-procedural analysis to determine whether they're aliased. A compiler might combine this analysis with function specialization and emit two versions of the routinea safe version, as well as a version that assumes that the parameters are not aliased, and then rewrites calls where it can prove that the parameters don't alias, in order to call the faster one.
This option is quite complex in the compiler, and actually only addresses a relatively simple case for potential optimization. Compilers will gradually gain more of an ability to infer this kind of information, but it's not going to happen very quickly.