Implementing Feature X in C++, Part 1
In early C++ implementations, the compiler generated C code, which was then compiled with the platform's native C compiler. This approach had a number of problems, but one significant advantage: You could look at the generated code and understand exactly how every feature that you used was implemented and what it cost.
A modern C++ compiler generates assembly or machine code directly, so you need a good understanding of the low-level details of your target architecture to understand the output from the compiler. This is a shame, because there seems to be a lot of confusion about exactly how various features of C++ work and what they cost to use.
The low-level details of C++ are specific to the implementation. Unless otherwise stated, details discussed in this article are specific to the Itanium C++ ABI, which is used by most *NIX systems.
Overloading
One of the simplest features of C++ is the ability to have two (or more) functions with the same name but different parameters. This feature is implemented in a very simple way, using name mangling. If you've ever made a mistake in linking C++ code, you've probably seen missing symbol errors with names that look like a cat walked across the keyboard. This is the result of the name mangling process.
In C, the names of globals (including function names) are the public symbol names that are exported to the symbol table in the resulting binary. This design isn't possible in C++, because names are not unique. You may have two functions with the same name in different namespaces, or two in the same namespace with different parameters. If all those functions were exported with the same symbol name, the linker would become very confused.
The solution is for the compiler to generate a unique name for each function. You can see this mangling with the nm command on a *NIX system, which shows the symbols in an object file. Here's an example from OS X:
$ cat mangle.cc namespace outer { namespace inner { int function(int a, int b) { return a+b; } }; }; $ c++ -c mangle.cc $ nm mangle.o 0000000000000018 s EH_frame1 0000000000000000 T __ZN5outer5inner8functionEii 0000000000000038 S __ZN5outer5inner8functionEii.eh U ___gxx_personality_v0
The source file contains a single function and generates four symbols. Three of these symbols are related to exception handlingwe'll look at those later. The remaining one is our function:
__ZN5outer5inner8functionEii
The first underscore (_) is added on OS X to all C and C++ symbols. This underscore may be missing on other platforms. _Z indicates that this is a mangled C++ symbol. The C specification reserves identifiers of this form, so it's selected to avoid conflicts with C.
The next letter, N, indicates that this is the start of a namespaced function name. The outer, inner, and function identifiers are all encoded by giving the identifier length and then the name. Finally, E indicates the end of the namespace, followed by i (for int) for each of the arguments.
You won't find this encoding in the C++ specification; it's implementation-dependent. This version is the GNU encoding, used by GCC 3 and later (earlier versions used a different encoding). Most common *NIX systems use this encoding on x86, although other architectures may have a different scheme specified as part of the ABI. On Windows, different compilers commonly implement their own mangling schemes, which means that you can't easily use C++ libraries compiled with a different compiler.
On *NIX systems, you can use the C++filt tool to translate these encodings back into something a bit more comprehensible:
$ c++filt __ZN5outer5inner8functionEii outer::inner::function(int, int)