- Portable Executable Format
- Tables and Metadata
- The #~ Stream
Tables and Metadata
To get at the metadata, you need to look at the listing again. This time, look at the entry MetaData Directory. This entry supplies what is known as the RVA (Relative Virtual Address) of the beginning of the metadata. An RVA is part of one of the sections of the file. To decode this address, you need to find which section this address corresponds to. For this example, there are three sections: .text, .rsrc, and .reloc. The .text section lies between 2000 and 22E3, the .rsrc section lies between 4000 and 433F, and the .reloc section lies between 6000 and 600B. The RVA that is to be decoded in order to find the metadata is at 207C. It turns out that this address is in the .text section. In order to find out where the address refers to you, subtract off the base address of the section (2000 in this case), and take what remains as the offset into the section. The CIL section not only contains the address, but the size of the metadata as well. For this sample, the metadata is 0214 bytes long. Using this information, the metadata for this sample program is between 207C and 2290.
The first entry into the table is the signature that is 0x42534A42, or if you convert those four bytes into characters, you get the signature BSJB. Each letter in the signature stands for one person who worked on building the metadata engine and version one of the CLR.
The next piece of interesting data is the version string that is contained in the metadata. On my machine, the version reads v1.0.3705. Following the version string are the name, sizes, and offsets (addresses) of the main containers for the metadata known as streams. For this sample, there are five streams: #~, #Strings, #US, #GUID, and #Blob. Using dumpbin, you can see these names in the assembly right after the BSJB signature of the metadata.
The #~ stream is a table of tables. Each table is identified by a single byte from 0x00 to 0x29, or up to 41 tables. Each of these tables describes the methods, the fields, the parameters, the signatures, the assembly, the assembly references, the types, the type references and so forth.
The #Strings stream contains a list of strings that identify the program, the methods, the parameter names, and so forth. These are all human-readable strings that are used in referencing addresses or values within the program. In this sample program, the #Strings stream contains strings such as "Main", "HelloWorld.exe", "Console", "WriteLine", and so forth.
The #US stream is a table that contains all the strings that are defined by the user. It is called the User Strings stream. In this simple example, there is only one user defined string, "Hello world!".
The #GUID stream, as you would expect, contains GUIDs. This table contains a list of all of the 128-bit GUIDs used in the application, including the GUID that is used to uniquely identify this application. Whenever a program is compiled, it is assigned a new GUID to uniquely identify it. This is reminiscent of COM, which has similar unique identifiers that were registered in the registry for the local machine so that this component could be found.
Finally, the #Blob stream contains sequences of binary data that are not easily represented as a string. For example, one blob could be the public key token associated with this assembly or an assembly that is referenced by this assembly. Blobs are used a lot when defining constants or signatures for methods and parameters. Both the #GUID stream and the tables within the #~ stream are indexed starting with 1. The #Strings, #US, and #Blob streams are indexed with an offset into the stream. The first entry is zero in each of these streams. For the #Blob stream associated with the HelloWorld assembly above, valid indices would be 0x1, 0x10, 0x14, 0x18, 0x24, and 0x29. Indices into the #Strings and #US streams would be similar.