The #~ Stream
The #~ stream has some header information that has the format of Listing 3.
Listing 3#~ Stream Header
typedef struct _META_COMPOSITE_HEADER { DWORD Reserved; BYTE MajorVersion; BYTE MinorVersion; BYTE HeapSizes; BYTE Padding; ULONGLONG Valid; ULONGLONG Sorted; } META_COMPOSITE_HEADER, *PMETA_COMPOSITE_HEADER;
The HeapSizes portion of this header indicates the size of the index that is required for the #Strings, #GUID, and #Blob streams. If bit zero (0x01) is set, then there are more than 65536 (2^16) strings in the #Strings stream and all indices into this stream require four bytes. If the bit is not set, then all indices to the #Strings stream will use a two-byte index. Similarly, if bit 1 (0x02) is set, then #GUID indices are all four bytes rather than two. And if bit 3 (0x04) is set, then the indices for the #Blob stream are required to be four bytes.
Notice that the #US stream is absent from this list. This is because all access to this stream comes from the code (IL). The ldstr IL instruction references this stream exclusively. The ldstr takes a single four-byte argument with the upper byte set to 0x70, and the remaining bytes are the index into the table. So inherently all indices into the #US stream use three bytes.
The next field of interest is the Valid field. This field is a 64-bit value (ULONGLONG), in which each bit position indicates that the table is available. For the HelloWorld application above, in the Valid field is 0x0000000900001447. This number indicates that tables 0, 1, 2, 6, 10, 12, 32, and 35 are available and valid.
Immediately following this header is an array of four-byte values that indicate the number of rows in the specified valid table. Again, in the HelloWorld application, the first four-byte value indicates how many rows there are in table 0, the next four-byte value indicates the number of rows in table 1, the next for table 2, then table 6, table 10, and so forth. Because there are eight bits that are set in the Valid field, then there should be eight four-byte values that follow the header.
Immediately following the table row size array are the data for the tables themselves.
0x00 ModuleA table that must only have one row. In this table, there are two columns that are non-zero and not reserved. The name is specified by an index into the #Strings stream. For the HelloWorld application, this name is "HelloWorld.exe". For each assembly, a unique GUID is generated, identifying the assembly. Although the CLR does not use this information, it can be used by debuggers and so forth. There is a field in this table that specifies an index into the #GUID stream so that the GUID associated with this assembly can be accessed. For the HelloWorld application, each row in this table is 10 bytes long.
0x01 TypeRefThis table contains a description of each of the types referenced in the assembly and which assembly they are referenced from. The first column in this table is a ResolutionScope coded index. Because this is the first time a coded index has been encountered in this article, it deserves some treatment here. A coded index typically is two bytes long, with the lower bits containing information about which table is being referenced. A ResolutionScope coded index can contain a reference to a Module, ModuleRef, AssemblyRef, or a TypeRef table. There are four tables, so it takes two bits to encode this information. If the number of rows multiplied by 4 (make room for the two bits of encoding) is greater than 65536 (2^16), then the size of this coded index is four bytes long instead of two bytes. The rest of the bits for this index (index >>2) represents the index into the table specified by the first two bits. The remaining two columns in this table are indices into the #Strings stream for the name of the type being referenced and its associated namespace. The HelloWorld application references System.Object, System.Diagnostics.DebuggableAttribute, and System.Console classesall referenced from mscorlib.dll. For the HelloWorld application, each row in the TypeRef table is six bytes long.
0x02 TypeDefThis table contains information about types that have been defined within your assembly. The first entry into this table contains a four-byte flag that describe the type. Some of the typical flags are the types protection level, whether it is static or not, how it is structured, if it is final, and so forth. All of these bits are described with the TypeAttributes enumerated value. The next two indices index the #Strings stream for the name of the type and its namespace, respectively. Following the namespace index is a TypeDefOrRef coded index. This coded index is handled in the same way as the ResolutionScope coded index, only now the possible tables that can be referenced are the TypeDef, TypeRef, and TypeSpec tables. This token indicates if this type extends an existing type. At a minimum, most user-defined classes at least extend System.Object. For the HelloWorld application, there are two rows in this table; the first row defines a type for "<Module>" that does not extend any class, and the second row defines the Hello class that extends System.Object. Finally, this table has two columns, FieldList and MethodList. The FieldList column is an index into the Field table. The MethodList column is an index into the MethodDef table. Each of these columns represents a list of fields or methods, respectively. The list starts with the index in the current row, and ends with either the index in the next row or the last index in the respective table. For example, there are two rows in the HelloWorld assembly. The first row has a value of 1 for both the FieldList and the MethodList columns. The next row also has a value of 1 for FieldList and MethodList. This would mean that there are no fields or methods associated with the first row because going from 1 to 1 represents zero entries. The Field table is undefined, so there are no fields associated with the second row of this table. There are two entries in the Method table, so the MethodList indicates that the methods associated with this type have indices from 1 to 2 (a total of one method).
0x06 MethodDefThis table contains a description of the methods that are defined for this assembly. The first column is a four-byte value that refers the RVA of the code. The RVA is a pointer to the actual IL code that defines the method associated with a given row in this table. The next two-byte constant refers to implementation flags (MethodImplAttributes). After that, there is another two-byte value that contains flags that describe the method (MethodAttributes). Next is an index into the #String stream containing the name of the method. Following that is an index in to the #Blob stream that contains a binary description of the signature (return type, parameter types, and so forth). Finally, there is an index into the Param table that forms the start of a parameter list. The same rules for the beginning and ending of the list, as discussed in the TypeDef bullet, apply here. For the HelloWorld sample, there are only two methods defined: Main and .ctor (constructor).
0x0A MemberRefThis table lists the methods and members that are referenced throughout the assembly. The first column in this table is a MemberRefParent coded index. This coded index indicates where the referenced member is defined, what class it is a part of, and so forth. The next column is an index in to the #Strings stream, indicating the name of the member. The last column in this table is an index into the #Blob stream giving this member a signature. In the HelloWorld application, each row in this table contains six bytes, and there are three rows.
0x0C CustomAttributeThis table describes all the custom attributes that are applied to this assembly.
0x20 AssemblyThis table contains a description of the current assembly (name, version, encryption algorithm, public key, and so forth).
0x23 AssemblyRefSimilar to the Assembly table, this table contains a description of all the assemblies that this assembly references. As with the Assembly table, a full description is given to the referenced assembly, not just the name.
As indicated earlier, there can be up to 41 tables in an assembly. I have only described only eight of them. A more complex assembly will have significantly more tables. After a little bit of work, and implementing the rest of the tables, you should be able to come up with an application that looks like Figure 1.
Figure 1 MetaViewer.
This has been a really brief overview of how to dissect an assembly file. After you figure out the documentation, it is really fairly easy to do. The metadata information is very valuable for debugging, auditing, profiling, and so on. Using the techniques described above, you will be much more comfortable with assemblies, knowing what is and is not contained in them.