MSIL
I mentioned earlier that the two major contents of a .NET assembly are metadata and MSIL code. Now that I have beaten the subject of .NET metadata to death, let's move the discussion along to MSIL. You can think of MSIL as a virtual assembly language. It defines a set of assembly language-like operations that are easily translated into the native instruction set of most modern CPUs.
Common Intermediate Language (CIL)
In the documentation for the .NET Framework SDK, you will see MSIL referred to as CIL. You may also hear the CLR referred to as the CLI. The difference between these names is that the CLI and CIL (as well as the CTS) are part of a specification that Microsoft has submitted to the European Computer Manufacturer's Association (ECMA) for ratification as a standardized platform. The CLR and MSIL are Microsoft's proprietary implementation of the CLI and CIL, respectively. Other vendors are now free to implement the CLI or CIL although it remains to be seen if any will actually do so. You can find out everything you want to know about MSIL in Partition III of the CLI documentation, which you can find in the Tool Developer's Guide documentation.
The CLR uses the VES to compile the MSIL code into native code on the fly as you run a managed code executable. This process of converting MSIL code into native code is called JIT compilation. The VES does not compile the code all at once; it compiles it method by method as the code is used. As soon as the MSIL for a method is compiled into native code, the CLR replaces the MSIL code for that method with the compiled native code so that the CLR can simply run the native code the next time the method is called.
NOTE
It would be very easy to think that MSIL is equivalent to Java byte codes and that the CLR is equivalent to the Java runtime. Although MSIL is actually conceptually similar to Java bytecodes, which also define a virtual assembly language, the CLR does not interpret MSIL code as the Java runtime does. MSIL code is always compiled into native and then executed. Just remember that MSIL is never interpreted, and the CLR is not an interpreter.
MSIL implements a stack-based instruction set. What this means is that you execute an instruction by first loading the arguments into a Last In First Out (LIFO) data structure. This data structure is called a stack, and loading arguments into this data structure is usually referred to as pushing arguments onto the stack. You then execute an instruction, which will pop the arguments off the stack and replace the item at the top of the stack with the result of the operation. The CIL documentation uses the following stack transition diagram to illustrate this type of operation.
value1, value2 ‡…, result
This diagram indicates that two values (value1 and value2) must be pushed onto the stack prior to executing this operation. The values are pushed left to right in this notation so value2 is at the top of the stack. The instruction will then perform some operation on the stack operands, pop the operands off the stack and leave the result on the top of the stack. An example of an operation that would have a stack transition diagram like this is the multiplication (mul) instruction. Here is the stack transition diagram for a unary instruction (one that only requires a single operand). The neg instruction, which simply negates a number, is an example of such an instruction.
value ‡…, result
Notice that I only push one argument on the stack and then the operation will pop the argument off the stack and leave the result at the top of the stack. Some instructions require no stack operand. The best examples of this are the push and pop operations themselves. Here is the stack transition diagram for the ldarg function, which you will use to push method arguments onto the stack.
‡…, value
This simply indicates that no arguments need to be pushed onto the stack prior to executing the instruction, and the result of the operation is that the specified value will reside on the top of the stack.
There are two major categories of instructions in MSIL: (1) base instructions and (2) object model instructions. The base instructions are the instructions that you use to move data on and off the stack; perform essential arithmetic like add, subtract, multiply, divide; and perform bitwise operations like AND, OR, NOT, XOR, and left and right shifts. The base instructions also include branching operations that are used by most programming language to implement flow of control, such as branch on false, branch on not false, branch if equal, branch if greater than, unconditional branch, and so forth. The basic instruction set also includes comparison instructions like compare equal, compare greater than, and compare less than and functions for copying and initializing memory blocks and calling and returning from methods. The base instruction set forms a Turing Complete set of operations, meaning that, with these instructions, you can perform all the calculations expected of a modern computer. The MSIL instruction set closely mirrors the instruction set of a modern microprocessor. This makes it simple to implement an MSIL-to-native-code compiler.
Table 3-9 lists some of the instructions in the base instruction set for moving data on and off the stack.
Table 39 Instructions for moving data on and off the stack
Instruction |
Description |
ldarg num |
Pushes argument num onto the stack. |
ldarg.0, ldarg.1, ldarg.2, ldarg.3 |
Is the short form for pushing arguments 0, 1, 2, or 3 on to the stack. |
ldarga num |
Pushes the address of argument num on to the stack. The ldarga instruction should only be used for by-ref parameter passing. In most cases, you should use ldarg. |
ldloc indx |
Pushes the local variable identified by index (indx) on to the stack. |
ldloc.0, ldloc.1, ldloc.2, ldloc.3 |
Is the short form for pushing local variable with index 0, 1, 2, or 3. |
ldloca indx |
Pushes the address of the local variable identified by index (indx) on to the stack. |
ldc.<type> value |
Pushes the numeric constant value of type <type> on to the stack, e.g. idc.i4 5 will push the value 5 onto the stack as a 4-byte integer, idc.i8 3 will push the value 3 onto the stack as an 8-byte integer, ldc.r4 3.5 will push the value 4-byte float 3.5 onto the stack, ldc.r8 4.8 will push the 8-byte float (double) value 4.8 on to the stack. |
starg num |
Stores the item at the top of the stack (pops the stack) into argument num. |
stloc indx |
Pops the stack into the local variable identified by index (indx). |
stloc.0, stloc.1, stloc.2, stloc.3 |
Is the short form for popping the stack into local variable with index 0, 1, 2, or 3. |
stind.<type> |
The stack transition diagram for this instruction is as follows: addr, val ‡… This instruction stores the value val of type <type> into the address identified by addr. stind.i4 will store a 4-byte integer at the specified address. |
Pop |
Removes the top element on the stack. |
Table 310 lists some of the instructions in the base instruction set for performing arithmetic operations on data.
Table 310 Instructions for performing arithmetic operations
Instruction |
Description |
Stack Transition Diagram |
add |
Adds value1 and value2. |
value1, value2 ‡…, result |
sub |
Subtracts value2 from value1. |
value1, value2 ‡…, result |
mul |
Multiplies two values. |
value1, value2 ‡…, result |
div |
Divides two values and returns a quotient or floating point result. |
value1, value2 ‡…, result |
rem |
Calculates the remainder of value1 divided by value2. |
value1, value2 ‡…, result |
Table 311 lists some of the branching and flow control instructions in the base instruction set.
Table 311 Branching and flow control instructions
Instruction |
Description |
Stack Transition Diagram |
beq target |
Branch to instruction (target) if value1 = value2. |
value1, value2 ‡… |
bne.un target |
Branch to instruction (target) if value1 <> value2 or is unordered. |
value1, value2 ‡… |
bge target |
Branch to instruction (target) if value1 >= value2. |
value1, value2 ‡… |
bgt target |
Branch to instruction (target) if value1 > value2. |
value1, value2 ‡… |
ble target |
Branch to instruction (target) if value1 <= value2. |
value1, value2 ‡… |
brfalse target |
Branch to instruction (target) if value = false, null, zero. |
value ‡… |
brtrue target |
Branch to instruction (target) if value = non-false, non-null. |
value ‡… |
br target |
Unconditional branch. |
‡… |
call method |
Calls the method identified by method. |
, arg1, arg2 argn ‡…, retVal (not always returned) |
ret |
Return from a method The methods stack must be empty except for the return value (if there is one). This return value will be copied from the method's stack to the stack of its caller. |
Return value on method's stack (not always present) ‡…, return value on callers stack (not always present) |
Table 3-12 contains some of the bitwise instructions in the base instruction set.
Table 312 Bit manipulation instructions
Instruction |
Description |
Stack Transition Diagram |
or |
Computes the bitwise OR of value1 and value2. |
value1, value2 ‡…, result |
and |
Computes the bitwise AND of value1 and value2. |
value1, value2 ‡…, result |
xor |
Computes the bitwise XOR of value1 and value2. |
value1, value2 ‡…, result |
not |
Computes the bitwise complement of the value on the top of the stack. |
value ‡…, result |
The object model instructions are built on the base instructions, and they provide a common set of services to high-level, object-oriented programming languages. Let's take a look at some of the object model instructions. These instructions provide a set of services that simplify the development of high-level, object-oriented languages. These services include accessing and updating the fields of an object, making late-bound (virtual) method calls, boxing and unboxing objects, creating arrays and accessing and updating the elements of an array, instantiating and type-casting objects, and throwing exceptions. Table 313 contains a partial list of the instructions in the object model instruction set.
Table 313 object model instructions
Instruction |
Description |
Stack Transition Diagram |
newobj ctor |
Create a new, uninitialized object or value type and call its constructor. |
arg1, argN ‡… , obj |
ldfld field |
Push a field of an object onto the stack. |
obj ‡…, value |
callvirt method |
Calls a late-bound method on an object. |
obj, arg1, argN ‡…, return value (optional) |
stfld field |
Updates the value of a field of a specified object with a new value. |
obj, value ‡… , |
box valueTypeToken |
Converts a value type object to a reference type object. |
valueObj ‡…, refObj |
unbox valueTypeToken |
Converts the boxed (reference type) representation of a value type back to its value type form. |
refObj ‡…, valueObj |
castclass class |
Casts an object to a specified class. |
obj ‡…, obj2 |
initobj classtoken |
Initializes all the fields of the value object to null or a 0 of the specified type. |
addrOfValueObj ‡…, |
cpobj classtoken |
Copies a value object of type indicated by classtoken from sourceObj to destObj. |
destValueObj, srcValueObj ‡… , |
ldobj classtoken |
Loads an instance of the value type indicated by classtoken onto the stack. |
addrOfValueObj ‡…, valueObj |
stobj classtoken |
Copies an instance of the type indicated by classtoken from the stack into memory. |
addr, valueObj ‡…, |
newarr etype |
Creates a new array of the type indicated by etype. |
numElems ‡… , array |
ldelem.<type> |
Pushes the element of type <type> onto the stack. ldelem.i4 will push the element as 4-byte integer, ldelem.i8 will push the element as an 8-byte integer, and so forth |
array, index ‡…, value |
stelem.<type> |
Stores the value on the stack of type <type> into the array element indicated by index. |
array, index, value ‡… , |
ldlen |
Pushes the number of elements of an array on to the stack. |
array ‡…, length |
Now that you know more about MSIL than you probably wanted to know, let's take a look at some MSIL code. I'll start by taking a look at
the MSIL code for the GetSalary method in the Manager class. To view the MSIL, perform the following steps:
Open a Visual Studio .NET command prompt by making the following selection from the Start menu: Programs | Microsoft Visual Studio .NET | Visual Studio .NET Tools | Visual Studio .NET Command Prompt.
Change directories to the location where you built the multifile assembly that you have been using throughout this chapter.
Enter the following command at the Visual Studio .NET command prompt:
ildasm Manager.mod
It's important that you run ildasm on the Manager.mod module because this is the file that contains the implementation of the Manager class. In the ildasm main window, find the GetSalary method in the Manager class as shown in Figure 36.
Figure 36 The Manager class as viewed in ildasm.
Now double-click the GetSalary method, and you should see a window that contains the following code. I did clean up the code slightly to make it a little more readable.
.method public hidebysig virtual instance valuetype [mscorlib]System.Decimal GetSalary() cil managed { // Code size 22 (0x16) .maxstack 2 .locals init (valuetype [mscorlib]System.Decimal V_0) IL_0000: ldarg.0 IL_0001: call instance valuetype System.Decimal [.module Employee.mod]Employee::GetSalary() IL_0006: ldarg.0 IL_0007: ldfld valuetype System.Decimal Manager::mBonus IL_000c: call valuetype System.Decimal System.Decimal::op_Addition( valuetype System.Decimal, valuetype System.Decimal) IL_0011: stloc.0 IL_0012: br.s IL_0014 IL_0014: ldloc.0 IL_0015: ret } // end of method Manager::GetSalary
The C# source code for this method is as follows:
public override decimal GetSalary() { return base.GetSalary()+mBonus; }
The first two lines in the MSIL code declare the maximum stack size for the method and initialize a local variable for the method:
.maxstack 2 .locals init (valuetype [mscorlib]System.Decimal V_0)
You don't explicitly declare a local variable in this method, but the compiler has to create one in the generated MSIL to store the return value of this method temporarily before you return it. To understand the next few lines, you have to first know that every nonstatic method of a class has an implied zeroth argument that contains the "this" pointer for the current instance:
IL_0000: ldarg.0 IL_0001: call instance valuetype System.Decimal [.module Employee.mod]Employee::GetSalary()
Therefore, the previous code pushes the "this" pointer on the stack and then calls the GetSalary method of the base class of Manager (Employee). This method returns the Salary (without the Manager's bonus) of the Employee. The return value of the method will be left on the top of the stack. This is the convention that MSIL uses for return values of methods.
On the next three lines, you load the "this" pointer for the current object and then use the ldfld instruction from the Object Model instruction set to load the mBonus field from the Manager instance:
IL_0006: ldarg.0 IL_0007: ldfld valuetype System.Decimal Manager::mBonus IL_000c: call valuetype System.Decimal System.Decimal::op_Addition( valuetype System.Decimal, valuetype System.Decimal)
The ldfld instruction will pop the "this" pointer off the stack. Therefore, after the ldfld instruction, the top two items on the stack will be the Salary for the current instance (without Manager's bonus) and the bonus amount (in the mBonus) field for the current Manager instance. The next line of code calls the op_Addition method in the System.Decimal class to add these two values together, yielding the total salary for the Manager. Remember a Decimal is a valuetype object that is declared in the Base class library; the regular add instruction will not work for this type. After the op_Addition method call, the result will be at the top of the stack.
The next four lines store the return value from the op_Addition method in the local variable for this method:
IL_0011: stloc.0 IL_0012: br.s IL_0014 IL_0014: ldloc.0 IL_0015: ret
I'll just have to come clean and admit that I have no idea why the compiler generated the line of code labeled IL_0012. Notice that this line of code is just an unconditional branch to the next line. The next line after this oddity simply loads a local variable on to the stack. This line is pushing the return value on to the stack. You must push the return value of a method on to the stack (if there is one) before you return. The final line is a call of the ret instruction, which will return from the method and put the return value at the top of the stack of the calling method.
Do you see how simple it is to read MSIL after you understand the base and object model instruction sets? Let's try out your newly acquired knowledge on some system code. The base class library in the .NET Framework contains a full-featured set of collection classes. The collection classes include stack, queue, hashtable, arraylist, and sorted list classes. You can learn a lot about how the stack collection class is implemented by examining the implementation of the push method.
To do this, navigate to the latest version of your .NET Framework directory. On my machine, the directory is D:\WINNT\Microsoft.NET\Framework\v1.0.3705. The collection classes are found in an assembly called mscorlib.dll. Enter the following command at a command prompt in this directory:
ildasm mscorlib.dll
Now find the stack class as shown in Figure 3-7 and double-click the push method within this class.
Figure 37 The System.Collection.Stack class as viewed in ildasm.
The following code shows the MSIL code for the push method in the System.Collection.Stack class. Once again, I did clean up the code a little to enhance its readability. Let's break down this code section by section.
.method public hidebysig newslot virtual instance void Push(object obj) cil managed { // Code size 100 (0x64) .maxstack 5 .locals (object[] V_0, int32 V_1) IL_0000: ldarg.0 IL_0001: ldfld int32 Stack::_size IL_0006: ldarg.0 IL_0007: ldfld object[] Stack::_array IL_000c: ldlen IL_000d: conv.i4 IL_000e: bne.un.s IL_003c IL_0010: ldc.i4.2 IL_0011: ldarg.0 IL_0012: ldfld object[] Stack::_array IL_0017: ldlen IL_0018: conv.i4 IL_0019: mul IL_001a: conv.ovf.u4 IL_001b: newarr System.Object IL_0020: stloc.0 IL_0021: ldarg.0 IL_0022: ldfld object[] Stack::_array IL_0027: ldc.i4.0 IL_0028: ldloc.0 IL_0029: ldc.i4.0 IL_002a: ldarg.0 IL_002b: ldfld int32 .Stack::_size IL_0030: call void System.Array::Copy( class System.Array,int32, class System.Array,int32,int32) IL_0035: ldarg.0 IL_0036: ldloc.0 IL_0037: stfld object[] Stack::_array IL_003c: ldarg.0 IL_003d: ldfld object[] Stack::_array IL_0042: ldarg.0 IL_0043: dup IL_0044: ldfld int32 Stack::_size IL_0049: dup IL_004a: stloc.1 IL_004b: ldc.i4.1 IL_004c: add IL_004d: stfld int32 Stack::_size IL_0052: ldloc.1 IL_0053: ldarg.1 IL_0054: stelem.ref IL_0055: ldarg.0 IL_0056: dup IL_0057: ldfld int32 Stack::_version IL_005c: ldc.i4.1 IL_005d: add IL_005e: stfld int32 Stack::_version IL_0063: ret } // end of method Stack::Push
The first line of code declares two local variables called V_0, and V_1, which are typed as an object array and a 4-byte integer, respectively.
.locals (object[] V_0, int32 V_1)
I will call these two local variables newArray and curSize. You will see why shortly. The next seven lines of code are simply an "if" statement.
IL_0000: ldarg.0 IL_0001: ldfld int32 Stack::_size IL_0006: ldarg.0 IL_0007: ldfld object[] Stack::_array IL_000c: ldlen IL_000d: conv.i4 IL_000e: bne.un.s IL_003c
If you recall that the instruction ldarg.0 loads the "this" pointer for the current object and that the ldfld instruction loads a field of an object, you can see that lines (IL_0000 thru IL_007) are simply loading two private variables of the stack object onto the execution stack: (1) the current number of elements in the stack collection (_size) and (2) an object array (_array) that holds the contents of the stack collection. The next two lines (IL_000C and IL_000D) will replace the top item on the execution stack with the declared length of the internal array and convert the length to a 4-byte integer. Line IL_000e will branch to line IL_003C if the top two items in the execution stack are not equal to each other, in other words, if the number of elements in the stack collection is not equal to the declared length of the internal array. If I was to convert this MSIL code into C#, it would look as follows.
If ( this._size == this._array.Length) { // Lines IL_0010 thru IL_0037 go here }
Therefore, the lines of MSIL code shown previously (IL_0000 thru IL_000e) are testing to see if the internal storage that was allocated for the stack contents is full. Let's look at lines IL_0010 through IL_0037 to see how the stack collection expands its internal storage if there are too many elements to fit within the current storage buffer. In other words, you are looking to see what happens when the "if" statement evaluates to true. These lines are as follows:
IL_0010: ldc.i4.2 IL_0011: ldarg.0 IL_0012: ldfld object[] Stack::_array IL_0017: ldlen IL_0018: conv.i4 IL_0019: mul IL_001a: conv.ovf.u4 IL_001b: newarr System.Object IL_0020: stloc.0 IL_0021: ldarg.0 IL_0022: ldfld object[] Stack::_array IL_0027: ldc.i4.0 IL_0028: ldloc.0 IL_0029: ldc.i4.0 IL_002a: ldarg.0 IL_002b: ldfld int32 Stack::_size IL_0030: call void System.Array::Copy( class System.Array, int32,class System.Array, int32,int32) IL_0035: ldarg.0 IL_0036: ldloc.0 IL_0037: stfld object[] Stack::_array
Line IL_0010 loads the number 2 onto the stack. Lines IL_0011 through IL_0018 load the current length of the array that contains the stored elements of the stack. Line IL_0019 multiplies this length by 2, and line IL_001a converts the multiplication result to an unsigned integer. Line IL_001b instantiates a new array of System.Object instances whose length is the multiplication result, and line IL_0020 stores this new array in the local variable with index zero. Remember I had called this local variable newArray earlier. Do you see why? This local variable is used to store the new expanded array. So now, if you converted lines IL_0010 through IL_0020 into C#, you would get the following code:
newArray=new System.Object[(uint)(2*this._array.Length)];
In other words, if the internal storage array for a stack collection is full, when you attempt to add a new item to the stack collection, the stack collection will create a new array that is double the size of the existing array. On lines IL_0021 through IL_0030, the MSIL is setting up a call to the Copy method in the Array class to copy the contents of the existing array (_array) into the new larger array (newArray). The copy function takes five parameters in order from left to right: (1) the source array, (2) the starting index in the source array to copy from, (3) the destination array, (4) the starting index in the destination array to copy to, and (5) the number of elements to copy from the source to the destination array. You must push all of these arguments, in order from left to right, onto the execution stack before calling the Copy method. In other words, you push the first argument (the source array) first and the last argument (the number of elements to copy) last. Lines IL_0021 and IL_0022 push the _array field of the stack collection instance onto the execution stack; this is the source array. Line IL_0027 pushes 0 for the starting index in the source array. Line IL_0028 pushes the destination array onto the execution stack; remember this array is stored in the local variable at index 0. Line IL_0029 pushes the starting index in the destination array onto the stack, and lines IL_002a and IL_002b push the current size of the stack collection onto the stack, which is the number of elements that you want to copy from the source to the destination. Line IL_0030 makes the actual call to the copy function.
The next three lines push the this pointer and the newArray local variable onto the execution stack and then store the newArray into the _array local variable.
IL_0035: ldarg.0 IL_0036: ldloc.0 IL_0037: stfld object[] Stack::_array
This will remove the only outstanding reference to the _array environment variable and thereby free it to be garbage-collected the next time the garbage collection algorithm runs.
So far, the code that you have looked at is the code that will be executed only if the internal storage for the stack needs to be expanded. The next section of code is the instructions in the push method of the stack collection class that actually places the new element on to the stack. Let's take a look at this code, which runs from IL_003c to IL_0063:
IL_003c: ldarg.0 IL_003d: ldfld object[] Stack::_array IL_0042: ldarg.0 IL_0043: dup IL_0044: ldfld int32 Stack::_size IL_0049: dup IL_004a: stloc.1 IL_004b: ldc.i4.1 IL_004c: add IL_004d: stfld int32 Stack::_size IL_0052: ldloc.1 IL_0053: ldarg.1 IL_0054: stelem.ref IL_0055: ldarg.0 IL_0056: dup IL_0057: ldfld int32 Stack::_version IL_005c: ldc.i4.1 IL_005d: add IL_005e: stfld int32 Stack::_version IL_0063: ret
The lines from IL_003c to IL_004c are difficult to understand unless you understand how the stack works. Just keep in mind that most instructions require one, two, or three arguments to be on the stack. The instruction will use the arguments on the execution stack and then pop them off the stack when it is done and replace the arguments with the result of the operation if there is one. Line IL_003c pushes the "this" pointer for the Stack collection on to the execution stack. Line IL_003d pushes the _array member field of the Stack collection onto the execution stack; it also pops the "this" pointer off the execution stack. Line IL_0042 pushes the "this" pointer onto the execution stack, and the next line, IL_0043, duplicates the top element on the stack. The next two lines load the _size field on to the stack and duplicate it. The ldfld instruction also uses one of the "this" pointers on the stack. Therefore, after line IL_0049, the execution stack will look like Figure 38.
Figure 38 The execution stack after Instruction IL_0049.
Line IL_004a will store the top item on the execution stack into the local variable with index 1. Remember that I called this variable currSize. This instruction will also pop one of the _size values off the execution stack (see Figure 38). Line IL_004b pushes the literal value 1 on to the execution stack, and line IL_004c will add the value 1 to the remaining _size value on the top of the execution stack (and also pop the remaining _size value off the execution stack, replacing it with the result of the operation). The top of the execution stack will now contain the value _size + 1. Line IL_004d will store this value into the _size field and pop the remaining "this" pointer off the execution stack. The top of the execution stack will now contain the _array field. Line IL_0052 will push the currSize local variable, which contains the size of the array, on to the execution stack. This value will be used as the index position for the new element in the stack collection. Line IL_0053 will load the argument to this method on to the execution stack. In this case, the argument is the object that you are pushing on to the stack collection.
NOTE
Remember argument 0 is the "this" pointer. Argument 1 is the first real argument.
Line IL_0054 will store the object into the internal array at the index specified on line IL_0052. It will also pop the _array field, the currSize field, and the object (argument 1) off the execution stack. Lines IL_0055 and IL_0056 load the "this" pointer onto the stack and duplicate it. Line IL_0057 will push the _version field onto the execution stack (and pop one of the "this" pointers off the stack). Line IL_005c will load the literal value 1 on to the stack, and line IL_005d will add 1 to _version, and line IL_005e stores this new value back to the _version field. In other words, lines IL_0055 through IL_005e are equivalent to the following C# code:
this._version = this._version + 1;
It's not clear to me what the _version field is used for. I looked at the code for the Pop method in the stack collection class, and the _version field is incremented each time you call the pop method also. The _version field appears to track the number of times you perform an operation on the stack collection. It's not clear why this is necessary because I could not find any properties or methods that return or use this information. The final line of MSIL code, IL_0063, is obviously just a return instruction. The push method in the System.Collections.Stack class has no return value (it's typed as a void), so the execution stack is left empty when you return. Therefore, now I can show you the complete, decompiled C# code for the Push method in the System.Collections.Stack class.
public void Push(System.Object arg1) { System.Object[] newArray; int currSize; if ( this._size == this._array.Length) { newArray=new System.Object [(uint)(2*this._array.Length)]; System.Array.Copy(_array,0,newArray,0,this._size); _array=newArray; } currSize=this._size; this._size=this._size+1; this._array[currSize]=arg1; this._version=this._version+1; }
I think you can see how easy that was. The key difference between a .NET assembly and a regular Win32 DLL is that the Win32 DLL contains machine code, an almost unintelligible encoding of 0s and 1s. Sure, you can disassemble this code into x86 assembly language code, but native, x86 assembly language still does not contain the kind of high-level instructions (particularly the object model instructions) that make MSIL so easy to decipher. Moreover, it is much easier to read assembly language code if it uses only stack-based instructions like MSIL does. When an instruction set has lots of CPU registers, it is harder to figure out what's going on because different compilers will use these registers in different ways.
The fact that MSIL code is so easy to read is both good news and bad news. The good news is that you can always understand exactly how a class or method is implemented, even if you do not have the source code for the class. I have used my ability to decipher MSIL code a number of times while writing this book to gain insight into how certain aspects of the .NET Framework's class library are implemented.
NOTE
Now that you know how the push method in the stack collection class is implemented, you can be smarter about how you use it. Because you know that the push method will double the size of the internal storage every time it has to grow the stack, you know that it is important to try to initialize the stack to the largest size that you think you might need. One of the constructors for the stack collection class allows you to specify the initial size for the stack's internal storage.
The bad news, of course, is that, if you write your code and ship it as a .NET assembly, other people will be able to decipher your code. Your competitors can easily gain access to your intellectual property, and a savvy programmer can easily steal your proprietary algorithms. At first, when I realized this, I was horrified, and I questioned whether this fact alone would cause people to avoid using the .NET Framework. After thinking about it more, I realized that it's probably not a big deal as long as you know that this issue exists. For a start, obfuscation technology already exists that makes it nearly impossible for someone to decipher the MSIL code in your .NET assemblies.
NOTE
Some of the .NET obfuscation products that are available include Salamander, which is made by a company called Remote Soft (see http://www.remotesoft.com for more information), DotFuscator, from preEmptive Solutions (see http://www.preemptive.com), and Demeanor from Wise Owl (see http://www.wiseowl.com). Unfortunately, these obfuscators cost anywhere from a few hundred to more than a thousand dollars. Desaware makes an open source obfuscator called QND-Obfuscator, which is available for "free" if you purchase an e-book for $39.95. See http://www.desaware.com for more information.
If you're worried that determined intellectual property thieves may someday find a way to subvert these obfuscators (a valid concern), you can still hide your most sensitive intellectual property by implementing key algorithms as an unmanaged COM server using Visual C++ or VB6 and then use COM Interop (which I discuss extensively in this book) to call the methods in this COM server.