Types
The .NET Framework is built around types. A type in .NET is a class, structure, interface, enumeration, or delegate. Every piece of code that you write in .NET, even the main program for your application, must be a member of some type.
The following code shows the simplest "hello world" application that you can write.
class TestClass { static void Main(string[] args) { System.Console.WriteLine("Hello
World"); } }
Notice that the Main method in this case is a static member of a class called TestClass. Not only will all of the code that you write be contained within some type, but all of the code that you use, both the classes in the .NET Framework base class library and any third-party class libraries, are all implemented as types. There are no global functions in .NET.
What Is a Type?
In the software world, a type is a just a set of operations, and, optionally, some state that all instances of that type have in common. Most programming languages have intrinsic types like the int, float, char, and long types in C/C++ (and C#) and the Integer, Single and String types in VB. Each of these types has some state representation associated with it. For instance, an int in C is either a 16- or 32-bit block of memory depending on the platform. A C float is a 32-bit block of memory on most platforms. Each of these types has a set of operators that can be applied to them, and "+", "-", "/" are just a few of these operators. The main difference between these types is how they represent their state. The bits in a 32-bit int represent a base 2 number, and the bits in a 32-bit float represent a real number that is encoded using one of several formats, such as Institute of Electrical and Electronic Engineers (IEEE) 754.
Most programming languages also allow developers to create user-defined types that are conglomerates of the intrinsic types and other user-defined types, such as an Employee structure in C that contains an int ID member, a float salary member, and a string name member that is a null-terminated char array. Object-oriented programming languages introduced the notion of a class type that stores its state as a conglomerate of other types. Classes also support a set of methods that operate on the state; this is sometimes called the behavior of the class.
So what's the advantage of this approach? That's simple. It means that your code is easier to verify. If all of your code is a member of a type, as long as you can guarantee type safetyin other words, as long as you can guarantee that it is impossible to coerce an object of one type into behaving like another type that it is not assignment compatible with (in other words, that is not in its inheritance tree)you can go a long way toward guaranteeing that the code is safe. Indeed, one of the most common types of security attacks these days, buffer overflows, can be at least partially avoided through the use of type-safe code. To understand buffer overflows, imagine that you have a 50-element array. If you can create a piece of code that treats that 50-element array as though it were a 250-element array, you can gain access to memory that would not normally be accessed by application code. This memory may contain system-level information, or it might contain a handle that someone can use to write to the file system. If you can enforce type safety, that is, guarantee that a 50-element array can only be accessed as a 50-element array; you can eliminate an entire class of security problems. Type-safe code also has the additional benefit of being less error-prone and easier to debug.
CTS
In addition to making it easy to write code that is more secure, less error prone, and easier to debug, Microsoft also wants to enable an unprecedented level of cross-language interoperability in the .NET Framework. One of the key enabling technologies in the .NET Framework that makes this possible is the Common Type System (CTS). The CTS provides a common set of types for all CLR-compliant languages. One of the most difficult problems that you tend to encounter when trying to mix code from different programming languages is mapping the types in one language to the types in another language. For instance, strings in Visual Basic (BSTRs) are length prefixed, Unicode character arrays. Strings in C/C++ are null-terminated, single-byte character arrays. One solution to this problem is to create a set of functions that map one type to another. For instance, the MFC CString class contains a method called AllocSysString that converts an MFC string to a BSTR. Another approach is to create a type system that is independent of any programming language and then map language-specific types to this type system. Microsoft's first attempt at a common type system was the so-called Automation types: Variant, BSTR (strings), Currency, Date, Short, and Long. Unfortunately, the automation types weren't as language independent as they first seemed. All of the Automation types are simply the VB representation of that type. Some automation types, particularly the Integral types, like Short, Long, and so forth, mapped neatly to types in other programming languages. Other types like BSTR, Variant, and Currency didn't map so well and were difficult to use in other languages, such as C++, requiring a special set of functions and extra work to manipulate them. In the end, you had a choice. Your interfaces could either be easy to use from C++, in which case you eschewed the Automation types. Or you could use the automation types, and your interfaces would be easy to use from high-level languages like VB, but difficult to use for C++ programmers.
The CTS simply takes the idea of Automation types one step further. With the CTS, Microsoft has created a type system that all CLR-compliant programming languages share. Table 31 shows a list of the types supported by the CTS. The definitions of all of these types can be found in the System namespace in the Framework libraries, which I describe shortly. Don't worry about the column labeled CLS Type for now. I will explain what the CLS is in the next section.
Table 31 CTS types
Type Name |
CLS Type |
Description |
System.Boolean |
Yes |
True/False Value |
System.Code |
Yes |
16-bit, Unicode character |
System.Object |
Yes |
Generic object or boxed value type (I explain what a boxed type is later in this chapter). You can think of this type as being the .NET equivalent of a variant. |
System.String |
Yes |
Unicode string |
System.Single |
Yes |
IEEE 32-bit floating point |
System.Double |
Yes |
IEEE 64-bit floating point |
System.SByte |
No |
Signed 8-bit integer |
System.Byte |
Yes |
Unsigned 8-bit integer |
System.IntPtr |
Yes |
Signed integer of native size |
System.UintPtr |
No |
Unsigned integer of native size |
System.Int16 |
Yes |
Signed 16-bit integer |
System.Uint16 |
No |
Unsigned 16-bit integer |
System.Int32 |
Yes |
Signed 32-bit integer |
System.Uint32 |
No |
Unsigned 32-bit integer |
System.Int64 |
Yes |
Signed 64-bit integer |
System.Uint64 |
No |
Unsigned 64-bit integer |
System.TypedReference |
No |
Pointer plus runtime type |
You don't have to worry about converting from your language's native types to the CTS types. In order for a programming language to work with the CLR, the language must use the CTS types as its native types. Internally, most programming languages will have aliases for these types that map to the expected, primitive types of the language. Table 3-2 shows a mapping of C# types (which were designed to closely resemble the C/C++ types) to CTS types.
Table 32 A mapping of native C# types to CTS types
C# Type |
Equivalent CTS Type |
bool |
System.Boolean |
byte |
System.Byte |
char |
System.Char |
System.DateTime |
System.DateTime |
decimal |
System.Decimal |
double |
System.Double |
int |
System.Int32 |
long |
System.Int64 |
object |
System.Object |
short |
System.Int16 |
float |
System.Single |
string |
System.String |
Table 33 shows the mapping of VB types to CTS types.
Table 33 A mapping of native VB types to CTS types
VB Type |
Equivalent CTS Type |
Boolean |
System.Boolean |
Byte |
System.Byte |
Char |
System.Char |
Date |
System.DateTime |
Decimal |
System.Decimal |
Double |
System.Double |
Integer |
System.Int32 |
Long |
System.Int64 |
Object |
System.Object |
Short |
System.Int16 |
Single |
System.Single |
String |
System.String |
User-Defined Type |
System.ValueType |
C/C++ and C# programmers are used to dealing with a type called int that is usually a 32-bit integer. VB programmers have an equivalent type called Integer. Each type maps to a CTS type called System.Int32, and you can declare a variable of type System.Int32 in each programming language if you really want to. However, each programming language also contains an alias for the CTS type that has the name that developers who use that language are used to seeing. For C# programmers, the System.Int32 can also be called an int, and, for VB programmers, the System.Int32 can also be called an Integer.
The beauty of the CTS is that, if you are writing a VB client that calls a component written in C#, there is no need to do type conversions. The VB representation of a string, for instance, is exactly the same as the C# representation of a string. They both use the System.String class from the CTS to represent a string.
Microsoft bashers (and there are lots of them) will probably say that the CTSand the CLRare just another example of Microsoft forcing other vendors to do things their way. This time, Microsoft is using its control of the operating system platform to coerce compiler vendors into building their compilers with a type system that they (Microsoft) have specified. Actually, the opposite is true. Most compiler vendors have eagerly embraced the CLR and hence the CTS. By targeting the CLR and using the CTS, a programming language instantly becomes a first-class player in the .NET world. Classes written in any programming language can inherit from classes in the .NET Framework, and they can serve as base classes to classes written using other CLR-compliant languages. Moreover, any CLR-compliant programming language can use the exception handling, deployment, versioning, debugging, and profiling features built into the CLR; they also instantly gain support for garbage collection. The CLR and the CTS is actually a tremendous win for compiler vendors. The CLR does a lot of work that they had to do themselves in the past. That's why far from viewing this as Microsoft once again asserting its hegemony, dozens of third-party compiler vendors have rushed to support the CLR.
CLS
Even though the CLR attempts to make all languages equal, the reality is that they are not. For instance, some languages have the notion of an unsigned integer, and some do not. The CTS (and hence the CLR) does support unsigned integers. Microsoft did not want to require all CLR-compliant programming language to support all the features and types in the CLR, because this would require compiler vendors to change the expected behavior of some programming languages. Instead, Microsoft has defined a subset of the CTS and features supported by the CLR that all languages must support as a minimum. This subset is known as the Common Language Specification (CLS). For compiler vendors, supporting the CLS means that your language can use any CLS-compliant class library or framework. Moreover, the types created by your programming language can be used and extended by any other programming language that is compliant with the CLS. For class library and framework developers, making sure that your library or framework is CLS compliant means that your software will be usable by the greatest number of .NET programming languages.
The distinction between the CTS and CLS is a little confusing. Think of it this way: the CTS defines the full set of types supported by the CLR and available internally to any .NET programming language, the CLS defines the subset of the CTS that you must restrict yourself to and a set of rules that compiler and framework developers must adhere to, in order to ensure that their software is usable by all CLR-compliant programming languages. The second column of Table 31 shows which of the CTS types are also CLS compliant. Some examples of the rules in the CLS are as follows:
A type is CLS compliant if its public interfaces, methods, fields, properties, and events contain only CLS-compliant types or are marked explicitly as not CLS compliant.
A CLS Consumer can completely use any CLS-compliant type.
A CLS Extender is a CLS consumer tool, and it can also extend (inherit from) any CLS-compliant base class, implement any CLS-compliant interface, and use any CLS-compliant custom attribute on any type, method, field, parameter, property, or event.
You can find a complete description of the CLS in Partition 1 of the Tool Developers Guide.
VOS
The Virtual Object System (VOS) is the object model used by all CLR-compliant programming languages. The VOS specifies how classes are defined; it enables the unprecedented level of cross-language interoperability that the CLR provides. For instance, with the .NET Framework, you can create a class in one programming language and then derive a subclass from it in a different programming language. You could not do this unless both programming languages shared a common representation of what a class is. The VOS provides that common representation.
The VOS is essentially just a set of rules that describe how classes are represented in the CLR. Some of the rules in the VOS are as follows:
A class may contain zero or more members.
The members of a class can be one of the following: Field, Method Property, or Event.
Members of a type may have one of the following visibilities:
PublicThe member can be called or accessed by code in any assembly.
PrivateThe member can be called or accessed only by methods in the same type.
FamilyThe member can be called or accessed only by methods in the same type or derived types in any assembly.
AssemblyThe member can be called or accessed only by methods in the same assembly.
Family and AssemblyThe member can be called or accessed by methods in the same type or derived types only if the type is in the same assembly.
Family or AssemblyThe member can be called or accessed by methods in the same type or derived types in any assembly. It can also be accessed by any type in the same assembly.
Class supports only single inheritance.
Classes must inherit (directly or indirectly) from a class called System.Object.
Just as you don't have to be an MSIL expert to use the .NET Framework, you don't have to be a VOS expert to create and use classes and objects in the .NET Framework. Your programming language will be integrated with the VOS, and you will use the native object constructs of your programming language. These constructs will map to the VOS. For instance, even though C++ supports multiple inheritance, if you use C++ with Managed Extensions (which is just a variant of C++ that uses the CLR), you are limited to single inheritance.
The key point here is that, whether you create your classes using C#, VB, or C++ with Managed Extensionsthanks to the VOSafter the code is compiled into MSIL, these classes are equivalent.
Reference Types
The .NET Framework supports the notion of reference types and value types. Reference types are always allocated on the managed heap, and they can only be accessed through a reference (a reference is the managed code equivalent of a pointer). Arrays, pointer types, interfaces, and delegates are all reference types. (If you are not sure what delegates are, you'll learn in Chapter 4). The garbage collector tracks instances of reference types, and they are automatically freed when your program is no longer using them. Because the garbage collector does not run immediately, an instance of a reference type will likely remain in memory after the method in which it was created has returned. In C#, reference types are created using the class keyword, so the following code defines an Employee reference type:
public class Employee { public Employee(int id,string name,decimal salary) { this.mName=name; this.mID=id; this.mSalary=salary; } public int ID { get { return mID; } set { mID=value; } } public string Name { get { return mName; } set { mName=value; } } public virtual decimal GetSalary() { return mSalary; } private int mID; private string mName; private decimal mSalary; }
All reference types must derive from System.Object either directly or indirectly. Even though I have not specified a base class for the Employee class, it still derives from System.Object.
When you assign a reference to some other reference, you are performing a shallow copy. In other words, you are making a copy of the reference, not a copy of the object that the reference points to. The following code should make this clear:
static void Main(string[] args) { Employee emp1, emp2; emp1=new Employee(1,"Alan Gordon",500); emp2=emp1; // assign emp2 to emp1 (shallow copy) emp1.Name="Tamber Gordon"; System.Console.WriteLine( "Emp1 name = {0}",emp1.Name); System.Console.WriteLine( "Emp2 name = {0}",emp2.Name); }
This program will write the following output to the console:
Emp1 name = Tamber Gordon Emp2 name = Tamber Gordon
In this program, I have defined two Employee references: emp1 and emp2.
Employee emp1, emp2;
I create a single Employee object using the C# new operator. Because the Employee class is a reference type, the new operator creates the variable on the managed heap:
emp1=new Employee(1,"Alan Gordon",500);
The next line is the key line in this example:
emp2=emp1;
In this line, I assign emp2 to the reference emp1. Emp2 now points to the same object that emp1 points to. Emp2 does not contain a copy of that object. Therefore, if I change the name property of the object pointed to by emp1, emp2 will reflect the change also. There is only one object; any change made through one reference will immediately be reflected in the other reference. Figure 31 should make this clear.
Figure 31 Two instances of a reference type pointing to the same object.
Value Types
Value types in .NET are allocated on the stack. The built-in types, such as int, float, and so forth, and enumerations are value types, and you can create user-defined values types. They are not garbage collected. Like all stack variables, an instance of a value type livesat mostas long as the method in which it was created lives. Value types are always accessed directly; you cannot create a reference to a value type.
When you assign to an instance of a value type, you create a deep copy of the variable that is being assigned. An example will make this clear. I change the definition of the Employee type to look as follows:
public struct Employee { public Employee(int id,string name,decimal salary) { this.mName=name; this.mID=id; this.mSalary=salary; } public int ID { get { return mID; } set { mID=value; } } public string Name { get { return mName; } set { mName=value; } } public decimal GetSalary() { return mSalary; } private int mID; private string mName; private decimal mSalary; }
I have now changed Employee to be a value type. Notice that I use the keyword struct instead of class to define this type. You use the struct keyword to declare user-defined value types in C#. Notice also that I removed the virtual keyword from the GetSalary method. Value classes are sealed, which means you cannot inherit from them. It therefore makes no sense to have virtual functions, so the compiler rejects a virtual function in a value type. If you now run the following code shown, which is exactly the same code that I ran for the reference version of Employee, you'll notice that, if I assign emp1 to emp2 and then change one of the fields of emp1, emp2 still reflects the original value:
static void Main(string[] args) { Employee emp1, emp2; emp1=new Employee(1,"Alan Gordon",500); emp2=emp1; emp1.Name="Tamber Gordon"; System.Console.WriteLine( "Emp1 name = {0}",emp1.Name); System.Console.WriteLine( "Emp2 name = {0}",emp2.Name); }
The code shown previously will write the following output to the console:
Emp1 name = Tamber Gordon Emp2 name = Alan Gordon
This indicates that emp1 and emp2 are different objects. Therefore, the runtime made a deep copy of emp1 when you assigned it to emp2. Figure 32 should make this clear.
All value types derive either directly or indirectly from a class called ValueType. ValueType derives from System.Object and overrides the methods in System.Object to provide an appropriate implementation for a value type.
Figure 32 Two instances of a value type.
Boxing and Unboxing
There will be times that you will want a value type object to behave like a reference type object and vice versa. For example, all of the collection classes in the System.Collections store instances of System.Object. But there will be times when you will need to store value types in a collection. Consider the following code that pushes two Employee value type objects onto a Stack and then pops them off:
using System; using System.Collections; namespace TestApp { static void Main(string[] args) { Employee emp1, emp2; emp1=new Employee(1,"Alan Gordon",500); emp2=new Employee(2,"Tamber Gordon",600); Stack myStack; myStack=new Stack(10); myStack.Push(emp1); myStack.Push(emp2); emp1=(Employee)myStack.Pop(); emp2=(Employee)myStack.Pop(); } }
The Push and Pop methods in the System.Collections.Stack class are declared as follows:
public virtual void Push(System.Object obj); public virtual System.Object Pop();
Therefore, when you push the two Employee objects onto the Stack, the CLR has to perform some "magic" to convert the two value type objects to System.Object reference type objects. Similarly, when you pop the two Employees off the Stack, the CLR has to convert the reference type objects on the Stack back to value type objects. The process of converting value type instances to reference type instances (the Push case) is called boxing. When the CLR needs to box a value type instance, it allocates a reference object on the heap and then copies the value type's data into it. The wrapper is marked as a boxed object so that the system knows that it contains a boxed representation of a value type. The reverse process is called unboxing. When the CLR unboxes a reference type object, it first checks that the object is in fact a boxed value type, and then it copies the data in the reference type object into the value type object you are assigning to. Although we are jumping ahead of ourselves a little bit, take a look at a portion of the MSIL code that the C# compiler will generate for the preceding example code. Notice that there is a "box" call prior to each "Push" on to the Stack and an "unbox" call after each "Pop" off the Stack.
IL_0038: box TestApp.Employee IL_003d: callvirt instance void Stack::Push(object) IL_0042: ldloc.2 IL_0043: ldloc.1 IL_0044: box TestApp.Employee IL_0049: callvirt instance void Stack::Push(object) IL_004e: ldloc.2 IL_004f: callvirt instance object Stack::Pop() IL_0054: unbox TestApp.Employee IL_0059: ldobj TestApp.Employee IL_005e: stloc.0 IL_005f: ldloc.2 IL_0060: callvirt instance object Stack::Pop() IL_0065: unbox TestApp.Employee
You must unbox a reference object back to its pre-boxed type. Therefore, the following code, where Manager is also a value type, will cause a runtime error because I boxed an Employee object and then I am trying to unbox it as a Manager:
static void Main(string[] args) { Employee emp; Manager mgr; emp=new Employee(1,"Alan Gordon",500); Stack myStack; myStack=new Stack(10); myStack.Push(emp); mgr=(Manager)myStack.Pop(); // Runtime error here! }
The semantics of value types and reference types are a fundamental part of the behavior of the CLR, not the C# language. Therefore, another programming language may not use the keyword "class" to declare a reference type or the keyword "struct" to declare a value type, but the behavior of these different categories of types as well as boxing and unboxing will be the same.