Reference Types
Reference types combine a location and a sequence of bits. The location provides identity by designating an area in memory where values can be stored and the type of values that can be stored there. A location is "type safe" in that only assignment-compatible types can be stored in it. (The section "Assignment Compatibility" gives an example of assignment compatibility.)
Because all reference types are allocated on the garbage collected heap and the garbage collector is free to move objects during execution, reference types are always accessed through a strongly typed reference rather than directly. As the garbage collector moves the object, the reference can be updated as part of the relocation process. As shown in Figure 2.2 on page 24, three categories of reference types exist: object types, interface types, and pointer types.
Object Types
Object types represent types that are known as classes in many languages, such as SmallTalk and Java. The built-in object types include Object and String. The CLR uses the term object to refer to values of an object type; the set of all exact types for all objects is known as the object types. Because String is an object type, all instances of the String type are therefore objects. Object types are always allocated on the garbage collected heap. Table 2.2 lists the relevant information for the CLR's built-in object types.
A number of reference types are supplied with the Base Framework. They are not considered to be built-in types because the CLR provides no inherent support for them; instead, these reference types are viewed as being under the user-defined object types that are subtypes of Object.
Object
All object types inherit, either directly or indirectly, from the CLR type System.Object class. A major facility of the Object class is its ability to enforce a singular rooted inheritance hierarchy on all types in the CLR. Although you may think that this kind of inheritance does not apply to value types, value types can be treated as a subtype of Object through boxing. The libraries make extensive use of the Object type as the parameters to, and the return type of, many functions.
Table 2.2 CLR Built-in Reference Types: Object Types
CIL Name |
Library Name |
Description |
CLS Support |
Object |
System.Object |
Base class for all object types |
Y |
String |
System.String |
Unicode string |
Y |
The Object class provides a number of methods that can be called on all objects:
Equals returns true if the two objects are equal. Subtypes may override this method to provide either identity or equality comparison. Equals is available as both a virtual method that takes a single parameter consisting of the other object with which this object is being compared and a static method that, naturally, requires two parameters.
-
Finalize is invoked by the garbage collector before an object's memory is reclaimed. Because the garbage collector is not guaranteed to run during a program's execution, this method may not be invoked.5 In C#, if a developer defines a destructor, then it is renamed to be the type's Finalize method.
GetHashCode returns a hash code for an object. It can be used when inserting objects into containers that require a key to be associated with each such object.
GetType returns the type object for this object. This method gives access to the metadata for the object. A static method on the Type class can be used for the same purpose; it does not require that an instance of the class be created first.
MemberwiseClone returns a shallow copy of the object. This method has protected accessibility and, therefore, can be accessed only by subtypes. It cannot be overridden. If a deep copy is required, then the developer should implement the ICloneable interface.
ReferenceEquals returns true if both object references passed to the method refer to the same object. It also returns true if both references are null.
ToString returns a string that represents the object. As defined in Object, this method returns the name of the type of the objectthat is, its exact type. Subtypes may override this method to have the return string represent the object as the developer sees fit. For example, the String class returns the value of the string and not the name of the String class.
Most of the methods defined on Object are public. MemberwiseClone and Finalize, however, have protected access; that is, only subtypes can access them. The following program output shows the assembly qualified name and the publicly available methods on the Object class.(Assemblies are covered in Chapter 5.)
System.Object, mscorlib, Version=1.0.2411.0, Culture=neutral, PublicKeyToken=b77a5c561934e089 Int32 GetHashCode() Boolean Equals(System.Object) System.String ToString() Boolean Equals(System.Object, System.Object) Boolean ReferenceEquals(System.Object, System.Object) System.Type GetType() Void .ctor()
A simple program using the metadata facilities of the CLR generated this output. Chapter 3 describes how to build this approximately 10-line program and explains the significance of the assembly qualified name shown as the first line of output. In brief, the program retrieves the Object class's Type object and then displays its public method's prototypes. The two protected methods are not shown. The preceding output also shows the use of built-in types, such as Boolean, Int32, and String.
Listing 2.5 demonstrates how many classes override the ToString method. The default behavior is to print a string representing the type of the object on which it is invokedthat is the action of the Object class. String and Int32 have both overridden this behavior to provide a more intuitive string representation of the object on which it is invoked.
Listing 2.5 Overriding the ToString method
using System; namespace Override { class Sample { static void Print(params Object[] objects) { foreach(Object o in objects) Console.WriteLine(o.ToString()); } static void Main(string[] args) { Object o = new Object(); String s = "Mark"; int i = 42; Print(o, s, i); } } }
One interesting feature of Listing 2.5 relates to the use of the C# params keyword. You can call the Print method with any number of arguments, and these arguments will be passed to the method in an array that holds Object references. Within the Print method, each member of the array will have its virtual ToString method called, so the correct method for each argument will be invoked. You may wonder how the value i of type int is passed given that it is a value type: It is boxed. Listing 2.5 produces the following output:
System.Object Mark 42
The constructor for Object should be called whenever an object is created; it must be called if the goal is to produce verifiable code. Chapter 4 discusses verifiable code in more detail, but for now consider "verifiable code" to mean proven type-safe code. As a simplification of the rules, compiler writers must ensure that a call to the constructor for the base class occurs during construction of all derived classes. This call can occur at any time during the construction of derived classes; it need not be the first instruction executed in the constructor of subtype. Developers should be aware that construction of the base class may not have occurred when a user-defined constructor starts running.
String
Like Object, String is a built-in type in the CLR. This class is sealed, which means that no type can be subtyped from it. The sealed nature of String allows for much greater optimization by the execution system if it is known that subtypes cannot be passed where a type, such as String, is expected. Strings are also immutable, such that calling any method to modify a string creates a new string. The fact that the String class is sealed and immutable allows for an extremely efficient implementation. Developers can also use a StringBuilder class if creating temporary strings while doing string manipulations with the String class proves too expensive.
The String class contains far too many members to describe them all here. A nonexhaustive list of the general functionality of the class would include the following abilities:
Constructors: No default constructor exists. Constructors take arguments of type char, arrays of char, and pointers to char.
Compare: These methods take two strings (or parts thereof) and compare them. The comparison can be case sensitive or case insensitive.
Concatenate: This method returns a new string that represents the concatenation of a number of individual strings.
Conversion: This method returns a new string in which the characters within the string have been converted to uppercase or lowercase.
Format: This method takes a format string and an array of objects and returns a new string. It replaces each placeholder in the format string with the string representation of an object in the array.
IndexOf: This method returns an integer index that identifies the location of a character or string within another string.
Insert: This method returns a new string representing the insertion of one string into another string.
Length: This property represents the length of the string. It supplies only a get methodthat is, it is a read-only value.
Pad and Trim Strings: These methods return a new string that represents the original string with padding characters inserted or removed, respectively.
Listing 2.6 is a simple example demonstrating the use of the built-in String reference type in C#. This program creates objects of type string, converts them between uppercase and lowercase, concatenates them, and then checks for a substring within a string.
Listing 2.6 Using the String reference type
using System; namespace StringSample { class Sample { static void Main(string[] args) { String s = "Mark"; Console.WriteLine("Length is: " + s.Length); Console.WriteLine(s.ToLower()); String[] strings = {"Damien", "Mark", "Brad"}; String authors = String.Concat(strings); if(authors.IndexOf(s) > 0) Console.WriteLine("{0} is an author", s.ToUpper()); } } }
Main begins by defining an instance of the String class called s with the contents "Mark". Normally, you allocate an instance of a reference type by using a language construct such as new. In the CLR and in many languages, such as C#, a string is a special case because it is a built-in type and, therefore, can be allocated using special instructions or different syntax. The CLR has an instruction for just this purpose. Likewise, as shown in Listing 2.6, C# has a special syntax for allocating a string.
The next line accesses a property of the string object called Length, returns an integer representing the number of characters in a string. You have already seen how a property is defined, but now you may wonder how an integer is "added" to a string. Of course, the answer is that it is not. While an int is a value type, a reference type is always created for all value types, known as the boxed type. In this case, code is inserted to box the integer value and place it on the garbage collected heap.
How, then, are a string and a boxed integer concatenated? Again, the answer is that they are not. One of the overloaded concatenation methods on the String class takes parameters of type Object. The compiler calls this method, and the resulting string is added to the first string; this value is then written to the screen. Developers need to be aware that compilers may silently insert a number of calls to provide conversion and coercion methods, such as boxing. The exact behavior is a function of the particular compiler; some compilers will insert these method calls, whereas others will require the developer to call them explicitly. Of course, Listing 2.6 could be rewritten to avoid boxing altogether; again, this choice requires the developer to know and understand the code generated by the particular compiler.
Finally, the program in Listing 2.6 makes a single string containing all the names of the authors of this book and then searches to see whether "Mark" is one of the authors.
Your attention is also drawn to the generation of temporary string objects in Listing 2.6. Note that calls to methods such as ToLower and ToUpper generate temporary string objects that may produce performance problems.
Listing 2.6 produces the following output:
Length is: 4 mark MARK is an author
Interface Types
Programming with interface types is an extremely powerful concept; in fact, with COM and CORBA, interface-based programming is the predominate paradigm. Object-oriented programmers will already be familiar with the concept of substituting a derived type for a base type. Sometimes, however, two classes are not related by implementation inheritance but do share a common contract. For example, many classes contain methods to save their state to and from persistent storage. For this purpose, classes not related by inheritance may support common interfaces, allowing programmers to code for their shared behavior based on their shared interface type rather than their exact types.
An interface type is a partial specification of a type. This contract binds implementers to providing implementations of the members contained in the interface. Object types may support many interface types, and many different object types would normally support an interface type. By definition, an interface type can never be an object type or an exact type. Interface types may extend other interface types; that is, an interface type can inherit from other interface types.
An interface type may define the following:
Methods (static and instance)
Fields (static)
Properties
Events
Of course, properties and events equate to methods. By definition, all instance methods in an interface are public, abstract, and virtual. This is the opposite of the situation with value types, where user-defined methods on the value type, as opposed to the boxed type, are always nonvirtual because no inheritance from value types is possible. The CLR does not include any built-in interface types, although the Base Framework provides a number of interface types.
Listing 2.7 demonstrates the use of the IEnumerator interface supported by array objects. The array of String objects allows clients to enumerate over the array by requesting an IEnumerator interface. (Developers never need to define an array class even for their own types, because the CLR generates one automatically if needed. The automatically defined array type implements the IEnumerator interface and provides the GetLength method so they can be used in exactly the same manner as arrays for built-in types.) The IEnumerator interface defines three methods:
Current returns the current object.
MoveNext moves the enumerator on to the next object.
Reset resets the enumerator to its initial position.
Listing 2.7 Using the IEnumerator interface
using System; using System.Collections; namespace StringArray { class EntryPoint { static void Main(string[] args) { String [] names = {"Brad", "Damien", "Mark"}; IEnumerator i = names.GetEnumerator(); while(i.MoveNext()) Console.WriteLine(i.Current); } } }
Of course, you could have written the while loop in Listing 2.7 as a foreach loop, but the former technique seems slightly more explicit for demonstration purposes. The program produces the following output:
Brad Damien Mark
Array objects support many other useful methods, such as Clear, GetLength, and Sort. The Sort method makes use of another Base Framework interface, IComparable, which must be implemented by the type that the array holds. Array objects also provide support for synchronization, an ability that is provided by methods such as IsSynchronized and SyncRoot.
Pointer Types
Pointer types provide a means of specifying the location of either code or a value. The CLR supports three pointer types:
-
Unmanaged function pointers refer to code and are similar to function pointers in C++.6
Managed Pointers are known to the garbage collector and are updated if they refer to an item that is moved on the garbage collected heap.
Unmanaged pointers are similar to unmanaged function pointers but refer to values. Unmanaged pointers are not CLS compliant, and many languages do not even expose syntax to define or use them. By comparison, managed pointers can point at items located on the garbage collected heap and are CLS compliant.
The semantics for these pointer types vary greatly. For example, pointers to managed objects must be registered with the garbage collector so that as objects are moved on the heap, the pointers can be updated. Pointers to local variables have different lifetime issues. When using unmanaged pointers, objects will often need to be pinned in memory so that the garbage collector will not move the object while it is possible to access the object via an unmanaged pointer.
This book will not cover pointer types in detail for two reasons. First, a greater understanding of the .NET platform architecture is needed to understand the semantic issues relating to pointers. Second, most programming languages will abstract the existence of pointers to such a degree that they will be invisible to programmers.