Value Types
Value types represent types that are known as simple or primitive types in many languages. They include types such as int and float in C++ and Java. Value types are often allocated on the stack, which means that they can be local variables, parameters, or return values from functions. By default, they are passed by value. Unlike in some programming languages, CLR value types are not limited to built-in data types; developers may define their own value types if necessary.
Built-in Value Types
Table 2.1 lists the CLR's built-in value types. In the table, the "CIL Name" column gives the type's name as used in Common Intermediate Language (CIL), which could best be described as the assembly language for the CLR. CIL is described in more detail in Chapter 4, which covers the execution system. The next column, "Base Framework Name," gives the name for the type in the Base Framework. The Base Framework is often referred to as the Framework Class Library (FCL). As the library contains more than just classes, this name is somewhat inaccurate. Chapter 7 covers the Base Framework in more detail.
Table 2.1 CLR Built-in Value Types
CIL Name |
Base Framework Name |
Description |
CLS Support |
bool |
System.Boolean |
Boolean, true or false |
Y |
char |
System.Char |
Unicode character |
Y |
int8 |
System.SByte |
Signed 8-bit integer |
N |
int16 |
System.Int16 |
Signed 16-bit integer |
Y |
int32 |
System.Int32 |
Signed 32-bit integer |
Y |
int64 |
System.Int64 |
Signed 64-bit integer |
Y |
unsigned int8 |
System.Byte |
Unsigned 8-bit integer |
Y |
unsigned int16 |
System.UInt16 |
Unsigned 16-bit integer |
N |
unsigned int32 |
System.UInt32 |
Unsigned 32-bit integer |
N |
unsigned int64 |
System.UInt64 |
Unsigned 64-bit integer |
N |
float32 |
System.Single |
IEEE 32-bit floating-point number |
Y |
float64 |
System.Double |
IEEE 64-bit floating-point number |
Y |
native int |
-System.IntPtr equivalent to the machine word size (32 bits on a 32-bit machine, 64 bits on a 64-bit machine) |
Signed native integer, |
Y |
|
native unsigned int |
System.UIntPtr Unsigned native integer |
N |
Note that 8-bit integers appear to be named in an inconsistent manner when compared to the other integral types. Normally, the unsigned integers are known as System.UIntX, where X is the size of the integer. With 8-bit integers, however, the signed version is known as System.SByte, where S means signed. This nomenclature is preferred because unsigned bytes are used more frequently than signed bytes are, so the unsigned byte gets the simpler name.
Boolean Values
The bool type is used to represent true and false values. Unlike some languages that use an integer for this type (so that a value such as 0 represents false and all other values represent true), the CLR designates a specific type for this purpose. This choice eliminates errors that could potentially arise when integer values are taken to signify Boolean values but that interpretation was not the programmer's intention.
Characters
All characters in the CLR are 16-bit Unicode code points.2 The UTF-16 character set uses 16 bits to represent characters; by comparison, the ASCII character set normally uses 8 bits for this purpose. This point is important for a component model such as the .NET Framework, for which distributed programming over the Internet was a prime design goal of the architecture. Many newer languages and systems for Internet programming, such as Java, have also decided to support Unicode.
Integers
The CLR supports a range of built-in integer representations. Integers vary in three ways:
Their size can be 8, 16, 32, or 64 bits. This range covers the size of integers in many common languages and machine architectures. Integers can be signed or unsigned, designating whether the values they hold are positive only or positive/negative.
Native integers are used to represent the most natural size integer on the execution architecture. Because the CLR is designed to run on a number of different platforms, it needs a mechanism to inform the execution engine that it is free to choose the most efficient representation on the platform that the code executes onhence the native integer type.
The last point highlights a recurring theme in the design of the CLRnamely, that many issues are left to the execution system to resolve at run- time. While this flexibility does incur some overhead, the execution system can make decisions about runtime values to ensure more efficient execution. Another example of this facility, which is covered in more detail later, involves the layout of objects in memory. Developers may explicitly specify how objects are laid out or they can defer this decision to the execution engine. The execution engine can take aspects of the machine's architecture, such as word size, into account to ensure that the layout of fields aligns with the machine's word boundaries.
Floating-Point Types
CLR floating-point types vary between 32- and 64-bit representations and adhere to the IEEE floating-point standard.3 Rather than providing a detailed overview of this standard here, readers are referred to the IEEE documentation. Native floating-point representations are used when values are manipulated in a machine, as they may be the natural size for floating-point arithmetic as supported by the hardware of the underlying platform. These values, however, will be converted to float32 or float64 when they are stored as values in the CLR. Providing internal representations for floating-point numbers that match the natural size for floating-point values on the machine on which the code executes allows the runtime environment to operate on a number of different platforms where intermediate results may be larger than these types. A native floating-point value will be truncated, if necessary, when the value is stored into a 32- or 64-bit location in the CLR.
Special Issues with Built-in Value Types
In Table 2.1, notice which of the built-in value types are CLS compliant. As stated previously, languages must adhere to the CLS subset of the CLR to achieve maximum interoperability between languages. It is not surprising that types such as int32 are listed in the CLS whereas types such as native unsigned int are not. Note, however, that most unsigned integers are not included in the CLS.
Also, note that not all programming languages will expose all of these value types to developers. Types such as native int, for instance, may not have a natural mapping into a language's type system. Also, language designers may choose not to expose a typeinstead exposing only CLS-compliant types, for example.
A number of value types are defined in the Framework Class Library. Technically speaking, they are really user-defined types; that is, they have been defined by the developers of the Base Framework rather than being integral CLR value types. Developers using the CLR often do not recognize this distinction, however, so these types are mentioned here. Of course, such a blurry distinction is precisely what the designers of the CLR type system were hoping to achieve. Examples of such types include System.DateTime, which represents time; System.Decimal, which represents decimal values in the approximate range from positive to negative 79,228,162,514,264,337,593,543,950,335; System.TimeSpan, which represents time spans; and System.Guid, which represents globally unique identifiers (GUIDs).
User-Defined Value Types
In addition to providing the built-in value types described previously, the CLR allows developers to define their own value types. Like the built-in value types, these types will have copy semantics and will normally be allocated on the stack. A default constructor is not defined for a value type. User-defined value types may be enumerations or structures. Enumerations
Listing 2.1 gives an example of the declaration and use of a user-defined enumeration.4 This program defines a simple enumeration representing the months in the year and then prints a string to the console window matching a month name with its number.
Listing 2.1 User-defined value type: EnumerationSample
using System; namespace Enumeration { struct EnumerationSample { enum Month {January = 1, February, March, April, May, June, July, August, September, October, November, December} static int Main(string[] args) { Console.WriteLine("{0} is month {1}", Month.September, (int) Month.September); return 0; } } }
For the first few examples in the book, the C# code used is described so that you will become familiar with how to read C#. Later, such detailed descriptions of the example code are omitted.
The first line in Listing 2.1 is the using System directive, which is included so that types whose names start with "System.", such as System.Console, can be referenced without fully qualifying those names. While this tactic reduces the amount of typing developers need to do, its overuse can eliminate the advantages gained by using namespaces, so employ this technique judiciously.
Next in Listing 2.1 comes the definition of a namespace called Enumeration. This name has no programmatic significance; we could have used any name for the namespace or even not used a namespace at all. Nevertheless, because components developed within the CLR are designed to be reused in many scenarios, including being downloaded from the Internet, the use of namespaces is strongly encouraged to avoid collisions between the names of components developed by different developers.
Listing 2.1 continues with the definition of the user-defined value type EnumerationSample. This value type will hold the program's entry point, the method with which execution will commence. The C# keyword enum is used to define an enumeration. In Listing 2.1, this enumeration is called Month and contains constants representing each month of the year. The declaration of the enumeration is reasonably straightforward, except for the fact that the enumeration starts the constants with a value of 1; the default value would be 0.
Main is the entry point for the program. It prints out a single line of output that informs the user that September is the ninth month (9) of the year.
Listing 2.1 produces the following output:
September is 9
Structures
In this book, user-defined value types are called structures. Some languages, such as Managed C++, allow users to use a keyword such as class when defining either value or reference types. Other languages, such as C#, use a keyword such as struct to indicate a user-defined value type and class to indicate a user-defined reference type. This choice is largely a language-specific issue, but readers should be aware of these differences and understand the behavior of the language they are using.
Structures can contain any of the following:
Methods (both static and instance)
Fields (both static and instance)
Properties (both static and instance)
Events (both static and instance)
Methods Methods specify a contract that must be honored by both the caller and the callee. Methods have a name, a parameter list (which may be empty), and a return type. Clients that need to call a method must satisfy the contract when calling the method.
Methods on value types can be either static methods or instance methods:
Static methods are invoked on the type itself and are callable at any time, even if no values of the type exist.
Instance methods are always invoked on values of a type.
One limitation on value types is that they cannot define a constructor that takes no parameters, known as a default constructor in some languages.
Listing 2.2 demonstrates the definition and use of both static and instance methods on a value type in C#. Both methods write a greeting to the console window. The code starts with a using directive; it allows types whose names would start with System, such as "System.Console," to be referenced more simply: Console. Next comes the declaration of the user-defined namespace, ValueTypeMethods. Note that the CLR does not intrinsically support namespaces; instead, a type T declared in namespace N is known to the CLR as the type N.T. This type, N.T, will reside in an assembly; the CLR uses such assemblies to uniquely identify typesnot namespaces, as in some languages. (Assemblies are covered in Chapter 5.)
Listing 2.2 Use of static and instance methods with a user-defined value type
using System; namespace ValueTypeMethods { struct Sample { public static void SayHelloType() { Console.WriteLine("Hello world from Type"); } public void SayHelloInstance() { Console.WriteLine("Hello world from instance"); } static void Main(string[] args) { SayHelloType(); Sample s = new Sample(); s.SayHelloInstance(); } } }
The C# keyword struct is used to create a value type called Sample. This struct has three methods, two of which are static methods: SayHelloClass and Main. The method SayHelloInstance is an instance method and is, therefore, always invoked on values of the type.
The static method Main is also the entry point for the program. As far as the CLR is concerned, the entry point for a program need not be called Main, although in C# it always has that name. The entry point must be a static method and can be a member of either a value type or a reference type. In Listing 2.2, Main calls both the static and instance methods. The entry point function can return a 32-bit value indicating its success or failure; in Listing 2.2, however, Main returns void (i.e., nothing). (Methods also have visibility and accessibilitytopics covered later in this chapter.)
Listing 2.2 produces the following output:
Hello world from Type Hello world from instance
Fields A type may contain zero or more fields, each of which has a type. Like methods, fields can be either static or instance. Used to store values, they represent the state of a type or value. Every field has a type and a name. For example, a Point class may have two fields to represent its x and y coordinates. These values may exist in every instance of the type, thereby allowing the state to be different within each value. If these fields have private accessibility, which is often the desired situation, then the state of an instance remains hidden and only its other members may access it. (Visibility and accessibility are covered later in this chapter.)
The next section, on properties, gives an example of defining and using fields.
Properties Languages that target the CLR are provided with support for properties by the CLR; that is, they are not just a naming convention to be followed by developers. This relationship proves particularly useful when the goal is to provide expressive class libraries. The CLR implements properties through the use of some special metadata annotations and methods. To a certain degree, properties are "syntactic sugar": They represent set and get methods defined on logical fields of a type.
What is a logical field of a type? As an example, a Person type may have properties that represent the Person's Birth Date, Star Sign, and Age. Clearly, storing the actual date of birth is sufficient to allow all the other values to be computed and supplied at runtime. Therefore, Age can be represented as a propertythat is, a logical field of a type where an actual member is not used. Properties have a name, a type, and a number of accessor methods. A type, such as the Person type, would be free to implement all of these logical members as properties.
In client code, although they may appear to be accessing public fields when they read and write to these properties, compilers will insert code to call the property's methods. These methods may compute the needed values and provide all the data validation required to ensure the integrity of the member's values. Properties exist in COM and CORBA as well. In CORBA, they are known as attributes (the IDL keyword used to describe them).
Listing 2.3 demonstrates the definition and use of a value type with properties and fields in C#. This program first defines a value type called Point with properties representing its x and y coordinates, and then writes and reads values to these properties. The value type is a C# struct that has two integers as its data members. Because they are passed by value by default, value types should generally be lightweight; this struct is an example of this requirement.
Listing 2.3 Use of properties and fields with a user-defined value type
using System; namespace ValueType { struct Point { private int xPosition, yPosition; public int X { get {return xPosition;} set {xPosition = value;} } public int Y { get {return yPosition;} set {yPosition = value;} } } class EntryPoint { static void Main(string[] args) { Point p = new Point(); p.X = 42; p.Y = 42; Console.WriteLine("X: {0}", p.X); Console.WriteLine("Y: {0}", p.Y); } } }
The first item in Listing 2.3 is the using System directive, which ensures that types whose names start with "System." can be referenced without the need to fully qualify these names. Next comes the definition of a namespace called ValueType; this name has no particular significance, as we could have used any name for this namespace or even no namespace at all. The definition of the user-defined value type Point follows. This type has two fields, both of type int, which represent the x and y coordinates of a point.
The value type definition is followed by the definition of two more members, both properties. The definition of the properties looks a little awkward initially. The first part of the definition gives the accessibility, type, and name of the property; this information looks identical to the description of any field. The subsequent lines of the definitions provide the set and get methods for these properties. In fact, using the metadata facilities to look at this struct (as is done in Chapter 3), it becomes apparent that two methods are generated for each property, both with the words set_ and get_ prefixed to the names of the propertiesfor example, set_X and get_X.
Note two points relating to Listing 2.3:
The properties map to fields within the type, although such mapping is not strictly necessary.
Properties are not limited to types such int; they can be of any type.
The class EntryPoint provides the entry point for this program. This class could have been given any name, but EntryPoint was chosen because it describes the class's purpose (rather than for any syntactical reason). Within Main, the first line appears to allocate a new instance of the Point type on the heap; in reality, this is not the case. For developers familiar with other programming languages, this idea is very counterintuitive; as value types are allocated on the stack, the local variable p is, in fact, allocated on the stack. Next, the use of the properties is highlighted. Notice how access to the properties appears similar to access to a public field, but the compiler generated code to call the get_ and set_ methods as required. Properties also offer "hints" to the just-in-time (JIT) compiler, which may choose to inline the method calls. (The JIT compiler is discussed in Chapter 4.) For simple properties such as the ones defined in Listing 2.3, little (if any) performance overhead is incurred and many reasons exist to prefer properties over publicly exposing instance fields (e.g., the elimination of versioning and data integrity issues).
Listing 2.3 produces the following output:
X: 42 Y: 42
Events As with properties, languages that target the CLR are provided with support for events by the CLR; like properties, events are not just a naming convention to be followed by developers. Events are used to expose asynchronous changes in an observed object. At a fundamental level, they are "syntactic sugar" that generates methods and associated metadata.
An event has both a name and a type. The type specifies the method signature that clients must provide for the event's callback method. When types define an event, methods to add and remove listeners are created automatically, named add_EventName and remove_EventName. Clients register to listen for events. When an event is raised, a callback method is invoked on the affected clients. When a client is no longer interested in receiving notification of events, it can remove itself from the list of listeners on an event source.
Both COM and CORBA support events, albeit somewhat differently. In COM, an interface can be marked as a source interface, which means that the methods in the interface need to be implemented by the client and the component will call back to the client through these methods. CORBA uses a similar methodnamely, an interface is passed from a client to a server and callbacks are made through the same interface. The CORBA interface is not specifically marked as a callback interface, however, as it is in COM. The approaches employed in COM and CORBA are similar to events in the CLR, except that the CLR registers individual methods to be called back rather than interfaces (which contain a number of methods). CORBA also provides an Event Service that gives full control over events, providing, for example, both push and pull functionality. Unfortunately, a functional equivalent to CORBA Event Service does exist in the CLR.
Listing 2.4 demonstrates the definition and use of events in a value type in C#. This program defines a value type called EventClass that exposes an event, creates a value of this value type, attaches listeners, and then invokes the event. Events are tied to the concept of delegates in the CLR. A delegate is best described as a type-safe function pointer. With events, delegates are used to specify the signature of the method that the event will call when it is raised. For example, the definition of ADelegate in Listing 2.4 states that ADelegate is a delegate (function pointer) that can point at functions that take no parameters and return nothing. The value type EventClass defines an event called AnEvent of type ADelegate; that is, it can register and call back methods whose signature matches that of ADelegate.
Listing 2.4 Use of events with a user-defined value type
using System; namespace EventSample { public delegate void ADelegate(); struct EventClass { public event ADelegate AnEvent; public void InvokeEvent() { if(AnEvent !=null) AnEvent(); } static void CallMe() { Console.WriteLine("I got called!"); } static void Main(string[] args) { EventClass e = new EventClass(); e.AnEvent += new ADelegate(CallMe); e.AnEvent += new ADelegate(CallMe); e.AnEvent += new ADelegate(CallMe); e.InvokeEvent(); } } }
Within the class EventClass, the event can be raised by calling the event's name, such as AnEvent() in the method InvokeEvent. When this event is called, all delegates currently listening on the event are called. The sample program attaches three delegates to this event on the instance of the class called e. Thus, whenever e raises the event, the static method CallMe is called three times. Note that the called method does not always have to be a static method as it is in Listing 2.4.
Listing 2.4 produces the following output:
I got called! I got called! I got called!
Sealed Value Types
As mentioned previously, all value types inherit from specific classesenumerations from System.Enum and structures from System.ValueType. It is not possible to build an inheritance hierarchy with value types; that is, a value type cannot inherit from another value type. In CLR terminology, a value type is said to be sealed. Sealing explains why instance methods are not declared as virtual in value types, because it is not possible to subtype them and, therefore, the definitions of methods cannot be overridden.
By contrast, reference types in the CLR can optionally be declared as sealed, which prohibits subtyping of these types. An example of a reference type that is sealed in the CLR is the String class.
Boxed Types
For every value type, including user-defined value types, there exists a corresponding object type, known as its boxed type. The CLR automatically generates boxed types for user-defined value types, which means that values of any value type can be boxed and unboxed:
Boxing a value type copies the data from the value into an object of its boxed type allocated on the garbage collected heap.
Unboxing a value type returns a pointer to the actual datathat is, the sequence of bitsheld in a boxed object. (In some programming languages, unboxing not only facilitates obtaining the pointer to the data members of a boxed object but also copies the data from the boxed object into a value of the value type on the stack.)
The fact that all value types can be converted to their corresponding object types allows all values in the type system to be treated as objects if required. This situation has the effect of unifying the two fundamentally different types in the CLR, because everything can be treated as a subtype of Object. This approach is somewhat similar to that created by the use of COM's IUnknown and CORBA's Object interfaces, which also act as the base interface in the IDLs. Because a box type is an object type, it may support interface types, thereby providing additional functionality to its unboxed representation. Object types, reference types, and interface types are described later in this chapter.