Assemblies Defined
In order to deploy a CLR module, developers must first affiliate it with an assembly. An assembly is a logical collection of one or more modules. As just described, modules are physical constructs that exist as byte streams, typically in the file system. Assemblies are logical constructs and are referenced by location-independent names that must be translated to physical paths either in the file system or on the Internet. Those physical paths ultimately point to one or more modules that contain the type definitions, code, and resources that make up the assembly.
The CLR allows developers to compose assemblies from more than one module primarily to support deferred loading of infrequently accessed code without forming separate encapsulation boundaries. This feature is especially useful when developers are using code download because they can download the initial module first and download secondary modules only on an as-needed basis. The ability to build multimodule assemblies also enables mixed-language assemblies. This allows developers to work in a high-productivity language (e.g., Logo.NET) for the majority of their work but to write low-level grunge code in a more flexible language (e.g., C++). By conjoining the two modules into a single assembly, developers reference, deploy, and version the C++ and Logo.NET code as an atomic unit.
Parenthetically, though an assembly may consist of more than one module, a module is generally affiliated with only one assembly. As a point of interest, if two assemblies happen to reference a common module, the CLR will treat this as if there are two distinct modules, something that results in two distinct copies of every type in the common module. For that reason, the remainder of this chapter assumes that a module is affiliated with exactly one assembly.
Assemblies are the "atom" of deployment in the CLR and are used to package, load, distribute, and version CLR modules. Although an assembly may consist of multiple modules and auxiliary files, the assembly is named and versioned as an atomic unit. If one of the modules in an assembly must be versioned, then the entire assembly must be redeployed because the version number is part of the assembly name and not the underlying module name.
Modules typically rely on types from other assemblies. At the very least, every module relies on the types defined in the mscorlib assembly, which is where types such as System.Object and System.String are defined. Every CLR module contains a list of assembly names that identifies which assemblies are used by this module. These external assembly references use the logical name of the assembly, which contains no remnants of the underlying module names or locations. It is the job of the CLR to convert these logical assembly names into module pathnames at runtime, as is discussed later in this chapter.
To assist the CLR in finding the various pieces of an assembly, every assembly has exactly one module whose metadata contains the assembly manifest. The assembly manifest is an additional chunk of CLR metadata that acts as a directory of adjunct files that contain additional type definitions and code. The CLR can directly load modules that contain an assembly manifest. For modules that lack an assembly manifest, the CLR can load them only indirectly, by first loading a module whose assembly manifest refers to the manifest-less module. Figure 2.2 shows two modules: one with an assembly manifest and one without one. Note that of the four /t compiler options, only /t:module produces a module with no assembly manifest.
Figure 2.2: Modules and Assemblies
Figure 2.3 shows an application that uses a multimodule assembly, and Listing 2.1 shows the MAKEFILE that would produce it. In this example, code.netmodule is a module that does not contain an assembly manifest. To make it useful, one needs a second module (in this case, component.dll) that provides an assembly manifest that references code.netmodule as a subordinate module. One achieves this using the /addmodule switch when compiling the containing assembly. After this assembly is produced, all the types defined in component.dll and code.netmodule are scoped by the name of the assembly (component). Programs such as application.exe use the /r compiler switch to reference the module containing the assembly manifest. This makes the types in both modules available to the referencing program.
Figure 2.3: Multimodule Assemblies Using CSC.EXE
Listing 2.1: Multimodule Assemblies Using CSC.EXE an1d NMAKE
# code.netmodule cannot be loaded as is until an assembly # is created code.netmodule : code.cs csc /t:module code.cs # types in component.cs can see internal and public members # and types defined in code.cs component.dll : component.cs code.netmodule csc /t:library /addmodule:code.netmodule component.cs # types in application.cs cannot see internal members and # types defined in code.cs (or component.cs) application.exe : application.cs component.dll csc /t:exe /r:component.dll application.cs
The assembly manifest resides in exactly one module and contains all of the information needed to locate types and resources defined as part of the assembly. Figure 2.4 shows a set of modules composed into a single assembly, as well as the CSC.EXE switches required to build them. Notice that in this example, the assembly manifest contains a list of file references to the subordinate modules pete.netmodule and george.netmodule. In addition to these file references, each of the public types in these subordinate modules is listed using the .class extern directive, which allows the complete list of public types to be discovered without traversing the metadata for each of the modules in the assembly. Each entry in this list specifies both the file name that contains the type as well as the numeric metadata token that uniquely identifies the type within its module. Finally, the module containing the assembly manifest will contain the master list of externally referenced assemblies. This list consists of the dependencies of every module in the assembly, not just the dependencies of the current module. This allows all of the assembly's dependencies to be discovered by loading a single file.
Figure 2.4: A Multimodule Assembly
Finally, the module containing the assembly manifest will contain the master list of externally referenced assemblies. This list consists of the dependencies of every module in the assembly not only the dependencies of the current module. This allows all of the assembly's dependencies to be discovered by loading a single file.
Assemblies form an encapsulation boundary to protect internal implementation details from interassembly access. Programmers can apply this protection to members of a type (e.g., fields, methods, constructors) or to a type as a whole. Marking a member or type as internal causes it to be available only to modules that are part of the same assembly. Marking a type or member as public causes it to be available to all code (both inside and outside the current assembly). Individual members of a type (e.g., methods, fields, constructors) can also be marked as private, which restricts access to only methods and constructors of the declaring type. This supports classic C++-style programming, in which intracomponent encapsulation is desired. In a similar vein, programmers can mark members of a type as protected, which broadens the access allowed by private to include methods and constructors of derived types. The protected and internal access modifiers can be combined, something that provides access to types that are either derived from the current type or are in the same assembly as the current type. Table 2.2 shows the language-specific modifiers as they apply both to types and to individual members. Note that members marked protected internal in C# require only that the accessor be in the same assembly or in a derived type. The CLR also supports an access modifier that requires the accessor to be both in the same assembly and in a derived type (marked famandassem in the metadata). However, VB.NET and C# do not allow programmers to specify this access modifier.
Table 2.2 Access Modifiers
|
C# |
VB.NET |
Meaning |
Type |
public |
Public |
Type is visible everywhere. |
internal |
Friend |
Type is visible only inside assembly. |
|
Member |
public |
Public* |
Member is visible everywhere. |
internal |
Friend |
Member is visible only inside assembly. |
|
protected |
Protected |
Member is visible only inside declaring type and its subtypes. |
|
protected internal |
Protected Friend |
Member is visible only inside declaring type and its subtypes or other types inside assembly. |
|
private |
Private* |
Member is visible only inside declaring type. |
Assemblies scope the type definitions of a component. CLR types are uniquely identified by their assembly name/type name pair. This allows two definitions of the type Customer to coexist inside the runtime without ambiguity, provided that each one is affiliated with a different assembly. Although it is possible for multiple assemblies to define the type Customer without confusing the runtime, it does not help the programmer who wants to use two or more definitions of the same type name in a single program because the symbolic type name is always Customer no matter which assembly defines it. To address this limitation of most programming languages, CLR type names can have a namespace prefix. This prefix is a string that typically begins with either the organization name of the developer (e.g., Microsoft, AcmeCorp) or System if the type is part of the .NET framework. An emerging convention is to name the assembly based on the namespace prefix. For example, the .NET XML stack is deployed in the System.Xml assembly, and all of the contained types use the System.Xml namespace prefix. This is simply a convention and not a rule. For example, the type System.Object resides in an assembly called mscorlib and not in the assembly called System, even though there actually is an assembly called System.