Java Type Information and Reflection
- The Binary Class Format
- Reflection
- Reflective Invocation
- Dynamic Proxies
- Reflection Performance
- Package Reflection
- Custom Metadata
- Onward
- Resources
Java classes preserve a wealth of information about programmer intent. Rather than just containing a jumble of executable instructions, binary classes1 also contain large amounts of metadatadata that describes the structure of the binary class. Most of this metadata is type information enumerating the base class, superinterfaces, fields, and methods of the class. Type information is used to make the dynamic linking of code more reliable by verifying at runtime that clients and servers share a common view of the classes they use to communicate.
The presence of type information also enables dynamic styles of programming. You can introspect against a binary class to discover its fields and methods at runtime. Using this information, you can write generic services to add capabilities to classes that have not even been written yet.
The binary class format is a simple data structure that you could parse to perform introspection yourself. Rather than going to this trouble, you can use the Java Reflection API instead. Reflection provides programmatic access to most of the metadata in the binary class format. It also provides not only the ability to introspect classes for metadata, but also the ability to dynamically access fields and methods. Reflective invocation is critical for writing generic object services. As of SDK version 1.3, reflection also includes the ability to manufacture classes called dynamic proxies at runtime. This chapter introduces the binary class format, the uses of metadata, the Reflection API, dynamic proxies, and custom metadata.
3.1 The Binary Class Format
The binary class format means different things to different people. To an application developer, the binary class is the compiled output of a Java class. Most of the time, you can treat the class format as a black boxa detail that is thankfully hidden by the compiler. The binary class is also the unit of executable code recognized by the virtual machine. Virtual machine developers see the binary class as a data structure that can be loaded, interpreted, and manipulated by virtual machines and by Java development tools. The binary class is also the unit of granularity for dynamic class loading. Authors of custom class loaders take this view and may use their knowledge of the binary class format to generate custom classes at runtime. But most importantly, the binary class is a well-defined format for conveying class code and class metadata.
Most of the existing literature on the binary class format targets compiler and virtual machine developers. For example, the virtual machine specification provides a wealth of detail about the exact format of a binary class, plus a specific explanation of extensions that can legally be added to that format. For a Java developer, such detail is overkill. However, hidden in that detail is information that the virtual machine uses to provide valuable services, such as security, versioning, type-safe runtime linkage, and runtime type information. The availability and quality of these services is of great concern to all Java developers. The remainder of Section 3.1 will describe the information in the binary class format, and how that information is used by the virtual machine. Subsequent sections show you how you can use this information from your own programs.
3.1.1 Binary Compatibility
A clear example of the power of class metadata is Java's enforcement of binary compatibility at runtime. Consider the MadScientist class and its client class BMovie, shown in Listing 31. If you compile the two classes and then execute the BMovie class, you will see that the threaten method executes as expected. Now, imagine that you decide to ship a modified version of MadScientist with the threaten method removed. What happens if an old version of BMovie tries to use this new version of MadScientist?
In a language that does not use metadata to link methods at runtime, the outcome is poorly defined. In this particular case, the old version of BMovie probably would link to the first method in the object. Since threaten is no longer part of the class, blowUpWorld is now the first method. This program error would literally be devastating to the caller.
Listing 31 The MadScientist Class
public class MadScientist { public void threaten() { System.out.println("I plan to blow up the world"); } public void blowUpWorld() { throw new Error("The world is destroyed. Bwa ha ha ha!"); } } public class BMovie { public static void main(String [] args) { MadScientist ms = new MadScientist(); ms.threaten(); } }
As bad as this looks, an obvious failure is actually one of the best possible outcomes for version mismatches in a language without adequate metadata. Consider what might happen in a systems programming language, such as C++, that encodes assumptions about other modules as numeric locations or offsets. If these assumptions turn out to be incorrect at runtime, the resulting behavior is undefined. Instead of the desired behavior, some random method may be called, or some random class may be loaded. If the random method does not cause an immediate failure, the symptoms of this problem can be incredibly difficult to track down. Another possibility is that the code execution will transfer to some location in memory that is not a method at all. Hackers may exploit this situation to inject their own malicious code into a process.
Compare all the potential problems above with the actual behavior of the Java language. If you remove the threaten method, and recompile only the MadScientist class, you will see the following result:
>java BMovie java.lang.NoSuchMethodError at BMovie.main(BMovie.java:4)
If a class makes a reference to a nonexistent or invalid entity in some other class, that reference will trigger some subclass of IncompatibleClassChangeError, such as the NoSuchMethodError shown above. All of these exception types indirectly extend Error, so they do not have to be checked and may occur at any time. Java assumes fallible programmers, incomplete compile-time knowledge, and partial installations of code that change over time. As a result, the language makes runtime metadata checks to ensure that references are resolved correctly. Systems languages, on the other hand, tend to assume expert programmers, complete compile-time knowledge, and full control of the installation processes. The code that results from these may load a little faster than Java code, but it will be unacceptably fragile in a distributed environment.
In the earlier example, the missing method threaten caused the new version of MadScientist to be incompatible with the original version of BMovie. This is an obvious example of incompatibility, but some other incompatibilities are a little less obvious. The exact rules for binary class compatibility are enumerated in [LY99], but you will rarely need to consult the rules at this level. The rules all support a single, common-sense objective: no mysterious failures. A reference either resolves to the exact thing the caller expects, or an error is thrown; "exactness" is limited by what the caller is looking for. Consider these examples:
You cannot reference a class, method, or field that does not exist. For fields and methods, both names and types must match.
You cannot reference a class, method, or field that is invisible to you, for example, a private method of some other class.
Because private members are invisible to other classes anyway, changes to private members will not cause incompatibilities with other classes. A similar argument holds for package-private members if you always update the entire package as a unit.
You cannot instantiate an abstract class, invoke an abstract method, subclass a final class, or override a final method.
Compatibility is in the eye of the beholder. If some class adds or removes methods that you never call anyway, you will not perceive any incompatibility when loading different versions of that class.
Another way to view all these rules is to remember that changes to invisible implementation details will never break binary compatibility, but changes to visible relationships between classes will.
3.1.1.1 Declared Exceptions and Binary Compatibility
One of the few oddities of binary compatibility is that you can refer to a method or constructor that declares checked exceptions that you do not expect. This is less strict than the corresponding compile-time rule, which states that the caller must handle all checked exceptions. Consider the versions of Rocket and Client shown in Listing 32. You can only compile Client against version 1 of the Rocket since the client does not handle the exception thrown by version 2. At runtime, a Client could successfully reference and use either version because exception types are not checked for binary compatibility.
This loophole in the binary compatibility rules may be surprising, but it does not compromise the primary objective of preventing inexplicable failures. Consider what happens if your Client encounters the second version of Rocket. If and when the InadequateNationalInfrastructure exception is thrown, your code will not be expecting it, and the thread will probably terminate. Even though this may be highly irritating, the behavior is clearly defined, and the stack trace makes it easy to detect the problem and add an appropriate handler.
Listing 32 Checked Exceptions Are Not Enforced by the VM.
public class Client { Rocket r = new Rocket(); } public class Rocket { //version 1 public Rocket() { _ } } public class Rocket { //version 2 public Rocket() throws InadequateNationalInfrastructure { _ } }
3.1.1.2 Some Incompatible Changes Cannot Be Detected
The Java compiler enforces the rules of binary compatibility at compile time, and the virtual machine enforces them again at runtime. The runtime enforcement of these rules goes a long way toward preventing the accidental use of the wrong class. However, these rules do not protect you from bad decisions when you are shipping a new version of a class. You can still find clever ways to write new versions of classes that explode when called by old clients.
Listing 33 shows an unsafe change to a class that Java cannot prevent. Clients of the original version of Rocket expect to simply call launch. The second version of Rocket changes the rules by adding a mandatory preLaunchSafetyCheck. This does not create any structural incompatibilities with the version 1 clients, who can still find all the methods that they expect to call. As a result, old versions of the client might launch new rockets without the necessary safety check. If you want to rely on the virtual machine to protect the new version of Rocket from old clients, then you must deliberately introduce an incompatibility that will break the linkage. For example, your new version could implement a new and different Rocket2 interface.2
Listing 33 Some Legal Changes to a Class May Still Be Dangerous.
public interface Rocket { //version 1 public void launch(); } public interface Rocket { //version 2 public void mandatoryPreLaunchSafetyCheck(); public void launch(); }
3.1.2 Binary Class Metadata
[LY99] documents the exact format of a binary class. My purpose here is not to reproduce this information but to show what kinds of metadata the binary class includes. Figure 31 shows the relevant data structures that you can traverse in the binary class format. The constant pool is a shared data structure that contains elements, such as class constants, method names, and field names, that are referenced by index elsewhere in the class file. The other structures in the class file do not hold their own data; instead, they hold indexes into the constant pool. This keeps the overall size of the class file small by avoiding the repetition of similar data structures.
Figure 31 Metadata in the binary class format
The -superclass and -interfaces references contain indices into the constant pool. After a few levels of indirection, these indices eventually lead to the actual string names of the class's base class and superinterfaces. The use of actual string names makes it possible to verify at runtime that the class meets the contractual expectations of its clients.
Note that the class name format used by the virtual machine is different from the dotted notation used in Java code. The VM uses the "/" character as a package delimiter. Also, it often uses the "L" and ";" characters to delimit class names if the class name appears inside a stream where other types of data might also appear. So, the class java.lang.String will appear as either java/lang/String or Ljava/lang/String; in the class file's constant pool.
The fields and methods arrays also contain indices into the constant pool. Again, these constant pool entries lead to the actual string names of the referenced types, plus the string names of the methods and fields. If the referenced type is a primitive, the VM uses a special single-character string encoding for the type, as shown in Table 31. A method also contains a reference to the Java bytecodes that implement the method. Whenever these bytecodes refer to another class, they do so through a constant pool index that resolves to the string name of the referenced class. Throughout the virtual machine, types are referred to by their full, package qualified string names. Fields and methods are also referenced by their string names.
Table 31 Virtual Machine Type Names
Java Type |
Virtual Machine Name |
int |
I |
float |
F |
long |
J |
double |
D |
byte |
B |
boolean |
Z |
short |
S |
char |
C |
type[ ] |
[type |
package.SomeClass |
Lpackage.SomeClass; |
3.1.2.1 Analyzing Classes with javap
The details of binary class data structures are of interest to VM writers, and they are covered in detail in the virtual machine specification [LY99]. Fortunately, there are a large number of tools that will display information from the binary class format in a human-friendly form. The javap tool that ships with the SDK is a simple class decompiler. Consider the simple Echo1 class:
public class Echo1 { private static final String prefix = "You said: "; public static void main(String [] args) { System.out.println(prefix + args[0]); } }
If you run javap on the compiled Echo1 class, you will see output similar to Listing 34. As you can see, the class format contains the class names, the method names, and the parameter type names. The javap utility has a variety of more verbose options as well, including the c flag to display the actual bytecodes that implement each method, shown in Listing 35. Without worrying about what specific bytecodes do, you can easily see that the bytecode instructions refer to classes, fields, and members by name. The #10, #5, #1, and #8 in the output are the indices into the constant pool; javap helpfully resolves these indices so that you can see the actual strings being referenced.
Listing 34 Standard javap Output
>javap Echo Compiled from Echo1.java public class Echo1 extends java.lang.Object { public Echo1(); public static void main(java.lang.String[]); }
Listing 35 Javap Output with Bytecodes Included
>javap -c Echo1 {output clipped for brevity} Method void main(java.lang.String[]) 0 getstatic #10 <Field java.io.PrintStream out> 3 new #5 <Class java.lang.StringBuffer> 6 dup 7 ldc #1 <String "You said: "> 9 invokespecial #8 <Method java.lang.StringBuffer(java.lang.String)> etc_
3.1.3 From Binary Classes to Reflection
Java class binaries always contain metadata, including the string names for classes, fields, field types, methods, and method parameter types. This metadata is used implicitly to verify that cross-class references are compatible. Both metadata and the notion of class compatibility are built into the bones of the Java language, so there is no subterranean level where you can avoid their presence. By themselves, the binary compatibility checks provided by the virtual machine would be sufficient to justify the cost of creating, storing, and processing class metadata. In reality, these uses only scratch the surface. You can access the same metadata directly from within your Java programs using the Reflection API.