- Object Equality and Identity
- Object Hash Codes
- Object Cloning
Object Hash Codes
The designers of the FCL decided that it would be incredibly useful if any instance of any object could be placed into a hash table collection. To this end, System.Object provides a virtual GetHashCode method so that an Int32 hash code can be obtained for any and all objects.
If you define a type and override the Equals method, you should also override the GetHashCode method. In fact, Microsoft's C# compiler emits a warning if you define a type that overrides just one of these methods. For example, compiling the following type yields this warning: "warning CS0659: 'App' overrides Object.Equals(object o) but does not override Object.GetHashCode()."
class App { public override Boolean Equals(Object obj) { ... } }
The reason that a type must define both Equals and GetHashCode is that the implementation of the System.Collections.Hashtable type requires any two objects that are equal to have the same hash code value. So, if you override Equals, you should override GetHashCode to ensure that the algorithm you use for calculating equality corresponds to the algorithm you use for calculating the object's hash code.
Basically, when you add a key/value pair to a Hashtable object, a hash code for the key object is obtained first. This hash code indicates what "bucket" the key/value pair should be stored in. When the Hashtable object needs to look up a key, it gets the hash code for the specified key object. This code identifies the "bucket" that is now searched, looking for a stored key object that is equal to the specified key object. Using this algorithm of storing and looking up keys means that if you change a key object that is in a Hashtable, the Hashtable will no longer be able to find the object. If you intend to change a key object in a hash table, you should first remove the original object/value pair, next modify the key object, and then add the new key object/value pair back into the hash table.
Defining a GetHashCode method can be easy and straightforward. But, depending on your data types and the distribution of data, it can be tricky to come up with a hashing algorithm that returns a well-distributed range of values. Here's a simple example that will probably work just fine for Point objects:
class Point { Int32 x, y; public override Int32 GetHashCode() { return x ^ y; // x XOR'd with y } § }
When selecting an algorithm for calculating hash codes for instances of your type, try to follow these guidelines:
Use an algorithm that gives a good random distribution for the best performance of the hash table.
Your algorithm can also call the base type's GetHashCode method, including its return value in your own algorithm. However, you don't generally want to call Object or ValueType's GetHashCode method because the implementation in either method doesn't lend itself to high-performance hashing algorithms.
Your algorithm should use at least one instance field.
Ideally, the fields you use in your algorithm should be immutable; that is, the fields should be initialized when the object is constructed, and they should never again change during the object's lifetime.
Your algorithm should execute as quickly as possible.
Objects with the same value should return the same code. For example, two String objects with the same text should return the same hash code value.
System.Object's implementation of the GetHashCode method doesn't know anything about its derived type and any fields that are in the type. For this reason, Object's GetHashCode method returns a number that is guaranteed to uniquely identify the object within the AppDomain; this number is guaranteed not to change for the lifetime of the object. After the object is garbage collected, however, its unique number can be reused as the hash code for a new object.
System.ValueType's implementation of GetHashCode uses reflection and returns the hash code of the first instance field defined in the type. This is a nai[um]ve implementation that might be good for some value types, but I still recommend that you implement GetHashCode yourself. Even if your hash code algorithm returns the hash code for the first instance field, your implementation will be faster than ValueType's implementation. Here's what ValueType's implementation of GetHashCode looks like:
class ValueType { public override Int32 GetHashCode() { // Get this type's public/private instance fields. FieldInfo[] fields = this.GetType().GetFields( BindingFlags.Instance | BindingFlags.Public | BindingFlags.NonPublic); if (fields.Length > 0) { // Return the hash code for the first non-null field. for (Int32 i = 0; i < fields.Length; i++) { Object obj = field[i].GetValue(this); if (obj != null) return obj.GetHashCode(); } } // No non-null fields exist; return a unique value for the type. // NOTE: GetMethodTablePtrAsInt is internal, undocumented return GetMethodTablePtrAsInt(this); } }
If you're implementing your own hash table collection for some reason or you're implementing any piece of code in which you'll be calling GetHashCode, you should never persist hash code values. The reason is that hash code values are subject to change. For example, a future version of a type might use a different algorithm for calculating the object's hash code.