12.6 unordered_map
The cost of a map lookup is O(log(n)) where n is the number of elements in the map. That’s pretty good. For example, for a map with 1,000,000 elements, we perform only about 20 comparisons and indirections to find an element. However, in many cases, we can do better by using a hashed lookup rather than a comparison using an ordering function, such as <. The standard-library hashed containers are referred to as “unordered” because they don’t require an ordering function:
For example, we can use an unordered_map from <unordered_map> for our phone book:
unordered_map<string,int> phone_book { {"David Hume",123456}, {"Karl Popper",234567}, {"Bertrand Arthur William Russell",345678} };
Like for a map, we can subscript an unordered_map:
int get_number(const string& s) { return phone_book[s]; }
The standard library provides a default hash function for strings as well as for other built-in and standard-library types. If necessary, we can provide our own. Possibly, the most common need for a custom hash function comes when we want an unordered container of one of our own types. A hash function is often implemented as a function object (§7.3.2). For example:
struct Record { string name; int product_code; // ... }; struct Rhash { // a hash function for Record size_t operator()(const Record& r) const { return hash<string>()(r.name) ^ hash<int>()(r.product_code); } }; unordered_set<Record,Rhash> my_set; // set of Records using Rhash for lookup
Designing good hash functions is an art and often requires knowledge of the data to which it will be applied. Creating a new hash function by combining existing hash functions using exclusive-or (^) is simple and often very effective. However, be careful to ensure that every value that takes part in the hash really helps to distinguish the values. For example, unless you can have several names for the same product code (or several product codes for the same name), combining the two hashes provides no benefits.
We can avoid explicitly passing the hash operation by defining it as a specialization of the standard-library hash:
namespace std { // make a hash function for Record template<> struct hash<Record> { using argument_type = Record; using result_type = size_t; result_type operator()(const Record& r) const { return hash<string>()(r.name) ^ hash<int>()(r.product_code); } }; }
Note the differences between a map and an unordered_map:
A map requires an ordering function (the default is <) and yields an ordered sequence.
A unordered_map requires an equality function (the default is ==); it does not maintain an order among its elements.
Given a good hash function, an unordered_map is much faster than a map for large containers. However, the worst-case behavior of an unordered_map with a poor hash function is far worse than that of a map.