Fast Lookup
In a dynamic language program, much of the slowness comes from the extra layer of indirection related to method calls. When you call a function in a C program, the address of the function is stored in a global variable, which is then simply loaded. In a JavaScript program, the mapping is a lot more complicated. When you call a method on a JavaScript object, you need to look up the method with that name in the dictionary associated with that object. This lookup process can be very slow.
In a naïve implementation, the mapping from method names to implementations might be performed in something like a tree, using string comparisons to find the correct mapping. To my knowledge, no one does this (except students implementing a crude object model as a programming assignment). Lisp introduced the notion of atoms, and the same concept has been picked up by other languages. In Smalltalk, they're called symbols. When they're used for identifying methods, they're commonly called selectors. An atom is an integer value representing a string. Often, this is just the pointer representing the address at which the string is stored (for fast lookup of the string value), but it doesn't have to be that way. The only requirement is that if two atoms have the same name, they also have the same number (and if they have different names, they have different numbers). You can implement this arrangement quite easily with a simple array, adding new atoms at the end of the array, and using their numbers as the numerical values.
This setup simplifies lookup slightly; now you're mapping a number to a method, rather than to a string. Now we can at least do very fast comparisons between selectors, navigating a binary tree to get to the correct value, or we can use some other sparse-array structure.
Even with a good sparse-array, something like a method lookup costs 3[nd]4 times as much as a function call. How do we make this faster? The same way we make anything go faster in computing: We add a cache.
At each call site, you're typically invoking a method with the same name, but potentially on different objects. Similarly, a single object may have multiple methods called on it. In each case, you're performing a mapping from objects and method names to methods, and one of the two inputs is fixed. This approach provides two potential places for caching a method lookup:
- The first potential cache place is on the object[md]or, more commonly, on the class. For a lot of classes in a typical program, only a small number of methods are invoked frequently. For example, the insert and lookup methods of an array class will be accessed a lot more frequently than most of its other methods. If you can speed up a small number of frequently used methods, you can make the entire program run faster.
- The other place to put the cache is at each call site. Remember that each call site is looking up a single method name, so this cache works the other way around. Rather than storing a set of selector-to-method mappings, it stores a set of object-to-method mappings. The simplest way of doing this is to simply keep a single cache of the last method called. Instead of performing a full lookup, you just test whether this is the same object as last time, by comparing its pointer to the cached version.
A simple option is to have a very small hash table, where the key is some bits from the selector (the number representing the method name). If your selectors are 32-bit string pointers, you can select four bits from the middle somewhere and get a reasonable hash, which is then used as an index into a 16-element array of pointers to structures containing method names and pointers to the methods. This technique lets you perform lookups for recently used methods very quickly. If the method is not found in the cache, you can add it after doing the slower lookup by just setting the pointer; because only a single word is being written, it doesn't need any locking. If two threads write to the same cache entry at the same time, only one will work correctly, but that doesn't matter.
You can reuse the same cache entry for any pair of objects that have the same selector-to-method mapping, which is what makes the hidden class transform so useful. If two objects have the same class, they can use the same cache entry, which is what makes the cache a lot more useful. A particular call site is much more likely to be reached twice in a row with instances of the same class than with the same object.
This kind of inline caching works fairly well, but the Self team took it one step further with polymorphic inline caching, which stores a small number of class-to-method mappings for each call site. The ideal number for each call site is variable. The larger the cache, the more expensive it is to search, and after a certain size it's cheaper to perform a complete lookup than to test the cache. For some pathological cases, it may be better not to have a cache at all: If the cache is generating misses most of the time, you're still paying for searching it, but getting nothing in return. In a tight loop calling the same method on the same object, a small cache may give the best results. You can find the optimal size through profiling the code as it runs. A virtual machine can do this much more easily than statically compiled code can; run it with one set of guesses, and recompile bits where those guesses turned out to be wrong.
In the conclusion of this series, we'll look at some techniques to speed up the code generated from compiled JavaScript.