Making JavaScript Fast, Part 1
Over a decade ago, Netscape changed the face of the Web by providing a scripting language that ran in the browser. Suddenly, it became possible to run programs in web pages. The JavaScript language is heavily influenced by Self but, although Self is one of the fastest dynamic languages around, JavaScript was very slow. Most of the time, people did nothing more with JavaScript than animate menus and make things go "Ding."
More recently, this situation has changed. The concept of a web application has started to gain traction, and the new breed has started doing a lot more on the client side, using JavaScript[md]now all grown up and standardized as ECMAScript.
When a web page was just using JavaScript for animating menus, most of the CPU time was spent in the browser's rendering engine, even with a very slow JavaScript implementation. Now, things like Google Spreadsheets are running large numbers of calculations, and people are starting to notice when the script is slow. For this reason, the latest browsers are all boasting about how fast their JavaScript implementations are. A lot of the new-and-exciting improvements are implementations of research done in the 1980s on languages like Smalltalk and Self. In this series, we'll take a look at exactly how they work.
Interpreting or Compiling?
Anyone who has worked on a big software project can tell you that compiling a program takes time. A compiler first has to take a long stream of text and split it into tokens[md]identifiers, special symbols, keywords, and so on. It then takes this token stream and turns it into a parse tree, which is a concrete representation of the syntactic structure of the program. For example, the for statement in C-like languages is a simple structure containing pointers to the initializer, test, increment, and loop body.
Next, the compiler turns the parse tree into an abstract syntax tree (AST). This is similar to the parse tree, but removes syntactic sugar. Typically, an AST might restrict itself to a single type of loop, for example. Things like the add-and-assign operation (+=) might be represented by a single node in the parse tree, pointing to the expression and the target, but in the AST would be represented by separate addition and assignment nodes. The line is blurred somewhat in languages such as Lisp, Self, Smalltalk, or Io, where the concrete syntax of the language closely matches the abstract syntax, but a language like JavaScript often has several ways of expressing the same thing, so the translation from concrete syntax to abstract syntax can be quite time-consuming.
The simplest kind of interpreter runs the AST directly. Each AST node might be represented by an object, and implement something like a "run in context" method. To run a program, you perform a traversal of the tree, with each AST node recursively evaluating its children. This process is very slow. For example, to run the add-and-assign operation, you would call the method first in the assign node. This would then call it in the add node, which would have two children, one being the original value and the other being the expression to add. You would have to call all of these methods for a simple statement that, in compiled code, would be a small handful of instructions.
Why do people implement languages this way? This approach has two advantages:
- It's easy. Code that's easy to write is easy to make bug-free. It's also very small, which was very important on older browsers that were expected to run on computers with 4MB (or less) of RAM.
- It gives very quick load times.
Load times are very important on a web page. If a desktop application takes 30 seconds to compile, no one cares. The author or packager will do the compiling, or maybe the end user when installing, but then it won't need to be done again. If compiling makes the program run faster, spending a bit of time at the start is a huge overall gain, which is why most desktop software is still compiled (although this situation is changing).
In a web browser, the perception of that 30 seconds is very different. Imagine waiting 30 seconds after a web page had been downloaded before the scripts started working. For a lot of sites, the user would have given up and abandoned the page before compiling finished. Worse, the scripts may only run for a small fraction of a second of CPU time. Even if you're taking only one second to compile them, that still may be more time for the compiler than the AST-interpreter would take to run the script.
Although dynamic languages have been compiled for a while, many still use a halfway step, somewhere between interpreting an AST and compiling. In this technique, the AST is compiled into something called bytecode. This term is often used fairly generally now to mean machine code for any virtual machine, but it technically means machine code for an instruction set where every operation code (opcode) is one byte.
The advantage of one-byte opcodes is obvious: It's easy to implement them with a simple jump table. A bytecode interpreter looks like a massive switch statement. Each instruction is run by loading it, adding its value to a fixed address, and then jumping there. The jump target contains code to run the instruction and then jump back to the top of the switch statement with the next instruction.
Bytecode interpreters are (usually) slower than compiled code, but not by much. A bytecode instruction often maps to a short sequence of machine instructions, so the overhead from the jump table is relatively small. A well-designed bytecode can come close to half the speed of compiled code, although 1/10 the speed is more common in real-world use. Still a lot slower, but generating bytecode from an AST is usually very fast, so the startup time is low and performance is "good enough."