- Polymorphic Instructions and Typed Memory
- User Mode Traps
- Hardware Sparse Arrays
- Mondrian Memory Protection
- Message Passing Primitives
- Conclusions
Message Passing Primitives
It's not unique to dynamic languages, strictly speaking, but one thing I'd love to see is better hardware support for message passing. In Erlang (a dynamic language based on the communicating sequential processes formalism), it's common to have a few thousand—or more—parallel processes. The only way of communicating between these processes is by message passing. This situation may seem familiar to Smalltalk programmers, who have a few thousand objects that can communicate only via message passing.
The main difference between Smalltalk and Erlang is that Smalltalk delivers messages synchronously, but Erlang delivers them asynchronously. Therefore Smalltalk messages map nicely to function calls, whereas Erlang messages don't. By the way, there have been a few extensions to Smalltalk based on the actor model to allow asynchronous message sending (I wrote one for Smalltalk and Objective-C, which is part of the EtoileFoundation framework), so this isn't solely an Erlang problem.
The VMS PALcode for the Alpha had a set of atomic add-to-queue instructions. PALcode on the Alpha was a really great invention—one I'd love to see on other chips. A PALcode instruction was partway between a privileged instruction and a system call. New PALcode could be loaded only by privileged processes, and they defined a set of new instructions. When you invoked one of these, a short snippet of code would run in a privileged mode with access to a few hidden registers. Most of the Alpha's privileged instruction set was implemented in terms of PALcode. This setup allowed the privileged set to look like x86 when running Windows NT, like a VAX when running VMS, and like a PDP-11 when running UNIX.
The Alpha's add-to-queue instructions were simple atomic memory operations. The instruction would lock a memory word containing an address, write a value at this address, increment the address, and then release the lock. In a modern multicore architecture, cache concerns make an efficient implementation of this arrangement a lot more tricky. If you're writing a massively parallel program of the sort Erlang encourages, it's quite likely that the process receiving the message will be on the same CPU—possibly even on the same core—as the process sending it. You really don't want to send it via main memory.
With Mondrian memory protection and a good caching strategy, this technique might not give any benefits, although it's worth noting that Intel has implemented a simplified version of it. The new message-passing interrupt mechanisms allow a virtual CPU ID as a recipient for an interrupt and integrate with the new virtualization functions to determine where it should end up.