- The Portable Kernel
- The Legacy Problem
- Emulation to the Rescue
- .NET and ARM
- Will It Work?
Emulation to the Rescue
The solution to the problem of running binaries intended for other architectures is emulation. The Alpha version of Windows NT shipped with an emulator called FX!32, from DEC. This had two components. The first was a simple emulator that read each instruction and emulated it as it was encountered. It also recorded profiling traces of the application and any of its linked libraries. The second part was an offline binary translator, which ran in the background and produced an Alpha binary that could be run the next time, meaning that the second time that a user ran an x86 program it was a lot faster.
Offline binary translation was chosen for the Alpha because it meant that the translator was not competing for CPU time and memory with the emulator. Modern emulators have different constraints. A modern mobile phone has more RAM than an expensive Alpha workstation running NT 4. Most of the Cortex A9 devices have two coressome will have moreso running a binary translator in the background doesn't take CPU away from single-threaded x86 apps.
The most important feature of FX!32 was its ability to mix native and emulated code. This is the reason that emulators went from giving around 10 percent of native performance to giving 50-75 percent with very little additional effort. Whenever you called a function in one of the standard Windows DLLs, the emulator called a stub version in jacket.dll, which collected the x86 versions of the parameters from the emulator and constructed an Alpha call frame.
This same trick is used by Transitive Technology's emulator, which was licensed to Apple and distributed under the Rosetta brand name. Things like laying out text and rendering are all done by system libraries. When PowerPC code on an Intel Mac calls any of these, it calls the native version. The cost of the function call is slightly greater, but the cost of the drawingwhich is one of the most CPU-intensive things a typical desktop app doesis exactly the same as if a native application had done it.
Modern emulators do binary translation in the background, so a hot path through the code is turned into native code for the next time that it's run. This is exactly the same technique used by things like the Hotspot JVM, which interprets Java bytecodes for much of the program, but compiles frequently-run bytecode sequences to native code.
In theory, the output from the binary translation can be the same speed as native code. In some experiments, it's even faster. A project by SGI in the '90s ran a MIPS dynamic translator in MIPS, and got about a 10 percent speedup over running the code directly, because the dynamic translator could do profile-driven optimizations.
In practice, it's usually slightly slower, and made worse by the fact that emulators don't usually bother using dynamic translation for all of the code.
The Alpha could easily get by with emulation performance at around 50 percent of native, because the CPU was about twice the speed of the fastest x86 chip you could buy, so even poorly performing emulated code ran as fast as it would on a midrange Intel system.
The situation is a bit different for ARM. Typically, ARM chips are slower than x86 CPUs. They're competitive with Intel's Atom line, but not with anything else in terms of pure performance. Much of the performance of a typical ARM system comes from offloading CPU-intensive tasks like video and audio encoding or decoding to an on-die DSP or GPU. Applications that use the Windows Media Player APIs for decoding video, for example, should perform very well under ARM because they'll use the native codecs, which will be implemented on the DSP.
Other CPU-bound tasks may be less fortunate. Even if they run in an emulator that gives 100 percent native performance, they won't exactly be speed demons. That said, while I've been writing this, on my four-year-old laptop, I checked my CPU load: I have 108 processes currently running, and my CPU load has not gone over 20 percent. Even if an emulator only got 25 percent native performance, the only way I'd notice would be if I actually checked the graph.
Microsoft actually has a head-start when it comes to producing an x86 emulator. A few years ago, they bought a company called Connectix. This company produced VirtualPC for Windows, which Microsoft wanted for things like server virtualization and the XP mode in Windows Vista. They also produced VirtualPC for Mac. This was a full x86 emulator for PowerPC systems, complete with dynamic translation. Porting this to ARM would be a nontrivial exercise, but it would be a lot less effort than writing an x86 emulator from scratch.