CPU Wars, Part 2: POWER to the People
Part 1 of this series took a broad look at the state of the CPU industry and what general trends existed. This week, we’ll examine the two survivors of the workstation-targeted RISC families: PowerPC and SPARC.
A Separation of POWERs
IBM’s POWER line began as a very high-performance multi-chip product. The POWER2 was the first in the series to be a microprocessor, in 1996. Back in 1991, IBM began a collaboration with Apple and Motorola to create the PowerPC architecture. This was very similar to the POWER instruction set, although neither was a subset of the other. Some compilers, including GCC, can output code that just uses the common subset, however; and AIX running on PowerPC chips would trap POWER-specific instructions and emulate them in software.
Starting with the POWER3, IBM blurred the line further. The POWER3 implemented the 64-bit PowerPC instruction set, as well as the POWER2 instruction set. Thus, all recent POWER chips have also been PowerPC chips (although the converse is not true).
The PowerPC was intended to replace both Motorola’s aging 68000 series and Intel’s 8086 series. It included a number of features to make emulating both of these architectures easier, including special instructions for supporting both big-endian and little-endian operating systems.
PowerPC never made the inroads into the desktop market that IBM and Motorola wished, but it has been hugely successful in the embedded market. Microsoft released Windows NT 4.0 for PowerPC, but few machines running it were sold. IBM’s OS/2 port took too long and never sold. Only Apple made any inroads into the desktop market, but has now abandoned PowerPC in favor of x86 chips.
IBM sells PowerPC 970 and POWER5+ workstations, but these are considerably more expensive than similar Opteron offerings.
This year sees IBM moving to the POWER6, replacing its current high-end CPUs. The main market for the POWER-branded chips is high-performance computing—the POWER5 series has been very popular among supercomputer builders—although many of the technologies featured are likely to trickle down into PowerPC-branded systems eventually. One thing, however, has migrated the other way: VMX.
Motorola introduced vector extensions into its PowerPC line with the 7400 series, which IBM collaborated in designing but didn’t manufacture. These were added to the PowerPC specification with version 2.03. IBM has implemented these extensions with the PowerPC 970 series of chips, but they’ve been absent from the POWER line. This situation will change with the POWER6, which incorporates the vector instruction set (dubbed VMX by IBM, AltiVec by Motorola, and Velocity Engine by Apple).
Another interesting feature is the addition of a decimal floating-point unit. This change is important for financial institutions, in which a fixed number of decimal places of accuracy is often required. It’s impossible to represent many common decimals as nonrecurring binaries. The value 0.2, for example, recurs after the eighth binary digit, and 0.1 after the ninth. This leads to rounding errors, which are a big problem when dealing with money. Languages such as COBOL and recent versions of Java provide support for decimal arithmetic to avoid this problem, but emulating this arithmetic on a machine that doesn’t natively support it can be very slow. This reasoning is thought to be part of IBM’s initiative to consolidate all of its computing products, from workstations to mainframes, on a single architecture.
The thing that makes the POWER6 most remarkable is the clock speed. In an era when everyone else is moving to more—slower—cores, IBM plans to release a chip at 5 GHz. This is likely to give IBM the best single-thread performance for a while (and, consequently, the best overall performance, if you use enough chips), which should present an advantage in some markets.
It should come as no surprise that the POWER6 supports SMT, since IBM was the first to market this feature in a general-purpose CPU with this in earlier POWER processors (the first to market overall was Sun, with a CPU aimed at Java applications). Another IBM first was virtualization; System/360 derivatives were the first to support virtualization, and IBM has included hardware support for virtualization in all recent products. The POWER6 is expected to support 1024 virtual partitions, although it’s unlikely that quite this many will be needed for a while.
Although the POWER6 is an incremental improvement on the POWER line, IBM has several other CPU products, the most interesting of which is the Cell, co-developed with Sony and Toshiba. The most unusual feature of the Cell is that it’s a heterogeneous multicore design. Most other multicore processors have two or more copies of the same core, while the Cell has one core of one kind and eight of another.
The first core is a fairly simple PowerPC. This is a 64-bit design, similar to the PowerPC 970 in capabilities but with two-way SMT and without out-of-order execution. This core can run existing PowerPC code, but when the CPU is busy it’s mainly responsible for coordinating the other eight, known in IBM buzzword lingo as Synergistic Processing Units (abbreviated to SPU by everyone who is too embarrassed to say "Synergistic" in polite company). I’m not going to talk much about the Cell, because it has been covered in exhaustive detail everywhere else already, but I will cover a few key points.
The SPU is basically an extended VMX unit with a few other instructions. The extended VMX instruction set (VMX128) is most notable for having 128 registers, rather than the standard 32. Since each of these registers is 128 bits wide, this gives 16KB of space in registers—a number not far off the size of level-1 cache in other processors.
The most interesting feature is not the instruction set, however, but the memory architecture. Rather than having a cache that’s transparent to the programmer, the 256KB of RAM that’s local to the SPU is directly exposed, along with instructions to perform bulk DMA transfers between it and main memory (or memory belonging to other SPUs).
In recent years, IBM has put a lot of its weight behind Free Software, and Linux in particular. This makes sense from the perspective of IBM as a whole, since IBM is a services company that tries to give customers the product they want, whatever it is—and a big support contract to go with it. This approach may well have a knock-on effect for IBM’s CPU arm, since Linux runs just as well (and in some cases better) on POWER/PowerPC as x86. For customers looking at thin clients or simple workstations, PowerPC designs originally aimed at the embedded market might be a good choice. For those looking at high-end servers, the POWER6 may be cost-effective. Without the need to run Windows, there’s a great deal more flexibility in the possible architectures.
FreeScale
FreeScale, formerly Motorola’s CPU division, has been shipping PowerPC chips since the beginning. FreeScale was still providing Apple with laptop chips before the Intel switch, but the design wasn’t updated much in recent years. While still shipping a large number of PowerPC chips, they tend to be mainly for the embedded market (thirty or so in every new BMW, for example).
The FreeScale PowerPC 74xx series (the G4, to Apple users) was seriously limited by the slow front-side bus speed, which made it almost impossible to keep the vector unit fed with data. The successor to this design, the e600, is due out early in 2007, and is expected to feature a dual-core design running at up to 1.5 GHz, with the e700 taking this speed up to 3 GHz and adding 64-bit support. When these chips were first announced in 2004, they seemed obvious contenders for a PowerBook upgrade. Now, their place seems a little more uncertain.
PA Semi
After the death of the Alpha, some of the brightest CPU designers in the industry were scattered. Many ended up at AMD and worked on the Opteron. A few went to Intel and worked on XScale. Two have now formed their own company, PA Semi, and are working on PowerPC designs.
PowerPC grew out of a collaboration between three companies, and IBM has always been keen to encourage others to use the architecture, hoping to displace x86. This setup has worked quite well in the embedded sphere, with the PowerPC 4xx series being very popular with ASIC designers who want to add a little bit of custom logic to an existing design. It’s also found at the core of a number of higher-end FPGAs, allowing an existing operating system to run on the PowerPC core and offload application-specific workloads to custom logic in the FPGA. Most PowerPC designs tend to focus on the bottom end of the market, however.
PA Semi is more ambitious. Its recently unveiled PWRficient design is squarely aimed at the performance-per-watt target. Featuring two 2 GHz 64-bit cores, it’s an impressive design. The chip features DDR2 controllers and 2MB of level-2 cache. These aren’t attached to the cores directly; instead, they’re connected via a crossbar, giving either CPU direct access to the cache, and the cache controller direct access to both memory controllers.
The same crossbar is used to attach a number of other dedicated processing units. These include 10-gigabit Ethernet controllers, DMA engines, and a TCP offload engine. The DMA controller, being connected to the crossbar, is capable of moving data between any of the I/O components and memory (including I/O-to-I/O and memory-to-memory transfers) in a way somewhat reminiscent of the Cell. Perhaps the most interesting features is that the chip implements a significant portion of iSCSI in hardware, as well as IPSec/SSL and common RAID functions. This makes these chips an ideal choice for lower-power NAS controllers.
In spite of expected performance close to that of the PowerPC 970MP, the power dissipation of the chip is claimed to peak at only 25 W at 2 GHz, dropping to 10 W for the 1 GHz variant, compared to around 80 W for a 2 GHz dual-core 970MP, which requires additional chips for memory, PCIe etc. controllers.
The PWRficient line looks like the ideal part for a non-x86 laptop, but we don’t see a lot of people lining up to manufacture those now that Apple has left the market. IBM might have been interested in a ThinkPad that could run AIX, but has now sold its laptop arm to Lenovo. I’ll be very interested to see what products do end up using the chips; the fact that QNX and Wind River are both partners of PA Semi indicates that they’re aiming hard at the embedded/real-time market, and I expect that a low-power, high-performance chip of this nature is going to give rise to a lot of exciting products in the next few years.