- The Power Mac G5
- The G5: Lineage and Roadmap
- The PowerPC 970FX
- Software Conventions
- Examples
3.2 The G5: Lineage and Roadmap
As we saw earlier, the G5 is a derivative of IBM's POWER4 processor. In this section, we will briefly look at how the G5 is similar to and different from the POWER4 and some of the POWER4's successors. This will help us understand the position of the G5 in the POWER/PowerPC roadmap. Table 3–2 provides a high-level summary of some key features of the POWER4 and POWER5 lines.
Table 3–2. POWER4 and Newer Processors
POWER4 |
POWER4+ |
POWER5 |
POWER5+ |
|
Year introduced |
2001 |
2002 |
2004 |
2005 |
Lithography |
180 nm |
130 nm |
130 nm |
90 nm |
Cores/chip |
2 |
2 |
2 |
2 |
Transistors |
174 million |
184 million |
276 million/chip [a] |
276 million/chip |
Die size |
415 mm2 |
267 mm2 |
389 mm2/chip |
243 mm2/chip |
LPAR [b] |
Yes |
Yes |
Yes |
Yes |
SMT [c] |
No |
No |
Yes |
Yes |
Memory controller |
Off-chip |
Off-chip |
On-chip |
On-chip |
Fast Path |
No |
No |
Yes |
Yes |
L1 I-cache |
2x64KB |
2x64KB |
2x64KB |
2x64KB |
L1 D-cache |
2x32KB |
2x32KB |
2x32KB |
2x32KB |
L2 cache |
1.41MB |
1.5MB |
1.875MB |
1.875MB |
L3 cache |
32MB+ |
32MB+ |
36MB+ |
36MB+ |
3.2.1 Fundamental Aspects of the G5
All POWER processors listed in Table 3–2, as well as the G5 derivatives, share some fundamental architectural features. They are all 64-bit and superscalar, and they perform speculative, out-of-order execution. Let us briefly discuss each of these terms.
3.2.1.1 64-bit Processor
Although there is no formal definition of what constitutes a 64-bit processor, the following attributes are shared by all 64-bit processors:
- 64-bit-wide general-purpose registers
- Support for 64-bit virtual addressing, although the physical or virtual address spaces may not use all 64 bits
- Integer arithmetic and logical operations performed on all 64 bits of a 64-bit operand—without being broken down into, say, two operations on two 32-bit quantities
The PowerPC architecture was designed to support both 32-bit and 64-bit computation modes—an implementation is free to implement only the 32-bit subset. The G5 supports both computation modes. In fact, the POWER4 supports multiple processor architectures: the 32-bit and 64-bit POWER; the 32-bit and 64-bit PowerPC; and the 64-bit Amazon architecture. We will use the term PowerPC to refer to both the processor and the processor architecture. We will discuss the 64-bit capabilities of the 970FX in Section 3.3.12.1.
3.2.1.2 Superscalar
If we define scalar to be a processor design in which one instruction is issued per clock cycle, then a superscalar processor would be one that issues a variable number of instructions per clock cycle, allowing a clock-cycle-per-instruction (CPI) ratio of less than 1. It is important to note that even though a superscalar processor can issue multiple instructions in a clock cycle, it can do so only with several caveats, such as whether the instructions depend on each other and which specific functional units they use. Superscalar processors typically have multiple functional units, including multiple units of the same type.
3.2.1.3 Speculative Execution
A speculative processor can execute instructions before it is determined whether those instructions will need to be executed (instructions may not need to be executed because of a branch that bypasses them, for example). Therefore, instruction execution does not wait for control dependencies to resolve—it waits only for the instruction's operands (data) to become available. Such speculation can be done by the compiler, the processor, or both. The processors in Table 3–2 employ in-hardware dynamic branch prediction (with multiple branches "in flight"), speculation, and dynamic scheduling of instruction groups to achieve substantial instruction-level parallelism.
3.2.1.4 Out-of-Order Execution
A processor that performs out-of-order execution includes additional hardware that can bypass instructions whose operands are not available—say, due to a cache miss that occurred during register loading. Thus, rather than always executing instructions in the order they appear in the programs being run, the processor may execute instructions whose operands are ready, deferring the bypassed instructions for execution at a more appropriate time.
3.2.2 New POWER Generations
The POWER4 contains two processor cores in a single chip. Moreover, the POWER4 architecture has features that help in virtualization. Examples include a special hypervisor mode in the processor, the ability to include an address offset when using nonvirtual memory addressing, and support for multiple global interrupt queues in the interrupt controller. IBM's Logical Partitioning (LPAR) allows multiple independent operating system images (such as AIX and Linux) to be run on a single POWER4-based system simultaneously. Dynamic LPAR (DLPAR), introduced in AIX 5L Version 5.2, allows dynamic addition and removal of resources from active partitions.
The POWER4+ improves upon the POWER4 by reducing its size, consuming less power, providing a larger L2 cache, and allowing more DLPAR partitions.
The POWER5 introduces simultaneous multithreading (SMT), wherein a single processor supports multiple instruction streams—in this case, two—simultaneously.
The POWER5 supports other important features such as the following:
- 64-way multiprocessing.
- Subprocessor partitioning (or micropartitioning), wherein multiple LPAR partitions can share a single processor. [19] Micropartitioned LPARs support automatic CPU load balancing.
- Virtual Inter-partition Ethernet, which enables a VLAN connection between LPARs—at gigabit or even higher speeds—without requiring physical network interface cards. Virtual Ethernet devices can be defined through the management console. Multiple virtual adapters are supported per partition, depending on the operating system.
- Virtual I/O Server Partition, [20] which provides virtual disk storage and Ethernet adapter sharing. Ethernet sharing connects virtual Ethernet to external networks.
- An on-chip memory controller.
- Dynamic firmware updates.
- Detection and correction of errors in transmitting data courtesy of specialized circuitry.
- Fast Path, the ability to execute some common software operations directly within the processor. For example, certain parts of TCP/IP processing that are traditionally handled within the operating system using a sequence of processor instructions could be performed via a single instruction. Such silicon acceleration could be applied to other operating system areas such as message passing and virtual memory.
Besides using 90-nm technology, the POWER5+ adds several features to the POWER5's feature set, for example: 16GB page sizes, 1TB segments, multiple page sizes per segment, a larger (2048-entry) translation lookaside buffer (TLB), and a larger number of memory controller read queues.
The POWER6 is expected to add evolutionary improvements and to extend the Fast Path concept even further, allowing functions of higher-level software—for example, databases and application servers—to be performed in silicon. [21] It is likely to be based on a 65-nm process and is expected to have multiple ultra-high-frequency cores and multiple L2 caches.
3.2.3 The PowerPC 970, 970FX, and 970MP
The PowerPC 970 was introduced in October 2002 as a 64-bit high-performance processor for desktops, entry-level servers, and embedded systems. The 970 can be thought of as a stripped-down POWER4+. Apple used the 970—followed by the 970FX and the 970MP—in its G5-based systems. Table 3–3 contains a brief comparison of the specifications of these processors. Figure 3–3 shows a pictorial comparison. Note that unlike the POWER4+, whose L2 cache is shared between cores, each core in the 970MP has its own L2 cache, which is twice as large as the L2 cache in the 970 or the 970FX.
Table 3–3. POWER4+ and the PowerPC 9xx
POWER4+ |
PowerPC 970 |
PowerPC 970FX |
PowerPC 970MP |
|
Year introduced |
2002 |
2002 |
2004 |
2005 |
Lithography |
130 nm |
130 nm |
90 nm [a] |
90 nm |
Cores/chip |
2 |
1 |
1 |
2 |
Transistors |
184 million |
55 million |
58 million |
183 million |
Die size |
267 mm2 |
121 mm2 |
66 mm2 |
154 mm2 |
LPAR |
Yes |
No |
No |
No |
SMT |
No |
No |
No |
No |
Memory controller |
Off-chip |
Off-chip |
Off-chip |
Off-chip |
Fast Path |
No |
No |
No |
No |
L1 I-cache |
2x64KB |
64KB |
64KB |
2x64KB |
L1 D-cache |
2x32KB |
32KB |
32KB |
2x32KB |
L2 cache |
1.41MB shared [b] |
512KB |
512KB |
2x1MB |
L3 cache |
32MB+ |
None |
None |
None |
VMX (AltiVec [c] ) |
No |
Yes |
Yes |
Yes |
PowerTune [d] |
No |
No |
Yes |
Yes |
Another noteworthy point about the 970MP is that both its cores share the same input and output busses. In particular, the output bus is shared "fairly" between cores using a simple round-robin algorithm.
Figure 3–3 The PowerPC 9xx family and the POWER4+
3.2.4 The Intel Core Duo
In contrast, the Intel Core Duo processor line used in the first x86-based Macintosh computers (the iMac and the MacBook Pro) has the following key characteristics:
- Two cores per chip
- Manufactured using 65-nm process technology
- 90.3 mm2 die size
- 151.6 million transistors
- Up to 2.16GHz frequency (along with a 667MHz processor system bus)
- 32KB on-die I-cache and 32KB on-die D-cache (write-back)
- 2MB on-die L2 cache (shared between the two cores)
- Data prefetch logic
- Streaming SIMD [22] Extensions 2 (SSE2) and Streaming SIMD Extensions 3 (SSE3)
- Sophisticated power and thermal management features