Introducing the Cell Processor
- 1.1 Background of the Cell Processor
- 1.2 The Cell Architecture: An Overview
- 1.3 The Cell Broadband Engine Software Development Kit (SDK)
- 1.4 Conclusion
In September 2007, the Guinness Book of World Records announced the new record holder for the world’s most powerful distributed computing system. It wasn’t a traditional cluster of high-performance computers, but a worldwide network composed of regular PCs and PlayStation 3 consoles (PS3s). Called Folding@Home, this distributed computing system simulates protein folding to analyze how diseases originate.
Before the PS3s joined in, the network was capable of only .25 petaflops (250,000,000,000,000 floating-point operations per second). But when the Folding@Home client started running on PS3s, the computation speed quadrupled in six months, making Folding@Home the first distributed computing system to break the 1 petaflop barrier.
Table 1.1 clarifies the significance of the PS3s in the Folding@Home network. There aren’t nearly as many consoles as PCs, but they provide more computational power than the Windows/Mac/Linux computers combined.
Table 1.1. Folding@Home Performance Statistics (Recorded on April 16, 2008)
OS Type |
Current Pflops |
Active CPUs |
Total CPUs |
Windows |
.182 |
190,892 |
1,986,517 |
Mac OS X/PowerPC |
.007 |
8,478 |
114,326 |
Mac OS X/Intel |
.023 |
7,428 |
45,480 |
Linux |
.047 |
27,796 |
286,172 |
PlayStation 3 |
1.235 |
40,880 |
492,491 |
The PS3’s tremendous computational power is provided by the Cell Broadband Engine, commonly called the Cell processor or just the Cell. Developed by the STI Alliance (Sony, Toshiba, and IBM), the Cell combines the general-purpose capability of IBM’s PowerPC architecture with enough number crunching to satisfy even the most demanding gamers and graphic developers.
What does this mean for you? It means you can enjoy the best of both worlds: computational flexibility and power. On one hand, you can install a common operating system, such as Linux, on the Cell and execute applications as conveniently as if they were running on a PC. On the other hand, you can implement computationally intense algorithms at speeds that far exceed regular CPUs and even compete with supercomputing devices.1 More incredibly still, you can do both at the same time.
The Cell makes this possible through an on-chip division of labor: The operating system runs on a single PowerPC Processor Element (PPE), and the high-speed calculation is performed by a series of Synergistic Processor Elements (SPEs). These two types of cores are specifically designed for their tasks, and each supports a different set of instructions.
Taken individually, these processing elements are easy to understand and straightforward to program. The hard part is coordinating their operation to take the fullest advantage of their strengths. To accomplish this, a coder must know the Cell-specific programming commands and have a solid knowledge of the device’s architecture: its processing elements, interconnections, and memory structure.
The purpose of this book is to cover these subjects in enough depth to enable you to create applications that maximize the Cell’s capabilities. Much of this treatment delves into the processor’s architecture, but only the aspects that you can use and configure in code. Some topics may seem overwhelming for those not used to thinking like computer architects, but don’t be concerned: Everything will be explained as the need arises. And the goal of this book is always software.
The goal of this chapter is to explain, at a basic level, what the Cell processor is and how it works. The discussion begins with a description of the Cell’s background, including its history and capabilities, and proceeds to introduce the processor’s basic architecture.
1.1 Background of the Cell Processor
The Cell is so unlike its predecessors that it helps to know about why it was created and the corporate forces that shaped its design. When you see why the STI Alliance spent so much time and effort on the Cell, you’ll have a better idea why learning about it is worth your own.
History of the Cell
Sony finished development of the PlayStation 2 in 1999 and released it the following year. Despite its tremendous success, then-CEO Nobuyuki Idei was nervous: How could Sony’s next-generation console top the PS2? What more could they accomplish? The answer, he decided, was twofold: The next offering had to integrate broadband multimedia capability and provide dramatic improvements in graphical processing. These lofty goals required entirely new hardware, and to make this possible, he conferred with IBM’s then-CEO, Louis Gerstner. Together they shaped the concept that would ultimately lead to the Cell Processor.
The chief of Sony Computer Entertainment, Ken Kutaragi, fleshed out the hardware requirements and made demands that went far beyond the state of the art. Envisioning each processor as a building block of a larger, networked entity, Ken Kutaragi called the device the Cell. In keeping with Nobuyuki-san’s original intention, the project became called the Cell Broadband Engine (CBE). This remains the official name of the Cell processor.
Toshiba expressed an interest in using the Cell in their consumer electronics, and in 2001, Sony, Toshiba, and IBM announced the formation of the STI Alliance. Their stated intention was to research, develop, and manufacture a groundbreaking processor architecture. They formed the STI Design Center in Austin, Texas, to turn the CBE’s requirements into reality.
As the Cell’s chief architect, Jim Kahle saw that the CBE’s requirements couldn’t be met with a traditional single-core processor—the demand for power would be too great. Instead, he chose a more power-efficient design that incorporated multiple processing units into a single chip. The final architecture consisted of nine cores: one central processing element and eight dedicated elements for high-speed computation.
At the time of this writing, the STI Design Center has grown to more than 400 engineers. Dr. H. Peter Hofstee, one of the Cell’s founding designers, holds the positions of chief scientist and chief architect of the SPE. In a recent presentation, he listed the main goals that drove the Cell’s design:
- Outstanding performance on gaming and multimedia applications
- Real-time responsiveness to the user and the network
- Applicability to a wide range of platforms
In 2004, IBM’s semiconductor manufacturing plant in East Fishkill produced the first Cell prototype. The STI engineers installed Linux and tested the processor at speeds beyond the commonly stated range of 3 to 4GHz. The prototype passed. Over the next year, Sony and IBM worked feverishly to integrate the device within Sony’s next-generation console, and expectant gamers caught their first glimpse of the PlayStation 3 at the 2005 Electronic Entertainment Expo (E3).
November 2006 marked the full commercial release of the PS3, and the tales of long lines and barely sane consumers will amuse retail personnel for years to come. In addition to its powerful Cell processor brain, the new console provided resolution up to 1080p and a Blu-ray drive for high-definition video.
That same year, IBM released its first CBE Software Development Kit (SDK) to enable developers to build applications for the Cell. The SDK provides compilers for both types of processing elements, a combined simulator/debugger, numerous code libraries, and an Eclipse-based development environment. A great deal of this book is concerned with the SDK and how you can use it to build applications.
In mid-2008, the first Cell-based supercomputer, called the IBM Roadrunner, was tested in the Los Alamos National Laboratory. Containing 12,960 Cell processors and 12,960 Opterons, the Roadrunner reached a processing speed of 1.026 petaflops and has become the fastest of the supercomputers on the TOP500 list. Its speed more than doubles that of the second-place supercomputer, BlueGene/L, at .478 petaflops.
Potential of the Cell Processor for Scientific Computing
In 2005, the Lawrence Berkeley National Laboratory studied the Cell’s computational performance and recorded their findings in the report The Potential of the Cell Processor for Scientific Computing. They simulated a number of different algorithms and compared the Cell’s processing speed to that of similar processors: the AMD Opteron, Intel’s Itanium2, and Cray’s X1E. Table 1.2 tabulates their results.
Table 1.2. Results of the Lawrence Berkeley National Laboratory Study (All Values in Gflops/s)
Algorithm |
Cell Processor |
Cray X1E |
AMD Opteron |
Intel Itanium2 |
Dense matrix multiply (single precision) |
204.7 |
29.5 |
7.8 |
3.0 |
Dense matrix multiply (double precision) |
14.6 |
16.9 |
4.0 |
5.4 |
Symmetric sparse matrix vector multiply (single precision)1 |
7.68 |
— |
.80 |
.83 |
Symmetric sparse matrix vector multiply (double precision)1 |
4.00 |
2.64 |
.60 |
.67 |
Nonsymmetric Sparse Matrix Vector Multiply (Single Precision)1 |
4.08 |
— |
.53 |
.41 |
Nonsymmetric sparse matrix vector multiply (double precision)1 |
2.34 |
1.14 |
.36 |
.36 |
2-D fast Fourier transform (single precision)2 |
40.5 |
8.27 |
.34 |
.15 |
2-D fast Fourier transform (double precision)2 |
6.7 |
7.10 |
.19 |
.11 |
There are two points to keep in mind. First, the results refer to the computation speed in billions of flops (floating-point operations per second), not the amount of time needed to perform the algorithm. Second, because the first-generation Cell’s multipliers are single precision, the first-generation Cell performs much better with single-precision values than with double-precision values. But the second generation provides hardware multiplication of double-precision values.
To an engineer interested in signal processing and computational mathematics (like myself), the results are nothing short of astounding. The study justifies the outrageous marketing claims: The Cell really provides supercomputer-like capability for nearly the cost and power (approximately 50 to 60 W) of a regular CPU.