- The Processor Evolution
- The Memory Subsystem
- The I/O Subsystem
- Intel Microarchitectures
- Chipset Virtualization Support
The Memory Subsystem
The electronic industry has put a significant effort into manufacturing memory subsystems capable of keeping up with the low access time required by modern processors and the high capacity required by today's applications.
Before proceeding with the explanation of current memory subsystems, it is important to introduce a glossary of the most commonly used terms:
- RAM (Random Access Memory)
- SRAM (Static RAM)
- DRAM (Dynamic RAM)
- SDRAM (Synchronous DRAM)
- SIMM (Single Inline Memory Module)
- DIMM (Dual Inline Memory Module)
- UDIMM (Unbuffered DIMM)
- RDIMM (Registered DIMM)
- DDR (Double Data Rate SDRAM)
- DDR2 (Second Generation DDR)
- DDR3 (Third Generation DDR)
In particular, the Joint Electron Device Engineering Council (JEDEC) is the semiconductor engineering standardization body that has been active in this field. JEDEC Standard 21 [21], [22] specifies semiconductor memories from the 256 bits SRAM to the latest DDR3 modules.
The memory subsystem of modern servers is composed of RAMs (Random Access Memories), i.e., integrated circuits (aka ICs or chips) that allow the data to be accessed in any order, in a constant time, regardless of its physical location. RAMs can be static or dynamic [27], [28], [29], [30].
SRAMs
SRAMs (Static RAMs) are generally very fast, but smaller capacity (few megabytes) than DRAM (see next section), and they have a chip structure that maintains the information as long as power is maintained. They are not large enough to be used for the main memory of a server.
DRAMs
DRAMs (Dynamic RAMs) are the only choice for servers. The term "dynamic" indicates that the information is stored on capacitors within an integrated circuit. Since capacitors discharge over time, due to leakage currents, the capacitors need to be recharged ("refreshed") periodically to avoid data loss. The memory controller is normally in charge of the refresh operations.
SDRAMs
SDRAMs (Synchronous DRAMs) are the most commonly used DRAM. SDRAMs have a synchronous interface, meaning that their operation is synchronized with a clock signal. The clock is used to drive an internal finite state machine that pipelines memory accesses. Pipelining means that the chip can accept a new memory access before it has finished processing the previous one. This greatly improves the performance of SDRAMs compared to classical DRAMs.
DDR2 and DDR3 are the two most commonly used SDRAMs (see "DDR2 and DDR3" in Chapter 2, page 41 [23]).
Figure 2-13 shows the internal architecture of a DRAM chip.
Figure 2-13 Internal architecture of a DRAM chip
The memory array is composed of memory cells organized in a matrix. Each cell has a row and a column address. Each bit is stored in a capacitor (i.e., storage element).
To improve performance and to reduce power consumption, the memory array is split into multiple "banks." Figure 2-14 shows a 4-bank and an 8-bank organization.
Figure 2-14 Memory banks
DDR2 chips have four internal memory banks and DDR3 chips have eight internal memory banks.
DIMMs
Multiple memory chips need to be assembled together to build a memory subsystem. They are organized in small boards known as DIMMs (Dual Inline Memory Modules).
Figure 2-15 shows the classical organization of a memory subsystem [24]. For example, a memory controller connects four DIMMs each composed of multiple DRAM chips. The memory controller (that may also integrate the clock driver) has an address bus, a data bus, and a command (aka control) bus. It is in charge of reading, writing, and refreshing the information stored in the DIMMs.
Figure 2-15 Example of a memory subsystem
Figure 2-16 is an example of the connection between a memory controller and a DDR3 DIMM. The DIMM is composed of eight DRAM chips, each capable of storing eight bits of data for a total of 64 bits per memory word (width of the memory data bus). The address bus has 15 bits and it carries, at different times, the "row address" or the "column address" for a total of 30 address bits. In addition, three bits of bank address allow accessing the eight banks inside each DDR3 chip. They can be considered equivalent to address bits raising the total addressing capability of the controller to eight Giga words (i.e., 512 Gbits, or 64 GB). Even if the memory controller has this addressing capability, the DDR3 chips available on the market are significantly smaller. Finally, RAS (Row Address Selection), CAS (Column Address Selection), WE (Write Enabled), etc. are the command bus wires.
Figure 2-16 Example of a DDR3 memory controller
Figure 2-17 shows a schematic depiction of a DIMM.
Figure 2-17 A DIMM
The front view shows the eight DDR3 chips each providing eight bits of information (normally indicated by "x8"). The side view shows that the chips are on one side of the board for a total of eight chips (i.e., 64 bits).
ECC and Chipkill®
Data integrity is a major concern in server architecture. Very often extra memory chips are installed on the DIMM to detect and recover memory errors. The most common arrangement is to add 8 bits of ECC (Error Correcting Code) to expand the memory word from 64 to 72 bits. This allows the implementation of codes like the Hamming code that allows a single-bit error to be corrected and double-bit errors to be detected. These codes are also known as SEC/DED (Single Error Correction / Double Error Detection).
With a careful organization of how the memory words are written in the memory chips, ECC can be used to protect from any single memory chip that fails and any number of multi-bit errors from any portion of a single memory chip. This feature has several different names [24], [25], [26]:
- Chipkill® is the IBM® trademark.
- Oracle® calls it Extended ECC.
- HP® calls it Chipspare®.
- A similar feature from Intel is called Intel® x4 Single Device Data Correction (Intel® x4 SDDC).
Chipkill® performs this function by bit-scattering the bits of an ECC word across multiple memory chips, such that the failure of any single memory chip will affect only one ECC bit. This allows memory contents to be reconstructed despite the complete failure of one chip.
While a complete discussion of this technology is beyond the scope of this book, an example can give an idea of how it works. Figure 2-18 shows a memory controller that reads and writes 128 bits of useful data at each memory access, and 144 bits when ECC is added. The 144 bits can be divided in 4 memory words of 36 bits. Each memory word will be SEC/DED. By using two DIMMs, each with 18 4-bit chips, it is possible to reshuffle the bits as shown in Figure 2-18. If a chip fails, there will be one error in each of the four words, but since the words are SEC-DEC, each of the four words can correct an error and therefore all the four errors will be corrected.
Figure 2-18 A Chipkill example
Memory Ranks
Going back to how the DIMMs are organized, an arrangement of chips that produce 64 bits of useful data (not counting the ECC) is called a "rank". To store more data on a DIMM, multiple ranks can be installed. There are single, dual, and quad-ranks DIMMs. Figure 2-19 shows three possible organizations.
Figure 2-19 DIMMs and memory ranks
In the first drawing, a rank of ECC RAM is built using nine eight-bit chips, a configuration that is also indicated 1Rx8. The second drawing shows a 1Rx4 arrangement in which 18 four-bit chips are used to build one rank. Finally, the third drawing shows a 2Rx8 in which 18 eight-bit chips are used to build two ranks.
Memory ranks are not selected using address bits, but "chip selects". Modern memory controllers have up to eight separate chip selects and therefore are capable of supporting up to eight ranks.
UDIMMs and RDIMMs
SDRAM DIMMs are further subdivided into UDIMMs (Unbuffered DIMMs) and RDIMM (Registered DIMMs). In UDIMMs, the memory chips are directly connected to the address and control buses, without any intermediate component.
RDIMM have additional components (registers) placed between the incoming address and control buses and the SDRAM components. These registers add one clock cycle of delay but they reduce the electrical load on the memory controller and allow more DIMM to be installed per memory controller.
RDIMM are typically more expensive because of the additional components, and they are usually found in servers where the need for scalability and stability outweighs the need for a low price.
Although any combination of Registered/Unbuffered and ECC/non-ECC is theoretically possible, most server-grade memory modules are both ECC and registered.
Figure 2-20 shows an ECC RDIMM. The registers are the chips indicated by the arrows; the nine memory chips indicate the presence of ECC.
Figure 2-20 ECC RDIMM
DDR2 and DDR3
The first SDRAM technology was called SDR (Single Data Rate) to indicate that a single unit of data is transferred per each clock cycle. It was followed by the DDR (Double Data Rate) standard that achieves nearly twice the bandwidth of SDR by transferring data on the rising and falling edges of the clock signal, without increasing the clock frequency. DDR evolved into the two currently used standards: DDR2 and DDR3.
DDR2 SDRAMs (double-data-rate two synchronous dynamic random access memories) operate at 1.8 Volts and are packaged in 240 pins DIMM modules. They are capable of operating the external data bus at twice the data rate of DDR by improved bus signaling.
The rules are:
- Two data transfers per DRAM clock
- Eight bytes (64 bits) per data transfer
Table 2-2 shows the DDR2 standards. 2
Table 2-2. DDR2 DIMMs
Standard name |
DRAM clock |
Million data transfers per second |
Module name |
Peak transfer rate GB/s |
DDR2-400 |
200 MHz |
400 |
PC2-3200 |
3.200 |
DDR2-533 |
266 MHz |
533 |
PC2-4200 |
4.266 |
DDR2-667 |
333 MHz |
667 |
PC2-5300 |
5.333 |
DDR2-800 |
400 MHz |
800 |
PC2-6400 |
6.400 |
DDR2-1066 |
533 MHz |
1,066 |
PC2-8500 |
8.533 |
DDR3 SDRAMs (double-data-rate three synchronous dynamic random access memories) improve over DDR2 in the following areas:
- Reduced power consumption obtained by reducing the operating voltage to 1.5 volts.
- Increased memory density by introducing support for chips from 0.5 to 8 Gigabits; i.e., rank capacity up to 16 GB.
- Increased memory bandwidth by supporting a burst length = 8 words, compared to the burst length = 4 words of DDR2. The reason for the increase in burst length is to better match the increased external data transfer rate with the relatively constant internal access time. As the transfer rate increases, the burst length (the size of the transfer) must increase to not exceed the access rate of the DRAM core.
DDR3 DIMMs have 240 pins, the same number as DDR2, and are the same size, but they are electrically incompatible and have a different key notch location. In the future, DDR3 will also operate at faster clock rate. At the time of publishing, only DDR3-800, 1066, and 1333 are in production.
Table 2-3 summarizes the different DDR3 DIMM modules.
Table 2-3. DDR3 DIMMs
Standard name |
RAM clock |
Million data transfers per second |
Module name |
Peak transfer rate GB/s |
DDR3-800 |
400 MHz |
800 |
PC3-6400 |
6.400 |
DDR3-1066 |
533 MHz |
1,066 |
PC3-8500 |
8.533 |
DDR3-1333 |
667 MHz |
1,333 |
PC3-10600 |
10.667 |
DDR3-1600 |
800 MHz |
1,600 |
PC3-12800 |
12.800 |
DDR3-1866 |
933 MHz |
1,866 |
PC3-14900 |
14.900 |