- Cache Hierarchy
- Cache Functionality and Organization
- Hiding Latency With Prefetch
- Cache Organization and Replacement Policies
- UltraSPARC III Cu Memory Hierarchy
- References
Cache Functionality and Organization
In a modern microprocessor several caches are found. They not only vary in size and functionality, but also their internal organization is typically different across the caches. This section discusses the most important caches, as well as some popular cache organizations.
Instruction Cache
The instruction cache is used to store instructions. This helps to reduce the cost of going to memory to fetch instructions.
The instruction cache regularly holds several other things, like branch prediction information. In certain cases, this cache can even perform some limited operation(s). The instruction cache on UltraSPARC, for example, also pre-decodes the incoming instruction.
Data Cache
A data cache is a fast buffer that contains the application data. Before the processor can operate on the data, it must be loaded from memory into the data cache4. The element needed is then loaded from the cache line into a register and the instruction using this value can operate on it. The resultant value of the instruction is also stored in a register. The register contents are then stored back into the data cache. Eventually the cache line that this element is part of is copied back into the main memory.
TLB Cache
Translating a virtual page address to a valid physical address is rather costly. The TLB is a cache to store these translated addresses.
Each entry in the TLB maps to an entire virtual memory page. The CPU can only operate on data and instructions that are mapped into the TLB. If this mapping is not present, the system has to re-create it, which is a relatively costly operation.
The larger a page, the more effective capacity the TLB has. If an application does not make good use of the TLB (for example, random memory access) increasing the size of the page can be beneficial for performance, allowing for a bigger part of the address space to be mapped into the TLB.
Some microprocessors, including UltraSPARC, implement two TLBs. One for pages containing instructions (I-TLB) and one for data pages (D-TLB).
Putting it All Together
Now, all of the ingredients needed to build a generic cache-based system (FIGURE 3) have been discussed.
FIGURE 3 Generic System Architecture
FIGURE 3 shows unified cache at level 2. Both instructions and data are stored in this type of cache. It is shown outside of the microprocessor and is therefore called an external cache. This situation is quite typical; the cache at the highest level is often unified and external to the microprocessor.
Note that the cache architecture shown in FIGURE 3 is rather generic. Often, you will find other types of caches in a modern microprocessor. The UltraSPARC III Cu microprocessor is a good example of this. As you will see, it has two additional caches that have not been discussed yet.
FIGURE 3 clearly demonstrates that, for example, the same cache line can potentially be in multiple caches. In case of a containing cache philosophy, the levels of the cache hierarchy that are further away from the CPU always contain all the data present in the lower levels. The opposite of this design is called non-containing.