4.2 Address Space fo a Linux Process
The virtual address space of any Linux process is divided into two subspaces: kernel space and user space. As illustrated on the left-hand side of Figure 4.4, user space occupies the lower portion of the address space, starting from address 0 and extending up to the platform-specific task size limit (TASK_SIZE in file include/asm/processor.h). The remainder is occupied by kernel space. Most platforms use a task size limit that is large enough so that at least half of the available address space is occupied by the user address space.
Figure 4.4. Structure of Linux address space.
User space is private to the process, meaning that it is mapped by the process's own page table. In contrast, kernel space is shared across all processes. There are two ways to think about kernel space: We can either think of it as being mapped into the top part of each process, or we can think of it as a single space that occupies the top part of the CPU's virtual address space. Interestingly, depending on the specifics of CPU on which Linux is running, kernel space can be implemented in one or the other way.
During execution at the user level, only user space is accessible. Attempting to read, write, or execute kernel space would cause a protection violation fault. This prevents a faulty or malicious user process from corrupting the kernel. In contrast, during execution in the kernel, both user and kernel spaces are accessible.
Before continuing our discussion, we need to say a few words about the page size used by the Linux kernel. Because different platforms have different constraints on what page sizes they can support, Linux never assumes a particular page size and instead uses the platform-specific page size constant (PAGE_SIZE in file include/asm/page.h) where necessary. Although Linux can accommodate arbitrary page sizes, throughout the rest of this chapter we assume a page size of 8 Kbytes, unless stated otherwise. This assumption helps to make the discussion more concrete and avoids excessive complexity in the following examples and figures.
4.2.1 User address space
Let us now take a closer look at how Linux implements the user address spaces. Each address space is represented in the kernel by an object called the mm_structure (struct_mm_struct in file include/linux/sched.h). As we have seen in Chapter 3, Processes, Tasks, and Threads, multiple tasks can share the same address space, so the mm structure is a reference-counted object that exists as long as at least one task is using the address space represented by the mm structure. Each task structure has a pointer to the mm structure that defines the address space of the task. This pointer is known as the mm_pointer. As a special case, tasks that are known to access kernel space only (such as kswapd) are said to have an anonymous address space, and the mm pointer of such tasks is set to NULL. When switching execution to such a task, Linux does not switch the address space (because there is none to switch to) and instead leaves the old one in place. A separate pointer in the task structure tracks which address space has been borrowed in this fashion. This pointer is known as the active_mm_pointer of the task. For a task that is currently running, this pointer is guaranteed not to be NULL. If the task has its own address space, the active mm pointer has the same value as the mm pointer; otherwise, the active mm pointer refers to the mm structure of the borrowed address space.
Perhaps somewhat surprisingly, the mm structure itself is not a terribly interesting object. However, it is a central hub in the sense that it contains the pointers to the two data structures that are at the core of the virtual memory system: the page table and the list of virtual memory areas, which we describe next. Apart from these two pointers, the mm structure contains miscellaneous information, such as the mm context, which we describe in more detail in Section 4.4.3, a count of the number of virtual pages currently in use (the resident set size, or RSS), the start and end address of the text, data, and stack segments as well as housekeeping information that kswapd uses when looking for virtual memory to page out.
Virtual memory areas
In theory, a page table is all the kernel needs to implement virtual memory. However, page tables are not effective in representing huge address spaces, especially when they are sparse. To see this, let us assume that a process uses 1 Gbytes of its address space for a hash table and then enters 128 Kbytes of data in it. If we assume that the page size is 8 Kbytes and that each entry in the page table takes up 8 bytes, then the page table itself would take up 1 Gbyte/8 Kbytes·8 byte = 1 Mbyte of spacean order of magnitude more than the actual data stored in the hash table!
To avoid this kind of inefficiency, Linux does not represent address spaces with page tables. Instead, it uses lists of vm-area_structures (struct_vm_area_struct in file include/-linux/mm.h). The idea is to divide an address space into contiguous ranges of pages that can be handled in the same fashion. Each range can then be represented by a single vm-area structure. If a process accesses a page for which there is no translation in the page table, the vm-area covering that page has all the information needed to install the missing page. For our hash table example, this means that a single vm-area would suffice to map the entire hash table and that page-table memory would be needed only for recently accessed pages.
To get a better sense of how the kernel uses vm-areas, let us consider the example in Figure 4.5. It shows a process that maps the first 32 Kbytes (four pages) of the file /etc/termcap at virtual address 0x2000. At the top-left of the figure, we find the task structure of the process and the mm pointer that leads to the mm structure representing the address space of the process. From there, the mmap pointer leads to the first element in the vm-area list. For simplicity, we assume that the vm-area for the mapped file is the only one in this process, so this list contains just one entry. The mm structure also has a pointer to the page table, which is initially empty. Apart from these kernel data structures, the process's virtual memory is shown in the middle of the figure, the filesystem containing /etc/termcap is represented by the disk-shaped form, and the physical memory is shown on the right-hand side of the figure.
Figure 4.5. Example: vm-area mapping a file.
Now, suppose the process attempts to read the word at address 0x6008, as shown by the arrow labeled (1). Because the page table is empty, this attempt results in a page fault. In response to this fault, Linux searches the vm-area list of the current process for a vm-area that covers the faulting address. In our case, it finds that the one and only vm-area on the list maps the address range from 0x2000 to 0xa000 and hence covers the faulting address. By calculating the distance from the start of the mapped area, Linux finds that the process attempted to access page 2 (30x6008-0x2000/81923 = 2). Because the vm-area maps a file, Linux initiates the disk read illustrated by the arrow labeled (2). We assumed that the vm-area maps the first 32KB of the file, so the data for page 2 can be found at file offsets 0x4000 through 0x5fff. When this data arrives, Linux copies it to an available page frame as illustrated by the arrow labeled (3). In the last step, Linux updates the page table with an entry that maps the virtual page at 0x6000 to the physical page frame that now contains the file data. At this point, the process can resume execution. The read access will be restarted and will now complete successfully, returning the desired file data.
As this example illustrates, the vm-area list provides Linux with the ability to (re-)create the page-table entry for any address that is mapped in the address space of a process. This implies that the page table can be treated almost like a cache: If the translation for a particular page is present, the kernel can go ahead and use it, and if it is missing, it can be created from the matching vm-area. Treating the page table in this fashion provides a tremendous amount of flexibility because translations for clean pages can be removed at will. Translations for dirty pages can be removed only if they are backed by a file (not by swap space). Before removal, they have to be cleaned by writing the page content back to the file. As we see later, the cache-like behavior of page tables provides the foundation for the copy-on-write algorithm that Linux uses.
AVL trees
As we have seen so far, the vm-area list helps Linux avoid many of the inefficiencies of a system that is based entirely on page tables. However, there is still a problem. If a process maps many different files into its address space, it may end up with a vm-area list that is hundreds or perhaps even thousands of entries long. As this list grows longer, the kernel executes more and more slowly as each page fault requires the kernel to traverse this list. To ameliorate this problem, the kernel tracks the number of vm-areas on the list, and if there are too many, it creates a secondary data structure that organizes the vm-areas as an AVL tree [42, 62]. An AVL tree is a normal binary search tree, except that it has the special property that for each node in the tree, the height of the two subtrees differs by at most 1. Using the standard tree-search algorithm, this property ensures that, given a virtual address, the matching vm-area structure can be found in a number of steps that grows only with the logarithm of the number of vm-areas in the address space.1Let us consider a concrete example. Figure 4.6 show the AVL tree for an Emacs process as it existed right after it was started up on a Linux/ia64 machine. For space reasons, the figure represents each node with a rectangle that contains just the starting and ending address of the address range covered by the vm-area. As customary for a search tree, the vm-area nodes appear in the order of increasing starting address. Given a node with a starting address of x, the vm-areas with a lower starting address can be found in the lower ("left") subtree and the vm-areas with a higher starting address can be found in the higher ("right") subtree. The root of the tree is at the left end of the figure, and, as indicated by the arrows, the tree grows toward the right side. While it is somewhat unusual for a tree to grow from left to right, this representation has the advantage that the higher a node in the figure, the higher its starting address.
Figure 4.6. AVL tree of vm-area structures for a process running Emacs.
First, observe that this tree is not perfectly balanced: It has a height of six, yet there is a missing node at the fifth level as illustrated by the dashed rectangle. Despite this imperfection, the tree does have the AVL property that the height of the subtrees at any node never differs by more than one. Second, note that the tree contains 47 vm-areas. If we were to use a linear search to find the vm-area for a given address, we would have to visit 23.5 vm-area structures on average and, in the worst case, we might have to visit all 47 of them. In contrast, when the AVL tree is searched, at most six vm-areas have to be visited, as given by the height of the tree. Clearly, using an AVL tree is a big win for complex address spaces. However, for simple address spaces, the overhead of creating the AVL tree and keeping it balanced is too much compared to the cost of searching a short linear list. For this reason, Linux does not create the AVL tree until the address space contains at least 32 vm-areas. Let us emphasize that even when the AVL tree is being maintained, the linear list continues to be maintained as well; this provides an efficient means to visit all vm-area structures.
Anatomy of the vm-area structure
So far, we discussed the purpose of the vm-area structure and how the Linux kernel uses it, but not what it looks like. The list below rectifies this situation by describing the major components of the vm-area:
Address range: Describes the address range covered by the vm-area in the form of a start and end address. It is noteworthy that the end address is the address of the first byte that is not covered by the vm-area.
VM flags: Consist of a single word that contains various flag bits. The most important among them are the access right flags VM_READ, VM_WRITE, and VM_EXEC, which control whether the process can, respectively, read, write, or execute the virtual memory mapped by the vm-area. Two other important flags are VM_GROWSDOWN and VM_GROWSUP, which control whether the address range covered by the vm-area can be extended toward lower or higher addresses, respectively. As we see later, this provides the means to grow user stacks dynamically.
Linkage info: Contain various linkage information, including the pointer needed for the mm structure's vm-area list, pointers to the left and right subtrees of the AVL tree, and a pointer that leads back to the mm structure to which the vm-area belongs.
VM operations and private data: Contain the VM operations pointer, which is a pointer to a set of callback functions that define how various virtual-memory-related events, such as page faults, are to be handled. The component also contains a private data pointer that can be used by the callback functions as a hook to maintain information that is vm-areaspecific.
Mapped file info: If a vm-area maps a portion of a file, this component stores the file pointer and the file offset needed to locate the file data.
Note that the vm-area structure is not reference-counted. There is no need to do that because each structure belongs to one and only one mm structure, which is already reference-counted. In other words, when the reference-count of an mm structure reaches 0, it is clear that the vm-area structures owned by it are also no longer needed.
A second point worth making is that the VM operations pointer gives the vm-area characteristics that are object-like because different types of vm-areas can have different handlers for responding to virtual-memory-related events. Indeed, Linux allows each filesystem, character device, and, more generally, any object that can be mapped into user space by mmap() to provide its own set VM operations. The operations that can be provided in this fashion are open(), close(), and nopage(). The open() and close() callbacks are invoked whenever a vm-area is created or destroyed, respectively, and is used primarily to keep track of the number of vm-areas that are currently using the underlying object. The nopage() callback is invoked when a page fault occurs for an address for which there is no page-table entry. The Linux kernel provides default implementations for each of these call-backs. These default versions are used if either the VM operations pointer or a particular callback pointer is NULL. For example, if the nopage() callback is NULL, Linux handles the page fault by creating an anonymous page, which is a process-private page whose content is initially cleared to 0.
4.2.2 Page-table-mapped kernel segment
Let us now return to Figure 4.4 and take a closer look at the kernel address space. The right-hand side of this figure is an enlargement of the kernel space and shows that it contains two segments: the identity-mapped segment and the page-table-mapped segment. The latter is mapped by a kernel-private page table and is used primarily to implement the kernel vmalloc_arena (file include/linux/vmalloc.h). The kernel uses this arena to allocate large blocks of memory that must be contiguous in virtual space. For example, the memory required to load a kernel module is allocated from this arena. The address range occupied by the vmalloc arena is defined by the platform-specific constants VMALLOC_START and VMALLOC_END. As indicated in the figure, the vmalloc arena does not necessarily occupy the entire page-table-mapped segment. This makes it possible to use part of the segment for platform-specific purposes.
4.2.3 Identity-mapped kernel segment
The identity-mapped segment starts at the address defined by the platform-specific constant PAGE_OFFSET. This segment contains the Linux kernel image, including its text, data, and stack segments. In other words, this is the segment that the kernel is executing in when in kernel mode (unless when executing in a module).
The identity-mapped segment is special because there is a direct mapping between a virtual address in this segment and the physical address that it translates to. The exact formula for this mapping is platform specific, but it is often as simple as vaddr -PAGE_OFFSET. This one-to-one (identity) relationship between virtual and physical addresses is what gives the segment its name.
The segment could be implemented with a normal page table. However, because there is a direct relationship between virtual and physical addresses, many platforms can optimize this case and avoid the overhead of a page table. How this is done on IA-64 is described in Section 4.5.3.
Because the actual formula to translate between a physical address and the equivalent virtual address is platform specific, the kernel uses the interface in Figure 4.7 to perform such translations. The interface provides two routines: _pa() expects a single argument, vaddr, and returns the physical address that corresponds to vaddr. The return value is undefined if vaddr does not point inside the kernel's identity-mapped segment. Routine _va() provides the reverse mapping: it takes a physical address paddr and returns the corresponding virtual address. Usually the Linux kernel expects virtual addresses to have a pointer-type (such as void *) and physical addresses to have a type of unsigned long. However, the _pa() and _va() macros are polymorphic and accept arguments of either type.
Figure 4.7. Kernel interface to convert between physical and virtual addresses.
A platform is free to employ an arbitrary mapping between physical and virtual addresses provided that the following relationships are true:
_va( pa(_vaddr)) = vaddr for all vaddr inside the identity-mapped segment paddr1 < paddr2 fi _va(paddr1) < _va(paddr2)
That is, mapping any virtual address inside the identity-mapped segment to a physical address and back must return the original virtual address. The second condition is that the mapping must be monotonic, i.e., the relative order of a pair of physical addresses is preserved when they are mapped to virtual addresses.
We might wonder why the constant that marks the beginning of the identity-mapped segment is called PAGE OFFSET. The reason is that the page frame number pfn for an address addr in this segment can be calculated as:
pfn = (addr - PAGE_OFFSET)/PAGE_SIZE
As we will see next, even though the page frame number is easy to calculate, the Linux kernel does not use it very often.
Page frame map
Linux uses a table called the page frame map to keep track of the status of the physical page frames in a machine. For each page frame, this table contains exactly one page frame descriptor (struct_page in file include/linux/mm.h). This descriptor contains various housekeeping information, such as a count of the number of address spaces that are using the page frame, various flags that indicate whether the frame can be paged out to disk, whether it has been accessed recently, or whether it is dirty (has been written to), and so on.
While the exact content of the page frame descriptor is of no concern for this chapter, we do need to understand that Linux often uses page frame descriptor pointers in lieu of page frame numbers. The Linux kernel leaves it to platform-specific code how virtual addresses in the identity-mapped segment are translated to page frame descriptor pointers, and vice versa. It uses the interface shown in Figure 4.8 for this purpose.
Figure 4.8. Kernel interface to convert between pages and virtual addresses.
Because we are not concerned with the internals of the page frame descriptor, Figure 4.8 lists its type (struct page) simply as an opaque structure. The virt_to_page() routine can be used to obtain the page frame descriptor pointer for a given virtual address. It expects one argument, vaddr, which must be an address inside the identity-mapped segment, and returns a pointer to the corresponding page frame descriptor. The page_address() routine provides the reverse mapping: It expects the page argument to be a pointer to a page frame descriptor and returns the virtual address inside the identity-mapped segment that maps the corresponding page frame.
Historically, the page frame map was implemented with a single array of page frame descriptors. This array was called mem_map and was indexed by the page frame number. In other words, the value returned by virt_to_page() could be calculated as:
&mem_map[(addr-PAGE_OFFSET)/PAGE_SIZE]
However, on machines with a physical address space that is either fragmented or has huge holes, using a single array can be problematic. In such cases, it is better to implement the page frame map by using multiple partial maps (e.g., one map for each set of physically contiguous page frames). The interface in Figure 4.8 provides the flexibility necessary for platform-specific code to implement such solutions, and for this reason the Linux kernel no longer uses the above formula directly.
High memory support
The size of the physical address space has no direct relationship to the size of the virtual address space. It could be smaller than, the same size as, or even larger than the virtual space. On a new architecture, the virtual address space is usually designed to be much larger than the largest anticipated physical address space. Not surprisingly, this is the case for which Linux is designed and optimized.
However, the size of the physical address space tends to increase roughly in line with Moore's Law, which predicts a doubling of chip capacity every 18 months [57]. Because the virtual address space is part of an architecture, its size cannot be changed easily (e.g., changing it would at the very least require recompilation of all applications). Thus, over the course of many years, the size of the physical address space tends to encroach on the size of the virtual address space until, eventually, it becomes as large as or larger than the virtual space.
This is a problem for Linux because once the physical memory has a size similar to that of the virtual space, the identity-mapped segment may no longer be large enough to map the entire physical space. For example, the IA-32 architecture defines an extension that supports a 36-bit physical address space even though the virtual address space has only 32 bits. Clearly, the physical address space cannot fit inside the virtual address space.
The Linux kernel alleviates this problem through the highmem interface (file include/-linux/highmem.h). High memory is physical memory that cannot be addressed through the identity-mapped segment. The highmem interface provides indirect access to this memory by dynamically mapping high memory pages into a small portion of the kernel address space that is reserved for this purpose. This part of the kernel address space is known as the kmap_segment.
Figure 4.9 shows the two primary routines provided by the highmem interface: kmap() maps the page frame specified by argument page into the kmap segment. The argument must be a pointer to the page frame descriptor of the page to be mapped. The routine returns the virtual address at which the page was mapped. If the kmap segment is full at the time this routine is called, it will block until space becomes available. This implies that high memory cannot be used in interrupt handlers or any other code that cannot block execution for an indefinite amount of time. Both high and normal memory pages can be mapped with this routine, though in the latter case kmap() simply returns the appropriate address in the identity-mapped segment.
Figure 4.9. Primary routines for the highmem interface.
When the kernel has finished using a high memory page, it unmaps the page by a call to kunmap(). The page argument passed to this routine is a pointer to the page frame descriptor of the page that is to be unmapped. Unmapping a page frees up the virtual address space that the page occupied in the kmap segment. This space then becomes available for use by other mappings. To reduce the amount of blocking resulting from a full kmap segment, Linux attempts to minimize the amount of time that high memory pages are mapped.
Clearly, supporting high memory incurs extra overhead and limitations in the kernel and should be avoided where possible. For this reason, high memory support is an optional component of the Linux kernel. Because IA-64 affords a vastly larger virtual address space than that provided by 32-bit architectures, high memory support is not needed and therefore disabled in Linux/ia64. However, it should be noted that the highmem interface is available even on platforms that do not provide high memory support. On those platforms, kmap() is equivalent to page_address() and kunmap() performs no operation. These dummy implementations greatly simplify writing platform-independent kernel code. Indeed, it is good kernel programming practice to use the kmap() and kunmap() routines whenever possible. Doing so results in more efficient memory use on platforms that need high memory support (such as IA-32) without impacting the platforms that do not need it (such as IA-64).
Summary
Figure 4.10 summarizes the relationship between physical memory and kernel virtual space for a hypothetical machine that has high memory support enabled. In this machine, the identity-mapped segment can map only the first seven page frames of the physical address spacethe remaining memory consisting of page frames 7 through 12 is high memory and can be accessed through the kmap segment only. The figure illustrates the case in which page frames 8 and 10 have been mapped into this segment. Because our hypothetical machine has a kmap segment that consists of only two pages, the two mappings use up all available space. Trying to map an additional high memory page frame by calling kmap() would block the caller until page frame 8 or 10 is unmapped by a call to kunmap().
Figure 4.10. Summary of identity-mapped segment and high memory support.
Let us now turn attention to the arrow labeled vaddr. It points to the middle of the second-last page mapped by the identity-mapped segment. We can find the physical address of vaddr with the _pa() routine. As the arrow labeled _pa(vaddr) illustrates, this physical address not surprisingly points to the middle of page frame 5 (the second-to-last page frame in normal memory).
The figure illustrates the page frame map as the diagonally shaded area inside the identity-mapped segment (we assume that our hypothetical machine uses a single contiguous table for this purpose). Note that this table contains page frame descriptors for all page frames in the machine, including the high memory page frames. To get more information on the status of page frame 5, we can use virt_to_page(vaddr) to get the page pointer for the page frame descriptor of that page. This is illustrated in the figure by the arrow labeled page. Conversely, we can use the page pointer to calculate page_address(page) to obtain the starting address of the virtual page that contains vaddr.
4.2.4 Structure of IA-64 address space
The IA-64 architecture provides a full 64-bit virtual address space. As illustrated in Figure 4.11, the address space is divided into eight regions of equal size. Each region covers 261 bytes or 2048 Pbytes. Regions are numbered from 0 to 7 according to the top three bits of the address range they cover. The IA-64 architecture has no a priori restrictions on how these regions can be used. However, Linux/ia64 uses regions 0 through 4 as the user address space and regions 5 through 7 as the kernel address space.
Figure 4.11. Structure of Linux/ia64 address space.
There are also no restrictions on how a process can use the five regions that map the user space, but the usage illustrated in the figure is typical: Region 1 is used for shared memory segments and shared libraries, region 2 maps the text segment, region 3 the data segment, and region 4 the memory and register stacks of a process. Region 0 normally remains unused by 64-bit applications but is available for emulating a 32-bit operating system such as IA-32 Linux.
In the kernel space, the figure shows that the identity-mapped segment is implemented in region 7 and that region 5 is used for the page-table mapped segment. Region 6 is identity-mapped like region 7, but the difference is that accesses through region 6 are not cached. As we discuss in Chapter 7, Device I/O, this provides a simple and efficient means for memory-mapped I/O.
The right half of Figure 4.11 provides additional detail on the anatomy of region 5. As illustrated there, the first page is the guard page. It is guaranteed not to be mapped so that any access is guaranteed to result in a page fault. As we see in Chapter 5, Kernel Entry and Exit, this page is used to accelerate the permission checks required when data is copied across the user/kernel boundary. The second page in this region serves as the gate page. It assists in transitioning from the user to the kernel level, and vice versa. For instance, as we also see in Chapter 5, this page is used when a signal is delivered and could also be used for certain system calls. The third page is called the per-CPU page. It provides one page of CPU-local data, which is useful on MP machines. We discuss this page in more detail in Chapter 8, Symmetric Multiprocessing. The remainder of region 5 is used as the vmalloc arena and spans the address range from VMALLOC START to VMALLOC END. The exact values of these platform-specific constants depend on the page size. As customary in this chapter, the figure illustrates the case in which a page size of 8 Kbytes is in effect.
Virtual address format
Even though IA-64 defines a 64-bit address space, implementations are not required to fully support each address bit. Specifically, the virtual address format mandated by the architecture is illustrated in Figure 4.12. As shown in the figure, bits 61 through 63 must be implemented because they are used to select the virtual region number (vrn).
Figure 4.12. Format of IA-64 virtual address.
The lower portion of the virtual address consists of a CPU-model-specific number of bits. The most significant bit is identified by constant IMPL_VA_MSB. This value must be in the range of 50 to 60. For example, on Itanium this constant has a value of 50, meaning that the lower portion of the virtual address consists of 51 bits.
The unimplemented portion of the virtual address consists of bits IMPL_VA_MSB + 1 through 60. Even though they are marked as unimplemented, the architecture requires that the value in these bits match the value in bit IMPL_VA_MSB. In other words, the unimplemented bits must correspond to the sign-extended value of the lower portion of the virtual address. This restriction has been put in place to ensure that software does not abuse unimplemented bits for purposes such as type tag bits. Otherwise, such software might break when running on a machine that implements a larger number of virtual address bits.
On implementations where IMPL_VA_MSB is less than 60, this sign extension has the effect of dividing the virtual address space within a region into two disjoint areas. Figure 4.13 illustrates this for the case in which IMPL_VA_MSB = 50: The sign extension creates the unimplemented area in the middle of the region. Any access to that area will cause the CPU to take a fault. For a user-level access, such a fault is normally translated into an illegal instruction signal (SIGILL). At the kernel level, such an access would cause a kernel panic.
Figure 4.13. Address-space hole within a region with IMPL_VA_MSB = 50.
Although an address-space hole in the middle of a region may seem problematic, it really poses no particular problem and in fact provides an elegant way to leave room for future growth without impacting existing application-level software. To see this, consider an application that requires a huge data heap. If the heap is placed in the lower portion of the region, it can grow toward higher addresses. On a CPU with IMPL_VA_MSB = 50, the heap could grow to at most 1024 Tbytes. However, when the same application is run on a CPU with IMPL_VA_MSB = 51, the heap could now grow up to 2048 Tbyteswithout changing its starting address. Similarly, data structures that grow toward lower addresses (such as the memory stack) can be placed in the upper portion of the region and can then grow toward the CPU-model-specific lower bound of the implemented address space. Again, the application can run on different implementations and take advantage of the available address space without moving the starting point of the data structure.
Of course, an address-space hole in the middle of a region does imply that an application must not, e.g., attempt to sequentially access all possible virtual addresses in a region. Given how large a region is, this operation would not be a good idea at any rate and so is not a problem in practice.
Physical address space
The physical address format used by IA-64 is illustrated in Figure 4.14. Like virtual addresses, physical addresses are 64 bits wide. However, bit 63 is the uc bit and serves a special purpose: If 0, it indicates a cacheable memory access; if 1, it indicates an uncacheable access. The remaining bits in a physical address are split into two portions: implemented and unimplemented bits. As the figure shows, the lower portion must be implemented and covers bits 0 up to a CPU-model-specific bit number called IMPL_PA_MSB. The architecture requires this constant to be in the range of 32 to 62. For example, Itanium implements 44 address bits and therefore IMPL_PA_MSB is 43. The unimplemented portion of a physical address extends from bit IMPL_PA_MSB +1 to 62. Unlike a virtual address, a valid physical address must have all unimplemented bits cleared to 0 (i.e., the unimplemented portion is the zero-extended instead of the sign-extended value of the implemented portion).
Figure 4.14. Format of IA-64 physical address.
The physical address format gives rise to the physical address space illustrated in Figure 4.15. As determined by the uc bit, it is divided into two halves: The lower half is the cached physical address space and the upper half is the uncached space. Note that physical addresses x and 263 + x correspond to the same memory locationthe only difference is that an access to the latter address will bypass all caches. In other words, the two halves alias each other.
Figure 4.15. Physical address space with IMPL PA MSB = 43.
If IMPL_PA_MSB is smaller than 62, the upper portion of each half is unimplemented. Any attempt to access memory in this portion of the physical address space causes the CPU to take an UNIMPLEMENTED DATA ADDRESS FAULT.
Recall from Figure 4.11 on page 149 that Linux/ia64 employs a single region for the identity-mapped segment. Because a region spans 61 address bits, Linux can handle IMPL-PA MSB values of up to 60 before the region fills up and high memory support needs to be enabled. To get a back-of-the-envelope estimate of how quickly this could happen, let us assume that at the inception of IA-64 the maximum practical physical memory size was 1 Tbytes (240 bytes). Furthermore, let us assume that memory capacity doubles roughly every 18 months. Both assumptions are somewhat on the aggressive side. Even so, more than three decades would have to pass before high memory support would have to be enabled. In other words, it is likely that high memory support will not be necessary during most of the life span, or even the entire life span, of the IA-64 architecture.