SKIP THE SHIPPING
Use code NOSHIP during checkout to save 40% on eligible eBooks, now through January 5. Shop now.
Register your product to gain access to bonus material or receive a coupon.
Hewlett-Packard's PA-RISC architecture is one of the most mature Reduced Instruction Set Computer designs in the industry. This book is the first publicly available, detailed description of the next revision of the PA-RISC architecture. KEY TOPICS: Covers the RISC characteristics of PA-RISC, PA- RISC processing resources, addressing and access control, control flow, interruptions, and an overview of the instruction set and floating point corprocessor. MARKET: For system designers and analysts, system software programmers, application developers, and technical managers.
1. Overview.
Traditional RISC Characteristics of PA-RISC. PA-RISC-The Genius is in the Details. A Critical Calculus: Instruction Pathlength. Multimedia Support: The Precision Process Illustrated. Integrated CPU. Extensibility and Longevity. System Organization.
Non-Privileged Software-Accessible Registers. Privileged Software-Accessible Registers. Unused Registers and Bits. Data Types. Byte Ordering (Big Endian/Little Endian).
Physical and Absolute Addressing. Virtual Addressing. Pointers and Address Specification. Address Resolution and the TLB. Access Control.
Caches. Control Flow. Branching. Nullification. Instruction Execution. Instruction Pipelining.
Interrupt Classes. Interruption Handling. Instruction Recoverability. Masking and Nesting of Interruptions. Interruption Priorities. Return from Interruption. Interruption Descriptions.
Computation Instructions. Multimedia Instructions. Memory Reference Instructions. Long Immediate Instructions. Branch Instructions. System Control Instructions. Assist Instructions. Conditions and Control Flow. Additional Notes on the Instruction Set.
The IEEE Standard. The Instruction Set. Coprocessor Registers. Data Registers. Data Formats. Floating-Point Status Register. Floating-Point Instruction Set.
Exception Registers. Interruptions and Exceptions. Saving and Restoring State.
Performance Monitor Instructions. Performance Monitor Interruptions. Monitor Units.
Preface
Hewlett-Packard's PA-RISC architecture was first introduced in 1986. Although there have been interim improvements in the intervening years, the PA-RISC 2.0 architecture described in this book is the most significant step in the evolution of the PA-RISC architecture. While the primary motivation for PA-RISC 2.0 was to add support for 64-bit integers, 64-bit virtual address space offsets, and greater than 4 GB of physical memory, many other more subtle enhancements have been added to increase the performance and functionality of the architecture.
Compatibility with PA-RISC 1.
From an unprivileged software perspective, PA-RISC 2.0 is forward compatible with the earlier PA-RISC 1.0 and PA-RISC 1.1 architectures all unprivileged software written to the PA-RISC 1.0 or PA-RISC 1.1 specifications will run unchanged on processors conforming to the PA-RISC 2.0 specification.
However, unprivileged software written to the PA-RISC 2.0 specification will not run on processors conforming to the PA-RISC 1.0 or PA-RISC 1.1 specifications.
PA-RISC 2.0 Enhancements.
PA-RISC 2.0 contains 64-bit extensions, instructions to accelerate processing of multimedia data, features to reduce cache miss and branch penalties, and a number of other changes to facilitate high performance implementations. The 64-bit extensions have the highest profile and the greatest impact on the programming model for both applications and system programs. The paragraphs that follow provide thumbnail sketches of some of the more significant features of PA-RISC 2.0.
64-bit Extensions.
PA-RISC has always supported a style of 64-bit addressing known as “segmented” addressing. In this style, many of the benefits of 64-bit addressing were obtained without requiring the integer datapath to be larger than 32 bits. While this approach was cost-effective, it did not easily provide the simplest programming model for single data objects (mapped files or arrays) larger than 4 billion bytes (4GB).
Support of such objects calls for larger-than-32-bit “flat<170 addressing, that is, pointers longer than 32 bits which can be the subject of larger-than-32-bit indexing operations. Since nature prefers powers of two, the next step for an integer data path width greater than 32 bits is 64 bits. PA-RISC 2.0 provides full 64-bit support with 64-bit registers and data paths. Most operations use 64-bit data operands and the architecture provides a flat 64-bit virtual address space.
Multimedia Extensions.
Since multimedia capabilities are rapidly becoming universal in desktop and notebook machines, and since general purpose processors are becoming faster than specialized digital signal processors, it was seen as critical that PA-RISC 2.0 support these multimedia data manipulation operations as a standard feature, thus eliminating the need for external hardware.
PA-RISC 2.0 contains a number of features which extend the arithmetic and logical capabilities of PA-RISC to support parallel operations on multiple 16-bit subunits of a 64-bit word. These operations are especially useful for manipulating video data, color pixels, and audio samples, particularly for data compression and decompression.
Cache Prefetching.
Because processor clock rates are increasing faster than main memory speeds, modern pipelined processors become more and more dependent upon caches to reduce the average latency of memory accesses.
However, caches are effective only to the extent that they are able to anticipate the data and instructions that are required by the processor. Unanticipated surprises result in a cache miss and a consequent processor stall while waiting for the required data or instruction to be obtained from the much slower main memory.
The key to reducing such effects is to allow optimizing compilers to communicate what they know (or suspect) about a program's future behavior far enough in advance to eliminate or reduce the “surprise” penalties. PA-RISC 2.0 integrates a mechanism that supports encoding of cache prefetching opportunities in the instruction stream to permit significant reduction of these penalties.
Branch Prediction.
A “surprise” also occurs when a conditional branch is mispredicted. In this case, even if the branch target is already in the cache, the falsely predicted instructions already in the pipeline must be discarded. In a typical high-speed superscalar processor, this might result in a lost opportunity to execute more than a dozen instructions. This is known as the mispredicted branch penalty.
PA-RISC 2.0 contains several features that help compilers signal future data and likely instruction needs to the hardware. An implementation may use this information to anticipate data needs or to predict branches more successfully, thus avoiding the penalties associated with surprises.
Some of these signals are in the nature of “hints” which are encoded in “don't care” bits of existing instructions. These hints are examples of retroactive additions to PA-RISC 1.1, since all existing code will run on newer machines, and newly annotated code will run correctly (but without advantage) on all existing machines. The benefit of making such retroactive changes is that compilers are thereby permitted to implement the anticipatory hints at will, without ÒsynchronizingÓ to any particular hardware release.
Memory Ordering.
When cache misses cannot be avoided, it is important to reduce the resultant latencies. The PA-RISC 1 architecture specified that all loads and stores are observed to be performed “in order,” a characteristic known as “strong ordering.”
Future processors are expected to support multiple outstanding cache misses while simultaneously performing loads and stores to lines already in the cache. In most cases this effective reordering of loads and stores causes no inconsistency, and permits faster execution. The latter model is known as “weak ordering,” and it is intended to become the default model in future machines. Of course, strongly ordered variants of loads and stores must be defined to handle contexts in which ordering must be preserved — mainly related to synchronization among processors or with I/O activities.
Coherent I/O.
As the popularity and pervasiveness of multiprocessor systems increase, the traditional PA-RISC model of I/O transfers to and from memory without cache coherence checks has become less advantageous. Multiprocessor systems require that processors support cache coherence protocols. By adding similar support to the I/O subsystem, the need to flush caches before and/or after each I/O transfer can be eliminated. As disk and network bandwidths increase, there is increasing motivation to move to such a cache coherent I/O model. The incremental impact on the processor is small, and is supported in PA-RISC 2.0.
How This Book is Organized.
The audience for this book might be divided into the following broad categories (listed in decreasing order of probable size, though, one hastens to add, not in any presumed order of importance):
application programmers
operating system programmers
compiler programmers
hardware/system designers.
The book has been organized to make information easily accessible to each of these audience categories based on the assumption that each category requires an additional level of detail. For example, application programmers are primarily concerned with such things as data types, addressing capabilities, and the instruction set. Operating system programmers need all of that information and also must concern themselves with such things as page table structures and cache operations, topics that application programmers do not usually need to worry about. Accordingly, chapters are generally structured so that the information that is of interest to the broadest audience is presented at the beginning, and details that have a more limited audience come later. Similarly, the book contains a rather large number of appendices: they are used to provide specialized information which, if included in the main body of the book, might add unneeded complexity to topics that are otherwise of broad interest.
Conventions Used in This Book.
Several typographical and notation conventions are used throughout this book to simplify, emphasize, and standardize presentation of information.
Fonts.
In this book, fonts are used as follows:
Italic is used for instruction fields and arguments. For example: “The completer, compte, encoded in the u and m fields of the instruction,...” .
Italic is also used for references to other parts of this and other books or manuals. For example: “As described in Chapter 4, Flow Control and ...” .
Bold is used for emphasis and the first time a word is defined. For example: “Implementations provide seven registers called shadow registers ...” .
UPPER CASE is used for instruction names, instruction mnemonics, short (three characters or less) register and register field names, and acronyms. For example: “The PL field in the IIAOQ register ...” . Underbar (_) characters join words in register, variable, and function names. For example: “The boolean variable cond_satisfied in the Operation section ...” .
Numbers.
The standard notation in this book for addresses and data is hexadecimal (base 16). Memory addresses and fields within instructions are written in hexadecimal. Where numbers could be confused with decimal notation, hexadecimal numbers are preceded with 0x. For example, 0x2C is equivalent to decimal 44.
Instruction Notations.
Instruction operation is described in a C-like algorithmic language. This language is the same as the C programming language with a few exceptions. These are:
The characters “{}” are used to denote bit fields.
The assignment operator used is “” instead of “=” .
The functions “cat” (concatenation), and “xor” (logical exclusive OR) take a variable number of arguments, for which there is no provision in C.
The switch statement usage is improper because we do not use constant expressions for all the cases.
The keyword “parallel” may appear before loop control statements such as “for” and “while” and indicates that the loop iterations are independent and may execute in parallel.
Bit Ranges.
A range of bits within a larger unit, is denoted by “unit{range}” , where unit is the notation for memory, a register, a temporary, or a constant; range is a single integer to denote one bit, or two integers separated by “..” to denote a range of bits.
For example, “GR1{0}” denotes the leftmost bit of general register 1, “CR24{59..63}” denotes the rightmost five bits of control register 24, and “5{0..6}” denotes a 7-bit field containing the number 5. If m > n, then {m..n} denotes the null range.
Registers.
In general, a register name consists of two or three uppercase letters. The name of a member of a register array consists of a register name followed by an index in square brackets. For example, “GR1” denotes general register 1.
The named registers and register arrays are:
Register
Range
Description
GRt
t = 0..31
General registers
SHRt
t = 0..6
Shadow registers
SRt
t = 0..7
Space registers
CRt
t = 0, 8..31
t = 0..31
Coprocessor “uid” registers
FPRt
t = 0..31
Floating-point coprocessor registers
The Processor Status Word and the Interruption Processor Status Word, denoted by “PSW” and “IPSW” , are treated as a series of 1-bit and multiple-bit fields. A field of either is denoted by the register name followed by a field name in square brackets, and bit ranges within such fields are denoted by the usual notation. For example, PSWC/B denotes the 16-bit carry/borrow field of the PSW and PSWC/B{0} denotes bit 0 of that field.
Temporaries.
A temporary name comprises three or more lowercase letters and denotes a quantity which requires naming, either for clarity, or because of limitations imposed by the sequential nature of the operational notation. It may or may not represent an actual processing resource in the hardware. The length of the quantity denoted by a temporary is implicitly determined and is equal to that of the quantity first assigned to it in an operational description.
Operators.
The operators used and their meanings are as follows:
assignment
|
bitwise or
+
addition
==
equal to
P
subtraction
<
less than
*
multiplication
>
greater than
~
bitwise complement
!=
not equal to
&&
logical and
<=
less than or equal to
&
bitwise and
>=
greater than or equal to
||
logical or
All operators are binary, except that “~” is unary and “P” is both binary and unary, depending on the context.
Control Structures and Functions.
The control structures used in the instruction notation are relatively standard and are described in Appendix E, “Instruction Notation Control Structures” .