SKIP THE SHIPPING
Use code NOSHIP during checkout to save 40% on eligible eBooks, now through January 5. Shop now.
Register your product to gain access to bonus material or receive a coupon.
Optimizing code for the new IA-64 architecture.
"...a timely and valuable book. It will appeal to those interested separately or jointly in IA-64 and the elementary math functions."
William S. Worley, Jr., Distinguished Contributor, Hewlett-Packard Laboratories
In IA-64 and Elementary Functions: Speed and Precision, leading HP computer architect Dr. Peter Markstein introduces the IA-64 architecture and its breakthrough elementary math functions. This informationessential to the development of optimized IA-64 server applications and operating systemswas formerly available only in specialized journals, or not available at all.
Markstein first introduces the IA-64 architecture, the objectives that motivated its design, and the unique architectural features that can be exploited by developers of high-performance elementary function libraries, including software pipelining, instruction grouping, prefetching, predication, speculative execution, and explicit parallelism. He then introduces several techniques that lend themselves to software pipelining, which is exceptionally well supported by the IA-64 architecture and can lead to dramatic performance gains.
The book covers all major elementary functions, demonstrating how they can be implemented to deliver optimal IA-64 performance and accuracy. Among the functions covered: square root and division, which must be performed in software on the IA-64.
For professional computer scientists, system software developers, mathematicians, and anyone building high-performance IA-64 software, IA-64 and Elementary Functions: Speed and Precision will be absolutely indispensable.
Click here for a sample chapter for this book: 0130183482.pdf
I. IA-64 Architecture
1. NEW ARCHITECTURE OBJECTIVES.VLIW. Memory Enhancements. Software Pipelining. Floating Point Enhancements. Summary.
2. IA-64 INSTRUCTIONS AND REGISTERS.Instructions. Register Sets. Accessing Memory. Assembly Language. Problems.
3. INCREASING INSTRUCTION LEVEL PARALLELISM.Branching. Speculation. Problems.
4. FLOATING POINT ARCHITECTURE.Floating Point Status Register. Precision. Fused Multiply-Add. Division and Square Root Assists. Floating Comparisons. Communication between Floating Point and General Purpose Registers. Fixed Point Multiplication. SIMD Arithmetic. Problems.
5. PROGRAMMING FOR IA-.Compiler Options. Pragmas. Floating Point Data Types. In-Line Assembly. The fenv.h Header. Extended Examples. Quad Precision. Problems.
II. Computation of Elementary Functions.
6. MATHEMATICAL PRELIMINARIES.Floating Point. Approximation and Error Analysis. The Exclusion Theorem. Ulps. Problems.
7. APPROXIMATION OF FUNCTIONS.Taylor Series. Lagrangian Interpolation. Chebychev Approximation. Remez Approximation. Practical Considerations. Function Evaluation. Table Construction. Problems.
8. DIVISION.Approximations for the Reciprocal. Computing the Quotient. Division Using Only Final Precision Results. Fast Variants of Division. Remainder. Integer Division. An Implementation of Division. Problems.
9. SQUARE ROOT.Approximations. Rounding the Square Root. Computing the Square Root. Calculating the Reciprocal Square Root. An Implementation of Square Root. Problems.
10. EXPONENTIAL FUNCTIONS.Definitions and Formulas. Argument Reduction. Error Containment. Computing the Exponential. The Function expm. Problems.
11. LOGARITHMIC FUNCTIONS.General Relations. Argument Reductions. Error Analysis. The Function log1p. Computing the Logarithm. Problems.
12. THE POWER FUNCTION.Definition. Single Precision. Double Precision. Double-Extended Precision. Quad Precision. Computing the Power Function. Problems.
13. TRIGONOMETRIC FUNCTIONS.Formulas and Identities. Argument Reduction. Error Analysis. Computing the Trigonometric Functions. Problems.
14. INVERSE SINE AND COSINE.Definitions and Formulas. Argument Reduction. Error Analysis. Computing the arcsin. Problems.
15. INVERSE TANGENT FUNCTIONS.Definitions and Formulas. Argument Reduction. Error Analysis. Computing the arctan. Problems.
16. HYPERBOLIC FUNCTIONS.Definitions and Formulas. Argument Reduction. Error Analysis. Computing the Hyperbolic Functions. Problems.
17. INVERSE HYPERBOLIC FUNCTIONS.Definitions and Formulas. arcsinh. arccosh. arctanh. Problems.
18. ODDS AND ENDS.Correctly Rounded Functions. Monotonicity. Alternative Algorithms. Testing. New Architectural Directions. Problems.
A. IN-LINE ASSEMBLY.This book puts under one cover the details of an elementary functionlibrary, covering the underlying mathematics as well as providingimplementation details, directed toward IA-64 architecture. Some of thematerial is difficult to find elsewhere, and some of it is scattered over avariety of conference proceedings and journals. The material should appealto readers with interest in elementary functions, as well as readersinterested in using IA-64 effectively. Part I discusses IA-64 architecturein detail, including motivation for the architecture. The description ofIA-64 is illustrated with extended examples chosen from numericalcalculation. Part II shows how to exploit IA-64 architecture in the domainof elementary functions. While the text emphasizes accurate computation, italso points to shortcuts in division and square root that may be of interestin graphics and other applications which heavily use short floating pointtypes. Most of the mathematical arguments are relatively elementary andshould be readable by anyone with an elementary calculus background.
This work is an outgrowth of the Precision Architecture Wide Word (PAWW) project at Hewlett-Packard Laboratories. Thearchitecture drew from prior experiences with very long instruction setarchitectures, particularly those at Cydrome and Multiflow, as well asPA-RISC (Precision Architecture - Reduced InstructionSet Computer). By the time I joined the project in 1992, much of thearchitecture had already been solidified. My architectural contributionsmainly dealt with floating point arithmetic, and I was also active inproducing a prototype compiler for Wide Word, which allowed many of thearchitectural ideas to be tested. PAWW later developed into IA-64.
One of my colleagues, Clemens Roothaan, had produced a library of elementaryfunction routines which exploited the software pipelining capabilities ofthe architecture. He was able to demonstrate routines which ran at speedsassociated with vector processors, but which did not sacrifice numericalaccuracy for performance. Over time, some of these algorithms werestrengthened to run faster, or produce even higher precision. We refer tothe software pipelined implementation as the vector library for theelementary functions.
My plan was to use the same algorithmic ideas to construct a very robustscalar elementary function library. My hope was that the fundamentalalgorithms could be implemented in the C language in such a manner that theywould yield closed subroutines, but would also be amenable to in-lining,after which they could then be software pipelined by the compiler.Eventually, Roothaan's handcrafted functions could be matched by thecompiler, which could also customize an elementary function to theparticular settings where it was invoked. This notion led to the in-lineassembly capability, which enables much finer control to be exercised overfloating point computation than is normally present in a compiler.
Eventually, I undertook to document these algorithms, indicating clearly themethods we had used, as well as the error characteristics of our algorithms.A fascinating by-product developed almost immediately: the act of writingclarified some of the fundamental processes that we were employing. New,faster algorithms were suggested by the text, and they replaced some of ourold techniques. This was especially true in the operations of division andsquare root, for which almost none of our 1992 algorithms survive in thistext. Logarithm also was markedly improved, and, as a by-product, theprecision of the power routine was improved. The trigonometric routines wereenhanced with "accurate A " argument reduction, and an improvedimplementation of the A and A addition formulas which preserveadditional precision.
This book describes a work in progress. Even now, new algorithms have cometo my attention from colleagues at Intel, and, as the greater programmingcommunity comes to use IA-64, I expect new, innovative developments toblossom.
Peter Markstein
February 2000
Woodside, CA