- IP Cores and SoCs
- Instruction Set Features
- Variable-Length Instructions
- Floating-Point, Vectors, and ABI Problems
- Similarities to x86
Instruction Set Features
The original ARM architecture was heavily influenced by the Berkeley RISC architecture, as is apparent from RISC in the name. ARM has a number of RISC features, such as a large register set, fixed-length instructions, and a purely load-store architecture. In comparison, 32-bit x86 has six registers that are nominally general-purpose, although a lot of instructions require the use of specific registers. It uses a variable-length instruction coding, and a number of instructions take a memory address as one of the operands.
The ARM instruction set also has some decidedly non-RISC features. The most well-known is predicated instructions, which execute conditionally based on the state of one of the condition registers. In a typical implementation, they execute anyway, but the results are stored only if the condition codes are matched.
Consider a typical pipelined CPU. The first ARM chips were very simple, using a three-stage pipeline with the fetch/decode/execute sequence that you'll find in an introductory CPU architecture book. In a typical bit of code, you might have a conditional branch, followed by a short sequence of instructions for each case. For example:
if (a != 0) b /= a;
Because you can't execute this branch until you know the value of a, the entire pipeline stalls until you finish the calculation. MIPS avoids this problem by executing the two instructions immediately after a branch. This means that you try to move a branch instruction two instructions before it should actually execute, and insert some no-ops if you can't do that (increasing code size). CISC chips typically devote a lot of die area to branch prediction, to try to guess which branch will be taken, follow that one, and then retire the results only if the guess was correct.
The ARM approach, also adopted by Itanium, is more elegant. Instructions can execute conditionally on the value of one or more condition registers. In this example, the division operation will always execute, but its result will only be retired if the comparison was correct. By the time the division has been computed, the result of the comparison is known, so there's no delay and no need for branch prediction. This approach helps to increase code density and reduce the requirement for accurate branch prediction.
Another well-known feature of the ARM instruction set is the addition of a barrel shifter in the pipeline, which operates on the second operand of most instructions, optionally shifting or rotating them by an arbitrary amount. Effectively, this gives you a free multiplication or division by a power of two. This technique has a lot of uses, the most obvious being in address computations. If you're loading the nth element of an array, you need to multiply the index by the size of one element and then add it to the address of the start. This multiplication is a power of two for all primitive types, and therefore it can be folded into one of the other instructions in the sequence.
Unlike x86, only the nonprivileged ARM instruction set guarantees backward compatibility. When you initialize an x86 chip (including a modern 64-bit chip), it starts in 16-bit real mode; you need to run some 16-bit code to bring it into protected mode. When in one of these modes, the privileged x86 instruction set also supports all of the weird and wonderful extensions that the 286 and 386 introduced.
In contrast, ARM chips break backward compatibility in the privileged instruction set every generation or two. Therefore, operating systems typically need porting to the new chip, although userspace applications can continue to run unmodified.