Virtual Memory Design
The address space, in common with most "64-bit" platforms, is actually not quite 64 bits. Pointers are 64 bits, but some bits are ignored. As with x86-64 (and SPARC64, and so on) there's a big hole in the middle of the address space, with the bottom (addresses up to 0x0000FFFFFFFFFFFFFFFF) reserved for userspace, and an equal-sized reservation at the top for the kernel.
This design gives us a 256TB address space, which should be enough for a while. If memory prices continue to fall so that the cost of memory halves every year (a bit faster than it has been doing historically), handheld computers will come with 256TB in about 20 years. By then, I expect ARMv9 to be released.
There are several interesting things about how this design is implemented. One is that the top byte of the address is completely ignored when doing the virtual-to-physical mapping. This means that they can be used to implement tagged pointers. There are a few really interesting things that you can do with these pointers, especially in object-oriented languages. The typical way of allocating an object is to use the first word to contain a pointer to its class (or vtable, in C++). For small objects, the class pointer can end up being a lot of the object's total space.
This is especially true for languages like JavaScript, where even numbers are objects. A naïve implementation would require 128 bits to store a 32-bit integer (96 bits, but rounded up to 128 by malloc). Using a tagged pointer lets you store the integer inside the pointer value, so you don't need any allocations.
With 8 bits, you can define 255 classes that can omit the class pointer from their instances. This significantly reduces the total memory usage; more importantly, it can significantly reduce the cache usage.
As you might expect from a modern architecture, ARMv8 is designed to support virtualization. Translation from virtual to physical memory addresses can be quite complexfirst translating to pseudo-physical and then to physical via two levels of page tables. For a very detailed explanation of how this works, see my book The Definitive Guide to the Xen Hypervisor. Put simply, the hypervisor needs to perform the same mapping from what the guest OS thinks is real memory to physical memory, in the same way that the OS maps from a process' virtual memory to physical memory.
This mapping can be quite slow, but ARMv8 does a few things to make this process simpler. One is to allow 64KB pages. x86-64 allows either 4KB or 2MB pages. PowerPC allows 64MB pages. Both of these are a bit too big to be useful in most cases, but 64KB is probably close to the sweet spot. It allows a malloc() implementation to get a reasonable amount of memory from the kernel cheaply without increasing internal fragmentation too badly, although it's probably more likely to be used for the hypervisor's page tables.