Advanced Topics
A single book chapter isn’t the right place to go into great detail on all the features packed into Ubuntu Server. There isn’t enough space, and many of the features are quite specialized. But that doesn’t stop us from taking you on a whirlwind tour. Our goal here is to give just enough information to let you know what’s there and interest you in finding out more about those features that may be relevant to how you use Ubuntu.
Virtualization
If there’s been one buzzword filling out the server space for the past couple of years, it’s virtualization. In August 2007, a virtualization company called VMware raised about a billion U.S. dollars in its initial public offering, and the term virtualization finally went supernova, spilling from the technology realm into the financial mainstream, and soon to CIOs and technology managers everywhere.
Fundamentally, virtualization is a way to turn one computer into many. (Erudite readers will note this is precisely the opposite of the Latin motto on the Seal of the United States, “E Pluribus Unum,” which means “out of many, one.” Some technologies match that description, too, like Single System Image, or SSI, grids. But if we talked about virtualization in Latin, it would be “Ex Uno Plura.”) Why is it useful to turn one computer into many?
Back in the 1960s, servers were huge and extremely expensive, and no one wanted to buy more of them than they absolutely needed. It soon became clear that a single server, capable of running different operating systems at once, would allow the same hardware to be used by different people with different needs, which meant fewer hardware purchases, which meant happier customers with less devastated budgets. IBM was the first to offer this as a selling point, introducing virtualization in its IBM 7044 and IBM 704 models, and later in the hardware of its Model 67 mainframe. Since then, the industry largely moved away from mainframes and toward small and cheap rack servers, which meant the need to virtualize mostly went away: If you needed to run separate operating systems in parallel, you just bought two servers. But eventually Moore’s law caught up with us, and even small rack machines became so powerful that organizations found many of them underutilized, while buying more servers (though cheap in itself) meant sizable auxiliary costs for cooling and electricity. This set the stage for virtualization to once again become vogue. Maybe you want to run different Linux distributions on the same machine. Maybe you need a Linux server side by side with Windows. Virtualization delivers.
There are four key types of virtualization. From the lowest level to highest, they are hardware emulation, full virtualization, paravirtualization, and OS virtualization. Hardware emulation means running different operating systems by emulating, for each, all of a computer’s hardware in software. The approach is very powerful and painfully slow. Full virtualization instead uses a privileged piece of software called a hypervisor as a broker between operating systems and the underlying hardware, and it offers good performance but requires special processor support on instruction sets like the ubiquitous x86. Paravirtualization also uses a hypervisor but supports only executing operating systems that have been modified in a special way, offering high performance in return. Finally, OS virtualization is more accurately termed “containerization” or “zoning” and refers to operating systems that support multiple user spaces utilizing a single running kernel. Containerization provides near-native performance but isn’t really comparable to the other virtualization approaches because its focus isn’t running multiple operating systems in parallel but carving one up into isolated pieces.
The most widely used hardware emulators on Linux are QEMU and Bochs, available in Ubuntu as packages qemu and bochs respectively. The big players in full virtualization on Linux are the commercial offerings from VMware, IBM’s z/VM, and most recently, a technology called KVM that’s become part of the Linux kernel. In paravirtualization, the key contender is Xen; the Linux OS virtualization space is dominated by the OpenVZ and Linux-VServer projects, though many of the needed interfaces for OS virtualization have gradually made their way into the Linux kernel proper.
Now that we’ve laid the groundwork, let’s point you in the right direction depending on what you’re looking for. If you’re a desktop Ubuntu user and want a way to safely run one or more other Linux distributions (including different versions of Ubuntu!) or operating systems (BSD, Windows, Solaris, and so forth) for testing or development, all packaged in a nice interface, the top recommendation is an open source project out of Sun Microsystems called VirtualBox. It’s available in Ubuntu as the package virtualbox-ose, and its home page is www.virtualbox.org.
If you want to virtualize your server, the preferred solution in Ubuntu is KVM, a fast full virtualizer that turns the running kernel into a hypervisor. Due to peculiarities of the x86 instruction set, however, full virtualizers can work only with a little help from the processor, and KVM is no exception. To test whether your processor has the right support, try:
$ egrep '(vmx|svm)' /proc/cpuinfo
If that line produces any output, you’re golden. Head on over to https://help.ubuntu.com/community/KVM for instructions on installing and configuring KVM and its guest operating systems.
If you lack the processor support for KVM, you don’t have great options. Ubuntu releases after Hardy (8.04) no longer offer kernels capable of hosting Xen guests (dom0 kernels aren’t provided, in Xen parlance), which means if you’re desperate to get going with Xen, you’ll have to downgrade to Hardy or get your hands quite dirty in rolling the right kind of kernel yourself, which is usually no small task.
Disk Replication
We’ve discussed the role of RAID in protecting data integrity in the case of disk failures, but we didn’t answer the follow-up question: What happens when a whole machine fails? The answer depends entirely on your use case, and giving a general prescription doesn’t make sense. If you’re Google, for instance, you have automated cluster management tools that notice a machine going down and don’t distribute work to it until a technician has been dispatched to fix the machine. But that’s because Google’s infrastructure makes sure that (except in pathological cases) no machine holds data that isn’t replicated elsewhere, so the failure of any one machine is ultimately irrelevant.
If you don’t have Google’s untold thousands of servers on a deeply redundant infrastructure, you may consider a simpler approach: Replicate an entire hard drive to another computer, propagating changes in real time, just like RAID1 but over the network.
This functionality is called DRBD, or Distributed Replicated Block Device, and it isn’t limited to hard drives: It can replicate any block device you like. Ubuntu 9.04 ships with DRBD version 8.3.0, and the user space utilities you need are in the drbd8-utils package. For the full documentation, see the DRBD Web site at www.drbd.org.
Cloud Computing
Slowly but surely overtaking virtualization as the uncontested hottest topic in IT, cloud computing is just a new term for an old idea: on-demand or “pay-as-you-go” computing. Building and managing IT infrastructure aren’t the core competencies of most organizations, the theory goes, and it’s hard to predict how much computing capacity you’ll need at any given time: If your company store transitions from wallowing in relative obscurity to becoming an overnight Internet sensation, what do you do? Buy up a truckload of new servers, ship them overnight, and work your IT staff to a pulp to bring all this new infrastructure up in as little time as possible? In the interim, your customers are overwhelming your existing capacity and getting frustrated by the slow response times. In the worst case scenario, by the time you have the new hardware running, customer interest has ebbed away, and you’re now stuck having paid for a ton of extra hardware doing nothing at all. Cloud computing is the promise of a better way. Instead of dealing with IT infrastructure yourself, why not rent only the amount of it you need at any given moment from people whose job it is to deal with IT infrastructure, like Amazon or Google?
Cloud services like Amazon’s S3 and EC2 and Google’s App Engine offer exactly that. And Ubuntu is getting in on the action in two ways. As this book goes to press, Ubuntu is offering a beta program wherein Ubuntu images can be deployed to existing Amazon EC2 instances, allowing you to run Ubuntu servers on Amazon’s infrastructure. It is expected that this functionality will become widely available in the foreseeable future. More interestingly, Ubuntu bundles a set of software called Eucalyptus (http://eucalyptus.cs.ucsb.edu) that allows you to create an EC2-style cloud on your own hardware while remaining interface-compatible with Amazon’s. Such a setup offers savvy larger organizations the ability to manage their own infrastructure in a much more efficient way and makes it possible for even small infrastructure shops to become cloud service providers and compete for business with the big boys.