HAPPY BOOKSGIVING
Use code BOOKSGIVING during checkout to save 40%-55% on books and eBooks. Shop now.
Register your product to gain access to bonus material or receive a coupon.
Gives students powerful additional resources for learning Apache, and updates for staying current. Ex.__-
Helps students rapidly find the solution they are searching for. Ex.___
Gives students a jumpstart by providing directives they can borrow and adapt. Ex.___
Teaches students how to perform one of the most crucial tasks in real-world Web site management—a task neglected by many Apache introductory guides. Ex.___
Essential Apache for Web Professionals is the fastest way for Web professionals to master key skills for configuring, deploying, and managing virtually any site! Through hands-on projects and real configuration directives, you'll learn how to use the world's #1 Web server to handle virtual hosting, database connectivity-even complex session management and load balancing tasks. You'll start with simple examples, then work your way up to sophisticated projects, learning practical techniques you'd otherwise need months-or years-to learn. The companion Web site contains downloadable configuration directives, sample images for the book's projects, and even more in-depth explanations!
You'll master all this, and much more!
Rely on Essential Guides for ALL the Web Skills You Need!All these books share the same great format, and the same dynamic Web site... so once you've used one, they're all a piece of cake!
1. Installation.
2. Basic Apache.
3. Hosting Multiple Sites.
4. Dynamic Content.
5. Advanced Topics.
This book is a discussion of the installation, configuration, andmaintenance of the Apache Web server. At this writing, Apache isthe most popular Web server in the world. Apache is open-sourcesoftware; among other things, that means it is available fordownload at no cost.
The source code also is included with most distributions. Ifyou choose, you can modify Apache to suit your needs. This featurehas led to a rich variety of third-party add-ons. Many talentedprogrammers have chosen to make their work available tothe general public.
Apache is high-quality software. It is rare to encounter anerror in the source code itself. If you do encounter a problem,technical support is available from a variety of outlets on theInternet and in bookstores. Some companies also provide phonesupport for a fee.
This book will teach you how to use the Apache Web server.The discussion assumes a basic familiarity with computer concepts,but if you use computers at all, you should find the bookaccessible. In this chapter, I first provide a brief discussion of somegeneral networking concepts. If you've been working with net-worksfor a while, feel free to skip this section. If not, you shouldreview it, as the discussion in later chapters presumes a familiaritywith the terms introduced here. The last portion of this chapterlays out the typographical conventions of the rest of the book.
In this section I will introduce several fundamental concepts ofnetworking software in general. Next I'll cover several conceptsthat are specific to Apache.
Apache is a Web server. A Web server is a piece of software thatresponds to the requests of Web browsers. When you type a URLin the address window of your Web browser, an intricate ques-tion-and-answer sequence is initiated between your browser andvarious Internet services. In order to understand the material inthis book, you need to have some understanding of these pro-cesses,so I will explain them first.
If you're even peripherally involved in the computer industry,you're probably familiar with the concept of an IP address. An IPaddress is a sequence of four numbers, each ranging in valuefrom 0 to 255, which are separated by periods. The following isan example of an IP address:
192.168.100.1
You will probably notice that most of the examples in thisbook use addresses in the range of 192.168.100.1 to 192.168.100.255. These aren't real addresses, at least not ones you can get tofrom the Internet. They are part of a range of addresses that wasset aside for private networks not connected to the Internet. Assuch, they are perfect for examples--because they are not real,they cannot be hacked.
As members of the browsing public, we are accustomed to thinkingof Web addresses in terms of their domain names. A domainname is an address of the form:
www.stitch.com
You might be surprised to learn that those names are not ofmuch use to your computer. Computers almost never care aboutEnglish names for things. In order to connect to your Apacheserver and start downloading information, the Web browser thatwants to be your client must know two things about you:
However, in all likelihood, when users try to connect to your Website, all they have is your domain name. How do we get from the
www.stitch.com
printed on your business card to the IP address and port numberthe networking software uses?
The first step in the process is name resolution. Name resolutionis the process of looking up the IP address associated witha domain name. Name resolution usually occurs without anyhelp from the end user. When you install networking software onyour PCsuch as the kind provided by your Internet service provider(AOL, Earthlink, and so on)part of the installation processis to tell your machine where to go when it needs some name resolutiondone.
Usually, the machines that perform name resolution arelarge, powerful server machines that are dedicated to that onetask. Most of them run software called the Domain Name Service(DNS). Not every machine that runs DNS contains every singleaddress of the Internet. DNS servers store only the addresses thatare most popular among their client bases. When they are askedto resolve a domain name with which they are not familiar, theypass the question on to another DNS server. The details of thename resolution process aren't really important to you as anadministrator. The key point to remember is this:
When you decide to add a new Web site to your server, youmust make sure the Internet at large knows that thedomain name you are supporting is associated with the IPaddress of your server. The actual mechanics of this processare probably outside of your control. In practice, DNS registrationusually is accomplished by picking up the phoneand calling your Internet service provider (ISP). Tell themthat you want the domain name you are hosting to be reg-isteredin DNS as belonging to your IP address. Generallythis process takes a couple of hours on hold and $50 or so.You also should allow a couple of days for news of thechange to travel from the DNS software of your ISP out tothe world at large.
Once you have found an unclaimed domain name you can livewith and have registered it with DNS, the worst is over.
Let us assume that the example browser has contacted a DNSserver and that name resolution has been completed successfully.Now the browser knows the IP address of the machine with whichit wants to communicate.
However, you may recall that earlier I said that, in order tomake a network connection, the client browser also needs toknow what port the Web server will be listening on. The machineassociated with the IP address you found may be running multiplenetwork services (ftp, telnet, etc.). Each of these services mustrespond to different requests in different ways. How does theserver keep them separated? The answer is ports.
A port is a secondary number associated with an IP address.Ports come in the range of 1 to 65535. Rather than asking eachindividual machine which service it associates with which port, ithas become customary for all machines connected to the Internetto use the same port for the same services. The term for this customis well-known port. The well-known port for Web service isnumber 80. When connecting across the secure socket layer (SSL),port 43 also is used.
A socket is a network programming construct that enables twomachines to communicate across a network. A socket is definedby the IP address of the originating machine, the IP address of theterminating machine, and the port they are using to communicate.Socket connections are requested by the client browser. Ifthere is a server process (such as Apache) on the machine at theIP address requested by the client, monitoring the well-knownport associated with Web connections, that server will accept theconnection. At that point, a socket is created.The actual transmission of Web pages occurs across the socketconnection.
The term protocol, as it is used in computer science, is derivedfrom the term as it is used in human interaction. Just as diplomats and debutantes have all sorts of rituals they perform tofacilitate a smooth interaction between parties, so do computers.The idea is that computers aren't versatile enough to improvise,so the order and nature of each requestand each response toeach requestmust be rigidly defined.
To give you just a rough idea of what I'm talking about, thefirst thing a server does after it has accepted a connection from aclient is to transfer information about which version of the protocolit is using across the socket. The client browser uses this informationto fine-tune the nature of the requests it sends and itsresponse to the information it receives. Next, the client has anopportunity to request data. The server responds to that requestwith either a Web page or an error message. The client displaysthe data it received and the cycle repeats itself.
All network services use some sort of protocol. Sometimes, asin the case of File Transfer Protocol (ftp) and HyperText TransferProtocol (http), the names reflect this. The protocol associatedwith the World Wide Web is http.
It's worth noting that the http protocol is not absolutely ideal.At the time it was created in 1990, something called "the Internet"did exist; it was largely the province of academics and lonelysingle men. The relentless hype that came to characterize it in themid-1990s was still years in the future. The most popular Internetapplications at the time were newsgroups and bulletin boards,both of which were, for the most part, text only. This was partly afunction of bandwidthmodems at the time were glacially slowcompared to what's available today. At 300 baud, even text-onlymessages took an achingly long time to download, and imagefiles were out of the question.
Modem speed improved, of course. At about the same time, aguy at CERN (a European research center) named Tim Berners-Leedeveloped a piece of software that would exploit both theincreasing speed of modems and the graphical user interface(GUI) capabilities of modern operating systems. His http enabledthe user to access dataincluding picturesacross a networkusing an intuitive, point-and-click interface.
This was truly a brilliant idea, and it took off immediately.However, in retrospect, it may have taken off too quickly. Let mepreface these next few sentences with a disclaimer: I am about toindulge in some shameless Monday-morning quarterbacking. Iwas a computer science student during this period, and I hadaccess to the same sorts of resources Tim Berners-Lee did. The main difference between us is that I was the one who failed toinvent the World Wide Web.
Having said that, I will go ahead and point out that http containedno provision for the secure transfer of data, no provisionfor the execution of scripts on either the client or the server side,and only rudimentary graphic-formatting capabilities. For thelast 11 years or so, the computer science community hasexpended enormous energy trying to find a way to retrofit thesecapabilities into http. The solutions that have been developed arecertainly functional, but no one ever describes them as elegant.
To be fair, I don't think that anybody at the time had anyidea just how huge the Web was going to be. If they had, theymight have spent a bit more time refining the protocols beforereleasing them on an unsuspecting public.
Usually, Web servers handle requests from many browsers simultaneously.If a single server process were to handle all of theincoming requests, a great deal of overhead would be incurred inkeeping track of who wants what, what stage of the protocol theyare in, and so forth. In the UNIX environment, there is a simplerway: On UNIX systems, each client is assigned its own individualserver process.
How does this work? When Apache is started, the first thing itdoes is check whether it is the first such process on the machine.The first process, called the parent, has rights and responsibilitiesthat the other processes do not have. Specifically, it is responsiblefor creating copies of itself, called child processes, tohandle user requests. It also is responsible for killing the childprocesses off as necessary. As an Apache server administrator,you have the ability to control the number of these processes.
Apache on Windows is slightly different. On Windows,Apache relies on multiple threads within a single process to handleall user requests. The Apache program has a lot of assumptionsabout parent and child processes that were difficult toremove when the windows port was performed, so there is a parentprocess as well. Note, however, that the parent/child model isnot optimal.
Apache is a versatile piece of software. It alters its behavior at runtime based on the values of hundreds or even thousands of differentvariables stored in its configuration file. These variables arecalled directives. Most of this book is concerned with definingwhat these directives do, what their possible values are, and howyou can best exploit them to suit your needs.
Even the simplest Apache server will need to have dozens ofdirectives set. Rather than type the directives in when the serverprocess is invoked, as is common with Unix command line utilities,Apache stores the directives in a configuration file. This configuration file is a plain old text file. You can edit it with yourfavorite text editor, copy it at will, and generally treat it as youwould any other text file.
In order for any changes you make in the configuration file totake effect, you must restart the server process. The details of howto do this are discussed in Chapter 2.
Apache distributions all come with the same chunk of basic functionality,called the core, enabled by default. This functionalityincludes the ability to do such basic tasks as read its configurationfile, perform rudimentary access control, and find the Webpages it is supposed to be serving.
Each of these (and many other) tasks is handled by its ownclearly defined section of code. These sections of code are calledmodules. Apache is designed so that you can use only the modulesyou really need and discard the rest.
In order to fully exploit the modular capabilities of Apache,you will need to create an executable program from the sourcecode provided with the distribution. The process of creating anexecutable program from source code is called compilation.The program you end up compiling is called httpd. The compilationprocess is discussed in detail in Chapter 1.
It's worth emphasizing here that Apache is httpd. The termswill be used interchangeably throughout this book and all otherApache documentation. Why don't we just call httpd "Apache"?That's a fair question. The code that eventually became theApache server is descended from a program called the HyperTextTransfer Protocol Daemon. The name Apache is one of the weakjokes common among programmersit refers to the fact thatearly versions of the server required a lot of software patches inorder to run correctly. By the time the name Apache was coined,using the label httpd for the running server process was unassailablyentrenched in both the source code and documentation.
Perhaps the best way to build a module is as a DynamicShared Object (or DSO). A DSO is a module that can be added toor removed from the httpd executable as the server is being startedsimply by changing a few directives in the configuration file. Thisis an amazingly handy ability. Compiling a module as a DSO isslightly more complicated than compiling it into a static serverprocess, but it is a smart investment of time. The details of compilingDSOs also are discussed in Chapter 1.
Modules sometimes provide specific handlers, which are methodsof processing files or requests in an unusual way. Sometimeshandlers are named so that they can be referred to in configurationdirectives. Named handlers and their associated modulesare listed in Table 0.1.
Handler | Module | Effect |
---|---|---|
send-as-is | mod_asis | Serve file and headers as-is |
cgi-script | mod_cgi | Attempt to execute and serveoutput |
imap-file | mod_imap | Imagemap rule file |
server-info | mod_info | Display server configurationinformation |
server-parsed | mod_include | Locate and replace server-sideincludes |
server-status | mod_status | Display server status information |
type-map | mod_negotiation | Parse as type map file |
As I implied in the discussion of modules, you must include amodule in the current httpd executable before you can access itshandler.
MIME is an acronym for Multimedia Internet Mail Extensions.The idea behind MIME types is to enable a program to determinewhat kind of data a file contains by looking at the file's extension.Apache comes with a default mechanism that enables you to define how MIME types will be presented to the client. Like everythingelse in Apache, this mechanism is fully configurable.
Throughout this book you'll find example commands and configurationdirectives, always accompanied by at least some explanationand sometimes by example output. In general, I don'tprovide detailed syntax information for directives and systemcommands in the regular text. That sort of thing is found in theAppendices, particularly Appendix A (Core Directives) andAppendix B (Other Directives). I hope you'll be able to glean thegeneral nature of any command with which you are unfamiliarfrom the context.
The success or failure of any given Apache transactiondepends on the internal server configuration, the content beingtransferred, the configuration of the underlying operating system,and the vagaries of the network support services. Given that, it isimpossible to say with absolute certainty that the examples presentedherein will run on your particular machine. You have mysolemn vow that I typed each and every one of them in and theyworked for me.
If you have any questions, comments, corrections, or suggestionsfor improvement, please feel free to contact me at:
s_hawkins@mindspring.com
Additional information about this and other books in PrenticeHall PTR's Essential Web Series can be found at:
www.phptr.com/essential/
A Web server is a piece of software that monitors an IP addressand port and uses the http protocol to respond to requests fromclient browsers. The Web pages are served across a network connectioncalled a socket.
The behavior of the Apache server is controlled by variablescalled directives stored in a configuration file. Apache is not a singleprocess but, rather, a collection of nearly identical child processesthat are created and destroyed by a parent.
Apache is composed of modules that may be included in theserver process at the discretion of the administrator. Some modulesprovide handlers, which are methods of processing files orrequests in a nonstandard way.