Web Browsers and Servers
As intelligent as Web browsers currently are, web servers are smarter still. A single web server can host hundreds of different websites, manage many different types of content, read/write information from/to databases, and speak multiple languages, both human and artificial. A web server knows who you are (to be precise, it knows the Internet address of your computer and what browser is being used), it keeps track of each request you make, and it logs whether it was able to comply with the request.
The Web has a client/server architecture, as illustrated in Figure 1.3. Most Internet protocols are client/server, including File Transfer Protocol (FTP), email, and many online games. A web server is a computer that resides on a rack somewhere, or is tucked into a back closet, patiently waiting for a client program to send it a request it can fulfill. As far as the web server is concerned, anything that sends it a request is considered an important client. In Web-speak, the client programs are called user agents. Web browsers are the most important user agents. Robots, or "bots" as they are sometimes called, are another kind.
Figure 1.3 The Web's client/server architecture
Widgets can also be user agents. Loosely defined, a widget is a small computer program. It is packaged so that it can be easily installed as an extension of a larger computer program, such as a web browser or mobile device, and it runs in its user interface. A widget can, in response to a mouse click or other user action, send requests to web servers just like browsers and robots do. Unlike robots running on large servers, organizing large masses of information, a widget typically uses the returned information to update the content in a specific page element.
Widgets come in many varieties and are rarely harmful. They run within the browser's security setup and are generally isolated from your computer's file system. However, they can cause trouble if they are not well written. The problems include messing up the display of a web page, using up too much of the browser resources, or even causing a browser to crash.
Any stand-alone computer application or software program that exchanges information over the Web (Twitter clients, for example) is a user agent. So are the automatic software update programs that come with computer operating systems. So is the online Help feature of Microsoft Word or, for that matter, an Xbox, Nintendo, or PlayStation game console. Many of the apps on a modern smartphone are user agents, sending requests to web servers and using the returned information to do something useful or keep you informed.
Every web browser must provide three basic functions: 1) It must provide a control interface for human users; 2) it must exchange information with other computers; and 3) it must interpret HTML and render a web page. We are primarily interested in this last function—how HTML is understood by a browser and how that determines what is seen on the page. Many browser makers use the same open source, HTML rendering engines and differ mostly in their user interfaces. As a result, only four browser types cover most Web surfing: Internet Explorer, Mozilla (Firefox, Flock), Webkit (Safari, Chrome), and everything else (mobile phone browsers, legacy versions of IE, and Internet appliances).
As with browsers, several different web servers are in use today, hosting nearly a quarter billion websites in total. By far the most popular web server, according to a November 2009 survey by Netcraft, is Apache, an open-source product from the Apache Foundation. It hosts about half of all sites worldwide. The next most popular web server is the Internet Information Server (IIS) from Microsoft, with about one-third of the market. The remaining web servers are Google Web Server (GWS), which the company uses internally to host its massive search engine and user sites; nginx (pronounced "engine X"), a free, lightweight, high-performance server written by Igor Sysoev; and Qzone, a Chinese web server used by QQ.com to host upward of 20 million blogs under its domain.
When a web server receives a request from a user agent, all it has to do is figure out which file to return. Actually, it is a bit more complicated than that. Apache, for example, has a modular structure with "hooks" that allow a systems administrator to include custom components. Apache analyzes the incoming request, applying defaults and rewriting rules. It determines whether to satisfy the request by returning the contents of a file or by executing a program and returning the output. If the requested resource requires authentication, Apache returns a status code instructing the browser to resubmit the request after prompting for a username/password combination. The HTTP request contains additional information such as the name of the browser or user agent and the preferred language. This enables Apache to provide a different page for mobile users or to substitute a translation of the requested page if one is available.
Web browsers and servers speak many other Internet protocols. Browsers are, in a sense, the Swiss army knives of Internet clients. Web servers have plug-in interfaces to email, database, FTP, streaming video players, and other services. Web servers can also make requests to each other and serve as mirrors or proxies for each other.