Introduction to Hyper Text Transfer Protocol
The Hypertext Transfer Protocol (HTTP) is the foundation protocol of the World Wide Web (WWW). The name is somewhat misleading in that HTTP is not a protocol for transferring hypertext; rather, it's a protocol for transmitting information with the efficiency necessary for making hypertext jumps. The data transferred by the protocol can be plain text, hypertext, audio, images, or any type of Internet-accessible information.
HTTP Overview
HTTP is a transaction-oriented client/server protocol. The most typical use of HTTP is between a web browser and a web server. To provide reliability, HTTP makes use of TCP. Nevertheless, HTTP is a "stateless" protocol; each transaction is treated independently. A typical implementation creates a new TCP connection between client and server for each transaction and then terminates the connection as soon as the transaction completes, although the specification doesn't dictate this one-to-one relationship between transaction and connection lifetimes.
The stateless nature of HTTP is well suited to its typical application. A normal session of a user with a web browser involves retrieving a sequence of web pages and documents. Ideally, the sequence is performed rapidly, and the locations of the various pages and documents may include a number of widely distributed servers.
Another important feature of HTTP is flexibility in the formats that it can handle. When a client issues a request to a server, it may include a prioritized list of formats that it can handle, and the server replies with the appropriate format. For example, a lynx browser can't handle images, so a web server need not transmit any images on web pages to this browser. This arrangement prevents the transmission of unnecessary information and provides the basis for extending the set of formats with new standardized and proprietary specifications.
Figure 1 illustrates three examples of HTTP operation.
Figure 1 Examples of HTTP operation.
The simplest case is one in which a user agent establishes a direct connection with an origin server. The user agent is the client that initiates the request, such as a web browser being run on behalf of an end user. The origin server is the server on which a resource of interest resides; an example is a web server at which a desired home page resides.
For this case, the client opens a TCP connection that's end-to-end between the client and the server. The client then issues an HTTP request. The request consists of a specific command (referred to as a method), a URL, and a message containing request parameters, information about the client, and perhaps some additional content information.
When the server receives the request, it attempts to perform the requested action and then returns an HTTP response. The response includes status information, a success/error code, and a message containing information about the server, information about the response itself, and possibly body content. The TCP connection is then closed.
The middle part of Figure 1 shows a case in which there is no end-to-end TCP connection between the user agent and the origin server. Instead, there are one or more intermediate systems with TCP connections between logically adjacent systems. Each intermediate system acts as a relay, so that a request initiated by the client is relayed through the intermediate systems to the server, and the response from the server is relayed back to the client.
Three forms of intermediate system are defined in the HTTP specification: proxy, gateway, and tunnel, all of which are illustrated in Figure 2.
Figure 2 Intermediate HTTP systems.
Proxy
A proxy acts on behalf of other clients, presenting requests from the other clients to a server. The proxy acts as a server in interacting with a client and as a client in interacting with a server. Several scenarios call for the use of a proxy:
Firewall. The client and server may be separated by a firewall, with the proxy on the client side of the firewall. Typically, the client is part of a network secured by the firewall and the server is external to the secured network. In this case, the server must authenticate itself to the firewall to set up a connection with the proxy. The proxy accepts responses after they have passed through the firewall.
Different versions of HTTP. If the client and server are running different versions of HTTP, the proxy can implement both versions and perform the required mapping.
In summary, a proxy is a forwarding agentreceiving a request for a URL object, modifying the request, and forwarding the request toward the server identified in the URL.
Gateway
A gateway is a server that appears to the client as if it were an origin server. It acts on behalf of other servers that may not be able to communicate directly with a client. There are several scenarios in which servers can be used:
Firewall. The client and server may be separated by a firewall, with the gateway on the server side of the firewall. Typically, the server is connected to a network protected by a firewall, with the client external to the network. In this case, the client must authenticate itself to the proxy, which can then pass the request to the server.
NonHTTP server. Web browsers have a built-in capacity to contact servers for protocols other than HTTP, such as FTP and Gopher servers. This capability can also be provided by a gateway. The client makes an HTTP request to a gateway server. The gateway server then contacts the relevant FTP or Gopher server to obtain the desired result. This result is then converted into a form suitable for HTTP and transmitted back to the client.
Tunnel
Unlike the proxy and the gateway, the tunnel performs no operations on HTTP requests and responses. Instead, a tunnel is simply a relay point between two TCP connections, and the HTTP messages are passed unchangedas if there were a single HTTP connection between user agent and origin server. Tunnels are used when there must be an intermediary system between client and server but it's unnecessary for that system to understand the contents of any messages. An example is a firewall in which a client or server external to a protected network can establish an authenticated connection, and then maintain that connection for purposes of HTTP transactions.
Cache
Returning to Figure 1, the lowest portion of the figure shows an example of a cache. A cache is a facility that may store previous requests and responses for handling new requests. If a new request arrives that's the same as a stored request, the cache can supply the stored response rather than accessing the resource indicated in the URL. The cache can operate on a client or server or on an intermediate system other than a tunnel. In Figure 1, intermediary B has cached a request/response transaction, so that a corresponding new request from the client need not travel the entire chain to the origin server, but instead is handled by B.
Not all transactions can be cached, and a client or server can dictate that a certain transaction may be cached only for a given time limit.
Request Messages
A request message is sent by an agent to a server to request some action. These are the possible actions, called methods:
Method |
Description |
OPTIONS |
A request for information about the options available. |
GET |
A request to retrieve information. |
HEAD |
Like a GET except that the server's response must not include an entity body; all of the header fields in the response are the same as if the entity body were present. This enables a client to get information about a resource without transferring the entity body. |
POST |
A request to accept the attached entity as a new subordinate to the identified URL. |
PUT |
A request to accept the attached entity and store it under the supplied URL. This may be a new resource with a new URL, or a replacement of the contents of an existing resource with an existing URL. |
DELETE |
Requests that the origin server delete a resource. |
TRACE |
Requests that the server return whatever is received as the entity body of the response. This can be used for testing and diagnostic purposes. |
Response Messages
A response message is returned by a server to an agent in response to a request message. It may include an entity body containing hypertext-based information. In addition, the response message must specify a status code, which indicates the action taken on the corresponding request. Status codes are organized into the following categories:
Category |
Description |
Informational |
The request has been received and processing continues. No entity body accompanies this response. |
Successful |
The request was successfully received, understood, and accepted. |
Redirection |
Further action is required to complete the request. |
Client Error |
The request contains a syntax error or the request cannot be fulfilled. |
Server Error |
The server failed to fulfill an apparently valid request. |