Transporting Information
In many ways, the Internet is very much like the postal service. The postal service delivers letters and postcards using various means of transportation and levels of reliability, such as postal carriers to carry local mail to its destination, trucks for moving mail locally or in bulk over long distances, and airplanes for moving mail over long distances quickly. It also offers different service levels and delivery time guarantees. Similarly, the Internet employs a number of technologies to move packets from place to place at various speeds and levels of reliability. The details of packet transport will be covered in Chapter 4, "Network and Application Protocols: TCP/IP," but it is worth reviewing a few of the general concepts.
The fundamental piece of data on the Internet is the Internet Protocol (IP) packet. IP packets can vary in size from the equivalent of a few dozen characters (such as letters and numbers) to a few thousands of characters. Emails, Web pages, music files, and movies are broken into packets of data and sent one at a time across the Internet (through the various ISPs that make of the Internet) from one device to another.
Some readers might confuse "the Internet" with "the Web." The Internet is simply the very large global IP network that transports many different types of application data, such as email, chat, and Web pages, between users and servers. The collection of Web servers that houses all the different cross-linked Web pages and e-commerce sites collectively commonly is known as the World Wide Web, or simply "the Web." Similarly, the Internet is the foundation for many other application networks, such as the EFNET and Undernet IRC chat networks, and file-sharing networks such as Gnutella.
The Internet moves all of this application data from place to place using some pretty simple techniques. Each computer on the Internet must have an address of some kind, and there must be a way to route the data from place to place until it gets to its final destination.
Addressing
Just like letters and postcards, a packet must have an address so that it can be routed across the network and delivered to the appropriate recipient. Where we use street addresses, cities, states, and ZIP codes to address letters, the Internet uses IP addresses. Every device on the Internet has an IP address, which is a number that takes on the now-famous "dotted-quad" format:
192.168.55.104
This is all that is needed to uniquely identify a computer on the Internet (the more experienced reader might take issue with that statementwe will defer the finer points of IP addressing until Chapter 4). Just like most letters, IP packets include a "return address" that provides the destination computer with a way of answering back. It should be noted that there is a limited supply of IP addresses. A total of about four billion possible addresses are available under the current version of the Internet Protocol. A planned upgrade to IP will dramatically increase this number.
Beyond physical access, the IP address is the most important thing that an ISP provides to its customers. Whenever a device is "on the Internet," it has one of these addresses. The address might not be the same one every time a customer logs onto an ISP, but the IP address is the key to being capable of exchanging data with any other system on the Internet.
Networks
The Internet is really a community of networks. The big ISP backbones connect to each other to form the core of the Internet. They also connect their own customers, including smaller ISPs and large enterprise networks, to the Internet. Smaller ISPs provide connections to other companies as well as to each other. At the end of the day, the Internet is a big mesh of both large and small networks. Without going into detail at this point, it is sufficient to say that the aforementioned IP address actually has two parts to ita network and a host number. If you know how to read these numbers, you can fairly rapidly figure out which network the IP address "belongs" to.
The concept of the network can be taken to a very fine-grained level. We have already established that the Internet is a "network"in fact, by definition, it is a network of networks. Each ISP runs its own "network" and provides Internet access to both home users (who might be running their own "home network") and commercial users, whose internal networks often are called "enterprise networks." Once inside an enterprise network, the local administrators might break their infrastructure into smaller networks associated with different parts of the organization. The HR group might have its own network, while the engineering group might have its own network. This network of networks is illustrated in Figure 3.3.
Figure 3.3 A network of networks.
Routing
Of course, the job of the Internet (or any other network built on the IP protocol) is to allow a computer from one network (for example, a personal computer connected to a small ISP in Cincinnati, Ohio, U.S.A.) to talk to another computer from another network (for example, a Web server at a big electronics company in Tokyo, Japan). This is where routers come into play. Each network is built from a collection of computers and routers; the job of the routers is to look at each IP packet that comes their way, figure out where the packet should go next, and send it on its way. If all goes well, the packet will go through anywhere from one to a few dozen appropriately selected routers, and the packet will end up at the server on the other side of the world.
The fact that the routers are capable of making smart decisions about how to pass on the packets is what makes the Internet work. Suffice it to say that at the same time all of the computers are talking to each other, exchanging emails and Web pages, the routers on the Internet also are talking to each other. They are exchanging information about things such as the speed of their links, what other routers or networks they can access, any congestion on the network, and so on. As a result of all of this router-to-router conversation, each router has a pretty good understanding of how to make decisions about what other routers to hand packets off to.
Overview of TCP/IP
The TCP/IP protocol family is the core technology that brings this all together. The term TCP/IP is used to refer to a number of individual protocols that collectively make the Internet work. Members of this family of protocols include:
Internet Protocol (IP)
Internet Control Message Protocol (ICMP)
Transmission Control Protocol (TCP)
User Datagram Protocol (UDP)
Simple Mail Transfer Protocol (SMTP)
File Transfer Protocol (FTP)
Hypertext Transfer Protocol (HTTP, the protocol that makes the World Wide Web what it is)
Protocols are simply agreed-upon methods of communicating. In this case, they are specifications that tell software and electronic hardware developers how to organize numbers and words so that packets can be built that have meaning to every other computer, router, or switch on the Internet (or any other private network that is based on TCP/IP).
We will reserve our discussion of the details of these protocols for Chapter 4; however, it is important for you to grasp the importance of these standards. Communications standards are good because they enable many software and equipment vendors to develop products that interoperatein this case, on the Internet. However, we will focus much of our attention in this book on the fact that these protocols were not developed with security as a top priority and that they are designed to facilitate communications between computers, not restrict it. This gives anyone who wants to communicate with any computer with connectivity to the Internet a big head start in his task.
Furthermore, we will place a great deal of focus on understanding some of the details of these protocols. The reason for this is that many "hacks" come from the ability to understand and take advantage of the finer points of a protocol or, as is often the case, to understand the details of the implementations of the protocols. Indeed, the title "hacker" used to be accepted proudly by developers who had enough insight into the operating systems and programs they had to work with that they could solve seemingly impossible problems. It was only when this kind of insight and skill was applied to less scrupulous activities that the term "hacker" got the malicious connotation that it has today.
For example, the TCP and IP protocols are described in two fairly lengthy technical documents. The hardware and software developers who build routers and write "protocol stacks" and Web browsers read these documents and design and build boxes or programs that implement the protocol. Unfortunately, they historically have assumed that everyone else who was working in their industry would read the same documents and develop products that would work cooperatively with theirs. Because of this, they do not always check for every twist or turn that someone with a more malicious intent might take to make an unsuspecting box or program do something the original developer never intended.
Consider a Web browser connecting to a Web server. Before it can even request a Web page from the server, it must exchange a few packets so that both sides of the connection are synchronized with each other (more on this in Chapter 4). At the end of the connection, after the Web page has been retrieved, one of the parties sends a packet that effectively says, "I'm done, let's disconnect." This sequence of events is analogous to making a phone call: One person calls another, the two talk, and then both people hang up. TCP/IP connections work the same way; however, with TCP/IP, it is actually possible to send a packet that is the equivalent of a "call" and a "hang-up" at the same time. There is no valid reason for this, but it can be (and is) done.
Because no developer ever expected that this might happen, the different software implementations of TCP/IP handle this differently. A few clever hackers figured this out in the late 1990s and used this knowledge to develop a method for remotely probing a computer with illegal packets and watching how it responded. It turns out that it is actually possible to learn a lot about a computer, such as what operating system it is running, simply by observing how it responds to these packets. Similarly, few developers considered that a packet might have the same source and destination IP addresses (TCP/IP actually provides a special address that allows a computer to "talk to itself"). As it turns out, some TCP/IP programs ended up handling the receipt of such a packet very poorly, and the computer ended up crashing, or "blue-screening," as it is said in the Microsoft world.
The Domain Name Service
Although the numeric "dotted-quad" IP addresses that are written in IP packet headers are quite easy for computers to process, humans need something a little friendlier to remember. Fortunately, the Internet provides the equivalent of a phone book in the form of the Domain Name System, or DNS. Suppose that a user would like to visit the online catalog maintained by a company called Acme. DNS allows the user to remember a simple name such as http://www.acme.com instead of the IP address that Acme's ISP happened to assign this server. This is the domain name of the Web server.
When Acme secures the right to use this domain name on the Internet, it can contract with any ISP that it likes to provide connectivity and an IP address for that server. The portability of the domain name comes from keeping an accurate mapping between the name and the current IP address in a DNS server, which is highly analogous to a phone book. If you want to know how to contact Acme's Web server, you simply "look up" the IP address by sending a request to a local DNS server. The reality is that few users actually ever do this; instead, their Internet applications do it for them. The domain name resolution process is illustrated in Figure 3.4. The user simply types http://www.acme.com into a Web browser (step 1), and suddenly Acme's home page is displayed. In the second or two that it took for the page to appear, the Web browser has sent a query to its DNS server on the Internet to find out exactly what IP address is currently assigned to the Acme Web server (step 2), the DNS server responded with the answer (step 3), and then it contacts that Acme Web server by sending packets to its IP address (step 4). This is analogous to the way that we locate people. A person knows someone's name, finds the phone number that currently is assigned to that person by the phone company, and uses this phone number to contact him. If the person moves, you simply look up the new phone number to contact him.
Figure 3.4 Resolving a domain name.
Given a high-level understanding of how DNS works, it now should be clear that most Internet applications rely on several things:
The ability for everyone to get an IP address for their computers. This is one of the fundamental services that an ISP provides its customers. Inside an enterprise network, computers get their IP addresses through either manual configuration or an assignment protocol such as the Dynamic Host Configuration Protocol (DHCP).
The ability for individuals or organizations to reserve a unique domain name. Unlike the phone book, there cannot be two acme.coms on the Internet.
The ability to ask a DNS server to translate a domain name into an IP address. Recall that an ISP usually provides its customer (or the customer's computer) with the address of at least one DNS server that can answer these queries.
In theory, it is possible to function on the Internet without a DNS server, although things get a little more difficult. To do this, you would have to remember the IP addresses of every computer that you want to connect to, or you would have to maintain the equivalent of a personal phone list on your computer, in which you could keep a record of the computer names and their respective addresses. Because DNS servers do this for you, they're much more convenient to utilize. A more detailed description of the mechanics of DNS is reserved for the next chapter; for now, it is important to understand the basic roles of domain names, IP addresses, and the DNS servers on the Internet.
Top-Level Domains
Domain names are assigned in a hierarchical manner, with several "trees" of domains currently defined. One of the most widely known of the domains is .comin fact, it is so widely known that a whole industry (the "dot com," may it rest in peace) was named after it. The top-level domains, or TLDs, are fairly self-explanatory:
.comCommercial entities
.netEntities associated with Internet networks
.orgNoncommercial organizations
.eduEducational institutions
.govU.S. government institutions
.milU.S. military organizations
Given the popularity of the .com domain in the recent decade, several new TLDs have been proposed:
.aeroAir-transport industry
.bizBusinesses
.coopCooperatives
.infoInformation sources
.museumMuseums
.namePersonal or individual sites
.proProfessional services
Of these, the .biz and .info TLDs began testing in July 2001, and the .name TLD was scheduled to be released in late 2001.
Furthermore, there is a whole series of "country code" TLDs, or ccTLDs, that are used to provide top-level domain trees in individual countries. Examples of the two-letter country domains include:
.ukUnited Kingdom
.auAustralia
.ruRussia
.krKorea
When an entity decides which domain(s) it wants to be a part of, it simply registers for the domain. Assuming that the registrant qualifies to join that domain (not just anyone can get a .edu domain name, for example) and no one else has obtained the domain name before it, the registrant simply pays the domain name registrar a small fee to make sure that the entry is inserted into the DNS servers so that others can find the entity's IP address(es). So, the IT staff for the aforementioned Acme company registers for the acme.com domain name. When it gets the second-level name (.com is the top-level domain, and acme is the second-level domain), it is free to add whatever additional members to the domain that it wants. Because Acme wants its customers to get to its Web and file servers easily, the company followed the now firmly entrenched convention of naming these systems http://www.acme.com and ftp.acme.com.