- 1.1 Opinions, Products
- 1.2 Roadmap to the Book
- 1.3 Terminology
- 1.4 Notation
- 1.5 Cryptographically Protected Sessions
- 1.6 Active and Passive Attacks
- 1.7 Legal Issues
- 1.8 Some Network Basics
- 1.9 Names for Humans
- 1.10 Authentication and Authorization
- 1.11 Malware: Viruses, Worms, Trojan Horses
- 1.12 Security Gateway
- 1.13 Denial-of-Service (DoS) Attacks
- 1.14 NAT (Network Address Translation)
1.8 Some Network Basics
Although I2 get a bit frustrated with teaching network concepts as if TCP/IP is the only way, or the best way to design a network, this is what today’s Internet is built with, so we need to understand some of the details. Here is a brief introduction.
1.8.1 Network Layers
A good way of thinking about networking concepts is with layers. The concept of a layer is that inside a node, there are interfaces to adjacent layers (the layer above or the layer below). Between nodes, there are protocols for talking to peer layers. The actual protocol inside a layer can in theory be replaced by a layer that gives similar functionality to the adjacent layers. Although layers are a good way to learn about networks, deployed networks do not cleanly follow a layering model. Layers often use data associated with layers other than peer layers or adjacent layers. Layers are often subdivided into more layers, and an implementation might merge layers. ISO (International Organization for Standardization) defined a model with seven layers. The bottom layers look like this:
Layer 1, physical layer. Defines how to send a stream of bits to a neighbor node (neighbors reside on the same link).
Layer 2, data link layer. Defines how to structure a string of bits (provided by layer 1) into packets between neighbor nodes. This requires using the stream of bits to signal information such as “this is the beginning of a packet”, “this is the end of a packet”, and an integrity check.
Layer 3, network layer. This allows a source node to send a packet of information across many links. The source adds header information to a packet to let the network know where to deliver the packet. This is analogous to putting a postal message inside an envelope, and writing the destination on the envelope. A network will consist of many links. Nodes known as routers or switches forward between links. Such nodes are connected to two or more links. They have a table known as a forwarding table that tells them which link to forward on, to get closer to the destination. Usually, network addresses are assigned hierarchically, so that a bunch of addresses can be summarized in one forwarding entry. This is analogous to the post office only needing to look at the destination country, and then once inside that country, forwarding towards the state, and once inside the state, forwarding to the destination city, etc. The usual protocol deployed in the Internet today for layer 3 is IP (Internet Protocol), which basically consists of adding a header to a packet identifying the source and destination, a hop count (so the network can discard packets that are looping), and other information. There are two versions of IP. IPv4 has 32-bit addresses. IPv6 has 128-bit addresses. One extra piece of information in the IP header is the 16-bit “protocol type”, which indicates which layer 4 protocol is sending the data.
Layer 4, transport layer. This is information that is put in by the source, and interpreted at the destination. The service provided by TCP (Transmission Control Protocol, RFC 793) to the layer above it consists of accepting a stream of bytes at the source, and delivering the stream of bytes to the layer above TCP at the destination, without loss or duplication. To accomplish this, TCP at the sender numbers bytes; TCP at the destination uses the sequence numbers to acknowledge receipt of data, reorder data that has arrived out of sequence, and ask for retransmission of lost data. UDP (User Datagram Protocol, RFC 768) is another layer 4 protocol that does not worry about lost or reordered data. Many processes in the layer above TCP or UDP will be reachable at the same IP address, so both UDP and TCP headers include ports (one for source, and one for destination), which tell the destination which process should receive the data.
1.8.2 TCP and UDP Ports
There are two 16-bit fields in TCP and UDP—a source port and a destination port. Typically an application on a server will be reachable at a “well-known port”, meaning that the port is specified in the protocol. If a client wants to reach that application at a server, the protocol type field in the IP header will be either TCP (6) or UDP (17) and the destination port field in the layer 4 header (TCP or UDP in this case) will be the well-known port for that application. For example, HTTP is at port 80, and HTTPS is at port 443. The source port will usually be a dynamically assigned port (49152 through 65535).
1.8.3 DNS (Domain Name System)
Another aspect of Internet networking we will be discussing is DNS. It is basically a distributed directory that maps DNS names (e.g., example.com) to IP addresses. DNS names are hierarchical. A simple way to think of DNS is that for each level in the DNS name (e.g., root, .org, .com, example.com) there is a server that keeps a directory associated with names in that level. The root would have a directory that allows looking up servers for each of the top-level domains (TLDs) (e.g., .org, .com, .gov, .tv). There are currently over a thousand TLDs, so the root would have information associated with each of those TLDs in its database. In general, to find a DNS name, a node starts at the root, finds the server that holds the directory for the next level down, and keeps going until it gets to the server that stores information about the actual name. There are several advantages to DNS being hierarchical.
Someone that wishes to purchase a DNS name has a choice of organizations from which to purchase a name. If a name is purchased from the organization managing names in the TLD .org, the purchased name will be of the form example.org. If you purchase the name example.org, you can then name anything that would be below that name in the DNS hierarchy, such as xyz.example.com or labs.xyz.example.com.
The DNS database will not become unmanageably large, because no organization needs to keep the entire DNS database. In fact, nobody knows how many names are in the DNS database.
It is fine to have the same lower level name in multiple databases. For instance, there is no problem with there being DNS names example.com and example.org.
1.8.4 HTTP and URLs
When we access things on the web, we use a protocol known as HTTP (hypertext transfer protocol). HTTP allows specifying more than a DNS name; it allows specifying a particular web page at the service with a DNS name. The URL (uniform resource locator) is the address of the web page. The URL contains a DNS name of the service, followed by additional information that is interpreted solely by the server that receives the request. The additional information might be, for instance, the directory path at the destination server that finds the information to construct the page being requested.
Sometimes humans type URLs, but usually URLs are displayed as links in a webpage that can be clicked on. It is common for people to do an Internet search (e.g., using Google or Bing) for something, and then click on choices. URLs can be very long and ugly, and people usually don’t look at the URL they click on. Often the web page that displays a link does not display the actual URL. Mousing over the link will sometimes show the human a URL. Unfortunately, the web page can choose what to display on the page as the link, and what to display the mouse-over link as. These can be different from the actual URL that will be followed if the link is clicked on. For example, on a web page, a clickable link (which is usually displayed in a different color), might display as “click here for information”, and if a suspicious user moused-over the link, it might display “http://www.example.com/information”, but if the user clicks on the link, the malicious webpage could send them to any URL, e.g., http://www.rentahitman.com.
The two main HTTP request types are GET and POST. GET is for reading a web page and POST is for sending information to a web server. The response contains information such as the content requested and status information (such as “OK” or “not found” or “unauthorized”). One status that might be included in a response is a redirect. This informs the browser that it should go to a different URL. The browser will then go to the new URL, as if the user had clicked on a link.
1.8.5 Web Cookies
If a client is browsing content that requires authentication and access control, or is accumulating information such as items in a virtual shopping basket to be purchased when the user is finished browsing the on-line catalog, the information for that session needs to be kept somewhere. But HTTP is stateless. Each request/response interaction is allowed to take place over a fresh TCP connection. The cookie mechanism enables the server to maintain context across many request/response interactions. A cookie is a piece of data sent to the client by the server in response to an HTTP request. The cookie need not be interpreted by the client. Instead, the client keeps a list of DNS names and cookies it has received from a server with that DNS name. If the client made a request at example.com, and example.com sent a cookie, the client will remember (example.com: cookie) in its cookie list. When the client next makes an HTTP request to example.com, it searches its cookie database for any cookies received from example.com, and includes those cookies in its HTTP request.
The cookie might contain all the relevant information about a user, or the server might keep a database with this information. In that case, the cookie only needs to contain the user’s identity (so the server can locate that user in its database), along with proof that the user has already authenticated to the server. For example, if Alice has authenticated to Bob, Bob could send Alice a cookie consisting of some function of the name “Alice” and a secret that only Bob knows. A cookie will be cryptographically protected by the server in various ways. It might be encrypted with a key that only the server knows. It might contain information that only allows the cookie to be used from a specific machine. And it is almost always protected when transmitted across the network because the client and server will be communicating over a secure session.