Networking in Java
- Everything You Need To Know about TCP/IP but Failed to Learn in Kindergarten
- A Client Socket in Java
- Sending Email by Java
- A Server Socket in Java
- HTTP and Web Browsing: Retrieving HTTP Pages
- How to Make an Applet Write a File on the Server
- A Multithreaded HTTP Server
- A Mapped I/O HTTP Server
- Further Reading
- Exercises
- Some Light Relief—Using Java to Stuff an Online Poll
"If a packet hits a pocket on a socket on a port, and the bus is interrupted and the interrupt's not caught, then the socket packet pocket has an error to report."
Programmer's traditional nursery rhyme
Everything You Need To Know about TCP/IP but Failed to Learn in Kindergarten
Networking at heart is all about shifting bits from point A to point B. We bundle the data bits into a packet, and add some more bits to say where they are to go. That, in a nutshell, is the Internet Protocol or IP. If we want to send more bits than will fit into a single packet, we can divide the bits into groups and send them in several successive packets. The units that we send are called "User Datagrams" or "Packets." Packets is the more common term these days.
User Datagrams can be sent across the Internet using the User Datagram Protocol (UDP), which relies on the Internet Protocol for addressing and routing. UDP is like going to the post office, sticking on a stamp, and dropping off the packet. IP is what the mail carrier does to route and deliver the packet. Two common applications that use the UDP are: SNMP, the Simple Network Management Protocol, and TFTP, the Trivial File Transfer Protocol. See Figure 171.
Figure 171 IP and UDP (datagram sockets).
When we send several pieces of postal mail to the same address, the packages might arrive in any order. Some of them might even be delayed, or even on occasion lost altogether. This is true for UDP too; you wave goodbye to the bits as they leave your workstation, and you have no idea when they will arrive where you sent them, or even if they did.
Uncertain delivery is equally undesirable for postal mail and for network bit streams. We deal with the problem in the postal mail world (when the importance warrants the cost) by paying an extra fee to register the mail and have the mail carrier collect and bring back a signature acknowledging delivery. A similar protocol is used in the network work to guarantee reliable delivery in the order in which the packets were sent. This protocol is known as Transmission Control Protocol or "TCP." Two applications that run on top of, or use, TCP are: FTP, the File Transfer Protocol, and Telnet.
What Is Your IP Address?
On Unix workstations, you can run the "ifconfig" (interface configuration) program to find out your IP address.
On WIndows 9x, you can run WinIPCfg to get the same information. Type this in a command tool:
c:\> winipcfg
It will popup a window that lists the host name, IP address, subnet mask, gateway, and even the MAC address of your network card.
The MAC (Media Access Control) address is the address on the network interface card burned in at manufacturing time. It is not used in TCP/IP because, unlike IP addresses, it lacks a hierarchy. To route packets using MAC addresses, each router would need a list of every MAC address in the world.
TCP uses IP as its underlying protocol (just as UDP does) for routing and delivering the bits to the correct address. The "correct address" means the IP address; every computer on the Internet has an IP address.However, TCP is more like a phone call than a registered mail delivery in that it supports an end-to-end connection for the duration of the transmission session. It takes a while to set up this stream connection, and it costs more to assure reliable sequenced delivery, but the cost is usually justified. See Figure 172.
Figure 172 TCP/IP (stream sockets).
The access device at each endpoint of a phone conversation is a telephone. The access object at each endpoint of a TCP/IP session is a socket. Sockets started life as a way for two processes on the same Unix system to talk to each other, but some smart programmers realized that they could be generalized into connection endpoints between processes on different machines connected by a TCP/IP network. Today, every operating system has adopted IP and sockets.
IP can deliver the following via socket connections:
Slower reliable delivery using TCP (this is termed a stream socket)
Faster but unguaranteed delivery using UDP (this is a datagram socket)
Fast raw bits using ICMP (Internet Control Message Protocol) datagrams. They are not delivered at all, but ask the remote end to do something or respond in some way.
ICMP is a low-level protocol for message control and error reporting. It uses IP packets, but its messages are directed at the IP software itself and don't come through to the application layer. Java doesn't support ICMP and we won't say anything more about it.
Socket connections have a client end and a server end, and they differ in what you can do with them. Generally, the server end just keeps listening for incoming requests (an "operators are standing by" kind of thing). The client end initiates a connection, and then passes or requests information from the server.
Note that the number of socket writes is not at all synchronized with the number or timing of socket reads. A packet may be broken into smaller packets as it is sent across the network, so your code should never assume that a read will get the same number of bytes that were just written into the socket.
The most widely used version of IP today is Internet Protocol Version 4 (IPv4). However, IP Version 6 (IPv6 or IPng) is also beginning to enter the market. IPv6 uses 128 bit addresses, not 32 bit, and so allows many more Internet users. IPv6 is fully backward compatible with (can process packets sent using) IPv4, but it will take a long time before IPv4 is displaced by v6. IPv4 is supported with hardware-based routing at wire speed on 2.5Gb links. IPv6 currently uses software routing.
An IPv4 feature called "Network Address Translation" (NAT) has reduced the pressure to move to v6. A few years ago, it looked like we were going to run
out of IP addresses. Today NAT lets your big site have just one assigned address, which you use for the computer with the internet connection. You use any IP address you like for the computers on your side of the firewall. You may be duplicating numbers that someone else uses behind their firewall, but the two systems don't interfere with each other. When you access the internet, NATS translates your internal IP address into the externally visible one, and vice versa for incoming packets. From outside, it looks like all your traffic is coming from your computer that runs NATS.
Looking at a Packet Traveling over the Net
Packets are moved along by routers, which are special-purpose computers that connect networks. Every IP packet that leaves your system goes to a nearby router which will move the packet to another router closer to the destination. This transfer continues until finally the packet is brought to a router that is directly connected to the subnet serving the destination computer.
Routers maintain large configuration tables of what addresses are served by what routers, what the priorities are, and what rules they should use for security and load balancing. These tables can be updated dynamically as the network runs.
Windows has a program that lets you trace a packet's movement between routers. Here's the output from a sample run, tracing the route between my PC and java.sun.com. Unix has a similar program, called "traceroute."
c:\> tracert java.sun.com Tracing route to java.sun.com [192.18.97.71]over a maximum of 30 hops: 1 93 ms 95 ms 95 ms sdn-ar-008carcor001t.dialsprint.net [63.128.147.130] 2 94 ms 100 ms 100 ms sdn-hr-008carcor001t.dialsprint.net [63.128.147.129] 3 99 ms 100 ms 95 ms sdn-pnc1-stk-4-1.dialsprint.net [207.153.212.49] ... and so on to ... 12 164 ms 170 ms 160 ms sun-1.border3.den.pnap.net [216.52.42.42] 13 166 ms 160 ms 161 ms java.sun.com [192.18.97.71] Trace complete.
This shows that it takes 13 "hops" for packets to travel from my PC to Sun's Java website. The program sends three test packets and notes the round trip time in milliseconds to reach each successive router. It works by sending out packets with brief time limits, and gradually increasing it until the first router gets it, and then the next, and so on. As each router replies, objecting to the timed-out packet, traceroute can figure out the hop time for each step. Traceroute is good for determining network connectivity.
Here it tells us that overall packets travel from me to Java-HQ in under a fifth of a second.
There! Now you know everything you need to use the Java networking fea_tures.
What's in the Networking Library?
If you browse the network library API, you'll find the following classes (there are a few other classes, but these are the key ones):
Socket |
This is the client Socket class. It lets you open a connection to another machine, anywhere on the Internet (that you have permission). |
ServerSocket |
This is the server Socket class. ServerSocket lets an application accept TCP connections from other systems and exchange I/O with them. |
URL |
The class represents a Uniform Resource Locatora reference to an object on the web. You can create a URL reference with this class. |
URLConnection |
You can open a URL and retrieve the contents, or write to it, using this class. |
HttpURLConnection |
The class extends URLConnection and supports functions specific to HTTP, like get, post, put, head, trace, and options. |
URLEncoder/URLDecoder |
These two classes have static methods to allow you to convert a String to and from MIME x-www-form-urlencoded form. This is convenient for posting data to servlets or CGI scripts. |
The class DatagramSocket supports the use of UDP packets. We don't deal with UDP here because it is much less widely used than TCP. Most people want the reliability feature that TCP offers. Ironically, the widespread use of subnets using directly connected switches (instead of shared ethernet) has made UDP much more reliable, to the point where people are using it on LANs instead of TCP, and getting performance and reliability.
Let me try that last sentence again. When we started extensive networking in the late 1970s, ethernet was the medium of choice. You strung a single ethernet cable down a corridor and workstations physically attached to the net by tapping into the cable. That meant that all the network traffic was visible to all the workstations that used that cable. It was electronically noisy and slow. Today, nearly everyone uses 10baseT or 100baseT wiring. The number is the speed in Megabits, and the "T" part means "Twisted pair." There is a twisted pair wire from your workstation directly to the switch that controls your subnet. No other workstation shares your twisted pair wiring. Result: faster performance, less electronic noise, and more reliable subnets, leading to greater confidence using UDP.
TCP/IP Client/Server Model
Before we look at actual Java code, a diagram is in order showing how a client and server typically communicate over a TCP/IP network connection. Figure 173 shows the way the processes contact each other is by knowing the IP address (which identifies a unique computer on the Internet) and a port number (which is a simple software convention the OS maintains, allowing an incoming network connection to be directed to a specific process).
Figure 173 Client and server communication using a TCP/IP connection.
What Is a Socket?
A socket is defined as "an IP address plus a port on that computer."
An IP address is like a telephone number, and a port number is like an extension at that number. Together they specify a unique destination. As a matter of fact, a socket is defined as an IP address and a port number.
The client and server must agree on the same port number. The port numbers under 1024 are reserved for system software use and on Unix can only be accessed by the superuser.
For simplicity, network socket connections are made to look like I/O streams. You simply read and write data using the usual stream methods (all socket communication is in 8-bit bytes), and it automagically appears at the other end. Unlike a stream, a socket supports two-way communication. There is a method to get the input stream of a socket, and another method to get the output stream. This allows the client and server to talk back and forth.
Almost all Internet programs work as client/server pairs. The server is on a host system somewhere in cyberspace, and the client is a program running on your local system. When the client wants an Internet service (such as retrieving a web page from an HTTP server), it issues a request, usually to a symbolic address such as http://www.sun.com rather than to an IP address (though that works, too).
There will be a Domain Name Server locally (usually one per subnet, per campus, or per company) that resolves the symbolic name into an Internet address.
The bits forming the request are assembled into a datagram and routed to the server. The server reads the incoming packets, notes what the request is, where it came from, and then tries to respond to it by providing either the service (web page, shell account, file contents, etc.) or a sensible error message. The response is sent back across the Internet to the client.
All the standard Internet utilities (telnet, rdist, FTP, ping, rcp, NFS, and so on) operate in client/server mode connected by a TCP or UDP socket. Programs that send mail don't really know how to send mailthey just know how to take it to the Post Office. In this case, mail has a socket connection and talks to a demon at the other end with a fairly simple protocol. The standard mail demon knows how to accept text and addresses from clients and transmit it for delivery. If you can talk to the mail demon, you can send mail. There is little else to it.
Many of the Internet services are actually quite simple. But often considerable frustration comes in doing the socket programming in C and in learning the correct protocol. The socket programming API presented to C is quite low-level and all too easy to screw up. Needless to say, errors are poorly handled and diagnosed. As a result, many programmers naturally conclude that sockets are brittle and hard to use. Sockets aren't hard to use. The C socket API is hard to use.
The C code to establish a socket connection is:
int set_up_socket(u_short port) { char myname[MAXHOSTNAME+1]; Horrid C Sockets int s; struct sockaddr_in sa; struct hostent *he; bzero(&sa,sizeof(struct sockaddr_in)); /* clear the address */ gethostname(myname,MAXHOSTNAME); /* establish identity */ he= gethostbyname(myname); /* get our address */ if (he == NULL) /* if addr not found... */ return(-1); sa.sin_family= he->h_addrtype; /* host address */ sa.sin_port= htons(port); /* port number */ if ((s= socket(AF_INET,SOCK_STREAM,0)) <0) /* finally, create socket */ return(-1); if (bind(s, &sa, sizeof(sa), 0) < 0) { close(s); return(-1); /* bind address to socket */ } listen(s, 3); /* max queued connections */ return(s); }
By way of contrast, the equivalent Java code is:
ServerSocket servsock = new ServerSocket(port, 3);
That's it! Just one line of Java code to do all the things the C code does.
Java handles all that socket complexity "under the covers" for you. It doesn't expose the full range of socket possibilities, so Java avoids the novice socketeer choosing contradictory options. On the other hand, a few recondite sockety things cannot be done in Java. You cannot create a raw socket in Java, and hence cannot write a ping program that relies on raw sockets (you can do something just as good though). The benefit is overwhelming: You can open sockets and start writing to another system just as easily as you open a file and start writing to hard disk.
A "ping program," in case you're wondering, is a program that sends ICMP control packets over to another machine anywhere on the Internet. This action is called "pinging" the remote system, rather like the sonar in a ship "pings" for submarines or schools of fish. The control packets aren't passed up to the application layer, but tell the TCP/IP library at the remote end to send back a reply. The reply lets the pinger calculate how quickly data can pass between the two systems.
The Story of Ping
If you want to know how quickly your packets can reach a system, use ping.
c:\> ping java.sun.com Pinging java.sun.com [192.18.97.71] with 32 bytes of data: Reply from 192.18.97.71: bytes=32 time=163ms TTL=241 Ping statistics for 192.18.97.71: Packets: Sent = 4, Received = 4, Lost = 0 (0% loss), Approximate round trip times in milli-seconds: Minimum = 160ms, Maximum = 169ms, Average = 163ms
This confirms that the time for a packet to hustle over from Mountain View to Cupertino is about 0.16 seconds on this particular day and time. "TTL" is "Time to Live." To prevent infinite loops, each router hop decrements this field in a packet, and if it reaches zero, the packet just expires where it is.
The most used methods in the API for the client end of a socket are:
public class Socket extends Object { public Socket(); public Socket(String,int) throws UnknownHostException, java.io.IOException; public Socket(InetAddress,int) throws java.io.IOException; public java.nio.channels.SocketChannel getChannel(); public InputStream getInputStream() throws IOException; public OutputStream getOutputStream() throws IOException; public synchronized void setSoTimeout(int) throws SocketException; public synchronized void close() throws IOException; public boolean isConnected(); public boolean isBound(); public boolean isClosed(); public boolean isInputShutdown(); public boolean isOutputShutdown(); public boolean shutdownOutput() throws IOException; public boolean shutdownInput() throws IOException; public static void setSocketImplFactory( SocketImplFactory fac); }
The constructor with no arguments creates an unconnected socket which you can later bind() to a host and port you specify. After binding, you will connect() it. It's easier just to do all this by specifying these arguments in the constructor, if you know them at that point.
The setSoTimeout(int ms) will set a timeout on the socket of ms milliseconds. When this is a non-zero amount, a read call on the input stream will block for only this amount of time. Then it will break out of it by throwing a java.net.SocketTimeoutException, but leaving the socket still valid for further use.
The setSocketFactory() method is a hook for those sites that want to provide their own implementation of sockets, usually to deal with firewall or proxy issues. If this is done, it will be done on a site-wide basis, and individual programmers won't have to worry about it.
The socket API has one or two dozen other get/set methods for TCP socket options. Most of the time you don't need these and can ignore them.