- What Are URIs, URLs, and URNs?
- Working with URIs
- Working with URLs
- Review
Working with URLs
The Network API makes it possible to work with URLs at the source code level by providing class URL (located in package java.net). Each URL object encapsulates a resource's identifier with a protocol handler. As the previous tip indicates, one way to obtain a URL object is to call a URI object's toURL() method. However, that option is not always convenient. (Why should you have to create a URI object whenever you need a URL object?) Instead, you can call a URL constructor to create a URL object. You can also call URL methods to extract URL components, open an input stream to read from the resource, obtain a reference to an object that makes it possible to retrieve the resource's data in a convenient fashion, compare the URLs in two URL objects, and obtain a connection object to the resource. The connection object allows code to learn more about (and write to) the resource.
A close look at class URL reveals six constructors. The simplest constructor is URL(String url). That constructor takes a URL as a String argument, parses the URL into its components, and stores those components in a new URL object. As with the other five constructors, URL(String url) throws a java.net.MalformedURLException object if either the URL contains no protocol component or the URL's protocol is unknown.
The following code fragment demonstrates using URL(String url) to create a URL object. That object encapsulates a simple URL's components with an http protocol handler.
URL url = new URL ("http://www.informit.com");
Once you have a URL object, you can extract various components by calling the methods getAuthority(), getDefaultPort(), getFile(), getHost(), getPath(), getPort(), getProtocol(), getQuery(), getRef(), and getUserInfo(). The getDefaultPort() method returns the default port that the URL object's protocol handler uses (to locate the resource) when a port is not specified as part of the URL. The getFile() method returns a combination of the path and query components. The getProtocol() method returns the name of the protocol (such as http, mailto, ftp, and so on) that determines the type of connection to the resource. The getRef() method returns the fragment (also known as a reference or an anchor) portion of the URL. Finally, the getUserInfo() method returns the user information portion of the authority component. As with URI's component extraction methods, URL's extraction methods return null or -1 if their components do not exist (although getDefaultPort() returns -1 if a default port has not been assigned to the URL object's protocol handler).
In addition to the component-extraction methods, you can call the openStream() method to retrieve a java.io.InputStream reference. Using that reference, you can read from the resource in a byte-oriented fashion.
Listing 4 presents source code to URLDemo1. That program creates a URL object from a command-line argument, calls URL's component-extraction methods to retrieve the URL's components, calls URL's openStream() method to open a connection to the resource (via the protocol handler) and return an InputStream reference for reading bytes from that resource, reads/prints those bytes, and closes the input stream.
Listing 4: URLDemo1.java
// URLDemo1.java import java.io.*; import java.net.*; class URLDemo1 { public static void main (String [] args) throws IOException { if (args.length != 1) { System.err.println ("usage: java URLDemo1 url"); return; } URL url = new URL (args [0]); System.out.println ("Authority = " + url.getAuthority ()); System.out.println ("Default port = " + url.getDefaultPort ()); System.out.println ("File = " + url.getFile ()); System.out.println ("Host = " + url.getHost ()); System.out.println ("Path = " + url.getPath ()); System.out.println ("Port = " + url.getPort ()); System.out.println ("Protocol = " + url.getProtocol ()); System.out.println ("Query = " + url.getQuery ()); System.out.println ("Ref = " + url.getRef ()); System.out.println ("User Info = " + url.getUserInfo ()); System.out.print ('\n'); InputStream is = url.openStream (); int ch; while ((ch = is.read ()) != -1) System.out.print ((char) ch); is.close (); } }
URLDemo1 produces the following (though slightly modified) output from java URLDemo1 http://www.javajeff.com/articles/articles/html:
Authority = http://www.javajeff.com Default port = 80 File = /articles/articles.html Host = http://www.javajeff.com Path = /articles/articles.html Port = -1 Protocol = http Query = null Ref = null User Info = null <html> <head> <title> Java Jeff - Articles </title> <meta http-equiv=Content-Type content="text/html; charset=ISO-8859-1"> <meta name=author content="Jeff Friesen"> <meta name=keywords content="java, virtual machine"> <script language=JavaScript> if (navigator.appName == "Netscape") document.write ("<br>"); </script> </head> <body bgcolor=#000000> <center> <table border=1 cellpadding=5 cellspacing=0> <tr> <td> <table cellpadding=0 cellspacing=0> <tr> <td> <a href=informit/informit.html> <img alt=InformIT border=0 src=informit.gif></a> </td> </tr> </table> </td> <td align=middle> <img src=title.gif><br> <a href=../welcome/welcome.html> <img alt="Welcome to Java Jeff!" border=0 src=jupiter.jpg> </a><br> <img src=../common/clear_dot.gif vspace=5><br> <a href=../ads/ads.html> <img alt="Welcome to Java Jeff!" border=0 src=jupiter.jpg> </td> <td> <table cellpadding=0 cellspacing=0> <tr> <td> <a href=javaworld/javaworld.html> <img alt=JavaWorld border=0 src=javaworld.gif></a> </td> </tr> </table> </td> </tr> </table> </center> <br> <font color=#ffffff> <center> Best viewed at a resolution of 1024x768 or higher.<br> <img src=../common/clear_dot.gif vspace=5><br> <i> Copyright © 2001-2002, Jeff Friesen. All rights reserved. </i> <p> <a href=../index.html> <img alt=Back border=0 src=../common/back.gif></a> </center> </font> </body> </html>
Among other things, the output identifies 80 as the default port and HTTP as the protocol, and gives the HTML for one of the WWW pages (the resource) that comprise my Web site.
URL's openStream() method always returns a reference to an object created from a concrete subclass of the abstract InputStream class. That implies that you read resource data as a byte sequence, and this is appropriate because you do not know what kind of data is being read. If you know ahead of time that the data is textual, with each line ending with a newline (\n) character, you can read the data as a sequence of lines instead of 1 byte at a time.
The following code fragment demonstrates wrapping the InputStream subclass object in a java.io.InputStreamReader object to bridge from 8-bit bytes to 16-bit characters, wrapping the resulting object in a java.io.BufferedReader object to access BufferedReader's readLine() method, and calling method readLine() to read entire lines of text from the resource.
InputStream is = url.openStream (); BufferedReader br = new BufferedReader (new InputStreamReader (is)); String line; while ((line = br.readLine ()) != null) System.out.println (line); is.close ();
Sometimes reading data as a sequence of bytes is not convenient. For example, if the resource is a JPEG file, it is more natural to obtain an image producer and register a consumer with that producer to consume the data. It is then a simple matter to display the image once the image is completely consumed. For that to happen, it is necessary to use URL's getContent() method.
When called, getContent() returns an Object reference to an object whose methods (after casting to the proper type) can be called to retrieve the data in a more convenient fashion. Before calling that method, however, you should use instanceof to verify the object's type, to prevent class cast exceptions.
For JPEG resources, getContent() returns an object whose class implements the java.awt.Image.ImageProducer interface. The following code fragment demonstrates using instanceof to verify the object is an ImageProducer and making a cast. ImageProducer methods (although not shown) can subsequently be called to register a consumer and initiate the process of consuming the image.
URL url = new URL (args [0]); Object o = url.getContent (); if (o instanceof ImageProducer) { ImageProducer ip = (ImageProducer) o; // ... }
TIP
Call URL's equals(Object o) and sameFile(Object o) methods to determine whether two URLs are equal. The first method includes the fragment in the comparison, whereas the second method ignores the fragment. Consult the SDK documentation for more information on those methods.
Study the getContent() method's source code, and you will discover openConnection().getContent(). Furthermore, study the openStream() method's source code and you will discover openConnection().getInputStream(). Each method first makes a call to URL's openConnection() method. That method returns a reference to an object created from a subclass of the abstract java.net.URLConnection class that describes a connection to some resource. URLConnection's methods reveal resource and connection details, and make it possible for code to write to the resource.
Listing 5's URLDemo2 source code demonstrates openConnection() and calls to some of URLConnection's methods.
Listing 5: URLDemo2.java
// URLDemo2.java import java.io.*; import java.net.*; import java.util.*; class URLDemo2 { public static void main (String [] args) throws IOException { if (args.length != 1) { System.err.println ("usage: java URLDemo2 url"); return; } URL url = new URL (args [0]); // Return a reference to a new protocol-specific object // that represents a connection to a resource. URLConnection uc = url.openConnection (); // Make the connection. uc.connect (); // Print out the contents of various header fields. Map m = uc.getHeaderFields (); Iterator i = m.entrySet ().iterator (); while (i.hasNext ()) System.out.println (i.next ()); // Find out if resource input and output operations are // allowed. System.out.println ("Input allowed = " + uc.getDoInput ()); System.out.println ("Output allowed = " + uc.getDoOutput ()); } }
After the call to openConnection() returns, a call is made to the connect() methodto establish a resource connection. (Although the openConnection() method returns a reference to a connection object, openConnection() does not connect to a resource.) The call to URLConnection's getHeaderFields() method returns a reference to an object whose class implements the java.util.Map interface. That map contains a collection of header names and values. What are headers? Headers are text-based name/value pairs that identify the type of resource data, the length of that data, and so forth.
After you compile URLDemo2, type the command line java URLDemo2 http://www.javajeff.com. You see the following output:
Date=[Sun, 17 Feb 2002 17:49:32 GMT] Connection=[Keep-Alive] Content-Type=[text/html; charset=iso-8859-1] Accept-Ranges=[bytes] Content-Length=[7214] null=[HTTP/1.1 200 OK] ETag=["4470e-1c2e-3bf29d5a"] Keep-Alive=[timeout=15, max=100] Server=[Apache/1.3.19 (Unix) Debian/GNU] Last-Modified=[Wed, 14 Nov 2001 16:35:38 GMT] Input allowed = true Output allowed = false
The output identifies a variety of headers (including Date, null, Content-Length, Server, Last-Modified, and so on) and their values. The output also shows that only reading from the resource is allowed.
Have you ever wondered how a program is capable of identifying resource data? Look closely at the preceding output, and you'll come across something called Content-Type. Content-Type is a header that identifies the resource data (content) type as text/html. The text portion is known as the type, and the html portion is known as the subtype. (If the content was ordinary text, Content-Type would probably contain text/plain as its value. The content is still text but is now plain.) The Content-Type header is part of something known as Multipurpose Internet Mail Extensions (MIME).
MIME is an extension to the traditional 7-bit ASCII standard for transmitting messages. By introducing various headers, MIME makes it possible to incorporate audio, video, still images, text from different character sets, and so on into 7-bit ASCII text. Along with Content-Type, MIME identifies Content-Length and other standard headers. As you work with the URLConnection class, you will encounter the methods getContentType() and getContentLength(). Those methods return the values of Content-Type and Content-Length headers. To learn more about MIME, I encourage you to read the RFC document identified earlier in this article.
You've probably heard of HTML formsthe <form>, </form>, and other HTML tags. Forms make it possible to GET data from a resource and POST the data from HTML form fields to a resource for subsequent processing. You can simulate an HTML form getting or posting data by using the URLConnection class and MIME. Here is how you accomplish that task.
Suppose that you want to POST form data to a server program. Posting requires manipulation of the form data. First, the form data must be organized into name/value pairs. Second, each pair must be specified in a name=value format. Third, if multiple name/value pairs are being sent, each pair must be separated from other pairs by using an ampersand (&) character. Finally, the contents of name and the contents of value must be encoded using the application/x-www-form-urlencoded MIME type. For example, x=y&a=b represents two name/value pairsx/y and a/b.
To assist with the encoding, Java supplies a java.net.URLEncoder class that declares a pair of static encode() methods. Each method takes a String argument and returns a reference to a String object that contains the encoded contents of the argument. For example, if encode() discovers a space character in the argument, it replaces that space with a plus sign character in the result.
The following code fragment demonstrates a call to URLEncoder's encode(String s) method, to encode a, the space, and b in the "a b" literal string. a+b is stored in a new String object, which is referenced by result.
String result = URLEncoder.encode ("a b");
In addition to preparing form data, the URLConnection object must be told that data is being posted because URLConnection defaults to getting data. To accomplish that task, you first cast openConnection()'s return value to an HttpURLConnection type, after ensuring that the return value is of that type. Then you call the resulting object's setRequestMethod(String method) method with POST as the value of the object referenced by the method argument.
Another task that must be accomplished is to call URLConnection's setDoOutput(boolean doOutput) method with a true argument value. That task is necessary because the URLConnection object defaults to not supporting output. (The program can then ultimately make a call to URLConnection's getOutputStream() method, to return a reference to the resource's output stream for sending form data.)
To put the aforementioned tasks (and a few other not-mentioned tasks) into perspective, Listing 6's URLDemo3 source code demonstrates posting form data to a resource that "understands" the application/x-www-form-urlencoded content type.
Listing 6: URLDemo3.java
// URLDemo3.java import java.io.*; import java.net.*; class URLDemo3 { public static void main (String [] args) throws IOException { // Check for at least two arguments and also for an even number // of arguments. if (args.length < 2 || args.length % 2 != 0) { System.err.println ("usage: java URLDemo3 name value " + "[name value ...]"); return; } // Create a URL object that lets the program connect to a server // program resource, that echoes back a form's name/value pairs. URL url; url = new URL ("http://banshee.cs.uow.edu.au:2000/~nabg/echo.cgi"); // Return a reference to a protocol-specific object that // represents a connection to the http resource. URLConnection uc = url.openConnection (); // Validate the type of connection. Must be HttpURLConnection. if (!(uc instanceof HttpURLConnection)) { System.err.println ("Wrong connection type"); return; } // Indicate that the program must output name/value pairs to the // server program resource. uc.setDoOutput (true); // Indicate that only "live" information can be returned. uc.setUseCaches (false); // Set the Content-Type header to indicate the form MIME type that // specifies URL encoded data. uc.setRequestProperty ("Content-Type", "application/x-www-form-urlencoded"); // Build the name/value pairs content to send to the server. String content = buildContent (args); // Set the Content-Type header to indicate the form MIME type // that specifies URL encoded data. uc.setRequestProperty ("Content-Length", "" + content.length ()); // Extract appropriate type of connection. HttpURLConnection hc = (HttpURLConnection) uc; // Set the HTTP request method to POST. (Default is GET.) hc.setRequestMethod ("POST"); // Output the content. OutputStream os = uc.getOutputStream (); DataOutputStream dos = new DataOutputStream (os); dos.writeBytes (content); dos.flush (); dos.close (); // Input and display the result from the server program. InputStream is = uc.getInputStream (); int ch; while ((ch = is.read ()) != -1) System.out.print ((char) ch); is.close (); } static String buildContent (String [] args) { StringBuffer sb = new StringBuffer (); for (int i = 0; i < args.length; i++) { // Encode each argument for proper transmission. String encodedItem = URLEncoder.encode (args [i]); sb.append (encodedItem); if (i % 2 == 0) sb.append ("="); // Separate name from value. else sb.append ("&"); // Separate name/value pairs. } // Remove final & separator. sb.setLength (sb.length () - 1); return sb.toString (); } }
You might be wondering why URLDemo3 does not call URLConnection's connect() method. That method is not explicitly called because other URLConnection methods (such as getContentLength()) implicitly call connect() if the connection to the resource has not been established. Once the connection is made, however, it is illegal to call methods such as setDoOutput(boolean doOutput). Those methods throw IllegalStateException objects after connect() has been (explicitly or implicitly) called.
After you compile URLDemo3, type the command line java URLDemo3 name1 value1 name2 value2 name3 value3. You see the following output:
<html> <head> <title>Echoing your name value pairs</title> </head> <body> <ol> <li>name1 : value1 <li>name2 : value2 <li>name3 : value3 </ol> <hr> Mon Feb 18 08:58:45 2002 </body> </html>
The server program resource's output consists of HTML that echoes back name1, value1, name2, value2, name3, and value3.
TIP
If you need a string representation of a URL object's URL, call either toExternalForm() or toString(). Both methods are equivalent.