Working with Streams in Java
A Java program uses a stream to either read data items from a source or to write data items to a destination. Think of a stream as a conduit by which a sequence of bytes flows from a source to specific program code or from specific program code to a destination. That conduit can be likened to a wire on which an electrical current flows, or to a river of water on which boats and barrels float. Stream sources include files, memory buffers, network sockets, threads, and other streams. Stream destinations include the same entities as stream sources, and other entities (such as printers). When a stream of data items flows from a source, that stream is referred to as an input stream. Similarly, when a stream of data items flows to a destination, that stream is referred to as an output stream. Input and output streams are illustrated in Figure 1.
Figure 1 Data items flow from a source to specific program code over an input stream, and flow from specific program code to a destination over an output stream.
Java divides streams into input and output categories. Java also divides streams into byte-oriented and character-oriented categories. The basic unit of a byte-oriented stream is a byte and the basic unit of a character-oriented stream is a Unicode character.
All byte-oriented input streams are created from objects whose classes derive from the abstract InputStream class, and all character-oriented input streams are created from objects whose classes derive from the abstract Reader class. Those classes share several methods in common, including a close() method and a no-argument read() method. Similarly, all byte-oriented output streams are created from objects whose classes derive from the abstract OutputStream class, and all character-oriented output streams are created from objects whose classes derive from the abstract Writer class. As with the InputStream and Reader classes, OutputStream and Writer share methods in common (such as close() and flush). Each class is located in the java.io package.
NOTE
InputStream's and Reader's read() methods are designed to block (wait) for input if data is not available when either of those methods is called. InputStream declares an available() method that can be called to return an integer identifying the number of bytes that can be read without blocking. Reader has no such method.
An Inventory of Stream Classes
Java's class library includes many stream classes. Rather than attempt to itemize every last stream class, this section focuses on a representative sample: file stream classes, buffered stream classes, data stream classes, piped stream classes, and Zip stream classes.
File Stream Classes
If you need to work with files in either a sequential-access or a random-access manner, you can use the RandomAccessFile class. However, the intent of the RandomAccessFile class is for its objects to manipulate record-oriented flat-file databases. If you are interested in reading an image's bytes, reading the contents of a text file, writing some configuration information to a file, and so forth, you would not use RandomAccessFile. Instead, you would work with various file stream classes: FileInputStream, FileReader, FileOutputStream, and FileWriter. (Those classes are located in the java.io package).
TIP
Use the FileInputStream and FileOutputStream classes to read/write binary data from/to image files, sound files, video files, configuration files and so on. Also, those classes can be used to read/write ASCII-based text files. To read/write modern Unicode-based text files, use FileReader and FileWriter.
The file stream classes include constructors for creating input and output byte-oriented or character-oriented streams that are connected to files opened or created by those constructors. If an input stream constructor cannot find a file to open for input, it will throw a FileNotFoundException object. Similarly, if an output stream constructor cannot create a file (because of bad path information, or for some other reason), it will throw an IOException object.
Because of the various exceptions thrown by their constructors and methods, the file stream classes might seem difficult to use. However, if you follow a pattern similar to the usage pattern that the Copy source code in Listing 1 demonstrates, you should not have trouble.
Listing 1: Copy.java.
// Copy.java import java.io.*; class Copy { public static void main (String [] args) { if (args.length != 2) { System.out.println ("usage: java Copy srcpath dstpath"); return; } FileInputStream fis = null; FileOutputStream fos = null; try { fis = new FileInputStream (args [0]); fos = new FileOutputStream (args [1]); int byte_; while ((byte_ = fis.read ()) != -1) fos.write (byte_); } catch (FileNotFoundException e) { System.out.println ("File not found"); // Do other stuff related to that exception (if necessary). } catch (IOException e) { System.out.println ("I/O Problem: " + e.getMessage ()); // Do other stuff related to that exception (if necessary). } finally { if (fis != null) try { fis.close (); } catch (IOException e) { } if (fos != null) try { fos.close (); } catch (IOException e) { } } } }
As its name suggests, Copy is an application that copies data from one file to another. Copy copies bytes from a file identified by a source path to a file identified by a destination path. For example, to copy all bytes contained in Copy.java to Copy.bak, issue the following command line: java Copy Copy.java Copy.bak.
Notice the pattern that Copy's source code uses when working with files. First, because Copy is designed to copy byte-oriented streams instead of character-oriented streams, Copy declares a pair of FileInputStream and FileOutputStream reference variables, and initializes those variables to null. Within a Try statement, Copy attempts to create FileInputStream and FileOutputStream objects. The FileInputStream constructor throws a FileNotFoundException object if it cannot locate the source file and the FileOutputStream constructor throws an IOException object if it is given bad path information to a destination file. Assuming both constructors succeed, a While loop statement repeatedly calls FileInputStream's read() method to read the next byte, and FileOutputStream's write() method to write that byte. The read() method continues to read bytes until end-of-file is encountered. At that time, read() returns -1, and the loop ends. Regardless of whether or not an exception is thrown, the Finally clause executes last. By using If decision statements, it checks that FileInputStream and FileOutputStream objects were created. If one or both of those objects was created, the object's close() method is called to close the underlying file. Because close() throws an IOException object if the underlying file is not open, it is necessary to place close() method calls within their own Try statements. If you follow a pattern similar to what you have just read, you should not experience trouble when working with the file stream classes.
TIP
The FileOutputStream and FileWriter constructors typically erase existing files when creating files. However, it is possible to append bytes or characters to existing files by calling the FileOutputStream(String name, boolean append) and FileWriter(String name, boolean append) constructors, respectively, with true as the value of the append argument.
Buffered Stream Classes
Failing to buffer I/O operations is the leading cause of poor I/O performance. That is not surprising when you consider that disk drives efficiently read and write large aggregates of bytes but are not very efficient when it comes to reading and writing small byte aggregates. Because most of Java's stream classes do not buffer their read and write operations, stream objects are prone to poor I/O performance.
I/O performance can be radically improved by grouping individual bytes (or characters) into aggregates before performing a write operation or reading a large group of bytes (or characters) and returning those bytes (or characters) on an individual basis from a buffer. That is the goal behind Java's BufferedInputStream, BufferedReader, BufferedOutputStream, and BufferedWriter classes. (Those classes are located in the java.io package.)
BufferedInputStream and BufferedReader objects represent buffered input streams that are chained to other input streams so that bytes (or characters) can flow from those other streams into buffered input streams. The following code fragment demonstrates that input stream chaining.
FileInputStream fis = new FileInputStream (pathname); BufferedInputStream bis = new BufferedInputStream (fis); System.out.println (bis.read ());
The code fragment creates a FileInputStream object and chains, to that object, a BufferedInputStream object, by passing the FileInputStream object's reference to the BufferedInputStream constructor. The resulting BufferedInputStream object's reference assigns to bis. When bis.read() is called, that read() method checks an internal buffer (associated with the BufferedInputStream object assigned to bis) for at least one byte that can be returned. If a byte exists in that buffer, bis.read() immediately returns. Otherwise, bis.read() internally calls fis.read(byte [] buffer, int offset, int length) to read a large chunk of bytes into the bis object's internal buffer. As long as bis.read() does not have to call fis.read(byte [] buffer, int offset, int length), performance is fast. When bis.read() must call fis.read(byte [] buffer, int offset, int length), performance slows down somewhat, because fis.read(byte [] buffer, int offset, int length) must access the disk drive. However, reading a large chunk of bytes via the fis.read(byte [] buffer, int offset, int length) method call is faster than performing many individual no-argument fis.read() method calls. Therefore, a bis.read() method call is considerably faster than calls to fis.read().
NOTE
To be fair, many platforms buffer data that is to be read from or written to a file. Therefore, the file stream classes do have some sort of buffering at their disposal. However, not all devices that support Java will buffer data at a platform level. Therefore, it is not a good idea to rely on such support. Instead, you should get into the habit of writing code that relies on the buffered stream classes.
BufferedOutputStream and BufferedWriter objects represent buffered output streams that are chained to other output streams so that bytes (or characters) can flow from buffered output streams to those other streams. The following code fragment demonstrates that output stream chaining.
FileOutputStream fos = new FileOutputStream (pathname); BufferedOutputStream bos = new BufferedOutputStream (fos); bos.write ('A');
The code fragment creates a FileOutputStream object and chains, to that object, a BufferedOutputStream object, by passing the FileOutputStream object's reference to the BufferedOutputStream constructor. The resulting BufferedOutputStream object's reference assigns to bos. When bos.write ('A'); executes, that method call appends 'A' to the contents of an internal buffer (associated with the BufferedOutputStream object assigned to bos). After that buffer fills, bos.write() calls fos.write() to write the entire buffer to the disk. Because fewer (but larger) writes are made to a disk, performance improves.
The Copy application in Listing 1 was not as efficient as it could have been. By adding support for buffering, Copy can become faster. Listing 2 introduces a BufferedCopy application that uses the BufferedInputStream and BufferedOutputStream classes to support buffering.
Listing 2: BufferedCopy.java.
// BufferedCopy.java import java.io.*; class BufferedCopy { public static void main (String [] args) { if (args.length != 2) { System.out.println ("usage: java BufferedCopy srcpath dstpath"); return; } BufferedInputStream bis = null; BufferedOutputStream bos = null; try { FileInputStream fis = new FileInputStream (args [0]); bis = new BufferedInputStream (fis); FileOutputStream fos = new FileOutputStream (args [1]); bos = new BufferedOutputStream (fos); int byte_; while ((byte_ = bis.read ()) != -1) bos.write (byte_); } catch (FileNotFoundException e) { System.out.println ("File not found"); // Do other stuff related to that exception (if necessary). } catch (IOException e) { System.out.println ("I/O Problem: " + e.getMessage ()); // Do other stuff related to that exception (if necessary). } finally { if (bis != null) try { bis.close (); } catch (IOException e) { } if (bos != null) try { bos.close (); } catch (IOException e) { } } } }
There is one interesting item to note about BufferedCopy's source code: bis.close() and bos.close() appear instead of fis.close() and fos.close(). All of the stream classes thus far presented contain close() methods. When you chain a buffered stream to a file stream, you might not know which close() method to call. The answer, as demonstrated by BufferedCopy, is to call the close() method on the stream that chains itself to another stream. In BufferedCopy, those methods are bis.close() and bos.close().
NOTE
The BufferedInputStream and BufferedReader classes support the capabilities of marking a particular point in a stream and coming back to that point to reread a sequence of bytes (or characters). Those capabilities manifest by way of the mark() and reset() methods. Use mark() to "remember" a point in the input stream and reset() to cause all bytes that have been read since the most recent mark operation to be reread, before new bytes are read from the stream to which the buffered input stream is chained.
Because the mark() and reset() methods are declared in InputStream and Reader, you might think every class supports those methods. However, that is not the case. Although BufferedInputStream and BufferedReader support mark() and reset(), many other input streams do not. Before calling those methods, find out if an input stream supports mark() and reset(), by calling the markSupported() method. If an input stream supports the mark() and reset() methods, markSupported() returns true.
Data Stream Classes
A problem with the FileInputStream and FileOutputStream classes is that they only work at the byte level. What do you do when you need to read integers, write floating-point values, and read or write some other non-byte value from/to a file? The answer is to use Java's DataInputStream and DataOutputStream classes (located in the java.io package portion of Java's standard class library).
As with the buffered stream classes, the data stream classes are designed so that their objects can be chained to other streams. However, you can only chain data stream objects to byte-oriented streams. For example, you can chain a data input stream to a FileInputStream object and call the data input stream's methods to read integer, floating-point, and other data items, but you cannot directly chain a data input stream object to a FileReader object.
For a glimpse of using DataOutputStream and DataInputStream to write and read non-byte-oriented data items to and from underlying FileOutputStream and FileInputStream objects, examine the DOSDISDemo source code in Listing 3.
Listing 3: DOSDISDemo.java.
// DOSDISDemo.java import java.io.*; class DOSDISDemo { public static void main (String [] args) { DataOutputStream dos = null; try { FileOutputStream fos = new FileOutputStream ("data.dat"); dos = new DataOutputStream (fos); dos.writeInt (256); dos.writeDouble (Math.PI); dos.writeUTF ("Java"); } catch (IOException e) { System.out.println (e.getMessage ()); return; } finally { if (dos != null) try { dos.close (); } catch (IOException e) { } } DataInputStream dis = null; try { FileInputStream fis = new FileInputStream ("data.dat"); dis = new DataInputStream (fis); System.out.println (dis.readInt ()); System.out.println (dis.readDouble ()); System.out.println (dis.readUTF ()); } catch (IOException e) { System.out.println (e.getMessage ()); return; } finally { if (dis != null) try { dis.close (); } catch (IOException e) { } } } }
DOSDISDemo introduces the UTF concept, by way of its writeUTF() and readUTF() method calls. UTF stands for Unicode Text Format and it is an encoding format used for efficiently storing and retrieving text characters. According to the format used by Java, which is a slight variant of UTF-8:
All characters whose Unicode values range from \u0001 to \u007f are represented by a single byte, with the most significant bit set to 0.
The null character Unicode value (\u0000) and all characters whose Unicode values range from \u0080 to \u07ff are represented by two bytes, with the most significant three bits of the most significant byte being 1, 1, and 0 (in a left-to-right order), and the most significant two bits of the least significant byte being 1 and 0 (in a left-to-right order).
All characters whose Unicode values range from \u0800 to \uffff are represented by three bytes, with the most significant four bits of the most significant byte being 1, 1, 1 and 0 (in a left-to-right order) and the most significant two bits of each of the remaining two bytes being 1 and 0 (in a left-to-right order).
When run, DOSDISDemo produces the following output:
256 3.141592653589793 Java
NOTE
Objects created from either the buffered stream or the data stream classes are known as filter streams. That name derives from their use in filtering bytes (or characters) that flow into a buffered input stream or filtering bytes that flow into a data input stream. Furthermore, that name derives from their use in filtering bytes (or characters) that flow out of the buffered output stream or filtering bytes that flow out of the data output stream. In addition to buffered and data stream classes, Java's standard class library includes other classes that are used to perform filtering operations.
Piped Stream Classes
Threads are often required to communicate. A technique that is often used by threads wishing to communicate involves piped streams.
The idea behind piped streams is to connect a piped output stream to a piped input stream. Then, one thread writes data to the piped output stream and another thread reads that data by way of the piped input stream. Although there are no synchronization problems with piped streams, those streams have limited sizes. As a result, a writing thread could write more output to a piped output stream than that stream can accommodate, and the excess output would be lost. To prevent that from happening, the reading thread must be responsive. To support piped streams, Java supplies the PipedInputStream, PipedReader, PipedOutputStream, and PipedWriter classes in its standard class library. (Those classes are located in the java.io package.)
CAUTION
Deadlock might occur if a single thread uses a piped output stream connected to a piped input stream, and performs both writing and reading operations on that stream.
Creating a piped input stream connected to a piped output stream is not difficult, as the following code fragment attests:
PipedWriter pw = new PipedWriter (); PipedReader pr = new PipedReader (pw);
The code fragment first creates a piped output stream (as represented by the PipedWriter object) and then creates a piped input stream (as represented by a PipedReader object) that binds itself to the piped output stream. When that's done, a writing thread can call pw.write() to output data to the piped output stream, whereas a reading thread can call pr.read() to read that output over its piped input stream.
Listing 4 presents source code to a PipedThreads application that demonstrates one thread piping output to another thread, via piped streams.
Listing 4: PipedThreads.java.
// PipedThreads.java import java.io.*; class MyThread extends Thread { private PipedReader pr; private PipedWriter pw; MyThread (String name, PipedReader pr, PipedWriter pw) { super (name); this.pr = pr; this.pw = pw; } public void run () { try { if (getName ().equals ("src")) { for (int i = 0; i < 15; i++) pw.write ("src " + " A" + i + "\n"); // src writes pw.close (); } else { int item; while ((item = pr.read ()) != -1) System.out.print ((char) item); // dst reads pr.close (); } } catch (IOException e) { } } } class PipedThreads { public static void main (String [] args) throws IOException { PipedWriter pw = new PipedWriter (); PipedReader pr = new PipedReader (pw); MyThread mt1 = new MyThread ("src", pr, pw); MyThread mt2 = new MyThread ("dst", pr, pw); mt1.start (); try { Thread.sleep (2000); } catch (InterruptedException e) { } mt2.start (); } }
When you run PipedThreads, you will see the following output:
src A0 src A1 src A2 src A3 src A4 src A5 src A6 src A7 src A8 src A9 src A10 src A11 src A12 src A13 src A14
TIP
For an additional example of piped streams, check out How to Use Pipe Streams in the Essential Java Classes trail of Sun's online Java Tutorial (http://java.sun.com/docs/books/tutorial/essential/io/pipedstreams.html).
Zip Stream Classes
Did you know that Java makes it easy to read and write Zip files? Zip support manifests itself in the standard class library by way of the ZipInputStream and ZipOutputStream filter stream classes, and other classes that (along with ZipInputStream and ZipOutputStream) are part of the java.util.zip package. By using those classes, it is possible to create a command-line version of the popular WinZip utility.
To give you a taste for working with Zip stream classes, Listing 5 presents source code to a ZipReader application. That application uses ZipInputStream to retrieve all entries in a Zip file. For each entry, that entry's name prints.
Listing 5: ZipReader.java.
// ZipReader.java import java.io.*; import java.util.zip.*; class ZipReader { public static void main (String [] args) { if (args.length != 1) { System.out.println ("usage: java ZipReader pathname"); return; } ZipInputStream zis = null; try { FileInputStream fis = new FileInputStream (args [0]); zis = new ZipInputStream (fis); ZipEntry ze; while ((ze = zis.getNextEntry ()) != null) System.out.println (ze.getName ()); } catch (IOException e) { System.out.println (e.getMessage ()); } finally { try { zis.close (); } catch (IOException e) { } } } }
To run ZipReader, you need access to either a Zip file or a Jar file (which is basically a Zip file with a .jar extension). For example, assuming the SDK's tools.jar file is placed in the same directory as ZipReader.class, issue java ZipReader tools.jar to obtain a list of all packages and classes contained in that Jar file.
TIP
For another example of Zip file extraction, check out Sun's Unpacking Zip Files TechTip (http://developer.java.sun.com/developer/TechTips/1998/tt0421.html).