20.5 A Quick Tour of the Stream Classes
The java.io package defines several types of streams. The stream types usually have input/output pairs, and most have both byte stream and character stream variants. Some of these streams define general behavioral properties. For example:
- Filter streams are abstract classes representing streams with some filtering operation applied as data is read or written by another stream. For example, a FilterReader object gets input from another Reader object, processes (filters) the characters in some manner, and returns the filtered result. You build sequences of filtered streams by chaining various filters into one large filter. Output can be filtered similarly (Section 20.5.2).
- Buffered streams add buffering so that read and write need not, for example, access the file system for every invocation. The character variants of these streams also add the notion of line-oriented text (Section 20.5.3).
- Piped streams are pairs such that, say, characters written to a PipedWriter can be read from a PipedReader (Section 20.5.4).
A group of streams, called in-memory streams, allow you to use in-memory data structures as the source or destination for a stream:
- ByteArray streams use a byte array (Section 20.5.5).
- CharArray streams use a char array (Section 20.5.6).
- String streams use string types (Section 20.5.7).
The I/O package also has input and output streams that have no output or input counterpart:
- The Print streams provide print and println methods for formatting printed data in human-readable text form (Section 20.5.8).
- LineNumberReader is a buffered reader that tracks the line numbers of the input (characters only) (Section 20.5.9).
- SequenceInputStream converts a sequence of InputStream objects into a single InputStream so that a list of concatenated input streams can be treated as a single input stream (bytes only) (Section 20.5.10).
There are also streams that are useful for building parsers:
- Pushback streams add a pushback buffer you can use to put back data when you have read too far (Section 20.5.11).
- The StreamTokenizer class breaks a Reader into a stream of tokens—recognizable "words"— that are often needed when parsing user input (characters only) (Section 20.5.12).
These classes can be extended to create new kinds of stream classes for specific applications.
Each of these stream types is described in the following sections. Before looking at these streams in detail, however, you need to learn something about the synchronization behavior of the different streams.
20.5.1 Synchronization and Concurrency
Both the byte streams and the characters streams define synchronization policies though they do this in different ways. The concurrent behavior of the stream classes is not fully specified but can be broadly described as follows.
Each byte stream class synchronizes on the current stream object when performing operations that must be free from interference. This allows multiple threads to use the same streams yet still get well-defined behavior when invoking individual stream methods. For example, if two threads each try to read data from a stream in chunks of n bytes, then the data returned by each read operation will contain up to n bytes that appeared consecutively in the stream. Similarly, if two threads are writing to the same stream then the bytes written in each write operation will be sent consecutively to the stream, not intermixed at random points.
The character streams use a different synchronization strategy from the byte streams. The character streams synchronize on a protected lock field which, by default, is a reference to the stream object itself. However, both Reader and Writer provide a protected constructor that takes an object for lock to refer to. Some subclasses set the lock field to refer to a different object. For example, the StringWriter class that writes its character into a StringBuffer object sets its lock object to be the StringBuffer object. If you are writing a reader or writer, you should set the lock field to an appropriate object if this is not appropriate. Conversely, if you are extending an existing reader or writer you should always synchronize on lock and not this.
In many cases, a particular stream object simply wraps another stream instance and delegates the main stream methods to that instance, forming a chain of connected streams, as is the case with Filter streams. In this case, the synchronization behavior of the method will depend on the ultimate stream object being wrapped. This will only become an issue if the wrapping class needs to perform some additional action that must occur atomically with respect to the main stream action. In most cases filter streams simply manipulate data before writing it to, or after reading it from, the wrapped stream, so synchronization is not an issue.
Most input operations will block until data is available, and it is also possible that output stream operations can block trying to write data—the ultimate source or destination could be a stream tied to a network socket. To make the threads performing this blocking I/O more responsive to cancellation requests an implementation may respond to Thread interrupt requests (see page 365) by unblocking the thread and throwing an InterruptedIOException. This exception can report the number of bytes transferred before the interruption occurred—if the code that throws it sets the value.
For single byte transfers, interrupting an I/O operation is quite straight-forward. In general, however, the state of a stream after a thread using it is interrupted is problematic. For example, suppose you use a particular stream to read HTTP requests across the network. If a thread reading the next request is interrupted after reading two bytes of the header field in the request packet, the next thread reading from that stream will get invalid data unless the stream takes steps to prevent this. Given the effort involved in writing classes that can deal effectively with these sorts of situations, most implementations do not allow a thread to be interrupted until the main I/O operation has completed, so you cannot rely on blocking I/O being interruptible. The interruptible channels provided in the java.nio package support interruption by closing the stream when any thread using the stream is interrupted—this ensures that there are no issues about what would next be read.
Even when interruption cannot be responded to during an I/O operation many systems will check for interruption at the start and/or end of the operation and throw the InterruptedIOException then. Also, if a thread is blocked on a stream when the stream is closed by another thread, most implementations will unblock the blocked thread and throw an IOException.
20.5.2 Filter Streams
Filter streams—FilterInputStream, FilterOutputStream, FilterReader, and FilterWriter—help you chain streams to produce composite streams of greater utility. Each filter stream is bound to another stream to which it delegates the actual input or output actions. Filter streams get their power from the ability to filter—process—what they read or write, transforming the data in some way.
Filter byte streams add new constructors that accept a stream of the appropriate type (input or output) to which to connect. Filter character streams similarly add a new constructor that accepts a character stream of the appropriate type (reader or writer). However, many character streams already have constructors that take another character stream, so those Reader and Writer classes can act as filters even if they do not extend FilterReader or FilterWriter.
The following shows an input filter that converts characters to uppercase:
public class UppercaseConvertor extends FilterReader { public UppercaseConvertor(Reader in) { super(in); } public int read() throws IOException { int c = super.read(); return (c == -1 ? c : Character.toUpperCase((char)c)); } public int read(char[] buf, int offset, int count) throws IOException { int nread = super.read(buf, offset, count); int last = offset + nread; for (int i = offset; i < last; i++) buf[i] = Character.toUpperCase(buf[i]); return nread; } }
We override each of the read methods to perform the actual read and then convert the characters to upper case. The actual reading is done by invoking an appropriate superclass method. Note that we don't invoke read on the stream in itself—this would bypass any filtering performed by our superclass. Note also that we have to watch for the end of the stream. In the case of the no-arg read this means an explicit test, but in the array version of read, a return value of –1 will prevent the for loop from executing. In the array version of read we also have to be careful to convert to uppercase only those characters that we stored in the buffer.
We can use our uppercase convertor as follows:
public static void main(String[] args) throws IOException { StringReader src = new StringReader(args[0]); FilterReader f = new UppercaseConvertor(src); int c; while ((c = f.read()) != -1) System.out.print((char)c); System.out.println(); }
We use a string as our data source by using a StringReader (see Section 20.5.7 on page 523). The StringReader is then wrapped by our UppercaseConvertor. Reading from the filtered stream converts all the characters from the string stream into uppercase. For the input "nolowercase" we get the output:
NO LOWERCASE
You can chain any number of Filter byte or character streams. The original source of input can be a stream that is not a Filter stream. You can use an InputStreamReader to convert a byte input stream to a character input stream.
Filter output streams can be chained similarly so that data written to one stream will filter and write data to the next output stream. All the streams, from the first to the next-to-last, must be Filter output stream objects, but the last stream can be any kind of output stream. You can use an OutputStreamWriter to convert a character output stream to a byte output stream.
Not all classes that are Filter streams actually alter the data. Some classes are behavioral filters, such as the buffered streams you'll learn about next, while others provide a new interface for using the streams, such as the print streams. These classes are Filter streams because they can form part of a filter chain.
Exercise 20.2 : Rewrite the TranslateByte class as a filter.
Exercise 20.3 : Create a pair of Filter stream classes that encrypt bytes using any algorithm you choose—such as XORing the bytes with some value—with your DecryptInputStream able to decrypt the bytes that your EncryptOutputStream class creates.
Exercise 20.4 : Create a subclass of FilterReader that will return one line of input at a time via a method that blocks until a full line of input is available.
20.5.3 Buffered Streams
The Buffered stream classes—BufferedInputStream, BufferedOutputStream, BufferedReader, and BufferedWriter—buffer their data to avoid every read or write going directly to the next stream. These classes are often used in conjunction with File streams—accessing a disk file is much slower than using a memory buffer, and buffering helps reduce file accesses.
Each of the Buffered streams supports two constructors: One takes a reference to the wrapped stream and the size of the buffer to use, while the other only takes a reference to the wrapped stream and uses a default buffer size.
When read is invoked on an empty Buffered input stream, it invokes read on its source stream, fills the buffer with as much data as is available—only blocking if it needs the data being waited for—and returns the requested data from that buffer. Future read invocations return data from that buffer until its contents are exhausted, and that causes another read on the source stream. This process continues until the source stream is exhausted.
Buffered output streams behave similarly. When a write fills the buffer, the destination stream's write is invoked to empty the buffer. This buffering can turn many small write requests on the Buffered stream into a single write request on the underlying destination.
Here is how to create a buffered output stream to write bytes to a file:
new BufferedOutputStream(new FileOutputStream(path));
You create a FileOutputStream with the path, put a BufferedOutputStream in front of it, and use the buffered stream object. This scheme enables you to buffer output destined for the file.
You must retain a reference to the FileOutputStream object if you want to invoke methods on it later because there is no way to obtain the downstream object from a Filter stream. However, you should rarely need to work with the downstream object. If you do keep a reference to a downstream object, you must ensure that the first upstream object is flushed before operating on the downstream object because data written to upper streams may not have yet been written all the way downstream. Closing an upstream object also closes all downstream objects, so a retained reference may cease to be usable.
The Buffered character streams also understand lines of text. The newLine method of BufferedWriter writes a line separator to the stream. Each system defines what constitutes a line separator in the system String property line.separator, which need not be a single character. You should use newLine to end lines in text files that may be read by humans on the local system (see "System Properties" on page 663).
The method readLine in BufferedReader returns a line of text as a String. The method readLine accepts any of the standard set of line separators: line feed (\n), carriage return (\r), or carriage return followed by line feed (\r\n). This implies that you should never set line.separator to use any other sequence. Otherwise, lines terminated by newLine would not be recognized by readLine. The string returned by readLine does not include the line separator. If the end of stream is encountered before a line separator, then the text read to that point is returned. If only the end of stream is encountered readLine returns null.
20.5.4 Piped Streams
Piped streams—PipedInputStream, PipedOutputStream, PipedReader, and PipedWriter—are used as input/output pairs; data written on the output stream of a pair is the data read on the input stream. The pipe maintains an internal buffer with an implementation-defined capacity that allows writing and reading to proceed at different rates—there is no way to control the size of the buffer.
Pipes provide an I/O-based mechanism for communicating data between different threads. The only safe way to use Piped streams is with two threads: one for reading and one for writing. Writing on one end of the pipe blocks the thread when the pipe fills up. If the writer and reader are the same thread, that thread will block permanently. Reading from a pipe blocks the thread if no input is available.
To avoid blocking a thread forever when its counterpart at the other end of the pipe terminates, each pipe keeps track of the identity of the most recent reader and writer threads. The pipe checks to see that the thread at the other end is alive before blocking the current thread. If the thread at the other end has terminated, the current thread will get an IOException.
The following example uses a pipe stream to connect a TextGenerator thread with a thread that wants to read the generated text. First, the text generator:
class TextGenerator extends Thread { private Writer out; public TextGenerator(Writer out) { this.out = out; } public void run() { try { try { for (char c = 'a'; c <= 'z'; c++) out.write(c); } finally { out.close(); } } catch (IOException e) { getUncaughtExceptionHandler(). uncaughtException(this, e); } } }
The TextGenerator simply writes to the output stream passed to its constructor. In the example that stream will actually be a piped stream to be read by the main thread:
class Pipe { public static void main(String[] args) throws IOException { PipedWriter out = new PipedWriter(); PipedReader in = new PipedReader(out); TextGenerator data = new TextGenerator(out); data.start(); int ch; while ((ch = in.read()) != -1) System.out.print((char) ch); System.out.println(); } }
We create the Piped streams, making the PipedWriter a parameter to the constructor for the PipedReader. The order is unimportant: The input pipe could be a parameter to the output pipe. What is important is that an input/output pair be attached to each other. We create the new TextGenerator object, with the PipedWriter as the output stream for the generated characters. Then we loop, reading characters from the text generator and writing them to the system output stream. At the end, we make sure that the last line of output is terminated.
Piped streams need not be connected when they are constructed—there is a no-arg constructor—but can be connected at a later stage via the connect method. PipedReader.connect takes a PipedWriter parameter and vice versa. As with the constructor, it does not matter whether you connect x to y, or y to x, the result is the same. Trying to use a Piped stream before it is connected or trying to connect it when it is already connected results in an IOException.
20.5.5 ByteArray Byte Streams
You can use arrays of bytes as the source or destination of byte streams by using ByteArray streams. The ByteArrayInputStream class uses a byte array as its input source, and reading on it can never block. It has two constructors:
-
public
ByteArrayInputStream(byte[] buf, int offset, int count)
- Creates a ByteArrayInputStream from the specified array of bytes using only the part of buf from buf[offset] to buf[offset+count-1] or the end of the array, whichever is smaller. The input array is used directly, not copied, so you should take care not to modify it while it is being used as an input source.
-
public
ByteArrayInputStream(byte[] buf)
- Equivalent to ByteArrayInputStream(buf,0, buf.length).
The ByteArrayOutputStream class provides a dynamically growing byte array to hold output. It adds constructors and methods:
-
public
ByteArrayOutputStream()
- Creates a new ByteArrayOutputStream with a default initial array size.
-
public
ByteArrayOutputStream(int size)
- Creates a new ByteArrayOutputStream with the given initial array size.
-
public int
size()
- Returns the number of bytes generated thus far by output to the stream.
-
public byte[]
toByteArray()
- Returns a copy of the bytes generated thus far by output to the stream. When you are finished writing into a ByteArrayOutputStream via upstream filter streams, you should flush the upstream objects before using toByteArray.
-
public void
reset()
- Resets the stream to reuse the current buffer, discarding its contents.
-
public String
toString()
- Returns the current contents of the buffer as a String, translating bytes into characters according to the default character encoding.
-
public String
toString(String enc)
throws UnsupportedEncodingException
- Returns the current contents of the buffer as a String, translating bytes into characters according to the specified character encoding. If the encoding is not supported an UnsupportedEncodingException is thrown.
-
public void
writeTo(OutputStream out)
throws IOException
- Writes the current contents of the buffer to the stream out.
20.5.6 CharArray Character Streams
The CharArray character streams are analogous to the ByteArray byte streams—they let you use char arrays as a source or destination without ever blocking. You construct CharArrayReader objects with an array of char:
-
public
CharArrayReader(char[] buf, int offset, int count)
- Creates a CharArrayReader from the specified array of characters using only the subarray of buf from buf[offset] to buf[offset+count-1] or the end of the array, whichever is smaller. The input array is used directly, not copied, so you should take care not to modify it while it is being used as an input source.
-
public
CharArrayReader(char[] buf)
- Equivalent to CharArrayReader(buf,0, buf.length).
The CharArrayWriter class provides a dynamically growing char array to hold output. It adds constructors and methods:
-
public
CharArrayWriter()
- Creates a new CharArrayWriter with a default initial array size.
-
public
CharArrayWriter(int size)
- Creates a new CharArrayWriter with the given initial array size.
-
public int
size()
- Returns the number of characters generated thus far by output to the stream.
-
public char[]
toCharArray()
- Returns a copy of the characters generated thus far by output to the stream. When you are finished writing into a CharArrayWriter via upstream filter streams, you should flush the upstream objects before using toCharArray.
-
public void
reset()
- Resets the stream to reuse the current buffer, discarding its contents.
-
public String
toString()
- Returns the current contents of the buffer as a String.
-
public void
writeTo(Writer out)
throws IOException
- Writes the current contents of the buffer to the stream out.
20.5.7 String Character Streams
The StringReader reads its characters from a String and will never block. It provides a single constructor that takes the string from which to read. For example, the following program factors numbers read either from the command line or System.in:
class Factor { public static void main(String[] args) { if (args.length == 0) { factorNumbers(new InputStreamReader(System.in)); } else { for (String str : args) { StringReader in = new StringReader(str); factorNumbers(in); } } } // ... definition of factorNumbers ... }
If the command is invoked without parameters, factorNumbers parses numbers from the standard input stream. When the command line contains some arguments, a StringReader is created for each parameter, and factorNumbers is invoked on each one. The parameter to factorNumbers is a stream of characters containing numbers to be parsed; it does not know whether they come from the command line or from standard input.
StringWriter lets you write results into a buffer that can be retrieved as a String or StringBuffer object. It adds the following constructors and methods:
-
public
StringWriter()
- Creates a new StringWriter with a default initial buffer size.
-
public
StringWriter(int size)
- Creates a new StringWriter with the specified initial buffer size. Providing a good initial size estimate for the buffer will improve performance in many cases.
-
public StringBuffer
getBuffer()
- Returns the actual StringBuffer being used by this stream. Because the actual StringBuffer is returned, you should take care not to modify it while it is being used as an output destination.
-
public String
toString()
- Returns the current contents of the buffer as a String.
The following code uses a StringWriter to create a string that contains the output of a series of println calls on the contents of an array:
public static String arrayToStr(Object[] objs) { StringWriter strOut = new StringWriter(); PrintWriter out = new PrintWriter(strOut); for (int i = 0; i < objs.length; i++) out.println(i + ": " + objs[i]); return strOut.toString(); }
20.5.8 Print Streams
The Print streams—PrintStream and PrintWriter—provide methods that make it easy to write the values of primitive types and objects to a stream, in a human-readable text format—as you have seen in many examples. The Print streams provide print and println methods for the following types:
char |
int |
float |
Object |
boolean |
char[] |
long |
double |
String |
These methods are much more convenient than the raw stream write methods. For example, given a float variable f and a PrintStream reference out, the call out.print(f) is equivalent to
out.write(String.valueOf(f).getBytes());
The println method appends a line separator after writing its argument to the stream—a simple println with no parameters ends the current line. The line separator string is defined by the system property line.separator and is not necessarily a single newline character (\n).
Each of the Print streams acts as a Filter stream, so you can filter data on its way downstream.
The PrintStream class acts on byte streams while the PrintWriter class acts on character streams. Because printing is clearly character-related output, the PrintWriter class is the class you should use. However, for historical reasons System.out and System.err are PrintStreams that use the default character set encoding—these are the only PrintStream objects you should use. We describe only the PrintWriter class, though PrintStream provides essentially the same interface.
PrintWriter has eight constructors.
-
public
PrintWriter(Writer out, boolean autoflush)
- Creates a new PrintWriter that will write to the stream out. If autoflush is true, println invokes flush. Otherwise, println invocations are treated like any other method, and flush is not invoked. Autoflush behavior cannot be changed after the stream is constructed.
-
public
PrintWriter(Writer out)
- Equivalent to PrintWriter(out,false) .
-
public
PrintWriter(OutputStream out, boolean autoflush)
- Equivalent to PrintWriter(new OutputStreamWriter(out), autoflush).
-
public
PrintWriter(OutputStream out)
- Equivalent to PrintWriter(newOutputStreamWriter(out), false).
-
public
PrintWriter(File file)
throws FileNotFoundException
- Equivalent to PrintWriter(newOutputStreamWriter(fos)) , where fos is a FileOutputStream created with the given file.
-
public
PrintWriter(File file, String enc)
throws FileNotFoundException, UnsupportedEncodingException
- Equivalent to PrintWriter(newOutputStreamWriter(fos, enc)), where fos is a FileOutputStream created with the given file.
-
public
PrintWriter(String filename)
throws FileNotFoundException
- Equivalent to PrintWriter(newOutputStreamWriter(fos)) , where fos is a FileOutputStream created with the given file name.
-
public
PrintWriter(String filename, String enc)
throws FileNotFoundException, UnsupportedEncodingException
- Equivalent to PrintWriter(newOutputStreamWriter(fos, enc)), where fos is a FileOutputStream created with the given file name.
The Print streams implement the Appendable interface which allows them to be targets for a Formatter. Additionally, the following convenience methods are provided for formatted output—see "Formatter" on page 624 for details:
-
public PrintWriter
format(String format, Object... args)
- Acts like new Formatter(this).format(format,args) , but a new Formatter need not be created for each call. The current PrintWriter is returned.
-
public PrintWriter
format(Locale l, String format, Object... args)
- Acts like new Formatter(this, l).format(format, args), but a new Formatter need not be created for each call. The current PrintWriter is returned. Locales are described in Chapter 24.
There are two printf methods that behave exactly the same as the format methods—printf stands for "print formatted" and is an old friend from the C programming language.
One important characteristic of the Print streams is that none of the output methods throw IOException. If an error occurs while writing to the underlying stream the methods simply return normally. You should check whether an error occurred by invoking the boolean method checkError—this flushes the stream and checks its error state. Once an error has occurred, there is no way to clear it. If any of the underlying stream operations result in an InterruptedIOException, the error state is not set, but instead the current thread is re-interrupted using Thread.currentThread().interrupt().
20.5.9 LineNumberReader
The LineNumberReader stream keeps track of line numbers while reading text. As usual a line is considered to be terminated by any one of a line feed (\n), a carriage return (\r), or a carriage return followed immediately by a linefeed (\r\n).
The following program prints the line number where the first instance of a particular character is found in a file:
import java.io.*; class FindChar { public static void main(String[] args) throws IOException { if (args.length != 2) throw new IllegalArgumentException( "need char and file"); int match = args[0].charAt(0); FileReader fileIn = new FileReader(args[1]); LineNumberReader in = new LineNumberReader(fileIn); int ch; while ((ch = in.read()) != -1) { if (ch == match) { System.out.println("'" + (char)ch + "' at line " + in.getLineNumber()); return; } } System.out.println((char)match + " not found"); } }
This program creates a FileReader named fileIn to read from the named file and then inserts a LineNumberReader, named in, before it. LineNumberReader objects get their characters from the reader they are attached to, keeping track of line numbers as they read. The getLineNumber method returns the current line number; by default, lines are counted starting from zero. When this program is run on itself looking for the letter 'I', its output is
'I' at line 4
You can set the current line number with setLineNumber. This could be useful, for example, if you have a file that contains several sections of information. You could use setLineNumber to reset the line number to 1 at the start of each section so that problems would be reported to the user based on the line numbers within the section instead of within the file.
LineNumberReader is a BufferedReader that has two constructors: One takes a reference to the wrapped stream and the size of the buffer to use, while the other only takes a reference to the wrapped stream and uses a default buffer size.
Exercise 20.5 : Write a program that reads a specified file and searches for a specified word, printing each line number and line in which the word is found.
20.5.10 SequenceInputStream
The SequenceInputStream class creates a single input stream from reading one or more byte input streams, reading the first stream until its end of input and then reading the next one, and so on through the last one. SequenceInputStream has two constructors: one for the common case of two input streams that are provided as the two parameters to the constructor, and the other for an arbitrary number of input streams using the Enumeration abstraction (described in "Enumeration" on page 617). Enumeration is an interface that provides an ordered iteration through a list of objects. For SequenceInputStream, the enumeration should contain only InputStream objects. If it contains anything else, a ClassCastException will be thrown when the SequenceInputStream tries to get that object from the list.
The following example program concatenates all its input to create a single output. This program is similar to a simple version of the UNIX utility cat—if no files are named, the input is simply forwarded to the output. Otherwise, the program opens all the files and uses a SequenceInputStream to model them as a single stream. Then the program writes its input to its output:
import java.io.*; import java.util.*; class Concat { public static void main(String[] args) throws IOException { InputStream in; // stream to read characters from if (args.length == 0) { in = System.in; } else { InputStream fileIn, bufIn; List<InputStream> inputs = new ArrayList<InputStream>(args.length); for (String arg : args) { fileIn = new FileInputStream(arg); bufIn = new BufferedInputStream(fileIn); inputs.add(bufIn); } Enumeration<InputStream> files = Collections.enumeration(inputs); in = new SequenceInputStream(files); } int ch; while ((ch = in.read()) != -1) System.out.write(ch); } // ... }
If there are no parameters, we use System.in for input. If there are parameters, we create an ArrayList large enough to hold as many BufferedInputStream objects as there are command-line arguments (see "ArrayList" on page 582). Then we create a stream for each named file and add the stream to the inputs list. When the loop is finished, we use the Collections class's enumeration method to get an Enumeration object for the list elements. We use this Enumeration in the constructor for SequenceInputStream to create a single stream that concatenates all the streams for the files into a single InputStream object. A simple loop then reads all the bytes from that stream and writes them on System.out.
You could instead write your own implementation of Enumeration whose nextElement method creates a FileInputStream for each argument on demand, closing the previous stream, if any.
20.5.11 Pushback Streams
A Pushback stream lets you push back, or "unread," characters or bytes when you have read too far. Pushback is typically useful for breaking input into tokens. Lexical scanners, for example, often know that a token (such as an identifier) has ended only when they have read the first character that follows it. Having seen that character, the scanner must push it back onto the input stream so that it is available as the start of the next token. The following example uses PushbackInputStream to report the longest consecutive sequence of any single byte in its input:
import java.io.*; class SequenceCount { public static void main(String[] args) throws IOException { PushbackInputStream in = new PushbackInputStream(System.in); int max = 0; // longest sequence found int maxB = -1; // the byte in that sequence int b; // current byte in input do { int cnt; int b1 = in.read(); // 1st byte in sequence for (cnt = 1; (b = in.read()) == b1; cnt++) continue; if (cnt > max) { max = cnt; // remember length maxB = b1; // remember which byte value } in.unread(b); // pushback start of next seq } while (b != -1); // until we hit end of input System.out.println(max + " bytes of " + maxB); } }
We know that we have reached the end of one sequence only when we read the first byte of the next sequence. We push this byte back using unread so that it is read again when we repeat the do loop for the next sequence.
Both PushbackInputStream and PushbackReader support two constructors: One takes a reference to the wrapped stream and the size of the pushback buffer to create, while the other only takes a reference to the wrapped stream and uses a pushback buffer with space for one piece of data (byte or char as appropriate). Attempting to push back more than the specified amount of data will cause an IOException.
Each Pushback stream has three variants of unread, matching the variants of read. We illustrate the character version of PushbackReader, but the byte equivalents for PushbackInputStream have the same behavior:
-
public void
unread(int c)
throws IOException
- Pushes back the single character c. If there is insufficient room in the pushback buffer an IOException is thrown.
-
public void
unread(char[] buf, int offset, int count)
throws IOException
- Pushes back the characters in the specified subarray. The first character pushed back is buf[offset] and the last is buf[offset+count-1]. The subarray is prepended to the front of the pushback buffer, such that the next character to be read will be that at buf[offset], then buf[offset+1], and so on. If the pushback buffer is full an IOException is thrown.
-
public void
unread(char[] buf)
throws IOException
- Equivalent to unread(buf,0, buf.length).
For example, after two consecutive unread calls on a PushbackReader with the characters '1' and '2', the next two characters read will be '2' and '1' because '2' was pushed back second. Each unread call sets its own list of characters by prepending to the buffer, so the code
pbr.unread(new char[] {'1', '2'}); pbr.unread(new char[] {'3', '4'}); for (int i = 0; i < 4; i++) System.out.println(i + ": " + (char)pbr.read());
produces the following lines of output:
0: 3 1: 4 2: 1 3: 2
Data from the last unread (the one with '3' and '4') is read back first, and within that unread the data comes from the beginning of the array through to the end. When that data is exhausted, the data from the first unread is returned in the same order. The unread method copies data into the pushback buffer, so changes made to an array after it is used with unread do not affect future calls to read.
20.5.12 StreamTokenizer
Tokenizing input text is a common application, and the java.io package provides a StreamTokenizer class for simple tokenization. A more general facility for scanning and converting input text is provided by the java.util.Scanner class—see "Scanner" on page 641.
You can tokenize a stream by creating a StreamTokenizer with a Reader object as its source and then setting parameters for the scan. A scanner loop invokes nextToken, which returns the token type of the next token in the stream. Some token types have associated values that are found in fields in the StreamTokenizer object.
This class is designed primarily to parse programming language-style input; it is not a general tokenizer. However, many configuration files look similar enough to programming languages that they can be parsed by this tokenizer. When designing a new configuration file or other data, you can save work if you make it look enough like a language to be parsed with StreamTokenizer.
When nextToken recognizes a token, it returns the token type as its value and also sets the ttype field to the same value. There are four token types:
- TT_WORD: A word was scanned. The String field sval contains the word that was found.
- TT_NUMBER: A number was scanned. The double field nval contains the value of the number. Only decimal floating-point numbers (with or without a decimal point) are recognized. The tokenizer does not understand 3.4e79 as a floating-point number, nor 0xffff as a hexadecimal number.
- TT_EOL: An end-of-line was found.
- TT_EOF: The end-of-file was reached.
The input text is assumed to consist of bytes in the range \u0000 to \u00FF—Unicode characters outside this range are not handled correctly. Input is composed of both special and ordinary characters. Special characters are those that the tokenizer treats specially—namely whitespace, characters that make up numbers, characters that make up words, and so on. Any other character is considered ordinary. When an ordinary character is the next in the input, its token type is itself. For example, if the character '¿' is encountered in the input and is not special, the token return type (and the ttype field) is the int value of the character '¿'.
As one example, let's look at a method that sums the numeric values in a character stream it is given:
static double sumStream(Reader source) throws IOException { StreamTokenizer in = new StreamTokenizer(source); double result = 0.0; while (in.nextToken() != StreamTokenizer.TT_EOF) { if (in.ttype == StreamTokenizer.TT_NUMBER) result += in.nval; } return result; }
We create a StreamTokenizer object from the reader and then loop, reading tokens from the stream, adding all the numbers found into the burgeoning result. When we get to the end of the input, we return the final sum.
Here is another example that reads an input source, looking for attributes of the form name=value, and stores them as attributes in AttributedImpl objects, described in "Implementing Interfaces" on page 127:
public static Attributed readAttrs(Reader source) throws IOException { StreamTokenizer in = new StreamTokenizer(source); AttributedImpl attrs = new AttributedImpl(); Attr attr = null; in.commentChar('#'); // '#' is ignore-to-end comment in.ordinaryChar('/'); // was original comment char while (in.nextToken() != StreamTokenizer.TT_EOF) { if (in.ttype == StreamTokenizer.TT_WORD) { if (attr != null) { attr.setValue(in.sval); attr = null; // used this one up } else { attr = new Attr(in.sval); attrs.add(attr); } } else if (in.ttype == '=') { if (attr == null) throw new IOException("misplaced '='"); } else { if (attr == null) // expected a word throw new IOException("bad Attr name"); attr.setValue(new Double(in.nval)); attr = null; } } return attrs; }
The attribute file uses '#' to mark comments. Ignoring these comments, the stream is searched for a string token followed by an optional '=' followed by a word or number. Each such attribute is put into an Attr object, which is added to a set of attributes in an AttributedImpl object. When the file has been parsed, the set of attributes is returned.
Setting the comment character to '#' sets its character class. The tokenizer recognizes several character classes that are set by the following methods:
-
public void
wordChars(int low, int hi)
- Characters in this range are word characters: They can be part of a TT_WORD token. You can invoke this several times with different ranges. A word consists of one or more characters inside any of the legal ranges.
-
public void
whitespaceChars(int low, int hi)
- Characters in this range are whitespace. Whitespace is ignored, except to separate tokens such as two consecutive words. As with the wordChars range, you can make several invocations, and the union of the invocations is the set of whitespace characters.
-
public void
ordinaryChars(int low, int hi)
- Characters in this range are ordinary. An ordinary character is returned as itself, not as a token. This removes any special significance the characters may have had as comment characters, delimiters, word components, whitespace, or number characters. In the above example, we used ordinaryChar to remove the special comment significance of the '/' character.
-
public void
ordinaryChar(int ch)
- Equivalent to ordinaryChars(ch,ch) .
-
public void
commentChar(int ch)
- The character ch starts a single-line comment—characters after ch up to the next end-of-line are treated as one run of whitespace.
-
public void
quoteChar(int ch)
- Matching pairs of the character ch delimit String constants. When a String constant is recognized, the character ch is returned as the token, and the field sval contains the body of the string with surrounding ch characters removed. When string constants are read, some of the standard \ processing is applied (for example, \t can be in the string). The string processing in StreamTokenizer is a subset of the language's strings. In particular, you cannot use \u xxxx , \', \", or (unfortunately) \Q, where Q is the quote character ch. You can have more than one quote character at a time on a stream, but strings must start and end with the same quote character. In other words, a string that starts with one quote character ends when the next instance of that same quote character is found. If a different quote character is found in between, it is simply part of the string.
-
public void
parseNumbers()
- Specifies that numbers should be parsed as double-precision floating-point numbers. When a number is found, the stream returns a type of TT_NUMBER, leaving the value in nval. There is no way to turn off just this feature—to turn it off you must either invoke ordinaryChars for all the number-related characters (don't forget the decimal point and minus sign) or invoke resetSyntax.
-
public void
resetSyntax()
- Resets the syntax table so that all characters are ordinary. If you do this and then start reading the stream, nextToken always returns the next character in the stream, just as when you invoke InputStream.read.
There are no methods to ask the character class of a given character or to add new classes of characters. Here are the default settings for a newly created StreamTokenizer object:
wordChars('a', 'z'); // lower case ASCII letters wordChars('A', 'Z'); // upper case ASCII letters wordChars(128 + 32, 255); // "high" non-ASCII values whitespaceChars(0, ' '); // ASCII control codes commentChar('/'); quoteChar('"'); quoteChar('\''); parseNumbers();
This leaves the ordinary characters consisting of most of the punctuation and arithmetic characters (;, :, [, {, +, =, and so forth).
The changes made to the character classes are cumulative, so, for example, invoking wordChars with two different ranges of characters defines both ranges as word characters. To replace a range you must first mark the old range as ordinary and then add the new range. Resetting the syntax table clears all settings, so if you want to return to the default settings, for example, you must manually make the invocations listed above.
Other methods control the basic behavior of the tokenizer:
-
public void
eolIsSignificant(boolean flag)
- If flag is true, ends of lines are significant and TT_EOL may be returned by nextToken. If false, ends of lines are treated as whitespace and TT_EOL is never returned. The default is false.
-
public void
slashStarComments(boolean flag)
- If flag is true, the tokenizer recognizes /*...*/ comments. This occurs independently of settings for any comment characters. The default is false.
-
public void
slashSlashComments(boolean flag)
- If flag is true, the tokenizer recognizes // to end-of-line comments. This occurs independently of the settings for any comment characters. The default is false.
-
public void
lowerCaseMode(boolean flag)
- If flag is true, all characters in TT_WORD tokens are converted to their lowercase equivalent if they have one (using String.toLowerCase). The default is false. Because of the case issues described in "Character" on page 192, you cannot reliably use this for Unicode string equivalence—two tokens might be equivalent but have different lowercase representations. Use String.equalsIgnoreCase for reliable case-insensitive comparison.
There are three miscellaneous methods:
-
public void
pushBack()
- Pushes the previously returned token back into the stream. The next invocation of nextToken returns the same token again instead of proceeding to the next token. There is only a one-token pushback; multiple consecutive invocations to pushBack are equivalent to one invocation.
-
public int
lineno()
- Returns the current line number. Usually used for reporting errors you detect.
-
public String
toString()
- Returns a String representation of the last returned stream token, including its line number.
Exercise 20.6 : Write a program that takes input of the form name op value , where name is one of three words of your choosing, op is +, -, or =, and value is a number. Apply each operator to the named value. When input is exhausted, print the three values. For extra credit, use the HashMap class that was used for AttributedImpl so that you can use an arbitrary number of named values.