Home > Articles

Processing Input and Output

This chapter is from the book

In this chapter, you will learn how to work with files, directories, and web pages, and how to read and write data in binary and text format. You will also find a discussion of regular expressions, which can be useful for processing input. (I couldn’t think of a better place to handle that topic, and apparently neither could the Java developers—when the regular expression API specification was proposed, it was attached to the specification request for “new I/O” features.) Finally, this chapter shows you the object serialization mechanism that lets you store objects as easily as you can store text or numeric data.

The key points of this chapter are:

  1. An InputStream is a source of bytes, and an OutputStream is a destination for bytes.

  2. A Reader reads characters, and a Writer writes them. Be sure to specify a character encoding.

  3. The Files class has convenience methods for reading all bytes or lines of a file.

  4. The DataInput and DataOutput interfaces have methods for writing numbers in binary format.

  5. Use a RandomAccessFile or a memory-mapped file for random access.

  6. A Path is an absolute or relative sequence of path components in a file system. Paths can be combined (or “resolved”).

  7. Use the methods of the Files class to copy, move, or delete files and to recursively walk through a directory tree.

  8. To read or update a ZIP file, use a ZIP file system.

  9. You can read the contents of a web page with the URL class. To read metadata or write data, use the URLConnection class.

  10. With the Pattern and Matcher classes, you can find all matches of a regular expression in a string, as well as the captured groups for each match.

  11. The serialization mechanism can save and restore any object implementing the Serializable interface, provided its instance variables are also serializable.

9.1. Input/Output Streams, Readers, and Writers

In the Java API, a source from which one can read bytes is called an input stream. The bytes can come from a file, a network connection, or an array in memory. (These streams are unrelated to the streams of 8.) Similarly, a destination for bytes is an output stream. In contrast, readers and writers consume and produce sequences of characters. In the following sections, you will learn how to read and write bytes and characters.

9.1.1. Obtaining Streams

The easiest way to obtain a stream from a file is with the static methods

InputStream in = Files.newInputStream(path);
OutputStream out = Files.newOutputStream(path);

Here, path is an instance of the Path class that is covered in Section 9.2.1. It describes a path in a file system.

If you have an URL object, you can read its contents from the input stream returned by the openStream method. (The URL constructors are deprecated, and you should create an URL instance as shown here.)

var url = URI.create("https://horstmann.com/index.html").toURL();
InputStream in = url.openStream();

Section 9.3 shows how to send data to a web server.

The ByteArrayInputStream class lets you read from an array of bytes.

byte[] bytes = ...;
var in = new ByteArrayInputStream(bytes);
Read from in

Conversely, to send output to a byte array, use a ByteArrayOutputStream:

var out = new ByteArrayOutputStream();
Write to out
byte[] bytes = out.toByteArray();

9.1.2. Reading Bytes

The InputStream class has a method to read a single byte:

InputStream in = ...;
int b = in.read();

This method either returns the byte as an integer between 0 and 255, or returns -1 if the end of input has been reached.

More commonly, you will want to read the bytes in bulk. The most convenient method is the readAllBytes method that simply reads all bytes from the stream into a byte array:

byte[] bytes = in.readAllBytes();

If you want to read some, but not all bytes, provide a byte array and call the readNBytes method:

var bytes = new byte[len];
int bytesRead = in.readNBytes(bytes, offset, n);

The method reads until either n bytes are read or no further input is available, and returns the actual number of bytes read. If no input was available at all, the methods return -1.

Finally, you can skip bytes:

long bytesToSkip = ...;
in.skipNBytes(bytesToSkip);

9.1.3. Writing Bytes

The write methods of an OutputStream can write individual bytes and byte arrays.

OutputStream out = ...;
int b = ...;
out.write(b);
byte[] bytes = ...;
out.write(bytes);
out.write(bytes, start, length);

When you are done writing a stream, you must close it in order to commit any buffered output. This is best done with a try-with-resources statement:

try (OutputStream out = ...) {
    out.write(bytes);
}

If you need to copy an input stream to an output stream, use the InputStream.transferTo method:

try (InputStream in = ...; OutputStream out = ...) {
    in.transferTo(out);
}

Both streams need to be closed after the call to transferTo. It is best to use a try-with-resources statement, as in the code example.

To write a file to an OutputStream, call

Files.copy(path, out);

Conversely, to save an InputStream to a file, call

Files.copy(in, path, StandardCopyOption.REPLACE_EXISTING);

9.1.4. Character Encodings

Input and output streams are for sequences of bytes, but in many cases you will work with text—that, is, sequences of characters. It then matters how characters are encoded into bytes.

Java uses the Unicode standard for characters. Each character or “code point” has a 21-bit integer number. There are different character encodings—methods for packaging those 21-bit numbers into bytes.

The most common encoding is UTF-8, which encodes each Unicode code point into a sequence of one to four bytes (see Table 9.1). UTF-8 has the advantage that the characters of the traditional ASCII character set, which contains all characters used in English, only take up one byte each.

Table 9.1: UTF-8 Encoding

Character range

Encoding

0...7F

0a6a5a4a3a2a1a0

80...7FF

110a10a9a8a7a6 10a5a4a3a2a1a0

800...FFFF

1110a15a14a13a12 10a11a10a9a8a7a6 10a5a4a3a2a1a0

10000...10FFFF

11110a20a19a18 10a17a16a15a14a13a12 10a11a10a9a8a7a6 10a5a4a3a2a1a0

A less common encoding is UTF-16, which encodes each Unicode code point into one or two 16-bit values (see Table 9.2). This is the encoding used in Java strings. Actually, there are two forms of UTF-16, called “big-endian” and “little-endian.” Consider the 16-bit value 0x2122. In big-endian format, the more significant byte comes first: 0x21 followed by 0x22. In little-endian format, it is the other way around: 0x22 0x21. To indicate which of the two is used, a file can start with the “byte order mark,” the 16-bit quantity 0xFEFF. A reader can use this value to determine the byte order and discard it.

Table 9.2: UTF-16 Encoding

Character range

Encoding

0...FFFF

a15a14a13a12a11a10a9a8a7a6a5a4a3a2a1a0

10000...10FFFF

110110b19b18b17b16a15a14a13a12a11a10 110111a9a8a7a6a5a4a3a2a1a0
where b19b18b17b16 = a20a19a18a17a16 – 1

In addition to the UTF encodings, there are partial encodings that cover a character range suitable for a given user population. For example, ISO 8859-1 is a one-byte code that includes accented characters used in Western European languages. Shift_JIS is a variable-length code for Japanese characters. A large number of these encodings are still in widespread use.

Because UTF-8 is so common, it has become the default encoding since Java 18. Previously, the default encoding was the native encoding—the character encoding that is preferred by the operating system of the computer running your program. On Windows, that is generally not UTF-8. If you are using an older version of Java, or if you are working with text in an encoding other than UTF-8, you need to explicitly specify the encoding.

The StandardCharsets class has static variables of type Charset for the character encodings that every Java virtual machine must support:

StandardCharsets.UTF_8
StandardCharsets.UTF_16
StandardCharsets.UTF_16BE
StandardCharsets.UTF_16LE
StandardCharsets.ISO_8859_1
StandardCharsets.US_ASCII

To obtain the Charset for another encoding, use the static forName method:

Charset shiftJIS = Charset.forName("Shift_JIS");

You use the Charset object to specify a character encoding. For example, you can turn an array of bytes into a string as

var contents = new String(bytes, StandardCharsets.ISO_8859_1);

9.1.5. Text Input

To read text input, use a Reader. You can obtain a Reader from any input stream with the InputStreamReader adapter:

InputStream inStream = ...;
var in = new InputStreamReader(inStream, charset);

If you want to process the input one UTF-16 code unit at a time, you can call the read method:

int ch = in.read();

The method returns a code unit between 0 and 65536, or -1 at the end of input.

That is not very convenient. Here are several alternatives.

With a short text file, you can read it into a string like this:

String content = Files.readString(path, charset);

But if you want the file as a sequence of lines, call

List<String> lines = Files.readAllLines(path, charset);

If the file is large, process them lazily as a Stream<String>:

try (Stream<String> lines = Files.lines(path, charset)) {
    ...
}

To read numbers or words from a file, use a Scanner, as you have seen in 1. For example,

var in = new Scanner(path);
while (in.hasNextDouble()) {
    double value = in.nextDouble();
    ...
}

If your input does not come from a file, wrap the InputStream into a BufferedReader:

try (var reader = new BufferedReader(new InputStreamReader(url.openStream()))) {
    Stream<String> lines = reader.lines();
    ...
}

A BufferedReader reads input in chunks for efficiency. (Oddly, this is not an option for basic readers.) It has methods readLine to read a single line and lines to yield a stream of lines.

If a method asks for a Reader and you want it to read from a file, call Files.newBufferedReader(path, charset).

9.1.6. Text Output

To write text, use a Writer. With the write method, you can write strings. You can turn any output stream into a Writer:

OutputStream outStream = ...;
var out = new OutputStreamWriter(outStream, charset);
out.write(str);

To get a writer for a file, use

Writer out = Files.newBufferedWriter(path, charset);

It is more convenient to use a PrintWriter, which has the print, println, and printf that you have always used with System.out. Using those methods, you can print numbers and use formatted output.

If you write to a file, construct a PrintWriter like this:

var out = new PrintWriter(Files.newBufferedWriter(path, charset));

If you write to another stream, use

var out = new PrintWriter(new OutputStreamWriter(outStream, charset));

If you already have the text to write in a string, call

String content = ...;
Files.writeString(path, content, charset);

or

Files.write(path, lines, charset);

Here, lines can be a Collection<String>, or even more generally, an Iterable<? extends CharSequence>.

To append to a file, use

Files.writeString(path, charset, StandardOpenOption.APPEND);
Files.write(path, lines, charset, StandardOpenOption.APPEND);

Sometimes, a library method wants a Writer to write output. If you want to capture that output in a string, hand it a StringWriter. Or, if it wants a PrintWriter, wrap the StringWriter like this:

var writer = new StringWriter();
throwable.printStackTrace(new PrintWriter(writer));
String stackTrace = writer.toString();

9.1.7. Reading Character Input

If you read a file with a structured format such as JSON or XML, you will use a parser that someone wrote who understands the fiddly details of that format. Such a parser typically reads a character at a time.

In the uncommon case that you need to write such a parser, use a BufferedReader for efficiency. Keep calling its read method, which yields a char value or -1 at the end of input. The reader converts the encoding of the input stream into UTF-16.

If you want to process Unicode code points, you need to handle the UTF-16 encoding. Here is how to read one code point:

int ch = reader.read();
if (ch != -1)
{
   int codePoint;
   if (Character.isHighSurrogate((char) ch))
   {
      int ch2 = reader.read();
      if (Character.isLowSurrogate((char) ch2))
         codePoint = Character.toCodePoint(ch, ch2);
      else
         throw new MalformedInputException();
   }
   else
      codePoint = ch;
}

The Character class contains methods to tell whether a particular code point has a given property. For example,

Character.isLetter(codePoint)

returns true if codePoint is a letter in some language. Here are some other classification methods:

isUpperCase
isLowerCase
isDigit
isSpaceChar
isEmoji

These methods use the rules of the Unicode standard. Others refer to the rules of the Java language:

isJavaIdentifierStart
isJavaIdentifierPart
isWhitespace

After analyzing the code points, you often need to store them in strings, converting them back to UTF-16. The appendCodePoint method of the StringBuilder class turns a code point into one or two char values which are appended to the builder.

9.1.8. Reading and Writing Binary Data

The DataInput interface declares the following methods for reading a number, a character, a boolean value, or a string in binary format:

byte readByte()
int readUnsignedByte()
char readChar()
short readShort()
int readUnsignedShort()
int readInt()
long readLong()
float readFloat()
double readDouble()
void readFully(byte[] b)

The DataOutput interface declares corresponding write methods.

The advantage of binary I/O is that it is fixed width and efficient. For example, writeInt always writes an integer as a big-endian 4-byte binary quantity regardless of the number of digits. The space needed is the same for each value of a given type, which speeds up random access. Also, reading binary data is faster than parsing text. The main drawback is that the resulting files cannot be easily inspected in a text editor.

You can use the DataInputStream and DataOutputStream adapters with any stream. For example,

DataInput in = new DataInputStream(Files.newInputStream(path));
DataOutput out = new DataOutputStream(Files.newOutputStream(path));

9.1.9. Random-Access Files

The RandomAccessFile class lets you read or write data anywhere in a file. You can open a random-access file either for reading only or for both reading and writing; specify the option by using the string "r" (for read access) or "rw" (for read/write access) as the second argument in the constructor. For example,

var file = new RandomAccessFile(path.toString(), "rw");

A random-access file has a file pointer that indicates the position of the next byte to be read or written. The seek method sets the file pointer to an arbitrary byte position within the file. The argument to seek is a long integer between zero and the length of the file (which you can obtain with the length method). The getFilePointer method returns the current position of the file pointer.

The RandomAccessFile class implements both the DataInput and DataOutput interfaces. To read and write numbers from a random-access file, use methods such as readInt/writeInt that you saw in the preceding section. For example,

int value = file.readInt();
file.seek(file.getFilePointer() - 4);
file.writeInt(value + 1);

9.1.10. Memory-Mapped Files

Memory-mapped files provide another, very efficient approach for random access that works well for very large files. However, the API for data access is completely different from that of input/output streams. First, get a channel to the file:

FileChannel channel = FileChannel.open(path,
StandardOpenOption.READ, StandardOpenOption.WRITE)

Then, map an area of the file (or, if it is not too large, the entire file) into memory:

ByteBuffer buffer = channel.map(FileChannel.MapMode.READ_WRITE,
    0, channel.size());

Use methods get, getInt, getDouble, and so on to read values, and the equivalent put methods to write values.

int offset = ...;
int value = buffer.getInt(offset);
buffer.put(offset, value + 1);

At some point, and certainly when the channel is closed, these changes are written back to the file.

9.1.11. File Locking

When multiple simultaneously executing programs modify the same file, they need to communicate in some way, or the file can easily become damaged. File locks can solve this problem.

Suppose your application saves a configuration file with user preferences. If a user invokes two instances of the application, it could happen that both of them want to write the configuration file at the same time. In that situation, the first instance should lock the file. When the second instance finds the file locked, it can decide to wait until the file is unlocked or simply skip the writing process. To lock a file, call either the lock or tryLock methods of the FileChannel class.

FileChannel channel = FileChannel.open(path, StandardOpenOption.WRITE);
FileLock lock = channel.lock();

or

FileLock lock = channel.tryLock();

The first call blocks until the lock becomes available. The second call returns immediately, either with the lock or with null if the lock is not available. The file remains locked until the lock or the channel is closed. It is best to use a try-with-resources statement:

try (FileLock lock = channel.lock()) {
    ...
}

InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.

Overview


Pearson Education, Inc., 221 River Street, Hoboken, New Jersey 07030, (Pearson) presents this site to provide information about products and services that can be purchased through this site.

This privacy notice provides an overview of our commitment to privacy and describes how we collect, protect, use and share personal information collected through this site. Please note that other Pearson websites and online products and services have their own separate privacy policies.

Collection and Use of Information


To conduct business and deliver products and services, Pearson collects and uses personal information in several ways in connection with this site, including:

Questions and Inquiries

For inquiries and questions, we collect the inquiry or question, together with name, contact details (email address, phone number and mailing address) and any other additional information voluntarily submitted to us through a Contact Us form or an email. We use this information to address the inquiry and respond to the question.

Online Store

For orders and purchases placed through our online store on this site, we collect order details, name, institution name and address (if applicable), email address, phone number, shipping and billing addresses, credit/debit card information, shipping options and any instructions. We use this information to complete transactions, fulfill orders, communicate with individuals placing orders or visiting the online store, and for related purposes.

Surveys

Pearson may offer opportunities to provide feedback or participate in surveys, including surveys evaluating Pearson products, services or sites. Participation is voluntary. Pearson collects information requested in the survey questions and uses the information to evaluate, support, maintain and improve products, services or sites, develop new products and services, conduct educational research and for other purposes specified in the survey.

Contests and Drawings

Occasionally, we may sponsor a contest or drawing. Participation is optional. Pearson collects name, contact information and other information specified on the entry form for the contest or drawing to conduct the contest or drawing. Pearson may collect additional personal information from the winners of a contest or drawing in order to award the prize and for tax reporting purposes, as required by law.

Newsletters

If you have elected to receive email newsletters or promotional mailings and special offers but want to unsubscribe, simply email information@informit.com.

Service Announcements

On rare occasions it is necessary to send out a strictly service related announcement. For instance, if our service is temporarily suspended for maintenance we might send users an email. Generally, users may not opt-out of these communications, though they can deactivate their account information. However, these communications are not promotional in nature.

Customer Service

We communicate with users on a regular basis to provide requested services and in regard to issues relating to their account we reply via email or phone in accordance with the users' wishes when a user submits their information through our Contact Us form.

Other Collection and Use of Information


Application and System Logs

Pearson automatically collects log data to help ensure the delivery, availability and security of this site. Log data may include technical information about how a user or visitor connected to this site, such as browser type, type of computer/device, operating system, internet service provider and IP address. We use this information for support purposes and to monitor the health of the site, identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents and appropriately scale computing resources.

Web Analytics

Pearson may use third party web trend analytical services, including Google Analytics, to collect visitor information, such as IP addresses, browser types, referring pages, pages visited and time spent on a particular site. While these analytical services collect and report information on an anonymous basis, they may use cookies to gather web trend information. The information gathered may enable Pearson (but not the third party web trend services) to link information with application and system log data. Pearson uses this information for system administration and to identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents, appropriately scale computing resources and otherwise support and deliver this site and its services.

Cookies and Related Technologies

This site uses cookies and similar technologies to personalize content, measure traffic patterns, control security, track use and access of information on this site, and provide interest-based messages and advertising. Users can manage and block the use of cookies through their browser. Disabling or blocking certain cookies may limit the functionality of this site.

Do Not Track

This site currently does not respond to Do Not Track signals.

Security


Pearson uses appropriate physical, administrative and technical security measures to protect personal information from unauthorized access, use and disclosure.

Children


This site is not directed to children under the age of 13.

Marketing


Pearson may send or direct marketing communications to users, provided that

  • Pearson will not use personal information collected or processed as a K-12 school service provider for the purpose of directed or targeted advertising.
  • Such marketing is consistent with applicable law and Pearson's legal obligations.
  • Pearson will not knowingly direct or send marketing communications to an individual who has expressed a preference not to receive marketing.
  • Where required by applicable law, express or implied consent to marketing exists and has not been withdrawn.

Pearson may provide personal information to a third party service provider on a restricted basis to provide marketing solely on behalf of Pearson or an affiliate or customer for whom Pearson is a service provider. Marketing preferences may be changed at any time.

Correcting/Updating Personal Information


If a user's personally identifiable information changes (such as your postal address or email address), we provide a way to correct or update that user's personal data provided to us. This can be done on the Account page. If a user no longer desires our service and desires to delete his or her account, please contact us at customer-service@informit.com and we will process the deletion of a user's account.

Choice/Opt-out


Users can always make an informed choice as to whether they should proceed with certain services offered by InformIT. If you choose to remove yourself from our mailing list(s) simply visit the following page and uncheck any communication you no longer want to receive: www.informit.com/u.aspx.

Sale of Personal Information


Pearson does not rent or sell personal information in exchange for any payment of money.

While Pearson does not sell personal information, as defined in Nevada law, Nevada residents may email a request for no sale of their personal information to NevadaDesignatedRequest@pearson.com.

Supplemental Privacy Statement for California Residents


California residents should read our Supplemental privacy statement for California residents in conjunction with this Privacy Notice. The Supplemental privacy statement for California residents explains Pearson's commitment to comply with California law and applies to personal information of California residents collected in connection with this site and the Services.

Sharing and Disclosure


Pearson may disclose personal information, as follows:

  • As required by law.
  • With the consent of the individual (or their parent, if the individual is a minor)
  • In response to a subpoena, court order or legal process, to the extent permitted or required by law
  • To protect the security and safety of individuals, data, assets and systems, consistent with applicable law
  • In connection the sale, joint venture or other transfer of some or all of its company or assets, subject to the provisions of this Privacy Notice
  • To investigate or address actual or suspected fraud or other illegal activities
  • To exercise its legal rights, including enforcement of the Terms of Use for this site or another contract
  • To affiliated Pearson companies and other companies and organizations who perform work for Pearson and are obligated to protect the privacy of personal information consistent with this Privacy Notice
  • To a school, organization, company or government agency, where Pearson collects or processes the personal information in a school setting or on behalf of such organization, company or government agency.

Links


This web site contains links to other sites. Please be aware that we are not responsible for the privacy practices of such other sites. We encourage our users to be aware when they leave our site and to read the privacy statements of each and every web site that collects Personal Information. This privacy statement applies solely to information collected by this web site.

Requests and Contact


Please contact us about this Privacy Notice or if you have any requests or questions relating to the privacy of your personal information.

Changes to this Privacy Notice


We may revise this Privacy Notice through an updated posting. We will identify the effective date of the revision in the posting. Often, updates are made to provide greater clarity or to comply with changes in regulatory requirements. If the updates involve material changes to the collection, protection, use or disclosure of Personal Information, Pearson will provide notice of the change through a conspicuous notice on this site or other appropriate way. Continued use of the site after the effective date of a posted revision evidences acceptance. Please contact us if you have questions or concerns about the Privacy Notice or any objection to any revisions.

Last Update: November 17, 2020