2.5 Memory-Mapped Files
Most operating systems can take advantage of a virtual memory implementation to “map” a file, or a region of a file, into memory. Then the file can be accessed as if it were an in-memory array, which is much faster than the traditional file operations.
2.5.1 Memory-Mapped File Performance
At the end of this section, you can find a program that computes the CRC32 checksum of a file using traditional file input and a memory-mapped file. On one machine, we got the timing data shown in Table 2.5 when computing the checksum of the 37MB file rt.jar in the jre/lib directory of the JDK.
Table 2.5 Timing Data for File Operations
Method |
Time |
Plain input stream |
110 seconds |
Buffered input stream |
9.9 seconds |
Random access file |
162 seconds |
Memory-mapped file |
7.2 seconds |
As you can see, on this particular machine, memory mapping is a bit faster than using buffered sequential input and dramatically faster than using a RandomAccessFile.
Of course, the exact values will differ greatly from one machine to another, but it is obvious that the performance gain, compared to random access, can be substantial. For sequential reading of files of moderate size, on the other hand, there is no reason to use memory mapping.
The java.nio package makes memory mapping quite simple. Here is what you do.
First, get a channel for the file. A channel is an abstraction for a disk file that lets you access operating system features such as memory mapping, file locking, and fast data transfers between files.
FileChannel channel = FileChannel.open(path, options);
Then, get a ByteBuffer from the channel by calling the map method of the FileChannel class. Specify the area of the file that you want to map and a mapping mode. Three modes are supported:
FileChannel.MapMode.READ_ONLY: The resulting buffer is read-only. Any attempt to write to the buffer results in a ReadOnlyBufferException.
FileChannel.MapMode.READ_WRITE: The resulting buffer is writable, and the changes will be written back to the file at some time. Note that other programs that have mapped the same file might not see those changes immediately. The exact behavior of simultaneous file mapping by multiple programs depends on the operating system.
FileChannel.MapMode.PRIVATE: The resulting buffer is writable, but any changes are private to this buffer and not propagated to the file.
Once you have the buffer, you can read and write data using the methods of the ByteBuffer class and the Buffer superclass.
Buffers support both sequential and random data access. A buffer has a position that is advanced by get and put operations. For example, you can sequentially traverse all bytes in the buffer as
while (buffer.hasRemaining()) { byte b = buffer.get(); . . . }
Alternatively, you can use random access:
for (int i = 0; i < buffer.limit(); i++) { byte b = buffer.get(i); . . . }
You can also read and write arrays of bytes with the methods
get(byte[] bytes) get(byte[], int offset, int length)
Finally, there are methods
getInt getChar getLong getFloat getShort getDouble
to read primitive-type values that are stored as binary values in the file. As we already mentioned, Java uses big-endian ordering for binary data. However, if you need to process a file containing binary numbers in little-endian order, simply call
buffer.order(ByteOrder.LITTLE_ENDIAN);
To find out the current byte order of a buffer, call
ByteOrder b = buffer.order();
To write numbers to a buffer, use one of the methods
putInt putChar putLong putFloat putShort putDouble
At some point, and certainly when the channel is closed, these changes are written back to the file.
Listing 2.5 computes the 32-bit cyclic redundancy checksum (CRC32) of a file. That checksum is often used to determine whether a file has been corrupted. Corruption of a file makes it very likely that the checksum has changed. The java.util.zip package contains a class CRC32 that computes the checksum of a sequence of bytes, using the following loop:
var crc = new CRC32(); while (more bytes) crc.update(next byte); long checksum = crc.getValue();
The details of the CRC computation are not important. We just use it as an example of a useful file operation. (In practice, you would read and update data in larger blocks, not a byte at a time. Then the speed differences are not as dramatic.)
Run the program as
java memoryMap.MemoryMapTest filename
Listing 2.5 memoryMap/MemoryMapTest.java
1 package memoryMap; 2 3 import java.io.*; 4 import java.nio.*; 5 import java.nio.channels.*; 6 import java.nio.file.*; 7 import java.util.zip.*; 8 9 /** 10 * This program computes the CRC checksum of a file in four ways. <br> 11 * Usage: java memoryMap.MemoryMapTest filename 12 * @version 1.02 2018-05-01 13 * @author Cay Horstmann 14 */ 15 public class MemoryMapTest 16 { 17 public static long checksumInputStream(Path filename) throws IOException 18 { 19 try (InputStream in = Files.newInputStream(filename)) 20 { 21 var crc = new CRC32(); 22 23 int c; 24 while ((c = in.read()) != -1) 25 crc.update(c); 26 return crc.getValue(); 27 } 28 } 29 30 public static long checksumBufferedInputStream(Path filename) throws IOException 31 { 32 try (var in = new BufferedInputStream(Files.newInputStream(filename))) 33 { 34 var crc = new CRC32(); 35 36 int c; 37 while ((c = in.read()) != -1) 38 crc.update(c); 39 return crc.getValue(); 40 } 41 } 42 43 public static long checksumRandomAccessFile(Path filename) throws IOException 44 { 45 try (var file = new RandomAccessFile(filename.toFile(), "r")) 46 { 47 long length = file.length(); 48 var crc = new CRC32(); 49 50 for (long p = 0; p < length; p++) 51 { 52 file.seek(p); 53 int c = file.readByte(); 54 crc.update(c); 55 } 56 return crc.getValue(); 57 } 58 } 59 60 public static long checksumMappedFile(Path filename) throws IOException 61 { 62 try (FileChannel channel = FileChannel.open(filename)) 63 { 64 var crc = new CRC32(); 65 int length = (int) channel.size(); 66 MappedByteBuffer buffer = channel.map(FileChannel.MapMode.READ_ONLY, 0, length); 67 68 for (int p = 0; p < length; p++) 69 { 70 int c = buffer.get(p); 71 crc.update(c); 72 } 73 return crc.getValue(); 74 } 75 } 76 77 public static void main(String[] args) throws IOException 78 { 79 System.out.println("Input Stream:"); 80 long start = System.currentTimeMillis(); 81 Path filename = Paths.get(args[0]); 82 long crcValue = checksumInputStream(filename); 83 long end = System.currentTimeMillis(); 84 System.out.println(Long.toHexString(crcValue)); 85 System.out.println((end - start) + " milliseconds"); 86 87 System.out.println("Buffered Input Stream:"); 88 start = System.currentTimeMillis(); 89 crcValue = checksumBufferedInputStream(filename); 90 end = System.currentTimeMillis(); 91 System.out.println(Long.toHexString(crcValue)); 92 System.out.println((end - start) + " milliseconds"); 93 94 System.out.println("Random Access File:"); 95 start = System.currentTimeMillis(); 96 crcValue = checksumRandomAccessFile(filename); 97 end = System.currentTimeMillis(); 98 System.out.println(Long.toHexString(crcValue)); 99 System.out.println((end - start) + " milliseconds"); 100 101 System.out.println("Mapped File:"); 102 start = System.currentTimeMillis(); 103 crcValue = checksumMappedFile(filename); 104 end = System.currentTimeMillis(); 105 System.out.println(Long.toHexString(crcValue)); 106 System.out.println((end - start) + " milliseconds"); 107 } 108 }
2.5.2 The Buffer Data Structure
When you use memory mapping, you make a single buffer that spans the entire file or the area of the file that you’re interested in. You can also use buffers to read and write more modest chunks of information.
In this section, we briefly describe the basic operations on Buffer objects. A buffer is an array of values of the same type. The Buffer class is an abstract class with concrete subclasses ByteBuffer, CharBuffer, DoubleBuffer, FloatBuffer, IntBuffer, LongBuffer, and ShortBuffer.
In practice, you will most commonly use ByteBuffer and CharBuffer. As shown in Figure 2.9, a buffer has
FIGURE 2.9 A buffer
A capacity that never changes
A position at which the next value is read or written
A limit beyond which reading and writing is meaningless
Optionally, a mark for repeating a read or write operation
These values fulfill the condition
0 = mark = position = limit = capacity
The principal purpose of a buffer is a “write, then read” cycle. At the outset, the buffer’s position is 0 and the limit is the capacity. Keep calling put to add values to the buffer. When you run out of data or reach the capacity, it is time to switch to reading.
Call flip to set the limit to the current position and the position to 0. Now keep calling get while the remaining method (which returns limit – position) is positive. When you have read all values in the buffer, call clear to prepare the buffer for the next writing cycle. The clear method resets the position to 0 and the limit to the capacity.
If you want to reread the buffer, use rewind or mark/reset (see the API notes for details).
To get a buffer, call a static method such as ByteBuffer.allocate or ByteBuffer.wrap.
Then, you can fill a buffer from a channel, or write its contents to a channel. For example,
ByteBuffer buffer = ByteBuffer.allocate(RECORD_SIZE); channel.read(buffer); channel.position(newpos); buffer.flip(); channel.write(buffer);
This can be a useful alternative to a random-access file.