PCX Meets Image I/O: Creating an Image-Reading Java Plug-in
Back in the days of the MS-DOS and PC-DOS operating systems, images were often stored in files based on ZSoft Corporation’s PiCture eXchange (PCX) format. Because you might have some PCX images that you want your Java programs to read, and because Java does not support the PCX format, I’ve created a plug-in for Image I/O that lets Java programs read PCX images.
This article presents my PCX reader plug-in. After introducing the PCX file format, the article tours the PCX reader plug-in’s architecture. Along with source code excerpts, this tour reveals the generic architecture of reader plug-ins. The article concludes by revealing how I built and tested this plug-in. Even if PCX does not interest you, you can still apply this article’s plug-in concepts to your own reader plug-ins.
PCX File Format
This section introduces you to PCX, which ZSoft created so that its PC Paintbrush products family could store images in a standard file format. After presenting the organization of PCX’s header and optional VGA palette, the section presents PCX’s algorithms for encoding and decoding an image’s pixels. For more information about PCX, consult the ZSoft PCX File Format Technical Reference Manual (see the Resources section at the end of this article).
Header and VGA palette
The first 128 bytes in a PCX file specify a header that describes the file’s image. The header provides image dimensions, a 16-entry color palette, and other items. It divides into 15 fields and regions, where two-byte fields store 16-bit integers according to Intel’s little-endian format—the least significant byte is stored at the lowest address:
- Manufacturer: This single-byte field (at offset 0) identifies the manufacturer of the PCX format. The value stored in this field is always 10, which identifies ZSoft as the manufacturer. The presence of 10 in this field is the only indication that the file might be a PCX file. To confirm that the file is PCX, other header fields must be interrogated for appropriate values. The PCX reader plug-in ignores files in which 10 does not appear in this field.
- Version: This single-byte field (at offset 1) identifies the Paintbrush product that created this PCX file. Values include 0 (Version 2.5 of PC Paintbrush), 2 (Version 2.8 with palette information), 3 (Version 2.8 without palette information), 4 (PC Paintbrush for Windows), and 5 (Version 3.0 and higher of PC Paintbrush, PC Paintbrush Plus, and Publisher’s PaintBrush). Version 5 also supports images with 24-bit color. The PCX reader plug-in recognizes only files in which the version number is 5.
- Encoding: This single-byte field (at offset 2) identifies the type of compression applied to the image. The only compression algorithm currently supported by PCX is a simple, byte-wise, run-length encoding (RLE) scheme indicated by a value of 1. It seems to follow that if a PCX file held an uncompressed image, this value would be 0. However, because PCX files always contain a compressed image, 1 is the only valid value. The PCX reader plug-in ignores files in which 1 does not appear in this field.
- BitsPerPixel: This single-byte field (at offset 3) identifies the number of bits per pixel per plane in the image. The possible values are 1, 2, 4, and 8 for 2-, 4-, 16-, and 256-color images (assuming that the NPlanes field contains 1). However, if this value is 8, and NPlanes contains 3, the image uses 24-bit color. The PCX reader plug-in recognizes only files in which BitsPerPixel contains 1, 4, or 8 and NPlanes contains 1, or BitsPerPixel contains 8 and NPlanes contains 3.
- Window: This eight-byte region (at offset 4) stores four
integers that identify the image’s dimensions in successive Xmin, Ymin,
Xmax, and Ymax fields. The Xmin and Ymin fields identify the leftmost column and
topmost row of the image to be displayed, whereas the Xmax and Ymax fields
identify the image’s rightmost column and bottommost row. The
image’s width is Xmax-Xmin+1 pixels, and the height is Ymax-Ymin+1 pixels.
When they contain values other than 0, the Xmin and Ymin fields allow an image display program to display part of a larger image. Although the PCX reader plug-in uses the existing Xmin and Ymin values in its calculations of the image’s width and height, it always returns the image beginning with column 0 and row 0 (the leftmost pixel on the topmost row)—not Xmin columns from column 0 and Ymin rows from row 0.
- HDpi and VDpi: These two-byte fields (at offsets 12 and 14, respectively) contain the horizontal and vertical dots-per-inch resolutions of the image, assuming that the image was created via a scanner. Traditional scanner values include 300 dots per inch and 600 dots per inch. Because these fields are rarely used, the PCX reader plug-in ignores them.
- Colormap: This 48-byte field (at offset 16) stores 16 three-byte RGB entries that record a palette for the Enhanced Graphics Adapter (EGA)—an old video card that could display 16 colors out of a palette of 64 colors. The PCX reader plug-in examines only the first two palette entries when BitsPerPixel contains 1 and examines all 16 entries when BitsPerPixel contains 4.
- Reserved: This single-byte region (at offset 64) serves no real purpose and is ignored by the PCX reader plug-in. Older PCX versions used this region for file identification or to store the video mode value of the display screen on which the PCX image was created. Several paint and graphics display programs claim that a PCX file is invalid if this region is not set to 0.
- NPlanes: This single-byte field (at offset 65) identifies the number of image planes. The number of planes is usually 1, 3, or 4; it is used in conjunction with the BitsPerPixel value to determine the maximum number of colors a PCX image may have and the minimum graphics hardware on which to display the image.
- BytesPerLine: This two-byte field (at offset 66) identifies the number of bytes that make up a single uncompressed scanline plane. (A scanline is a sequence of planes, in which each plane contains a sequence of palette indexes, a sequence of red values, a sequence of green values, a sequence of blue values, or a sequence of intensities.) The PCX reader plug-in ignores files in which BytesPerLine is odd.
- PaletteInfo: This two-byte field (at offset 68) specifies whether the Colormap field contains color or gray-scale values. A value of 1 indicates color values, whereas a 2 value indicates gray-scale values. The Video Graphics Array (VGA) adapter has a special gray-scale mode. The PCX reader plug-in ignores this field.
- HscreenSize and VScreenSize: These two-byte fields (at offsets 70 and 72, respectively) contain the horizontal and vertical sizes (in pixels) of the screen on which the image was created. Introduced by PaintBrush IV and IV Plus, these fields let graphics display programs adjust their video mode to allow for proper display of the PCX image. The PCX reader plug-in ignores these fields.
- Filler: This 54-byte region (at offset 74) completes the header. It is used to pad out the header to a full 128 bytes, and to save room for additional fields that might be added to the header in future revisions of the PCX format (although this is extremely doubtful). The PCX reader plug-in ignores this region.
PCX files typically store a row of image pixels as three planes of red, green, and blue values, or as a single plane of palette indexes. If BitsPerPixel contains 8 and NPlanes contains 3, this row is stored as a sequence of bytes containing red values (the red plane), followed by a sequence of bytes containing green values (the green plane), followed by a sequence of bytes containing blue values (the blue plane).
If BitsPerPixel contains 1, 2, 4, or 8, and NPlanes contains 1, a row of image pixels is stored as a sequence of 1-bit, 2-bit, or 4-bit indexes into the header’s 16-entry Colormap field, or as a sequence of 8-bit indexes into a 256-entry VGA palette (each 3-byte entry stores an RGB value) appended to the PCX file. The VGA palette is preceded by a byte whose decimal value is 12.
Image Encoding and Decoding
PCX encodes each row of pixel values using an RLE algorithm. This algorithm looks for runs of identical data bytes. For each run, two bytes are output: the first byte has its upper two bits set and stores the run’s length in its lower six bits; the second byte stores the data value. This six-bit count implies that a run cannot exceed 63 bytes.
Zero-length runs are not stored (unless there is something wrong with the encoding algorithm). If a data byte does not repeat, and zero or one of its top two bits is set, the data byte is output by itself. However, if a data byte does not repeat, and both of its top two bits are set, a byte with its upper two bits set and with a run length of 1 in its lower six bits is output, followed by the data byte.
The encoding algorithm (expressed as a mixture of Java and pseudocode) appears below:
int scanlineLength = BytesPerLine * NPlanes byte [] buffer = scanlineLength bytes in appropriate color format int index = 0 do { int i = 0 while (i < 62 && index + i + 1 < scanlineLength && buffer [index + i] == buffer [index + i + 1]) ++i // If there is no run, i contains 0. If there is a run of 2 through 63 bytes, // i contains 1 through 62. Essentially, i counts the number of bytes that // equal the first byte in a run. if (i > 0) { output byte ((i + 1) | 0xc0) to file output byte buffer [index] to file index += i + 1 } else { if ((buffer [index] & 0xc0) == 0xc0) output byte 0xc1 to file output byte buffer [index++] to file } } while (index < scanlineLength)
The equivalent decoding algorithm (expressed as a mixture of Java and pseudocode) appears below:
int scanlineLength = BytesPerLine * NPlanes byte [] buffer = new byte [scanlineLength] int index = 0 do { byte x = input byte from file if (x & 0xc0 == 0xc0) // top two bits in x are set { int count = x & 0x3f // return lowest six bits in x if (count == 0 || index + count - 1 >= scanlineLength) Error x = input byte from file for (int i = 1; i <= count; i++) buffer [index++] = x } else buffer [index++] = x } while (index < scanlineLength)