- .dbg and .pdb Symbol Files
- Global PDB File Layout
- Scanning the Root Stream
- Decomposing a PDB File
- Sample Code Archive
- Bibliography
Global PDB File Layout
In my Windows 2000 book [2], detailed information about the extraction of public symbols is found, along with plenty of reusable source code. In this article, I will cover the more general task of parsing a PDB file and dissecting it into its parts without regarding the contents of its various streams. The sample application presented here is a simple Win32 console mode program that expects the path of a PDB file, and stores its constituents in separate files. Depending on the option switches specified on the command line, the following information can be extracted:
The PDB file header, containing general information about the file
The allocation bit table, which specifies what parts of the file are in use
The root stream, containing specific information about all data streams
All data streams listed in the root stream
The most basic structural property of a PDB file is its subdivision into pages of equal size. The most frequently used page size is 1KB (1,024 bytes), but my research has revealed that 2K and 4K pages are legal as well. (You can verify this by examining how Microsoft's dbghelp.dll processes PDB files.) A PDB stream is a sequence of file pages that contains coherent information. The most essential property of a stream is that its pages can be located anywhere in the file in arbitrary order.
When a stream is read or written, a stream directory is responsible for telling the application which pages need to be accessed in which order. This directory is itself stored in a stream called the root stream. Additionally, an embedded Allocation Bit Table keeps track of used and unused pages. This table is indispensable as soon as "holes" appear in the PDB file due to the rearrangement of streams. If a stream is rewritten to the end of the PDB file, releasing the pages it occupied before, the allocation bit table reflects that the previous pages are free while the new pages are in use. This scheme is borrowed from simple operating systems such as MS-DOS with its FAT file system, in which a similar table specifies which disk sectors are allocated to files.
Figure 1 shows the typical basic layouts of a PDB file in 1-KB, 2-KB, and 4-KB mode. Which page size should be used depends on the data to be stored in the streams. If the page size increases, the allocation bit table and the root stream become smaller. On the other hand, a larger page size results in more page overhang; that is, more file bytes are wasted if the stream size isn't an exact multiple of the page size. The same problem occurs in file systems, in which the disk sector size must be chosen properly to avoid excessive sector overhang. Most PDB files, such as the Windows 2000 symbol files and the debugging information generated by Microsoft Visual C/C++ 6.0, employ the 1-KB scheme, as depicted on the left side of Figure 1.
Figure 1 Typical PDB file layouts.
In 1-KB page mode, a maximum of 64MB can be stored in the PDB file, which simply results from multiplying the number of allocation bits by the page size. As I will show in a moment, PDB page numbers are stored as zero-based, 16-bit quantities. Therefore, the allocation bit table will never comprise more than 0[ts]10000 bits (8KB).
The PDB header always occupies the first file page, and is followed by one or more pages containing the allocation bits. The structural definition of the header is shown in Listing 1. The first 44 bytes make up the ID string PDB_SIGNATURE_200, specifying the file type and version. At the time of this writing, the most recent version is 2.00, and this is the version used by the Windows 2000 symbol files and the Visual C/C++ debugging information. The dPageSize member indicates the page size applying to all pages in the file, and wStartPage is the zero-based page number of the first data page following the allocation bits. The size of the allocation bit table can always be computed by subtracting 1 from wStartPage (for the header page), and multiplying the result by dPageSize. The wFilePages member specifies the number of pages stored in the PDB file, and should always match the file size in bytes divided by the page size.
Listing 1PDB File Header
#define PDB_SIGNATURE_200 "Microsoft C/C++ program database 2.00\r\n\x1AJG\0" #define PDB_SIGNATURE_ 44 // size of signature (bytes) // ----------------------------------------------------------------- typedef struct _PDB_HEADER { BYTE abSignature [PDB_SIGNATURE_]; // PDB_SIGNATURE_200 DWORD dPageSize; // 0x0400, 0x0800, 0x1000 WORD wStartPage; // 0x0009, 0x0005, 0x0002 WORD wFilePages; // file size / dPageSize PDB_STREAM RootStream; // stream directory WORD awRootPages []; // pages containing PDB_ROOT } PDB_HEADER, *PPDB_HEADER, **PPPDB_HEADER;
The RootStream member is another undocumented structure of type PDB_STREAM, as defined in Listing 2. This structure appears wherever a stream is defined. Here, it refers to the root stream that contains the sizes and locations of the data streams within the file. In a moment, we will revisit it when walking down the list of data streams in the root stream. Only the dStreamSize member of the PDB_STREAM structures is of interest when dealing with a PDB file on disk. The pwStreamPages member can obviously be used as a scratch pad by PDB read/write utilities that handle PDB information in virtual memory. Simply ignore this value because it might be a bogus address that was valid once upon a time.
The RootStream structure is immediately followed by an array of 16-bit page numbers used by the root stream. Most PDB root streams I have seen so far don't exceed one page, so the awRootPages[] array usually contains a single entry only. One exception is the extraordinary large symbol file of ntoskrnl.exe, which has root stream that spans two pages.
Listing 2 The Basic PDB Stream Structure
typedef struct _PDB_STREAM { DWORD dStreamSize; // in bytes, -1 = free stream PWORD pwStreamPages; // array of page numbers } PDB_STREAM, *PPDB_STREAM, **PPPDB_STREAM;
The allocation bits don't require much explanation. Each bit is associated with an individual page, and a value of 1 means that the corresponding page is currently available. The bits within a byte are ordered from least-significant to most-significant. That is, bit #0 of byte #0 refers to page #0, bit #1 of byte #0 to page #1, and so on. Note that the allocation bits of an on-disk PDB image do not necessarily reflect the status of the stored data stream pages. If you examine a couple of Windows 2000 symbol files, you will find out quickly that some data stream pages are located in pages that are marked free, and some pages marked in-use are not part of any stream. So, the allocation table is probably rebuilt by PDB readers/writers from the data in the root stream, and has meaningful content only while loaded into virtual memory.