- .dbg and .pdb Symbol Files
- Global PDB File Layout
- Scanning the Root Stream
- Decomposing a PDB File
- Sample Code Archive
- Bibliography
Decomposing a PDB File
My sample "PDB File Exploder" w2k_pdbx.exe is a barebones Win32 console-mode utility that performs the following processing steps:
First, it allocates a virtual memory block large enough to hold the entire PDB file data, and copies the file from disk to memory.
Before attempting any interpretation, the data has to undergo a simple verification test, performed by the PdbValid() function in Listing 4. Given a pointer to the memory block, which is supposed to start with a PDB_HEADER structure (function argument pph) and the number of bytes read from the file (function argument dData), this function first ensures that there is at least enough space for a complete PDB_HEADER structure. Otherwise, accessing any of its members might cause an exception. Next, the presence of a PDB V2.00 signature is verified. Finally, PdbValid() computes the number of data bytes indicated by the page count and page size, and matches the result against the file size. Of course, this test is very rawa good PDB reader should also consider verifying that all page numbers in the header and root stream are within the proper range.
Depending on the user-supplied command switches, the utility writes the main components of the PDB file to separate files. It recognizes the options h (extract header), a (extract allocation bits), r (extract root stream), and d (extract data streams). The command line may comprise multiple PDB file paths, and for each file, these four options can be turned on or off by prefixing the option ID with a plus or minus sign. For example, the command w2k_pdbx+hardD:\WINNT\Symbols\exe\ntoskrnl.pdb extracts all valid PDB data that is buried in the Windows 2000 kernel's symbol file.
Listing 4 Simple PDB Sanity Check
BOOL PdbValid (PPDB_HEADER pph, DWORD dData) { return (pph != NULL) && (dData >= PDB_HEADER_) && (!lstrcmpA (pph->abSignature, PDB_SIGNATURE_200))&& ((DWORD) pph->wFilePages * pph->dPageSize == dData); }
If the +h option is specified, the PDB_HEADER portion is saved, including all valid root stream page numbers. In this case, w2k_pdbx.exe simply writes the first PDB page to disk. If you assume a weird scenario, in which you have 65,536 zero-length data streams, it turns out that the size of the root stream would be 524,292 bytes, or 513 pages in 1-KB mode. Because each page number takes up 16 bits in the header's awRootPages[] array, it is apparent that the header size would exceed 1,024 bytes. The situation gets even worse if the data streams are not empty. Frankly, I currently don't know how the Microsoft PDB tools handle this special case. However, I doubt that you will ever run into such a pathological file in real life.
Saving the allocation bits on behalf of the +a option is also quite simple. The number of pages occupied by the bit array is given by the wStartPage member of the PDB_HEADER minus 1, again assuming that the header doesn't exceed the one-page limit. Extracting the root stream (command option +r) requires a bit more work because the program must first find out the size of the root stream. This calculation is not trivial because the size depends on the number and size of the data streams, and you must take into account that the root stream may span multiple pages that are not necessarily contiguous. Listing 5 shows a possible iterative solution. The PdbRoot() function uses a three-step approximation procedure to find out the exact size in bytes, and copies the data to a contiguous memory block.
Listing 5 Copying the Root Stream
PPDB_ROOT PdbRoot (PPDB_HEADER pph, PDWORD pdBytes) { DWORD dBytes, i, n; PPDB_ROOT ppr = NULL; if ((ppr = PdbRead (pph, PDB_ROOT_, pph->awRootPages)) != NULL) { dBytes = PDB_ROOT__ ((DWORD) ppr->wCount); free (ppr); if ((ppr = PdbRead (pph, dBytes, pph->awRootPages)) != NULL) { for (n = i = 0; i < (DWORD) ppr->wCount; i++) { n += PdbPages (pph, ppr->aStreams [i].dStreamSize); } dBytes += n * sizeof (WORD); free (ppr); ppr = PdbRead (pph, dBytes, pph->awRootPages); } } *pdBytes = (ppr != NULL ? dBytes : 0); return ppr; }
The first approximation is based on the fact that the root stream starts out with the fixed-size portion of a PDB_ROOT structure, which will always fit into a single page. Therefore, PdbRoot() uses the general-purpose PdbRead() function defined in Listing 6 to load the first root stream page. PdbRead() is sort of the workhorse of the w2k_pdbx.exe utilityit copies pages from the PDB memory image to a contiguous memory block, given a page number array and the number of bytes to copy. It relies on the PdbPages() function at the top of Listing 6 that computes the number of stream pages from the stream size in bytes and the current page size.
In step 2, PdbRoot() can compute the size of the PDB_ROOT structure including all aStreams[] entries, but not including the following page number array. Although not very probable, this data might already exceed one page. In 1-KB page mode, this would happen as soon as the stream directory contained 128 or more data streams. However, PdbRead() comes to the rescue, and builds a faithful and contiguous copy in memory.
Now that the entire PDB_STREAM array of the PDB_ROOT structure is in memory, it is easy to find out the overall size of the root stream by adding up the number of pages taken up by each data stream, yielding the required size of the page number array following the PDB_ROOT data. Again, PdbRead() is employed to reshuffle all root stream pages into a newly allocated memory block.
Listing 6 Joining Stream Pages In a Contiguous Memory Block
DWORD PdbPages (PPDB_HEADER pph, DWORD dBytes) { return (dBytes ? (((dBytes-1) / pph->dPageSize) + 1) : 0); } // ----------------------------------------------------------------- PVOID PdbRead (PPDB_HEADER pph, DWORD dBytes, PWORD pwPages) { DWORD i, j; DWORD dPages = PdbPages (pph, dBytes); PVOID pPages = malloc (dPages * pph->dPageSize); if (pPages != NULL) { for (i = 0; i < dPages; i++) { j = pwPages [i]; CopyMemory ((PBYTE) pPages + (i * pph->dPageSize), (PBYTE) pph + (j * pph->dPageSize), pph->dPageSize); } } return pPages; }
By now, we are almost done. Saving the data streams is almost trivial once the root stream is assembled in memory. As you might have guessed, the PdbRead() function does the hard work again. Listing 7 shows the PdbStream() function that produces a virtual-memory copy of the data stream identified by the zero-based dStream index. Before calling PdbRead(), the function locates the page number subarray associated with the requested stream by looping through the stream directory, and passes this pointer to PdbRead() as its third argument. PdbStream() returns the size of the stream via its output parameter pdBytes.
Listing 7 Copying Data Streams
PVOID PdbStream (PPDB_HEADER pph, PPDB_ROOT ppr, DWORD dStream, PDWORD pdBytes) { DWORD dBytes, i; PWORD pwPages; PVOID pPages = NULL; if (dStream < (DWORD) ppr->wCount) { pwPages = (PWORD) ((PBYTE) ppr + PDB_ROOT__ ((DWORD) ppr->wCount)); for (i = 0; i < dStream; i++) { pwPages += PdbPages (pph, ppr->aStreams [i].dStreamSize); } dBytes = ppr->aStreams [dStream].dStreamSize; pPages = PdbRead (pph, dBytes, pwPages); } *pdBytes = (pPages != NULL ? dBytes : 0); return pPages; }
If the w2k_pdbx.exe utility is run without any command arguments, it displays the help screen shown in Example 1. By default, the various output files are written to the current directory. However, you can override this setting by explicitly specifying a target directory. This can be a relative or absolute pathwith or without a trailing backslash. In any case, this directory must exist, and the path specification must be prefixed by a slash character.
Example 1: The w2k_pdbx.exe Command Help Screen
D:\tmp>w2k_pdbx // w2k_pdbx.exe // SBS Program Database Exploder V1.00 // 07-07-2001 Sven B. Schreiber // sbs@orgon.com Usage: w2k_pdbx { [+-hard] [/<target>] <PDB path> } + enable subsequent options - disable subsequent options h extract header a extract allocation bits r extract root stream d extract data streams Target paths: +h <target>\<PDB file>.header +a <target>\<PDB file>.alloc +r <target>\<PDB file>.root +d <target>\<PDB file>.<###> <###> = 0-based stream number. If /<target> is omitted, the files are written to the current directory.
Example 2 is another sample run of the w2k_pdbx.exe utility, this time specifying all available options (+h, +a, +r, and +d), as well as the path of the ntoskrnl.exe symbol file on the command line. Before writing the output files, w2k_pdbx.exe displays a summary of PDB file properties extracted from the file header and the root stream.
Example 2: Parsing the ntoskrnl.exe Symbol File
w2k_pdbx +hard e:\winnt\symbols\exe\ntoskrnl.pdb // w2k_pdbx.exe // SBS Program Database Exploder V1.00 // 07-07-2001 Sven B. Schreiber // sbs@orgon.com Properties of "e:\winnt\symbols\exe\ntoskrnl.pdb": 67108864 bytes maximum size 738304 bytes allocated 706239 bytes used by 8 data streams 1456 bytes used by the root stream 1024 bytes per page 721 pages allocated 694 pages used by 8 data streams 2 pages used by the root stream Saving "ntoskrnl.pdb.header"... 1024 bytes Saving "ntoskrnl.pdb.alloc"... 8192 bytes Saving "ntoskrnl.pdb.root"... 1456 bytes Saving "ntoskrnl.pdb.000"... 1456 bytes Saving "ntoskrnl.pdb.001"... 58 bytes Saving "ntoskrnl.pdb.002"... 56 bytes Saving "ntoskrnl.pdb.003"... 262825 bytes Saving "ntoskrnl.pdb.004"... 0 bytes Saving "ntoskrnl.pdb.005"... 16388 bytes Saving "ntoskrnl.pdb.006"... 106164 bytes Saving "ntoskrnl.pdb.007"... 319292 bytes