Cracking PDB Symbol Files
- .dbg and .pdb Symbol Files
- Global PDB File Layout
- Scanning the Root Stream
- Decomposing a PDB File
- Sample Code Archive
- Bibliography
Writers of software development and debugging tools are frequently faced with the task of having to display symbolic information about the Windows 2000 system modules. For example, every good disassembler should not just display raw numbers and addresses, but should attempt to resolve them to meaningful names instead. Otherwise, it's tough for the user to figure out what the disassembled code actually does. The first (and simplest) step to be taken is to use the symbolic information buried inside the Portable Executable (PE) file of the module under examination, as well as all modules referenced by it via dynamic linking. However, this information is by no means sufficient to gain a thorough understanding of the disassembled code because this information covers the points of contact between the modules only. To grasp the semantics of an internal (non-exported) function, it is usually quite helpful to know the function's name in the first place. It is also essential to know the names of the subordinate functions it calls and the global variables it accesses. Fortunately, Microsoft ships this important information with the Windows 2000 operating system in the form of symbol files, discernible by their file extensions .dbg and .pdb.
.dbg and .pdb Symbol Files
After installing Windows 2000, Microsoft Visual C/C++, and the Platform SDK, you will still be missing the symbol files—they have to be set up in s separate step. Note that the symbol files must always match the "Corrected Service Diskette" (CSD) level of the operating system. That is, after each Service Pack and Hot Fix installation, you must always update the symbol files as well. Usually, the symbol setup comes with the new operating system files. The setup program is named symbolsx.exe, and the symbol files are found in the associated archive file symbols.cab. After running the setup for the first time, you will find that your hard disk containing the Windows 2000 operating system has lost 400MB–500MB of free space. By default, the symbol files are installed into a directory tree called Symbols, contained by the Windows 2000 system root directory (for example, C:\WINNT\Symbols).
For each module file extension, a separate subdirectory is created, and each system module has two symbol files with file extensions: .dbg and .pdb, respectively. If you have been using Windows NT 4.0 before, you'll probably wonder why each Windows 2000 requires two symbol files now, instead of a single .dbg file. The reason is that Microsoft has moved the so-called "public symbols" into a separate file called Program Database, or PDB. Although the PDB file format is Microsoft proprietary [1], its basic structure is documented in my new book, Undocumented Windows 2000 Secrets [2]. Basically, a PDB file is a compound file that is made up of several independent streams. You can think of a PDB compound file as a simple flat file system in a single file where the streams correspond to the files hosted by the file system. One of the streams contains a sequence of variable-length records that describe the symbols defined inside the associated module.
In the previous example, in which Windows 2000 is assumed to be installed in directory C:\WINNT and the symbol root directory is C:\WINNT\Symbols, the symbol files of the Windows 2000 kernel module ntoskrnl.exe would be installed as C:\WINNT\Symbols\exe\ntoskrnl.dbg and C:\WINNT\Symbols\exe\ntoskrnl.pdb. Likewise, the paths of the ntdll.dll symbol files would be C:\WINNT\Symbols\dll\ntdll.dbg and C:\WINNT\Symbols\dll\ntdll.pdb. One of the main purposes of these files is to allow a debugger or disassembler to look up the nearest symbolic name that can be attributed to a given binary address within a module. If a disassembler, for example, finds out that the next assembly language instruction to be displayed is call72A05A2Eh, it would be great to provide the user with the real name that is associated to the function entry point 0x72A05A2E, such as call_pMemAlloc@4.