A General Kernel Call Interface
This article is excerpted from Undocumented Windows Secrets: A Programmer's Cookbook (Addison Wesley, 2001, ISBN: 0201721872).
If a program running in user-mode wants to call a kernel-mode function, it has to solve two problems. First, it must somehow jump across the barrier between user-mode and kernel-mode, and second, it must transfer data in and out. For the subset comprising the Native API, the ntdll.dll component takes over this duty, using an interrupt gate to accomplish the mode change, and CPU registers to pass in a pointer to the caller's argument stack and to return the function's result to the caller. For kernel functions not included in the Native API, the operating system doesn't offer such a gate mechanism. Therefore, we will have to roll our own.
Designing a Gate to Kernel-Mode
Part one of the problem is easily solved—the w2k_spy.sys driver is crossing the user-to-kernel-mode border back and forth many times during its IOCTL transactions. And since IOCTL optionally involves passing data blocks in both directions, part two of the problem is solved in one breath. In the end, the whole matter boils down to the following simple sequence of steps:
The user-mode application posts an IOCTL request, passing in information about the function to be called, as well as a pointer to its argument stack.
The kernel-mode driver dispatches the request, copies the arguments onto its own stack, calls the function, and passes the results back to the caller in the IOCTL output buffer.
The caller picks up the results of the IOCTL operation and proceeds like it would after a normal DLL function call.
The main problem with this scenario is that the kernel-mode module must cope with various data formats and calling conventions. Following is a list of situations the driver must be prepared for:
The size of the argument stack depends on the target function. Because it is impractical to give the driver detailed knowledge about all functions it might possibly have to call, the caller must supply the size of the argument stack.
Windows 2000 kernel API functions use three calling conventions—__stdcall, __cdecl, and __fastcall—differing considerably in the way arguments are treated. __stdcall and __cdecl require all arguments to be passed in on the stack, while __fastcall aims at minimizing stack fumbling overhead by passing the first two arguments in the CPU registers ECX and EDX. On the other hand, __stdcall and __fastcall agree in the way arguments are removed from the stack, forcing the called code to take over the responsibility. __cdecl, however, leaves this task to the calling code. While the stack cleanup problem can be easily solved by saving the stack pointer before the call and resetting it to its original position after returning, regardless of the calling convention, the driver is helpless with respect to the __fastcall convention. Therefore, the caller must specify on every call whether the __fastcall convention is in effect to allow the driver to prepare the registers ECX and EDX, if necessary.
Windows 2000 kernel functions return results in various sizes, ranging from zero to 64 bits. The 64-bit register pair EDX:EAX transports the results back to the caller. Data is filled in from the least-significant end towards the most-significant end. For example, if a function returns a 16-bit SHORT data type, only register AX (comprising AL and AH) is significant. The upper half of EAX and the entire EDX contents are undefined. Because the driver is ignorant of the called function's I/O data, it must assume the worst case, which is 64-bits. Otherwise, the result might be truncated.
The application might supply invalid arguments. In user mode, this is usually benign. At worst, the application process is aborted with an error message box. Only occasionally, this error results in a system damage that can only be recovered from by a reboot. In kernel-mode, the most frequent programming error, called "bad pointer," almost instantly results in a Blue Screen Of Death that might even cause loss of user data. This problem can be addressed to a great extent by using the operating system's Structured Exception Handling (SEH) mechanism.
That said, let's examine how our spy driver handles function properties, arguments, and results. Listing 1 shows the involved IOCTL input and output structures SPY_CALL_INPUT and SPY_CALL_OUTPUT. The latter is quite simple—it just consists of an ULARGE_INTEGER structure that is used by Windows 2000 to represent a 64-bit value both as a single 64-bit integer as well as a pair of 32-bit halves.
Listing 1 Definition of SPY_CALL_INPUT and SPY_CALL_OUTPUT
typedef struct _SPY_CALL_INPUT { BOOL fFastCall; DWORD dArgumentBytes; PVOID pArguments; PBYTE pbSymbol; PVOID pEntryPoint; } SPY_CALL_INPUT, *PSPY_CALL_INPUT, **PPSPY_CALL_INPUT; #define SPY_CALL_INPUT_ sizeof (SPY_CALL_INPUT) // ----------------------------------------------------------------- typedef struct _SPY_CALL_OUTPUT { ULARGE_INTEGER uliResult; } SPY_CALL_OUTPUT, *PSPY_CALL_OUTPUT, **PPSPY_CALL_OUTPUT; #define SPY_CALL_OUTPUT_ sizeof (SPY_CALL_OUTPUT)
SPY_CALL_INPUT needs a bit more explanation. The purpose of the fFastCall member should be obvious. It signals to the spy driver that the function to be called obeys the __fastcall convention, so the first two arguments, if any, must not be passed in on the stack, but in CPU registers. dArgumentBytes specifies the number of bytes piled up on the argument stack, and pArguments points to the top of this stack. The remaining arguments, pbSymbol and pEntryPoint, are mutually exclusive, and tell the driver which function it has to execute. You can specify either a function name or a plain entry point address. The other member should always be set to NULL. If both values are non-NULL, pbSymbol takes precedence over pEntryPoint. Calling a function by name rather than by address adds an additional step where the entry point of the specified symbolic name is determined. If it can be retrieved, the function is entered through this address. Passing in an entry point simply bypasses the symbol resolution step.
Finding the linear address associated to a symbol exported by a kernel-mode module sounds easier than it actually is. The powerful Win32 functions GetModuleHandle() and GetProcAddress(), which work fine with all components within the Win32 subsystem, are completely ignorant of kernel-mode system modules and drivers. Implementing this part of the sample code was a royal pain—however, I finally made it, and I will tell you the details in the next section of this chapter. For now, let's assume that a valid entry point is available, no matter how it has been supplied. Listing 2 shows the function SpyCall() that constitutes the core part of my kernel call interface. As you see, it is almost 100% assembly language. It is always unpleasant to resort to ASM in a C program, but some tasks simply can't be done in pure C. In this case, the problem is that SpyCall() needs total control of the stack and the CPU registers, and therefore must bypass the C compiler and optimizer, which abuse the stack and registers as they see fit.
Before delving into the details of Listing 2, let me tell you about another special feature of the SpyCall() function that obscures the code if you are not aware of it. The Windows 2000 system modules export some of their variables by name. Typical examples are NtBuildNumber and KeServiceDescriptorTable. The Portable Executable (PE) file format of Windows 2000/NT/9x provides a general-purpose mechanism for attaching symbols to addresses, and it doesn't care at all whether an address points to code or data. Therefore, a Windows 2000 module is free to attach exported symbols to its global variables at will. A client module can dynamically link to them like it links to function symbols, and is able to use these variables as if they were located in its own global data section. Of course, my kernel call interface would not be complete if it were not able to cope with this kind of symbols as well, so I decided that negative values of the dArgumentBytes member inside the SPY_CALL_INPUT structure should indicate that data is to be copied from the entry point instead of calling it. Valid values range from –1 to –9, where –1 means that the entry point address itself is copied to the SPY_CALL_OUTPUT buffer. For the remaining values, their one's complement states the number of bytes copied from the entry point, i.e. –2 copies a single BYTE or CHAR, –3 a 16-bit WORD or SHORT, –5 a 32-bit DWORD or LONG, and –9 a 64-bit DWORDLONG or LONGLONG. You might wonder why it should ever be necessary to copy the entry point itself. Well, some kernel symbols like KeServiceDescriptorTable point to structures that exceed the 64-bit return value limit, so it is wiser to return the plain pointer rather than truncating the value to 64 bits.
Listing 2 The Core Function of the Kernel Call Interface
void SpyCall (PSPY_CALL_INPUT psci, PSPY_CALL_OUTPUT psco) { PVOID pStack; __asm { pushfd pushad xor eax, eax mov ebx, psco ; get output parameter block lea edi, [ebx.uliResult] ; get result buffer mov [edi ], eax ; clear result buffer (lo) mov [edi+4], eax ; clear result buffer (hi) mov ebx, psci ; get input parameter block mov ecx, [ebx.dArgumentBytes] cmp ecx, -9 ; call or store/copy? jb SpyCall2 mov esi, [ebx.pEntryPoint] ; get entry point not ecx ; get number of bytes jecxz SpyCall1 ; 0 -> store entry point rep movsb ; copy data from entry point jmp SpyCall5 SpyCall1: mov [edi], esi ; store entry point jmp SpyCall5 SpyCall2: mov esi, [ebx.pArguments] cmp [ebx.fFastCall], eax ; __fastcall convention? jz SpyCall3 cmp ecx, 4 ; 1st argument available? jb SpyCall3 mov eax, [esi] ; eax = 1st argument add esi, 4 ; remove argument from list sub ecx, 4 cmp ecx, 4 ; 2nd argument available? jb SpyCall3 mov edx, [esi] ; edx = 2nd argument add esi, 4 ; remove argument from list sub ecx, 4 SpyCall3: mov pStack, esp ; save stack pointer jecxz SpyCall4 ; no (more) arguments sub esp, ecx ; copy argument stack mov edi, esp shr ecx, 2 rep movsd SpyCall4: mov ecx, eax ; load 1st __fastcall arg call [ebx.pEntryPoint] ; call entry point mov esp, pStack ; restore stack pointer mov ebx, psco ; get output parameter block mov [ebx.uliResult.LowPart ], eax ; store result (lo) mov [ebx.uliResult.HighPart], edx ; store result (hi) SpyCall5: popad popfd } return; }
With the special case of accessing exported variables kept in mind, Listing 2 shouldn't be too difficult to understand. First, the 64-bit result buffer is cleared, guaranteeing that unused bits are always zero. Next, the dArgumentBytes member of the input data is compared to –9 to find out whether the client requested a function call or a data copying operation. The function call handler starts at the label SpyCall2. After setting register ESI to the top of the argument stack by evaluating the pArguments member, it is time to check the calling convention. If __fastcall is required and there is at least one 32-bit value on the stack, SpyCall() removes it and stores it temporarily in EAX. If another 32-bit value is available, it is removed as well and stored in EDX. Any remaining arguments remain on the stack. Meanwhile, the label SpyCall3 is reached. Now the current top-of-stack address is saved to the local variable pStack, and the argument stack (minus the arguments removed in the __fastcall case) is copied to the spy driver's stack using the fast i386 REP MOVSD instruction. Note that the direction flag that determines whether MOVSD proceeds upwards or downwards in memory can be assumed to be clear by default, i.e. ESI and EDI are incremented after each copying step. The only thing that is left to do before executing the CALL instruction is to copy the first fastcall argument from its preliminary location EAX to its final destination ECX. SpyCall() blindly copies EAX to ECX because this operation doesn't create any havoc if the calling convention is __stdcall or __cdecl. The MOVE CX, EAX instruction is so fast that executing it in vain is much more efficient than jumping around it after testing the value of the fFastCall member. Hey, don't you agree that ASM programming is much fun?
After the call to the function's entry point returns, SpyCall() resets the stack pointer to the location saved off to the variable pStack. This takes care of the different stack cleanup policy of __stdcall and __fastcall versus __cdecl. A __cdecl function returns to the caller with the ESP register pointing to the top of the argument stack, while a __stdcall or __fastcall function resets it to its original address before the call. Forcing ESP to a previously backed-up address always cleans up the stack properly, no matter which calling convention is used. The last few ASM lines of SpyCall() finally store the function result returned in EDX:EAX to the caller's SPY_CALL_OUTPUT structure. No attempt is made to find out the correct result size. This is unnecessary because the caller knows exactly how many valid result bits it can expect. Copying too many bits doesn't hurt in any way—they are simply ignored by the caller.
One thing that should be noted about the code in Listing 2 is that it contains absolutely no provisions for invalid arguments. It does not even check the validity of the stack pointer itself. In kernel-mode, this is equivalent to playing with fire. However, how should the poor spy driver exhaustively verify all arguments? A 32-bit value on the stack could be a counter value, a bit-field array, or maybe a pointer. Only the caller and the called target function know the argument semantics. The SpyCall() function is a simple pass-through layer that has no knowledge about the type of data it forwards. Adding context-sensitive argument checking to this function would amount to rewriting large parts of the operating system. Fortunately, Windows 2000 offers an easy way out of this dilemma: Structured Exception Handling (SEH).
Listing 3 Adding Structured Exception Handling to the Kernel Call Interface
NTSTATUS SpyCallEx (PSPY_CALL_INPUT psci, PSPY_CALL_OUTPUT psco) { NTSTATUS ns = STATUS_SUCCESS; __try { SpyCall (psci, psco); } __except (EXCEPTION_EXECUTE_HANDLER) { ns = STATUS_ACCESS_VIOLATION; } return ns; }
While SEH catches the most common parameter errors, you should not expect that it is a remedy against any garbage a client application might possibly hand over to a kernel API function. Some bad function arguments silently wreck the system without causing an exception. For example, a function that copies a string can easily overwrite vital parts of system memory if the destination buffer pointer is set to the wrong address. This kind of bug might remain undetected for a long time, until the system suddenly and unexpectedly breaks down when it eventually rushes into the modified memory area. While I was testing the spy driver, I occasionally managed to get the test application hung in its IOCTL call to the spy device. The application didn't respond anymore and even refused to be removed from memory. Even worse, the system became unable to shut down. This is almost as annoying as a Blue Screen!