SQL Server uses Microsoft's XML parser, MSXML, to load XML data, so we'll begin our discussion there. There are two basic ways to parse XML data using MSXML: using the Document Object Model (DOM) or using the Simple API for XML (SAX). Both DOM and SAX are W3C standards. The DOM method involves parsing the XML document and loading it into a tree structure in memory. The entire document is materialized and stored in memory when processed this way. An XML document parsed via DOM is known as a DOM document (or just “DOM” for short). XML parsers provide a variety of ways to manipulate DOM documents. Listing 18.1 shows a short Visual Basic app that demonstrates parsing an XML document via DOM and querying it for a particular node set. (You can find the source code to this app in the CH18\msxmltest subfolder on the CD accompanying this book.)
Listing 18.1
Private Sub Command1_Click() Dim bstrDoc As String bstrDoc = "<Songs> " & _ "<Song>One More Day</Song>" & _ "<Song>Hard Habit to Break</Song>" & _ "<Song>Forever</Song>" & _ "<Song>Boys of Summer</Song>" & _ "<Song>Cherish</Song>" & _ "<Song>Dance</Song>" & _ "<Song>I Will Always Love You</Song>" & _ "</Songs>" Dim xmlDoc As New DOMDocument30 If Len(Text1.Text) = 0 Then Text1.Text = bstrDoc End If If Not xmlDoc.loadXML(Text1.Text) Then MsgBox "Error loading document" Else Dim oNodes As IXMLDOMNodeList Dim oNode As IXMLDOMNode If Len(Text2.Text) = 0 Then Text2.Text = "//Song" End If Set oNodes = xmlDoc.selectNodes(Text2.Text) For Each oNode In oNodes If Not (oNode Is Nothing) Then sName = oNode.nodeName sData = oNode.xml MsgBox "Node <" + sName + ">:" _ + vbNewLine + vbTab + sData + vbNewLine End If Next Set xmlDoc = Nothing End If End Sub
We begin by instantiating a DOMDocument object, then call its loadXML method to parse the XML document and load it into the DOM tree. We call its selectNodes method to query it via XPath. The selectNodes method returns a node list object, which we then iterate through using For Each. In this case, we display each node name followed by its contents via VB's MsgBox function. We're able to access and manipulate the document as though it were an object because that's exactly what it is—parsing an XML document via DOM turns the document into a memory object that you can then work with just as you would any other object.
SAX, by contrast, is an event-driven API. You process an XML document via SAX by configuring your application to respond to SAX events. As the SAX processor reads through an XML document, it raises events each time it encounters something the calling application should know about, such as an element starting or ending, an attribute starting or ending, and so on. It passes the relevant data about the event to the application's handler for the event. The application can then decide what to do in response—it could store the event data in some type of tree structure, as is the case with DOM processing; it could ignore the event; it could search the event data for something in particular; or it could take some other action. Once the event is handled, the SAX processor continues reading the document. At no point does it persist the document in memory as DOM does. It's really just a parsing mechanism to which an application can attach its own functionality. In fact, SAX is the underlying parsing mechanism for MSXML's DOM processor. Microsoft's DOM implementation sets up SAX event handlers that simply store the data handed to them by the SAX engine in a DOM tree.
As you've probably surmised by now, SAX consumes far less memory than DOM does. That said, it's also much more trouble to set up and use. By persisting documents in memory, the DOM API makes working with XML documents as easy as working with any other kind of object.
SQL Server uses MSXML and the DOM to process documents you load via sp_xml_preparedocument. It restricts the virtual memory MSXML can use for DOM processing to one-eighth of the physical memory on the machine or 500MB, whichever is less. In actual practice, it's highly unlikely that MSXML would be able to access 500MB of virtual memory, even on a machine with 4GB of physical memory. The reason for this is that, by default, SQL Server reserves most of the user mode address space for use by its buffer pool. You'll recall that we talked about the MemToLeave space in Chapter 11 and noted that the non–thread stack portion defaults to 256MB on SQL Server 2000. This means that, by default, MSXML won't be able to use more than 256MB of memory—and probably considerably less given that other things are also allocated from this region—regardless of the amount of physical memory on the machine.
The reason MSXML is limited to no more than 500MB of virtual memory use regardless of the amount of memory on the machine is that SQL Server calls the GlobalMemoryStatus Win32 API function to determine the amount of available physical memory. GlobalMemoryStatus populates a MEMORYSTATUS structure with information about the status of memory use on the machine. On machines with more than 4GB of physical memory, GlobalMemoryStatus can return incorrect information, so Windows returns a -1 to indicate an overflow. The Win32 API function GlobalMemoryStatusEx exists to address this shortcoming, but SQLXML does not call it. You can see this for yourself by working through the following exercise.
Exercise 18.1 Determining How MSXML Computes Its Memory Ceiling
-
Restart your SQL Server, preferably from a console since we will be attaching to it with WinDbg. This should be a test or development system, and, ideally, you should be its only user.
-
Start Query Analyzer and connect to your SQL Server.
-
Attach to SQL Server using WinDbg. (Press F6 and select sqlservr.exe from the list of running tasks; if you have multiple instances, be sure to select the right one.)
-
At the WinDbg command prompt, add the following breakpoint:
bp kernel32!GlobalMemoryStatus
-
Once the breakpoint is added, type g and hit Enter to allow SQL Server to run.
-
Next, return to Query Analyzer and run the following query:
declare @doc varchar(8000) set @doc=' <Songs> <Song name="She''s Like the Wind" artist="Patrick Swayze"/> <Song name="Hard to Say I''m Sorry" artist="Chicago"/> <Song name="She Loves Me" artist="Chicago"/> <Song name="I Can''t Make You Love Me" artist="Bonnie Raitt"/> <Song name="Heart of the Matter" artist="Don Henley"/> <Song name="Almost Like a Song" artist="Ronnie Milsap"/> <Song name="I''ll Be Over You" artist="Toto"/> </Songs> ' declare @hDoc int exec sp_xml_preparedocument @hDoc OUT, @doc
-
The first time you parse an XML document using sp_xml_preparedocument, SQLXML calls GlobalMemoryStatus to retrieve the amount of physical memory in the machine, then calls an undocumented function exported by MSXML to restrict the amount of virtual memory it may allocate. (I had you restart your server so that we'd be sure to go down this code path.) This undocumented MSXML function is exported by ordinal rather than by name from the MSXMLn.DLL and was added to MSXML expressly for use by SQL Server.
-
At this point, Query Analyzer should appear to be hung because your breakpoint has been hit in WinDbg and SQL Server has been stopped. Switch back to WinDbg and type kv at the command prompt to dump the call stack of the current thread. Your stack should look something like this (I've omitted everything but the function names):
KERNEL32!GlobalMemoryStatus (FPO: [Non-Fpo]) sqlservr!CXMLLoadLibrary::DoLoad+0x1b5 sqlservr!CXMLDocsList::Load+0x58 sqlservr!CXMLDocsList::LoadXMLDocument+0x1b sqlservr!SpXmlPrepareDocument+0x423 sqlservr!CSpecProc::ExecuteSpecial+0x334 sqlservr!CXProc::Execute+0xa3 sqlservr!CSQLSource::Execute+0x3c0 sqlservr!CStmtExec::XretLocalExec+0x14d sqlservr!CStmtExec::XretExecute+0x31a sqlservr!CMsqlExecContext::ExecuteStmts+0x3b9 sqlservr!CMsqlExecContext::Execute+0x1b6 sqlservr!CSQLSource::Execute+0x357 sqlservr!language_exec+0x3e1
-
You'll recall from Chapter 3 that we discovered that the entry point for T-SQL batch execution within SQL Server is language_exec. You can see the call to language_exec at the bottom of this stack—this was called when you submitted the T-SQL batch to the server to run. Working upward from the bottom, we can see the call to SpXmlPrepareDocument, the internal “spec proc” (an extended procedure implemented internally by the server rather than in an external DLL) responsible for implementing the sp_xml_preparedocument xproc. We can see from there that SpXmlPrepareDocument calls LoadXMLDocument, LoadXMLDocument calls a method named Load, Load calls a method named DoLoad, and DoLoad calls GlobalMemoryStatus. So, that's how we know how MSXML computes the amount of physical memory in the machine, and, knowing the limitations of this function, that's how we know the maximum amount of virtual memory MSXML can use.
-
Type q and hit Enter to quit WinDbg. You will have to restart your SQL Server.