As we saw in the earlier discussions of updategrams and OPENXML, inserting XML data into a SQL Server database is relatively easy. However, both of these methods of loading data have one serious drawback: They're not suitable for loading large amounts of data. In the same way that using the Transact-SQL INSERT statement is suboptimal for loading large numbers of rows, using updategrams and OPENXML to load large volumes of XML data into SQL Server is slow and resource intensive.
SQLXML provides a facility intended specifically to address this problem. Called the XML Bulk Load component, it is a COM component you can call from OLE Automation–capable languages and tools such as Visual Basic, Delphi, and even Transact-SQL. It presents an object-oriented interface to loading XML data in bulk in a manner similar to the Transact-SQL BULK INSERT command.
Architecturally, XML Bulk Load is an in-process COM component named SQLXMLBulkLoad that resides in a DLL named XBLKLDn.DLL. When it bulk loads data to SQL Server, it does so via the bulk load interface of SQL Server's SQLOLEDB native OLE DB provider. If you have a Profiler trace running while the bulk load is occurring, you'll see an INSERT BULK language event show up in the trace. INSERT BULK is indicative of a special TDS packet type designed especially for bulk loading data. It's neither a true language event nor an RPC event; instead, it is a distinct type of data packet that bulk load facilities send to the server when they want to initiate a bulk copy operation.
Using the Component
The first step in using the XML Bulk Load component is to define a mapping schema that maps the XML data you're importing to tables and columns in your database. When the component loads your XML data, it will read it as a stream and use the mapping schema to decide where the data goes in the database.
The mapping schema determines the scope of each row added by the Bulk Load component. As the closing tag for each row is read, its corresponding data is written to the database.
You access the Bulk Load component itself via the SQLXMLBulkLoad interface on the SQLXMLBulkLoad COM object. The first step in using it is to connect to the database using an OLE DB connection string or by setting its ConnectionCommand property to an existing ADO Command object. The second step is to call its Execute method. The VBScript code in Listing 18.70 illustrates.
Listing 18.70
Set objBulkLoad = CreateObject("SQLXMLBulkLoad.SQLXMLBulkLoad") objBulkLoad.ConnectionString = _ "provider=SQLOLEDB;data source=KUFNATHE;database=Northwind;" & _ "Integrated Security=SSPI;" objBulkLoad.Execute "d:\xml\OrdersSchema.xml", "d:\xml\OrdersData.xml" Set objBulkLoad = Nothing
You can also specify an XML stream (rather than a file) to load, making cross-DBMS data transfers (from platforms that feature XML support) fairly easy.
XML Fragments
Setting the XMLFragment property to True allows the Bulk Load component to load data from an XML fragment (an XML document with no root element, similar to the type returned by Transact-SQL's FOR XML extension). Listing 18.71 shows an example.
Listing 18.71
Set objBulkLoad = CreateObject("SQLXMLBulkLoad.SQLXMLBulkLoad") objBulkLoad.ConnectionString = _ "provider=SQLOLEDB;data source=KUFNATHE;database=Northwind;" & _ "Integrated Security=SSPI;" objBulkLoad.XMLFragment = True objBulkLoad.Execute "d:\xml\OrdersSchema.xml", "d:\xml\OrdersData.xml" Set objBulkLoad = Nothing
Enforcing Constraints
By default, the XML Bulk Load component does not enforce check and referential integrity constraints. Enforcing constraints as data is loaded slows down the process significantly, so the component doesn't enforce them unless you tell it to. For example, you might want to do that when you're loading data directly into production tables and you want to ensure that the integrity of your data is not compromised. To cause the component to enforce your constraints as it loads data, set the CheckConstraints property to True, as shown in Listing 18.72.
Listing 18.72
Set objBulkLoad = CreateObject("SQLXMLBulkLoad.SQLXMLBulkLoad") objBulkLoad.ConnectionString = _ "provider=SQLOLEDB;data source=KUFNATHE;database=Northwind;" & _ "Integrated Security=SSPI;" objBulkLoad.CheckConstraints = True objBulkLoad.Execute "d:\xml\OrdersSchema.xml", "d:\xml\OrdersData.xml" Set objBulkLoad = Nothing
Duplicate Keys
Normally you'd want to stop a bulk load process when you encounter a duplicate key. Usually this means you've got unexpected data values or data corruption of some type and you need to look at the source data before proceeding. There are, however, exceptions. Say, for example, that you get a daily data feed from an external source that contains the entirety of a table. Each day, a few new rows show up, but, for the most part, the data in the XML document already exists in your table. Your interest is in loading the new rows, but the external source that provides you the data may not know which rows you have and which ones you don't. They may provide data to lots of companies—what your particular database contains may be unknown to them.
In this situation, you can set the IgnoreDuplicateKeys property before the load, and the component will ignore the duplicate key values it encounters. The bulk load won't halt when it encounters a duplicate key—it will simply ignore the row containing the duplicate key, and the rows with nonduplicate keys will be loaded as you'd expect. Listing 18.73 shows an example.
Listing 18.73
Set objBulkLoad = CreateObject("SQLXMLBulkLoad.SQLXMLBulkLoad") objBulkLoad.ConnectionString = _ "provider=SQLOLEDB;data source=KUFNATHE;database=Northwind;" & _ "Integrated Security=SSPI;" objBulkLoad.IgnoreDuplicateKeys = True objBulkLoad.Execute "d:\xml\OrdersSchema.xml", "d:\xml\OrdersData.xml" Set objBulkLoad = Nothing
When IgnoreDuplicateKeys is set to True, inserts that would cause a duplicate key will still fail, but the bulk load process will not halt. The remainder of the rows will be processed as though no error occurred.
IDENTITY Columns
SQLXMLBulkLoad's KeepIdentity property is True by default. This means that values for identity columns in your XML data will be loaded into the database rather than being generated on-the-fly by SQL Server. Normally, this is what you'd want, but you can set KeepIdentity to False if you'd rather have SQL Server generate these values.
There are a couple of caveats regarding the KeepIdentity property. First, when KeepIdentity is set to True, SQL Server uses SET IDENTITY_ INSERT to enable identity value insertion into the target table. SET IDENTITY_ INSERT has specific permissions requirements—execute permission defaults to the sysadmin role, the db_owner and db_ddladmin fixed database roles, and the table owner. This means that a user who does not own the target table and who also is not a sysadmin, db_owner, or DDL administrator will likely have trouble loading data with the XML Bulk Load component. Merely having bulkadmin rights is not enough.
Another caveat is that you would normally want to preserve identity values when bulk loading data into a table with dependent tables. Allowing these values to be regenerated by the server could be disastrous—you could break parent-child relationships between tables with no hope of reconstructing them. If a parent table's primary key is its identity column and KeepIdentity is set to False when you load it, you may not be able to resynchronize it with the data you load for its child table. Fortunately, KeepIdentity is enabled by default, so normally this isn't a concern, but be sure you know what you're doing if you choose to set it to False.
Listing 18.74 illustrates setting the KeepIdentity property.
Listing 18.74
Set objBulkLoad = CreateObject("SQLXMLBulkLoad.SQLXMLBulkLoad") objBulkLoad.ConnectionString = _ "provider=SQLOLEDB;data source=KUFNATHE;database=Northwind;" & _ "Integrated Security=SSPI;" objBulkLoad.KeepIdentity = False objBulkLoad.Execute "d:\xml\OrdersSchema.xml", "d:\xml\OrdersData.xml" Set objBulkLoad = Nothing
Another thing to keep in mind is that KeepIdentity is a very binary option—either it's on or it's not. The value you give it affects every object into which XML Bulk Load inserts rows within a given bulk load. You can't retain identity values for some tables and allow SQL Server to generate them for others.
NULL Values
For a column not mapped in the schema, the column's default value is inserted. If the column doesn't have a default, NULL is inserted. If the column doesn't allow NULLs, the bulk load halts with an error message.
The KeepNulls property allows you to tell the bulk load facility to insert a NULL value rather than a column's default when the column is not mapped in the schema. Listing 18.75 demonstrates.
Listing 18.75
Set objBulkLoad = CreateObject("SQLXMLBulkLoad.SQLXMLBulkLoad") objBulkLoad.ConnectionString = _ "provider=SQLOLEDB;data source=KUFNATHE;database=Northwind;" & _ "Integrated Security=SSPI;" objBulkLoad.KeepNulls = True objBulkLoad.Execute "d:\xml\OrdersSchema.xml", "d:\xml\OrdersData.xml" Set objBulkLoad = Nothing
Table Locks
As with SQL Server's other bulk load facilities, you can configure SQLXMLBulkLoad to lock the target table before it begins loading data into it. This is more efficient and faster than using more granular locks but has the disadvantage of preventing other users from accessing the table while the bulk load runs. To force a table lock during an XML bulk load, set the ForceTableLock property to True, as shown in Listing 18.76.
Listing 18.76
Set objBulkLoad = CreateObject("SQLXMLBulkLoad.SQLXMLBulkLoad") objBulkLoad.ConnectionString = _ "provider=SQLOLEDB;data source=KUFNATHE;database=Northwind;" & _ "Integrated Security=SSPI;" objBulkLoad.ForceTableLock = True objBulkLoad.Execute "d:\xml\OrdersSchema.xml", "d:\xml\OrdersData.xml" Set objBulkLoad = Nothing
Transactions
By default, XML bulk load operations are not transactional—that is, if an error occurs during the load process, the rows loaded up to that point will remain in the database. This is the fastest way to do things, but it has the disadvantage of possibly leaving a table in a partially loaded state. To force a bulk load operation to be handled as a single transaction, set SQLXMLBulkLoad's Transaction property to True before calling Execute.
When Transaction is True, all inserts are cached in a temporary file before being loaded onto SQL Server. You can control where this file is written by setting the TempFilePath property. TempFilePath has no meaning unless Transaction is True. If TempFilePath is not otherwise set, it defaults to the folder specified by the TEMP environmental variable on the server.
I should point out that bulk loading data within a transaction is much slower than loading it outside of one. That's why the component doesn't load data within a transaction by default. Also note that you can't bulk load binary XML data from within a transaction.
Listing 18.77 illustrates a transactional bulk load.
Listing 18.77
Set objBulkLoad = CreateObject("SQLXMLBulkLoad.SQLXMLBulkLoad") objBulkLoad.ConnectionString = _ "provider=SQLOLEDB;data source=KUFNATHE;database=Northwind;" & _ "Integrated Security=SSPI;" objBulkLoad.Transaction = True objBulkLoad.TempFilePath = "c:\temp\xmlswap" objBulkLoad.Execute "d:\xml\OrdersSchema.xml", "d:\xml\OrdersData.xml" Set objBulkLoad = Nothing
In this example, SQLXMLBulkLoad establishes its own connection to the server over OLE DB, so it operates within its own transaction context. If an error occurs during the bulk load, the component rolls back its own transaction.
When SQLXMLBulkLoad uses an existing OLE DB connection via its ConnectionCommand property, the transaction context belongs to that connection and is controlled by the client application. When the bulk load completes, the client application must explicitly commit or roll back the transaction. Listing 18.78 shows an example.
Listing 18.78
On Error Resume Next Err.Clear Set objCmd = CreateObject("ADODB.Command") objCmd.ActiveConnection= _ "provider=SQLOLEDB;data source=KUFNATHE;database=Northwind;" & _ "Integrated Security=SSPI;" Set objBulkLoad = CreateObject("SQLXMLBulkLoad.SQLXMLBulkLoad") objBulkLoad.Transaction = True objBulkLoad.ConnectionCommand = objCmd objBulkLoad.Execute "d:\xml\OrdersSchema.xml", "d:\xml\OrdersData.xml" If Err.Number = 0 Then objCmd.ActiveConnection.CommitTrans Else objCmd.ActiveConnection.RollbackTrans End If Set objBulkLoad = Nothing Set objCmd = Nothing
Note that when using the ConnectionCommand property, Transaction is required—it must be set to True.
Errors
The XML Bulk Copy component supports logging error messages to a file via its ErrorLogFile property. This file is an XML document itself that lists any errors that occurred during the bulk load. Listing 18.79 demonstrates how to use this property.
Listing 18.79
Set objBulkLoad = CreateObject("SQLXMLBulkLoad.SQLXMLBulkLoad") objBulkLoad.ConnectionString = _ "provider=SQLOLEDB;data source=KUFNATHE;database=Northwind;" & _ "Integrated Security=SSPI;" objBulkLoad.ErrorLogFile = "c:\temp\xmlswap\errors.xml" objBulkLoad.Execute "d:\xml\OrdersSchema.xml", "d:\xml\OrdersData.xml" Set objBulkLoad = Nothing
The file you specify will contain a Record element for each error that occurred during the last bulk load. The most recent error message will be listed first.
Generating Database Schemas
In addition to loading data into existing tables, the XML Bulk Copy component can also create target tables for you if they do not already exist, or drop and recreate them if they do exist. To create nonexistent tables, set the component's SchemaGen property to True, as shown in Listing 18.80.
Listing 18.80
Set objBulkLoad = CreateObject("SQLXMLBulkLoad.SQLXMLBulkLoad") objBulkLoad.ConnectionString = _ "provider=SQLOLEDB;data source=KUFNATHE;database=Northwind;" & _ "Integrated Security=SSPI;" objBulkLoad.SchemaGen = True objBulkLoad.Execute "d:\xml\OrdersSchema.xml", "d:\xml\OrdersData.xml" Set objBulkLoad = Nothing
Since SchemaGen is set to True, any tables in the schema that don't already exist will be created when the bulk load starts. For tables that already exist, data is simply loaded into them as it would normally be.
If you set the BulkLoad property of the component to False, no data is loaded. So, if SchemaGen is set to True but BulkLoad is False, you'll get empty tables for those in the mapping schema that did not already exist in the database, but you'll get no data. Listing 18.81 presents an example.
Listing 18.81
Set objBulkLoad = CreateObject("SQLXMLBulkLoad.SQLXMLBulkLoad") objBulkLoad.ConnectionString = _ "provider=SQLOLEDB;data source=KUFNATHE;database=Northwind;" & _ "Integrated Security=SSPI;" objBulkLoad.SchemaGen = True objBulkLoad.BulkLoad = False objBulkLoad.Execute "d:\xml\OrdersSchema.xml", "d:\xml\OrdersData.xml" Set objBulkLoad = Nothing
When XML Bulk Load creates tables, it uses the information in the mapping schema to define the columns in each table. The sql:datatype annotation defines column data types, and the dt:type attribute further defines column type information. To define a primary key within the mapping schema, set a column's dt:type attribute to id and set the SGUseID property of the XML Bulk Load component to True. The mapping schema in Listing 18.82 illustrates.
Listing 18.82
<ElementType name="Orders" sql:relation="Orders"> <AttributeType name="OrderID" sql:datatype="int" dt:type="id"/> <AttributeType name="ShipCity" sql:datatype="nvarchar(30)"/> <attribute type="OrderID" sql:field="OrderID"/> <attribute type="ShipCity" sql:field="ShipCity"/> </ElementType>
Listing 18.83 shows some VBScript code that sets the SGUseID property so that a primary key will automatically be defined for the table that's created on the server.
Listing 18.83
Set objBulkLoad = CreateObject("SQLXMLBulkLoad.SQLXMLBulkLoad") objBulkLoad.ConnectionString = _ "provider=SQLOLEDB;data source=KUFNATHE;database=Northwind;" & _ "Integrated Security=SSPI;" objBulkLoad.SchemaGen = True objBulkLoad.SGUseID = True objBulkLoad.Execute "d:\xml\OrdersSchema.xml", "d:\xml\OrdersData.xml" Set objBulkLoad = Nothing
Here's the Transact-SQL that results when the bulk load executes:
CREATE TABLE Orders ( OrderID int NOT NULL, ShipCity nvarchar(30) NULL, PRIMARY KEY CLUSTERED (OrderID) )
In addition to being able to create new tables from those in the mapping schema, SQLXMLBulkLoad can also drop and recreate tables. Set the SGDropTables property to True to cause the component to drop and recreate the tables mapped in the schema, as shown in Listing 18.84.
Listing 18.84
Set objBulkLoad = CreateObject("SQLXMLBulkLoad.SQLXMLBulkLoad") objBulkLoad.ConnectionString = _ "provider=SQLOLEDB;data source=KUFNATHE;database=Northwind;" & _ "Integrated Security=SSPI;" objBulkLoad.SchemaGen = True objBulkLoad.SGDropTables = True objBulkLoad.Execute "d:\xml\OrdersSchema.xml", "d:\xml\OrdersData.xml" Set objBulkLoad = Nothing