SQL Server and OPENXML
- The OPENXML Function in Transact-SQL
- Joining OPENXML Output with Other Tables
- Advantages of Using OPENXML
- Performance Issues With OPENXML
- Summary
The OPENXML Function in Transact-SQL
A couple of my other articles discussed XML support in Transact-SQL (T-SQL). I showed you how to use the FOR XML extensions of the SELECT statement. This article discusses the OPENXML function, which lets you parse an XML string with T-SQL.
Overview of OPENXML
XML is one of the easiest ways to exchange data between applications. When XML first became available, many developers attempted to write their own XML parsers in the language of their choice. Indeed, because XML consists of tags, writing such a parser in T-SQL isn't very difficult. However, going though thousands of lines of XML can quickly degrade performance. That's why it is nice to have the OPENXML function, which does the parsing work for you fairly efficiently. We'll look at some of the performance limitations of OPENXML a bit later; for now, let's examine how this function is used.
The syntax of OPENXML can look somewhat convoluted at first:
OPENXML(idoc int [in],rowpattern nvarchar[in],[flags byte[in]]) [WITH (SchemaDeclaration | TableName)]
However, after you have a look at a couple of examples, it should become fairly easy to grasp. The OPENXML function must be used with two system stored procedures: sp_xml_preparedocument and sp_xml_removedocument. As the names of these procedures suggest, the former prepares an internal representation of the XML document in memory, and the latter removes such representation to free up resources. In fact, you can get away without using sp_xml_removedocument (whether this is a feature or bug, I'm not sure) because SQL Server will automatically destroy the internal structure once the session that created it disconnects.
sp_xml_preparedocument has two parameters: the XML document, which is accepted as an input parameter, and an output parameter with the integer type. Once the document is prepared with sp_xml_preparedocument, OPENXML can translate it into a row set. Consider the following example:
USE PUBS DECLARE @xml_text VARCHAR(4000), @i INT SELECT @xml_text = ' <root> <authors au_id="172-32-1176" au_lname="White" au_fname="Johnson" phone="408 496-7223" address="10932 Bigge Rd." city="Menlo Park" state="CA" zip="94025" contract="1"/> <authors au_id="213-46-8915" au_lname="Green" au_fname="Marjorie" phone="415 986-7020" address="309 63rd St. #411" city="Oakland" state="CA" zip="94618" contract="1"/> <authors au_id="238-95-7766" au_lname="Carson" au_fname="Cheryl" phone="415 548-7723" address="589 Darwin Ln." city="Berkeley" state="CA" zip="94705" contract="1"/></root>' EXEC sp_xml_preparedocument @i OUTPUT, @xml_text SELECT * FROM OPENXML(@i, '/root/authors') WITH authors EXEC sp_xml_removedocument @i
Results:
au_id |
au_lname |
au_fname |
phone |
Address |
city |
state |
zip |
contract |
172-32-1176 |
White |
Johnson |
408 496-7223 |
10932 Bigge Rd. |
Menlo Park |
CA |
94025 |
1 |
213-46-8915 |
Green |
Marjorie |
415 986-7020 |
309 63rd St. #411 |
Oakland |
CA |
94618 |
1 |
238-95-7766 |
Carson |
Cheryl |
415 548-7723 |
589 Darwin Ln. |
Berkeley |
CA |
94705 |
1 |
In this case, I "cheated." I knew that the structure of the XML document passed to the OPENXML function was identical to the authors structure. That's why I could specify the WITH authors clause at the end of OPENXML. Alternatively, I could specify the structure of the parsed document for OPENXML as follows:
SELECT * FROM OPENXML(@i, '/root/authors') WITH ( au_id VARCHAR(11), au_lname VARCHAR(20), au_fname VARCHAR(30), phone VARCHAR(12), address VARCHAR(50), city VARCHAR(20), state CHAR(2), zip CHAR(5), contract BIT)
The output would be the exactly the same as in the first example. The advantage of specifying the structure in this case is that it makes code more readable. However, if the parsed XML doesn't match any of the user table's structure, then you have no choice but to provide the document structure for OPENXML.
The flags option in OPENXML syntax simply lets you change from attribute-centric to element-centric mapping. In plain English, this means that the structure you supply for OPENXML will be applied to elements rather than the attributes of the parsed XML document. The default value of this parameter is 1, which stands for attribute-centric. If my XML document was element-centric, I could use almost an identical version of the same query with the flags parameter equal to 2, as in the following:
USE pubs DECLARE @xml_text VARCHAR(4000), @i INT SELECT @xml_text = ' <root><authors> <au_id>172-32-1176</au_id> <au_lname>White</au_lname> <au_fname>Johnson</au_fname> <phone>408 496-7223</phone> <address>10932 Bigge Rd.</address> <city>Menlo Park</city> <state>CA</state> <zip>94025</zip> <contract>1</contract> </authors> <authors> <au_id>213-46-8915</au_id> <au_lname>Green</au_lname> <au_fname>Marjorie</au_fname> <phone>415 986-7020</phone> <address>309 63rd St. #411</address> <city>Oakland</city> <state>CA</state> <zip>94618</zip> <contract>1</contract> </authors> <authors> <au_id>238-95-7766</au_id> <au_lname>Carson</au_lname> <au_fname>Cheryl</au_fname> <phone>415 548-7723</phone> <address>589 Darwin Ln.</address> <city>Berkeley</city> <state>CA</state> <zip>94705</zip> <contract>1</contract> </authors></root>' EXEC sp_xml_preparedocument @i OUTPUT, @xml_text SELECT * FROM OPENXML(@i, '/root/authors', 2) WITH authors EXEC sp_xml_removedocument @i
The results are again the same as in the first example.
Notice also that I specified the row pattern of '/root/authors' in the above example. This was fairly simple because the parsed XML only contained a single level inside the root tag. By the way, although many developers like to use '/root' with OPENXML, there is no magic in using root. You can use the outer tag of your preference.
The row pattern parameter of OPENXML lets you traverse the XML hierarchy. The following example has an additional hierarchy in the XML document:
DECLARE @xml_text VARCHAR(4000), @i INT SELECT @xml_text = ' <root> <authors au_id="172-32-1176" au_lname="White" au_fname="Johnson" phone="408 496-7223" address="10932 Bigge Rd." city="Menlo Park" state="CA" zip="94025" contract="1"> <titles au_id="172-32-1176" title_id="PS3333" au_ord="1" royaltyper="100"/> </authors> <authors au_id="213-46-8915" au_lname="Green" au_fname="Marjorie" phone="415 986-7020" address="309 63rd St. #411" city="Oakland" state="CA" zip="94618" contract="1"> <titles au_id="213-46-8915" title_id="BU1032" au_ord="2" royaltyper="40"/> <titles au_id="213-46-8915" title_id="BU2075" au_ord="1" royaltyper="100"/> </authors> </root>' EXEC sp_xml_preparedocument @i OUTPUT, @xml_text SELECT au_id AS author_id, au_lname AS last_name, au_fname AS first_name, title_id, royaltyper AS royalty FROM OPENXML(@i, '/root/authors/titles', 1) WITH ( au_id VARCHAR(11), au_lname VARCHAR(20) '../@au_lname', au_fname VARCHAR(30) '../@au_fname', title_id VARCHAR(15), royaltyper INT) EXEC sp_xml_removedocument @i
Results:
author_id |
last_name |
first_name |
title_id |
royalty |
172-32-1176 |
White |
Johnson |
PS3333 |
100 |
213-46-8915 |
Green |
Marjorie |
BU1032 |
40 |
213-46-8915 |
Green |
Marjorie |
BU2075 |
100 |