SQL Server 2000 and the XDR Schema
XML schemas describe XML document structure and are also capable of placing constraints on the data in the XML document. With relational database management systems, it is common practice to create a view of a database and then query that view utilizing SQL.
Schemas, Microsoft's version of XML schemas, create XML views of relational data which can then be queried via XPath expressions. This article looks at XDR Schemas and how they work with SQL Server 2000.
SQL Server 2000 and the XDR Schema
Microsoft's XDR schemas differ from the schemas documented in the W3C specification. This came about from Microsoft taking the initial specification, immediately adopting it, modifying it accordingly, and not paralleling the W3C specification.
SQL Server 2000 utilizes the XDR language for its schemas. This language is similar, but only similar, to the W3C specification for a schema language, which is located at http://www.w3.org/TR/2000/CR-xmlschema-0-20001024/. Although the two methods accomplish basically the same thing, the languages are very, very different.
The XDR language is a subset of ideas described in the XML-Data specification. Microsoft's XML Parser (MSXML) implementation utilizes the XDR language specification, which is based on the XML-Data Note posted by the W3C in January 1998. It is still available at http://www.w3.org/TR/1998/NOTE-XML-data-0105/. The parser implementation is also based on the Document Content Description (DCD) for XML, which is available at http://www.w3.org/TR/NOTE-dcd. XML Schemas in Microsoft Internet Explorer 5.0 and later provide support for the subset of XML-Data that coincides directly with the functionality expressed in this DCD, although in a slightly different XML grammar.
Elements and Attributes
Just as in the W3C schema specification, specifying an <ElementType ...> and <AttributeType ...> defines the elements and attributes contained in an XDR schema, respectively. These provide the definition and type of the elements and attributes. Then an instance of an element or an attribute is declared using <element ...> or <attribute ...> tags.
Consider the XDR schema shown in Listing 1.
Listing 1: Example XDR Schema
<?xml version="1.0"?> <Schema xmlns="schemas-microsoft-com:xml-data"> <ElementType name="title" /> <ElementType name="author" /> <ElementType name="pages" /> <ElementType name="book" model="closed"> <element type="title" /> <element type="author" /> <element type="pages" /> <AttributeType name="copyright" /> <attribute type="copyright" /> </ElementType> </Schema>
The schema defines four elements: <title>, <author>, <pages>, and <book> using the <ElementType> element. The <book> element specifies the individual elements and attributes that make it up. In other words, it describes the content model for the element. There is much more to the content model than this simple example shows, and we will discuss the additional components in the next section.
This example shows that each <book> element contains <title>, <author>, and <pages> child elements. This content model is specified using the <element> element along with the type attribute that references the <element> type defined earlier.
There is also support for global attributes that allow multiple elements to share the definition of a common attribute. Take a look at Listing 2. This schema declares an attribute, copyright, for the <book> element. This is done using the <AttributeType> element, which defines an attribute type, and then declaring it using the <attribute> element. You specify the <AttributeType> element globally by placing it outside the context of any <ElementType>.
Listing 2: Example of Global Attribute Declaration
<Schema xmlns:s="urn:schemas-microsoft-com:xml-data"> <ElementType name="title" content="textOnly"/> <ElementType name="authors" content="textOnly"/> <AttributeType name="pages" content="textOnly"/> <ElementType name="book" order="seq" content="eltOnly"> <attribute type="pages" /> <element type="title" /> <element type="authors" /> </ElementType> </Schema>
Data Types
Data type specification is a necessary part of schemas. In fact, it was one of the major driving forces behind their creation. The W3C XML 1.0 Recommendation defines enumerated types and a set of tokenized types. These types are referred to as primitive types in Microsoft's XML documentation. The primitive types are defined in Section 3.3.1 of the W3C XML 1.0 Recommendation.
Table 1: Microsoft's Primitive Data Types
Primitive Data Type |
Description |
entity |
Represents the XML ENTITY type. |
entities |
Represents the XML ENTITIES type. |
enumeration |
Represents an enumerated type (supported on attributes only). |
id |
Represents the XML ID type. |
idref |
Represents the XML IDREF type. |
idrefs |
Represents the XML IDREFS type. |
nmtoken |
Represents the XML NMTOKEN type. |
nmtokens |
Represents the XML NMTOKENS type. |
notation |
Represents a NOTATION type. |
string |
Represents a string type. |
In addition to the primitive types, Microsoft's schema specification enumerates many other different types. These data types are listed in Table 2.
Table 2: Microsoft Supported XML Non-Primitive Data Types
Data Type |
Description |
bin.base64 |
MIME-style Base64 encoded Binary Large Object (BLOB). |
bin.hex |
Hexadecimal digits representing octets. |
Boolean |
0 or 1, where 0 == "false" and 1 == "true". |
char |
String, one character long. |
date |
Date in a subset ISO 8601 format, without the time data, for example, "1994-11-05". The date itself is not validated. (For example, 2-31-99 will pass validation.) |
dateTime |
Date in a subset of ISO 8601 format, with optional time and no optional zone. Fractional seconds can be as precise as nanoseconds; for example, "1988-04-07T18:39:09". |
dateTime.tz |
Date in a subset ISO 8601 format, with optional time and optional zone. Fractional seconds can be as precise as nanoseconds; for example, "1988-04-07T18:39:09-08:00". |
fixed.14.4 |
Same as "number" but no more than 14 digits to the left of the decimal point, and no more than 4 to the right. |
float |
Real number with no limits on digits; can potentially have a leading sign, fractional digits, and, optionally, an exponent. Punctuation as in U.S. English. Values range from 1.7976931348623157E+308 to 2.2250738585072014E-308. |
int |
Number with optional sign, no fractions, and no exponent. |
number |
Number with no limit on digits; can potentially have a leading sign, fractional digits, and, optionally, an exponent. Punctuation as in U.S. English. (Values have the same range as the most significant number, R8, 1.7976931348623157E+308 to 2.2250738585072014E-308.) |
time |
Time in a subset ISO 8601 format, with no date and no time zone; for example, "08:15:27". |
time.tz |
Time in a subset ISO 8601 format, with no date but optional time zone; for example, "08:1527-05:00". |
i1 |
Integer represented in one byte. A number with optional sign, no fractions, no exponent, for example, "1, 127, -128". |
i2 |
Integer represented in one word. A number with optional sign, no fractions, no exponent; for example, "1, 703, -32768". |
i4 |
Integer represented in four bytes. A number with optional sign, no fractions, no exponent, for example, "1, 703, -32768, 148343, -1000000000". |
i8 |
Integer represented in eight bytes. A number with optional sign, no fractions, no exponent, and 19-digit precision. Range is from –9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 |
r4 |
Real number with seven-digit precision; can potentially have a leading sign, fractional digits, and, optionally, an exponent. Punctuation as in U.S. English. Values range from 3.40282347E+38F to 1.17549435E-38F. |
r8 |
Same as "float." Real number with 15-digit precision; can potentially have a leading sign, fractional digits, and, optionally, an exponent. Punctuation as in U.S. English. Values range from 1.7976931348623157E+308 to 2.2250738585072014E-308. |
ui1 |
Unsigned integer. A number, unsigned, no fractions, no exponent; for example, "1, 255". |
ui2 |
Unsigned integer, two bytes. A number, unsigned, no fractions, no exponent; for example, "1, 255, 65535". |
ui4 |
Unsigned integer, four bytes. A number, unsigned, no fractions, no exponent; for example, "1, 703, 3000000000". |
ui8 |
Unsigned integer, eight bytes. A number, unsigned, no fractions, no exponent. Range is 0 to 18,446,744,073,709,551,615. |
uri |
Universal Resource Identifier (URI); for example, "urn:schemas-microsoft-com:Office9". |
uuid |
Hexadecimal digits representing octets, optional embedded hyphens that are ignored; for example, "333C7BC4-460F-11D0-BC04-0080C7055A83". |