Ordering Up Some XQuery
For more than a year, XQuery had been inextricably tied to the acronym FLWR (presumably pronounced "flower") for For|Let|Where|Return. However, as of the November 15, 2002 draft, this acronym became FLWOR, an explicit recognition that ordering plays a fairly major part in database retrieval. The O stands for order by, a keyword that performs much the same actions as the SQL ORDER BY command.
The order by command follows for expressions (and where expressions where they exist), and indicates for a given set of data the order in which the data is output. When it is not explicitly specified with order by, the output order generally depends on the specific system architecture. With an XML file, for instance, the specific order is the order in which the XML parser walks the tree (usually the child of each node is processed, then the next sibling, and then the parent)the so-called document order.
The order by keyword indicates that the content should be sorted in ascending order, unless otherwise indicated according to the data type of the expressions being evaluated. This also implicitly assumes that the data type has some internal notion of ordering, which in turn assumes that there is some sense of schema validation acting on the data. For instance, consider the characters.xml file defined earlier in the chapter. Assume that it has an XSD schema (characters.xsd), which looks something like this:
<?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:element name="characters" type="Characters"/> <xsd:complexType name="Character"> <xsd:sequence> <xsd:element name="name" type="xsd:string"/> <xsd:element name="gender" type="Gender"/> <xsd:element name="species" type="xsd:string"/> <xsd:element name="vocation" type="xsd:string"/> <xsd:element name="level" type="xsd:nonNegativeInteger"/> <xsd:element name="health" type="xsd:int"/> </xsd:sequence> </xsd:complexType> <xsd:simpleType name="Gender"> <xsd:restriction base="xsd:string"> <xsd:enumeration value="Female"/> <xsd:enumeration value="Male"/> <xsd:enumeration value="Other"/> </xsd:restriction> </xsd:simpleType> <xsd:complexType name="Characters"> <xsd:sequence> <xsd:element name="character" type="Character" minOccurs="0" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:schema>
The schema can then be associated with the characters.xml file by editing the enclosing <characters> element:
<characters xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="http://www.kurtcagle.net/schemas/Characters.xsd"> <character> <name>Aleria Delamare</name> <gender>Female</gender> <species>Heroleim</species> <vocation>Bard</vocation> <level>5</level> <health>25</health> </character> <character> <name>Shar Denasthenes</name> <gender>Male</gender> <species>Human</species> <vocation>Merchant</vocation> <level>6</level> <health>28</health> </character> ... </characters>
The xsi:noNamespaceSchemaLocation gives the URL (here on a local machine) to the schema file. This schema thus indicates that <level> is a nonnegative integer, and <health> is an unbounded integer.
Given that, an XQuery can be written that orders the characters by health:
for $character in document('characters.xml')//character order by health return <healthReport> {$character/name} {$character/health} </healthReport>
This will return the following collection:
<healthReport> <name>Sheira</name> <health>9</health> </healthReport> <healthReport> <name>Gite</name> <health>18</health> </healthReport> <healthReport> <name>Paccu</name> <health>24</health> </healthReport> <healthReport> <name>Aleria</name> <health>25</health> </healthReport> <healthReport> <name>Shar</name> <health>28</health> </healthReport> <healthReport> <name>Gounna</name> <health>31</health> </healthReport> <healthReport> <name>Horukkan</name> <health>32</health> </healthReport> <healthReport> <name>Drue</name> <health>32</health> </healthReport>
Notice here that the order was done implicitly using integers. The first health report indicates a health of 9:
<healthReport> <name>Sheira</name> <health>9</health> </healthReport>
Had the ordering been done without regard to the data type of the element, this entry would have been last, because 9 is alphanumerically higher in order than 32.
Suppose, however, that you didn't have a specific schema against which to validate the characters.xml file. Would this have made sorting numerically impossible? No, because you can also declare an element's data type inline:
for $character in document('characters.xml')//character order by $character\health as xs:integer return <healthReport> {$character/name} {$character/health} </healthReport>
In this case, the contents of the <health> element are cast to an integer. The xs: namespace is implicitly defined within the XQuery schema, and indicates the XSD schema data types.
You can also determine the order of a query. The ascending and descending keywords determine the direction of the sort, and appear after the sort expression is given. For instance, suppose that each character element has an additional child element: <createdBy> of type xs:dateTime. To order the characters in descending order by the time they were created, you'd write this query:
for $character in document('characters.xml')//character order by $character\dateCreated as xs:dateTime descending return <dateReport> {$character/name} {$character/dateCreated} </dateReport>
Creating more sophisticated queries requires the intelligent use of FLWOR type expressions. For instance, if you want to list all entries in the characters.xml file by level, then sort all characters at that level by name, your query would need to use the distinct-values() function to retrieve the levels, then use that as the key for determining the set of each character within that level:
let $characters := document('characters.xml')//character for $level in distinct-values($characters/level) order by $level return <level value="{$level}">{ for $character in $characters where $character/level = $level order by $character/name return $character/name } </level>
This would create the following output:
<level value="4"> <name>Sheira</name> </level> <level value="5"> <name>Aleria</name> <name>Horukkan</name> <name>Paccu</name> </level> <level value="6"> <name>Shar</name> <name>Drue</name> </level> <level value="7"> <name>Gite</name> </level> <level value="8"> <name>Gounna</name> </level>
One of the problems with such sorting is that occasionally an empty order by expression will be used. For instance, suppose that a new character is added to the characters.xml list:
<character> <name>Yane Helavela</name> <gender>Female</gender> <species>Human</species> <vocation>Illusionist</vocation> </character>
In this case, no level element exists. The question that arises is whether this particular element should appear at the beginning or end of a sequence when sorted. This problem is answered by the empty greatest and empty least keywords. The empty greatest command indicates that whenever an item to be sorted has an empty sort key (or doesn't have one at all), it will always be assumed to be at the end of the sort order. On the other hand, empty least will place the same element at the beginning of the sort order.
The stable keyword, when added to the list of order by qualifiers, indicates another condition: what happens when two identical sort keys are found for different elements. For instance, if two different entries have the same creation date, the order of the output for the items becomes more questionable. In that situation, the stable keyword is used to tell the processor to retain the initial order in which the elements are retrieved, whereas the processor is free to implement its own ordering scheme if the stable keyword is absent. In most cases, this shouldn't make a major difference in the ordering.
Finally, if you have a reference to a specific collation (such as "eng-us"), you can indicate this particular collation as the basis for sorting through the collation keyword on the order by expression. For instance,
for $character in $characters order by $character/name collation "eng-us" return $character
will order the characters in ascending (the default) order using the U.S. English collation.