3.2 Paths
Navigation involves starting from one part of an XML document and moving to another part of the document (or a different document). XQuery performs navigation using paths. Paths were invented in 1970 for use with the PDP-11 file system. The path concept has been so generally useful that it has found broad application in a variety of systems, including XML query processing.
In XQuery, every path consists of a sequence of steps which, conceptually at least, are executed in order from left to right. A step consists of three parts, illustrated in Figure 3.1:
-
A direction of travel, called the axis
-
A description of the nodes to select upon arrival, called the node test
-
Zero or more filters to further narrow that selection (each filter is called a predicate)
Figure 3.1. Anatomy of a path
By allowing some of these parts to be abbreviated or omitted entirely, XQuery keeps paths very concise. Each of these parts is described next, and then Section 3.5 has many examples demonstrating how to use paths to accomplish common tasks.
Each step affects the evaluation context for the next step. This context and how it changes with each step are described in Section 3.4, but for now it's enough to know that there is a current context item that affects—and is affected by—each step in the path. Except for predicates, navigation steps can be applied only when the current context item is a node (in which case it is often called the current context node).
3.2.1 Beginnings
Every path starts somewhere. For the purpose of XQuery navigation, there are effectively three places from which a path can begin:
-
The current context node
-
The root of the tree in which the current context node resides
-
Any other node set, such as a variable or an XML constructor
With each successive step, the path may move to other nodes or alter the context.
The root of the tree in which the current context node resides is selected by a lone forward slash (/) or equivalently using the built-in root() function. Paths beginning from the root are absolute. In contrast, paths starting from the current context node are relative. Paths may also start from certain other expressions, such as variables, function calls, or parenthesized expressions (XQuery does not give a name to such paths).
From these humble beginnings, paths may navigate anywhere in the document, or even to other documents, step by step. Listing 3.1 shows a few paths. In a path, individual steps are almost always separated by one forward slash (/). (The exception, two forward slashes (//), is described in Section 3.2.4.)
Listing 3.1. Absolute, relative, and other paths
/AbsolutePath/First/Second RelativePath[. = "fun"] $other//x id("other")[@y > 1]/z
Paths with more than one step always result in a (possibly empty) sequence of nodes, sorted in document order. To sort nodes in some other order, you must use a FLWOR expression (see Chapter 6).
3.2.2 Axes
Each step consists of three parts: the axis (optional), the node test, and zero or more predicates. XPath defines a total of thirteen axes, and all but the namespace axis appear in XQuery. Of these, the four simplest and most commonly used ones are child, attribute, parent, and self (see Table 3.1). The other axes are explained in Section 3.2.4.
Table 3.1. The four basic axes and their abbreviations
Axis name |
Abbreviation |
Equivalent examples |
|
---|---|---|---|
attribute |
@ |
x/attribute::y |
x/@y |
child |
x/child::y |
x/y |
|
parent |
.. |
x/parent::node() |
x/.. |
self |
. |
x/self::node() |
x/. |
The child axis is so common that it is the default axis if no axis name is specified explicitly. The other three common axes all have shorthand abbreviations for convenience. XPath gets much of its succinctness from these shorthand forms. When the non-abbreviated name is used, it is followed by two colons (::) to distinguish axis names from XML qualified names (which contain at most one colon).
These four axes behave exactly as their names suggest:
-
The child axis navigates into the children of the current context node.
-
The attribute axis navigates into the attributes of the current context node.
-
The self axis essentially goes nowhere (navigating into the current context node itself).
-
The parent axis navigates to the parent of the current context node.
For example, x, which is short for child::x, selects the child elements named x from the current context node, while x/y, which is short for child::x/child::y, first selects the child elements named x from the current context node just like the previous example, and then from those selects the child elements named y.
3.2.3 Node Tests
Following the axis is the second part of the step, the node test. Node tests come in three varieties: names (qualified or unqualified), node kinds, and wildcards.
3.2.3.1 Name Tests
By far the most common node test is the name test. A name test selects only those nodes with the same name. Names in XQuery, as in XML, are case-sensitive. For example, the absolute path /x/y/@z starts at the root of the current document, navigates to the top-level elements named x, navigates to their child elements named y, and finally navigates to their attribute nodes named z. If you were to execute this XQuery over the XML document in Listing 3.2, it would select the two attributes named z and no other nodes.
Name tests can also select names that are in an XML namespace. However, this process is fairly complicated, so this description is deferred until Section 3.6.1.
Listing 3.2. A sample XML document
<x thisAttribute="isNotSelected"> <y z="1"/> <y z="2" thisAttribute="alsoIsNotSelected" </y> </x>
3.2.3.2 Node Kind Tests
Name tests are not the only node tests available in navigation steps. In fact, some kinds of XML nodes (for example, text, comment, and document nodes) have no names at all. To select nodes by kind, XQuery uses the same node kind tests used by sequence type matching (described in Chapter 2). Listing 3.3 shows two node kind tests.
Listing 3.3. Examples of node kind tests
x/comment() (: select all comment children of x :) x/attribute() (: select all attributes of x :) attribute(@*, xs:integer) (: select all integer attributes :) attribute(y) (: select all attributes named y :) attribute(y, xs:integer) (: select integer attributes named y :)
Recall from Chapter 2 that the node() node test matches any kind of node, including the document node. The text() and comment() node kind tests match text nodes and comment nodes, respectively. The processing-instruction() node test accepts an optional name argument. When no name is specified, it matches all processing instruction nodes; otherwise, it matches only those with the same name.
The document-node() test matches the invisible document node that occurs at the root of any tree loaded from an XML document using doc() (or constructed using the document constructor—see Chapter 7). It accepts an optional argument specifying an element node kind test, in which case it matches the document node only if its element content matches that element test.
And finally, the element() and attribute() node kind tests accept optional name and type arguments. Without these extra arguments, they match all elements and attributes, respectively; with these arguments, they match only elements or attributes that have the specified name and/or type. The name or type can also be *, in which case it matches all names or all types, respectively. The name specified in an attribute() test must start with an @ symbol to emphasize that it matches attributes.
3.2.3.3 Wildcards
Sometimes you want to select all nodes whose name is in a particular namespace, or conversely all nodes with the same local name regardless of the namespace. There are two equivalent ways to accomplish this goal. One is to use predicates; in fact, as you will see later, predicates can be used to perform all kinds of tests.
A more succinct way is to use the third kind of node test, the wildcard. Wildcard node tests combine aspects of both name and node kind tests; the names matched depend on the wildcard, and the node kind matched depends on the axis. The attribute axis by default selects attribute nodes; all other XQuery axes select elements by default. The default node kind is called the principal node kind for the axis.
XQuery supports three wildcard node tests. Two of these come from XPath 1.0: the star (*), which matches any name at all, and a qualified star (prefix:*) that matches all names in the namespace to which the prefix is bound. XQuery adds a third wildcard node test, *:local-name, which matches all names with the given local name and any namespace.
The only difference between the star wildcard * and the node() node kind test is that node() matches every kind of node with any name, while * matches only nodes of the principal node kind (with any name).
3.2.4 Other Axes
XQuery supports two more axes from XPath 1.0, called descendant and descendant-or-self. The descendant axis matches all descendants of the current context node. (It is the closure of the child axis under fixed-point recursion.) The descendant-or-self axis includes the current context node as well, and so is equivalent to the union of the descendant and self axes.
The descendant-or-self axis is so commonly used that it has its own abbreviation, //. Some caution should be observed when using it; it's easy to make mistakes when using predicates with // (see Chapter 11 for examples).
Additionally, implementations are allowed but not required to support the other six axes from XPath: ancestor, ancestor-or-self, following, following-sibling, preceding, and preceding-sibling. The first two of these are the inverses of descendant and descendant-or-self axes. They select all the ancestors of the current node (ancestor-or-self includes the node itself).
The following and preceding axes select all the nodes in the same document as the current context node that occur before and after it, respectively. There's really no reason to use them in XQuery, because the >> and << node comparison operators allow you to write the same meaning more compactly (see Chapter 5).
Finally, the following-sibling and preceding-sibling axes restrict their selections to the siblings of the current context node (that is, those nodes having the same parent as it).
3.2.5 Predicates
The third and final part of each navigation step consists of zero or more predicates. Like the node test, each predicate acts as a filter on the selected nodes, eliminating some from consideration and keeping the rest. For each node selected by the current step, the current context item is set to that node and then the predicate condition is evaluated with that context.
Any XQuery expression may be used inside a predicate; the meaning of the predicate depends on the type of the expression it contains. There are two cases: numeric and boolean predicates.
3.2.5.1 Numeric Predicates
Numeric predicates select nodes by their position in the current context. For example, /x/y[1] selects the first y child element of each x element. As this example demonstrates, predicates bind tightly to the current step. To apply a predicate to the entire results of a path, you must use parentheses. For example, (/x/y)[1] selects the first y element out of all the nodes selected by /x/y.
Because paths can start with other kinds of expressions, such as parenthesized expressions, predicates can be applied to more than just sequences of nodes. For example, the expression ("a", "b", "c")[2] selects the second item in the sequence, the string "b".
Numeric predicates, like the ones in Listing 3.4, filter by position. In general, when a predicate evaluates to a number N, it's as if the predicate were actually the boolean-valued predicate position()=N. For example, the path /x[1] is equivalent to the path /x[position() = 1]. This expansion applies not only to numeric constants, but also to any numeric-typed expression. For example, the path /x[@y + 1] is equivalent to the path /x[position() = @y + 1].
Listing 3.4. Numeric predicates filter by position
(//Customer)[2] Fruit[@index + 1]
The position is 1-based (the first item in the sequence is at position 1). When the predicate evaluates to a non-integral value, a value less than 1, or a value greater than the length of the sequence, then the predicate will be false for all items in the sequence and the result will be the empty sequence. In other words, it isn't an error to select an index that is out of bounds for the sequence.
3.2.5.2 Boolean Predicates
All other kinds of predicate expressions, such as the ones in Listing 3.5, filter a sequence so that only those items for which the predicate evaluates to true are kept. The predicate is converted to a boolean value by computing the Effective Boolean Value of the expression.
Listing 3.5. All other predicates filter as boolean conditions
/x[@a=1 and @b=1] /x[@a=1]/y[@b < 2]
As described in Section 2.6.2, the Effective Boolean Value acts as an existence test on sequences. Consequently, when the predicate is itself a path, the predicate evaluates to true if and only if the node(s) selected by that path exist. For example, x[y] matches all x elements that have a y child element, and x[not(@y)] matches all x elements that don't have a y attribute.
3.2.5.3 Successive and Nested Predicates
Several predicates can be applied to a step, with the effect that each predicate is evaluated with respect to the nodes remaining after the previous predicate. The order of evaluation of the predicates is always left to right, which matters only when computing positional predicates. For example, the path x[1][@y=2] selects the first x element (if there is one), and then only if that element has a y attribute whose value is 2; while the path x[@y=2][1] selects all x elements that have a y attribute whose value is 2, and then from that set selects the first one. Over the XML <x y="3"/><x y="2"/> the first path selects nothing (because the first x element has y="3"), while the second path selects the second element.
Predicates can also be nested. For example, the path x[y[@z=1] = 2] selects all x elements where there exists a y element with a z attribute equal to 1 and the value of the y element itself equals 2.