Searching XML Documents
Information storage is only useful if the information can be easily retrieved. Usually, information is retrieved from a relational database by sending commands (for example, Structured Query Language [SQL]) to the server. An XML document can be considered a facility for data storage, and therefore, can also be considered a rudimentary database. Just as SQL is the standard language for relational databases, the XML community has developed their own set of standards for retrieving information from an XML document. XPath is a standard that defines the syntax for addressing and fetching parts of an XML document.
NOTE
XPath and many other XML-related standards can be found on the web at http://www.w3.org.
Other XML standards exist, such as XML Query and XML Query Language (XQL), for querying XML documents. Each of these standards for querying XML documents uses its own syntax (though it might be similar) on top of the query engine.
Being able to extract data from an XML document can be very convenient. You can easily retrieve information from a certain part of the document by writing a query using XPath, XML Query, or XQL. Let's take a look at Figure 2.4, which shows (at a high level) just how XPath's query engine works.
Figure 2.4 XML XPath Query Engine.
As you can see in Figure 2.4, the application starts the query engine, initializes the query, and submits it to the engine for processing. The query engine then determines what is requested and goes to work to retrieve the requested information. After the data is located, it is returned to the application, which can proceed to process it. As you can see, the concept of the process is very simple and straightforward.
One area of XML processing that uses XPath quite extensively is XSLT. XPath is used to find portions of the XML document that match the criteria defined in the eXtensible Stylesheet Language (XSL) files (that is, which elements in the XML document will be extracted to create the output file).