A Brief Example
For convenience, let's say that I want to keep track of books by ISBN. ISBNs are convenient because they provide a unique numbering scheme for books. Let's take the previous example of the book references marked up by ISBN:
<document id="1">Homer's <book isbn="0987-2343">Odyssey</book> is a revered relic of the ancient world.</document>
I've added <document id="1"> and </document> tags around the body of my document so each document can uniquely identify itself. Each XML document I write has an ID number, which I've designated should be in a tag named "document" that wraps around the entire document. Again, remember that I'm just making these tags up. They're not a documented standard; they're just being used for the purpose of these examples.
For easy reference, I want to keep track of which ISBN numbers, are referred to from which documents; thus I design an SQL table to look something like this:
doc_id |
ISBN |
1 |
0987-2343 |
2 |
0872-8237 |
doc_id has referential integrity to a list of valid document ID numbers, and the isbn field has referential integrity to a list of valid ISBN numbers. "Great," I hear you saying, "this is a lot of complexity for a bunch of stupid book names. Explain to me why this is better than using HTML again."
Suppose I have a thousand documents (book reviews, articles, bulletin board messages, and so on), and I want to determine which of them refer to a specific book. In the HTML universe, I can perform a textual search for occurrences of the book name. But what if I have documents that refer to Homer's Odyssey and Arthur C. Clark's 2001: A Space Odyssey? If I search for the word "odyssey," my search results list both books. However, if I've marked up all my references to books by ISBN and I've decomposed or extracted this information into a table in a database, I can use a simple SQL query to get the information I need quickly and reliably:
select doc_id from doc_isbn where isbn = '0987-2343'
The search results are a set of document ID numbers. I can choose to display the title of each document as a hyperlink, clickable to the actual document, or I can concatenate the documents and display them to the user one after another on a pagewhatever the requirements of my application. By combining the power of XML machine-readable metadata with the simplicity and power of my relational database, I've created a powerful document retrieval tool that can answer the question, "What do I have?" Creating such a tool simply required a little forethought and designing skill.
If I'm going too fast for you, don't worry. I discuss these topics in detail in the following chapters.