- XML Reference Guide
- Overview
- What Is XML?
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- Table of Contents
- The Document Object Model
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- DOM and Java
- Informit Articles and Sample Chapters
- Books and e-Books
- Implementations
- DOM and JavaScript
- Using a Repeater
- Repeaters and XML
- Repeater Resources
- DOM and .NET
- Informit Articles and Sample Chapters
- Books and e-Books
- Documentation and Downloads
- DOM and C++
- DOM and C++ Resources
- DOM and Perl
- DOM and Perl Resources
- DOM and PHP
- DOM and PHP Resources
- DOM Level 3
- DOM Level 3 Core
- DOM Level 3 Load and Save
- DOM Level 3 XPath
- DOM Level 3 Validation
- Informit Articles and Sample Chapters
- Books and e-Books
- Documentation and Implementations
- The Simple API for XML (SAX)
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- SAX and Java
- Informit Articles and Sample Chapters
- Books and e-Books
- SAX and .NET
- Informit Articles and Sample Chapters
- SAX and Perl
- SAX and Perl Resources
- SAX and PHP
- SAX and PHP Resources
- Validation
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- Document Type Definitions (DTDs)
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- XML Schemas
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- RELAX NG
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- Schematron
- Official Documentation and Implementations
- Validation in Applications
- Informit Articles and Sample Chapters
- Books and e-Books
- XSL Transformations (XSLT)
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- XSLT in Java
- Java in XSLT Resources
- XSLT and RSS in .NET
- XSLT and RSS in .NET Resources
- XSL-FO
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- XPath
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- XML Base
- Informit Articles and Sample Chapters
- Official Documentation
- XHTML
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- XHTML 2.0
- Documentation
- Cascading Style Sheets
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- XUL
- XUL References
- XML Events
- XML Events Resources
- XML Data Binding
- Informit Articles and Sample Chapters
- Books and e-Books
- Specifications
- Implementations
- XML and Databases
- Informit Articles and Sample Chapters
- Books and e-Books
- Online Resources
- Official Documentation
- SQL Server and FOR XML
- Informit Articles and Sample Chapters
- Books and e-Books
- Documentation and Implementations
- Service Oriented Architecture
- Web Services
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- Creating a Perl Web Service Client
- SOAP::Lite
- Amazon Web Services
- Creating the Movable Type Plug-in
- Perl, Amazon, and Movable Type Resources
- Apache Axis2
- REST
- REST Resources
- SOAP
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- SOAP and Java
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- WSDL
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- UDDI
- UDDI Resources
- XML-RPC
- XML-RPC in PHP
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- Ajax
- Asynchronous Javascript
- Client-side XSLT
- SAJAX and PHP
- Ajax Resources
- JSON
- Ruby on Rails
- Creating Objects
- Ruby Basics: Arrays and Other Sundry Bits
- Ruby Basics: Iterators and Persistence
- Starting on the Rails
- Rails and Databases
- Rails: Ajax and Partials
- Rails Resources
- Web Services Security
- Web Services Security Resources
- SAML
- Informit Articles and Sample Chapters
- Books and e-Books
- Specification and Implementation
- XML Digital Signatures
- XML Digital Signatures Resources
- XML Key Management Services
- Resources for XML Key Management Services
- Internationalization
- Resources
- Grid Computing
- Grid Resources
- Web Services Resource Framework
- Web Services Resource Framework Resources
- WS-Addressing
- WS-Addressing Resources
- WS-Notifications
- New Languages: XML in Use
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- Google Web Toolkit
- GWT Basic Interactivity
- Google Sitemaps
- Google Sitemaps Resources
- Accessibility
- Web Accessibility
- XML Accessibility
- Accessibility Resources
- The Semantic Web
- Defining a New Ontology
- OWL: Web Ontology Language
- Semantic Web Resources
- Google Base
- Microformats
- StructuredBlogging
- Live Clipboard
- WML
- XHTML-MP
- WML Resources
- Google Web Services
- Google Web Services API
- Google Web Services Resources
- The Yahoo! Web Services Interface
- Yahoo! Web Services and PHP
- Yahoo! Web Services Resources
- eBay REST API
- WordML
- WordML Part 2: Lists
- WordML Part 3: Tables
- WordML Resources
- DocBook
- Articles
- Books and e-Books
- Official Documentation and Implementations
- XML Query
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- XForms
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- Resource Description Framework (RDF)
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- Topic Maps
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation, Implementations, and Other Resources
- Rich Site Summary (RSS)
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- Simple Sharing Extensions (SSE)
- Atom
- Podcasting
- Podcasting Resources
- Scalable Vector Graphics (SVG)
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- OPML
- OPML Resources
- Summary
- Projects
- JavaScript TimeTracker: JSON and PHP
- The Javascript Timetracker
- Refactoring to Javascript Objects
- Creating the Yahoo! Widget
- Web Mashup
- Google Maps
- Indeed Mashup
- Mashup Part 3: Putting It All Together
- Additional Resources
- Frequently Asked Questions About XML
- What's XML, and why should I use it?
- What's a well-formed document?
- What's the difference between XML and HTML?
- What's the difference between HTML and XHTML?
- Can I use XML in a browser?
- Should I use elements or attributes for my document?
- What's a namespace?
- Where can I get an XML parser?
- What's the difference between a well-formed document and a valid document?
- What's a validating parser?
- Should I use DOM or SAX for my application?
- How can I stop a SAX parser before it has parsed the entire document?
- 2005 Predictions
- 2006 Predictions
- Nick's Book Picks
Google Base is the search engine giant's attempt at a semantic search engine. Like the Semantic Web, data is marked up in such a way that machines can understand its context. In other words, Google Base knows that the string "40,000" is a starting salary for a job, rather than the altitude of a plane.
How does it know? Because it employs some space-age algorithm to analyze the data? No. It doesn't even use OWL or any of the other Semantic Web technologies we've talked about. No, Google Base uses a much more simple strategy:
It asks.
If you point your browser to http://base.google.com, you'll have the opportunity to search for specific Jobs, Blogs, Mobile content, Recipes, Products, and so on. Google Base knows that these objects are what they are because the users who entered them into the system in the first place said so.
For example, if you click the Jobs link, you'll find not just a selection of jobs, but also the opportunity to narrow your search for these jobs based on criteria such as location, education, job type, and industry. Google Base can accomplish this because users have the opportunity to add this data in an easily understandable way.
You can see this in action if you enter one of your own items. Click "Post your own item". You'll need to sign in using a Google account, but other than that, the service is free.
(Before I go any further, you may be wondering just what this has to do with XML. Well, in addition to its relevance with regard to semantic technology, Google also provides an opportunity to add information using XML data feeds, which we'll look at it a little while.)
Once you sign in, in addition to any items you've already added, you'll see the opportunity to add a new item. As you can see in this screenshot, you have a choice between existing item types and creating your own. Google Base does not limit you to the existing types, but because of the way searches are conducted, it's your advantage to use an existing type whenever possible. In this way, it's much like TV's Family Feud; yes, it may be clever to call your "product" an "object-de-art" but it's unlikely anyone will actually search for a category called "object-de-art", so nobody will find you.
Once you've chosen an item type, Google Base presents you with some "standard" attributes, along with the opportunity to add attributes of your own. For example, if we chose to add a new Recipe, we might see a screen much like this:
In addition to the title of the recipe, enter the main ingredient, cuisine, and cooking time These fields also suggest common values so a search is more likely to find your content.
Whether you choose an existing item type or create your own, you also have the option of creating your own attributes. To do that, you'll need to give each attribute in name and choose the type. These types include typical types, such as "" And "Number", as well as complex types such as "Number unit" (such as "10 minutes") or "Location" (such as "1600 Pennsylvania Ave, Washington, DC").
In this case, we can add several "Ingredient" attributes.
From there, you can add a description (the main body of the content) and, if you like, a photo.
Photos can be uploaded or referenced by URL. Each has its advantage. Uploading the photo to Google means you won't have to worry about bandwidth, but referring to an image on your site is an easy way to track how many times your item has been seen.
You also have the option to enter "labels", which are essentially tags, or keywords by which items can be grouped.
At the bottom of the page, you can determine whether and how users can contact you -- Google provides an anonymous address for you, so you don't have to add your main address to the page. You can also specify location and delivery options, if that's applicable.
What you have your information entered, click the "Preview" button to see how it will look.
Notice that the identically named attributes have been combined into a single, comma delimited field. Also notice that the labels are now linked to pages that list other items that use the same label. Once you're happy with your item, go ahead and submit it.
Click "My Items" to see the status of each of your items.
Okay, so we've created a recipe and submitted it. Where does it go?
Well, first, as you would expect, it goes to Google Base. If you go to the Google Base home page and click "recipes," you should see the new entry. (Obviously, if you have created a different type of object, click that instead.)
Now that's nice, but Google Base has a negligible amount of traffic compared to Google proper. But try this: go to Google and enter a query of "recipes". Notice that Google gives you an option to refine your search based on cuisine or main ingredient. (Here's where it's to your advantage to use common terms for your attributes. Our perogie recipe is listed as "European", but that's not a choice, so our recipe can't be found that way.) These searches look at Google Base content right from Google!
In fact, all Google Base content is also shown on "Google properties" based on relevance.
Let's think about this for a moment. You enter information into Google Base and it's immediately entered into Google. The Holy Grail! (Well, okay, maybe not the Holy Grail, but certainly a resonably-good-with-maybe-a-few-minor-sins-when-it-was-a-teenager grail.)
But there's still one problem. If all you want to do is make your information available -- for job or want ad listings, for example -- Google's ability to host your content is fine. But what if you want to enable this kind of search for your own content?
Fortunately, you have a second option for entering information, and it enables you to redirect users to your page, rather than showing the Google Base page. For example, our perogie recipe actually resides on Sarah's "Quick and easy cooking" page, and we'd like to send users there instead of the generic page. We can do that by creating a "bulk upload".
You can build a bulk upload as a traditional tab-delimited text file, or as a feed. Google supports bulk uploads in RSS 2.0, RSS 1.0, and Atom 0.3.
For example, we can create a number of job listings as an RSS 2.0 feed:
<?xml version="1.0" encoding="UTF-8" ?> <rss version ="2.0" xmlns:g="http://base.google.com/ns/1.0"> <channel> <title>Jobs at Backstop Media</title> <description>Backstop Media provides technical content to major online outlets...</description> <link>http://www.backstopmedia.com</link> <item> <title>Eclipse expert needed</title> <description>Do you know your way around the Eclipse IDE? Would you ...</description> <link>http://www.backstopmedia.com/jobs.html</link> <g:image_link></g:image_link> <guid isPermalink="true">http://www.backstopmedia.com/jobs.html#eclipse</guid> <g:expiration_date>2006-04-30</g:expiration_date> <g:label>Eclipse</g:label> <g:label>open source</g:label> <g:label>technical writing</g:label> <g:job_industry>technology</g:job_industry> <g:employer>Backtop Media</g:employer> <g:job_function>Technical Writer</g:job_function> <g:job_type>Technical Writer</g:job_type> <!-- <g:currency></g:currency> <g:salary></g:salary> <g:salary_type></g:salary_type> <g:education></g:education> <g:immigration_status></g:immigration_status> --> <g:location>telecommute</g:location> </item> <item> <title>Geronimo expert needed</title> <description>Do you know your way around the Geronimo application server? Would you ...</description> <link>http://www.backstopmedia.com/jobs.html</link> <g:image_link></g:image_link> <guid isPermalink="true">http://www.backstopmedia.com/jobs.html#geronimo</guid> <g:expiration_date>2006-04-30</g:expiration_date> <g:label>Geronimo</g:label> <g:label>open source</g:label> <g:label>technical writing</g:label> <g:job_industry>technology</g:job_industry> <g:employer>Backtop Media</g:employer> <g:job_function>Technical Writer</g:job_function> <g:job_type>Technical Writer</g:job_type> <!-- <g:currency></g:currency> <g:salary></g:salary> <g:salary_type></g:salary_type> <g:education></g:education> <g:immigration_status></g:immigration_status> --> <g:location>telecommute</g:location> </item> </channel> </rss>
Notice that we have a regular RSS listing, but here we are
adding additional attributes by adding additional elements in a new namespace,
http://base.google.com/ns/1.0
. Because
they're in a different namespace, they won't interfere with the normal RSS
data.
Google provides a list of "standard" types and
their attributes at http://base.google.com/base/attribute_list.html.
For example,
"wanted ads" items define only expiration_date
, location
,
expiration_date_time
, and image_link
.
But you can add any attribute you like, as long as you use a
second namespace, http://base.google.com/cns/1.0
. For example, we
can create a feed of our recipes:
<?xml version="1.0"?> <rss version="2.0" xmlns:g="http://base.google.com/ns/1.0" xmlns:c="http://base.google.com/cns/1.0"> <channel> <title>Quick and Easy Cooking</title> <link>http://www.squidoo.com/quickandeasycooking/</link> <description>How many times have you come home exhausted only to be met at the door with, "I'm starving. What's for dinner?" How many ...</description> <item> <title>Cheat and Eat Donuts</title> <link>http://www.squidoo.com/quickandeasycooking/#module1322541</link> <description>Want to have fun with the kids and make something yummy for dessert? Try this easy way to make donuts. Get a container of ...</description> <!-- <g:image_link></g:image_link> --> <c:main_ingredient type="string">biscuit dough</c:main_ingredient> <c:cuisine type="string">american</c:cuisine> <c:cooking_time type="intUnit">30 min</c:cooking_time> <g:label>donut</g:label> <g:label>doughnut</g:label> <g:label>snack</g:label> </item> <item> <title>Smothered Perogies</title> <link>http://www.squidoo.com/quickandeasycooking#module1343065</link> <description>Start with enough frozen perogies to ...</description> <!-- <g:image_link></g:image_link> --> <c:main_ingredient type="string">perogies</c:main_ingredient> <c:cuisine type="string">polish</c:cuisine> <c:cooking_time type="intUnit">10 min</c:cooking_time> <c:ingredients type="string">frozen perogies</c:ingredients> <c:ingredients type="string">ready to serve bacon</c:ingredients> <c:ingredients type="string">shredded cheddar cheese</c:ingredients> <c:ingredients type="string">sour cream</c:ingredients> <g:label>cheese</g:label> <g:label>easy</g:label> <g:label>fast</g:label> <g:label>perogies</g:label> <g:label>quick</g:label> <g:label>recipe</g:label> <g:label>sour cream</g:label> </item> </channel> </rss>
Notice that recipes don't have any standard attributes specific to them; even those that appear on the individual posting page are custom attributes. You can find more information on formatting custom attributes at http://base.google.com/base/provider_module.html.
Google knows what type of items the feed includes because you specify it when you create the bulk upload. From the "my items" page, click "Bulk Upload files." From there, click "specify a bulk upload file" (or "specify another bulk upload file") Enter the name of the file and choose or specify an item type.
What you specify file, you can upload the actual data. To do that through the browser, choose the appropriate file from the drop-down box and click "Browse..." to find the file. Click "upload and process this file" to complete the process. You'll see a message last across the top of the page telling you to check back in an hour, but in reality it may be much longer. The file and status appear on the page. You can also create FTP account for uploading the file.
Now when your items come up, they'll point not at your Google Base page, but at your own page instead:
So all of this enables Google to start to approach the Semantic Web, and it enables you to include your data in it. But this is very much a walled garden, in which we are dealing with data specifically for Google. What we really need is a standard set of attributes for specific item types.