- XML Reference Guide
- Overview
- What Is XML?
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- Table of Contents
- The Document Object Model
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- DOM and Java
- Informit Articles and Sample Chapters
- Books and e-Books
- Implementations
- DOM and JavaScript
- Using a Repeater
- Repeaters and XML
- Repeater Resources
- DOM and .NET
- Informit Articles and Sample Chapters
- Books and e-Books
- Documentation and Downloads
- DOM and C++
- DOM and C++ Resources
- DOM and Perl
- DOM and Perl Resources
- DOM and PHP
- DOM and PHP Resources
- DOM Level 3
- DOM Level 3 Core
- DOM Level 3 Load and Save
- DOM Level 3 XPath
- DOM Level 3 Validation
- Informit Articles and Sample Chapters
- Books and e-Books
- Documentation and Implementations
- The Simple API for XML (SAX)
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- SAX and Java
- Informit Articles and Sample Chapters
- Books and e-Books
- SAX and .NET
- Informit Articles and Sample Chapters
- SAX and Perl
- SAX and Perl Resources
- SAX and PHP
- SAX and PHP Resources
- Validation
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- Document Type Definitions (DTDs)
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- XML Schemas
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- RELAX NG
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- Schematron
- Official Documentation and Implementations
- Validation in Applications
- Informit Articles and Sample Chapters
- Books and e-Books
- XSL Transformations (XSLT)
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- XSLT in Java
- Java in XSLT Resources
- XSLT and RSS in .NET
- XSLT and RSS in .NET Resources
- XSL-FO
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- XPath
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- XML Base
- Informit Articles and Sample Chapters
- Official Documentation
- XHTML
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- XHTML 2.0
- Documentation
- Cascading Style Sheets
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- XUL
- XUL References
- XML Events
- XML Events Resources
- XML Data Binding
- Informit Articles and Sample Chapters
- Books and e-Books
- Specifications
- Implementations
- XML and Databases
- Informit Articles and Sample Chapters
- Books and e-Books
- Online Resources
- Official Documentation
- SQL Server and FOR XML
- Informit Articles and Sample Chapters
- Books and e-Books
- Documentation and Implementations
- Service Oriented Architecture
- Web Services
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- Creating a Perl Web Service Client
- SOAP::Lite
- Amazon Web Services
- Creating the Movable Type Plug-in
- Perl, Amazon, and Movable Type Resources
- Apache Axis2
- REST
- REST Resources
- SOAP
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- SOAP and Java
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- WSDL
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- UDDI
- UDDI Resources
- XML-RPC
- XML-RPC in PHP
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- Ajax
- Asynchronous Javascript
- Client-side XSLT
- SAJAX and PHP
- Ajax Resources
- JSON
- Ruby on Rails
- Creating Objects
- Ruby Basics: Arrays and Other Sundry Bits
- Ruby Basics: Iterators and Persistence
- Starting on the Rails
- Rails and Databases
- Rails: Ajax and Partials
- Rails Resources
- Web Services Security
- Web Services Security Resources
- SAML
- Informit Articles and Sample Chapters
- Books and e-Books
- Specification and Implementation
- XML Digital Signatures
- XML Digital Signatures Resources
- XML Key Management Services
- Resources for XML Key Management Services
- Internationalization
- Resources
- Grid Computing
- Grid Resources
- Web Services Resource Framework
- Web Services Resource Framework Resources
- WS-Addressing
- WS-Addressing Resources
- WS-Notifications
- New Languages: XML in Use
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- Google Web Toolkit
- GWT Basic Interactivity
- Google Sitemaps
- Google Sitemaps Resources
- Accessibility
- Web Accessibility
- XML Accessibility
- Accessibility Resources
- The Semantic Web
- Defining a New Ontology
- OWL: Web Ontology Language
- Semantic Web Resources
- Google Base
- Microformats
- StructuredBlogging
- Live Clipboard
- WML
- XHTML-MP
- WML Resources
- Google Web Services
- Google Web Services API
- Google Web Services Resources
- The Yahoo! Web Services Interface
- Yahoo! Web Services and PHP
- Yahoo! Web Services Resources
- eBay REST API
- WordML
- WordML Part 2: Lists
- WordML Part 3: Tables
- WordML Resources
- DocBook
- Articles
- Books and e-Books
- Official Documentation and Implementations
- XML Query
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- XForms
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- Resource Description Framework (RDF)
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- Topic Maps
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation, Implementations, and Other Resources
- Rich Site Summary (RSS)
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- Simple Sharing Extensions (SSE)
- Atom
- Podcasting
- Podcasting Resources
- Scalable Vector Graphics (SVG)
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- OPML
- OPML Resources
- Summary
- Projects
- JavaScript TimeTracker: JSON and PHP
- The Javascript Timetracker
- Refactoring to Javascript Objects
- Creating the Yahoo! Widget
- Web Mashup
- Google Maps
- Indeed Mashup
- Mashup Part 3: Putting It All Together
- Additional Resources
- Frequently Asked Questions About XML
- What's XML, and why should I use it?
- What's a well-formed document?
- What's the difference between XML and HTML?
- What's the difference between HTML and XHTML?
- Can I use XML in a browser?
- Should I use elements or attributes for my document?
- What's a namespace?
- Where can I get an XML parser?
- What's the difference between a well-formed document and a valid document?
- What's a validating parser?
- Should I use DOM or SAX for my application?
- How can I stop a SAX parser before it has parsed the entire document?
- 2005 Predictions
- 2006 Predictions
- Nick's Book Picks
The Semantic Web is a great idea. It defines a way in which content can be marked up so that it may be understood by various applications that can make sense of data that is currently opaque to analysis. It allows meaningful relationships to be drawn from the data by machines as well as people. But it has one tiny minor little problem.
The web already exists.
Okay, so that's not a tiny minor little problem. It's a huge, honking problem. Why? Well, for two reasons. The first reason is that millions, or perhaps billions of pages already exist, and they don't have Semantic Web information in them. The second reason is that thousands, or perhaps millions of web programmers and content authors already exists, and they don't have the Semantic Web mindset in them.
That's not to say that they don't have an appreciation for what could be achieved with the Semantic Web. Most people would love to see a world that includes Tim Berners-Lee's vision of the Web as a place where you can perform actions such as having an intelligent agent coordinate schedules and make a doctor's appointment for you, but they don't want to have to learn complicated RDF to make it happen.
Enter microformats. Microformats are, on the surface, standard ways of representing commonly published information. For example, the vCard, a standard way of exchanging "business card"-type information has been around for more than a decade, and the iCalendar standard has been around almost that long. Both of them provide a standard, interoperable way of providing information that is commonly published on the Web. If an easy -- and most of all, standard -- way can be found for developers and content authors to represent this information in web pages, it can be available for semantic searches.
Note the use of the lowercase "s" in "semantic". Tantek Çelik has been giving talks on microformats and calling it the "semantic web", emphasizing the lowercase nature of it. This is not the Semantic Web of RDF and OWL, where the emphasis is on making data machine-readable. Instead, the emphasis is all in making data that is human readable first, and machine-readable second.
In essence, the microformats movement has two aspects. The first is to define the information represented by a particular kind of entity. For example, a vCard has the following information:
BEGIN:VCARD VERSION:3.0 N:Chase;Nicholas FN:Nicholas Chase URL:http://nicholaschase.com/ ORG:InformIT END:VCARD
The hCard microformat takes this information and provides a standard way to represent it in XHTML:
<div class="vcard"> <a class="url fn" href="http://nicholaschase.com/">Nicholas Chase</a> <div class="org">InformIT</div> </div>
Note the use of classes to specify information. This is a feature of microformats; they try very hard not to change what people are doing, but rather to simply add on in a way that is useful. For example, because this is XHTML with class information, you can simply create a style sheet such as:
<style type="text/css"> .vcard {border: 1px dotted black} .fn {font-weight: bold; text-decoration: none} .org {display: inline} </style>
And the hCard would be displayed as:
Adding information this way is not onerous to content authors, but it provides enough information that an aggregator could easily find specific hCard data.
And that's the other focus of microformats: defining how this information should be represented. The Microformats wiki defines them as:
simple conventions for embedding semantic markup for a specific problem domain in human-readable (X)HTML/XML documents, Atom/RSS feeds, and "plain" XML that normalize existing content usage patterns using brief, descriptive class names often based on existing interoperable standards to enable decentralized development of resources, tools, and services
And it does seem to be catching on. The wiki lists nine specifications, and at least as many drafts. They fall into two categories: elemental microformats and compound microformats.
Elemental microformats are meant to be composed into compound microformats. For example, the list of elemental microformats includes:
- XOXO:
- This microformat represents lists. In fact, to the naked eye it is indistinguishable from
XHTML lists and definitions, encompassing the
ol
,ul
,li
,dl
,dt
, anddd
elements. All are used in their normal manner, but since most people don't use thedl
,dt
, anddd
elements properly, it is a formal declaration that they are to be used as a definition lists, definition terms, and definition descriptions, respectively. (For example, this list is marked up as a definition list.) Also, by adding thexoxo
class to lists intended to be part of a microformat, you specify their purpose, and not just their presentation. - RelTag:
- This microformat has a leg up in the blogging world, because of its adoption by Technorati, where
Tantek Celik is Chief Technologist. It specifies that information in is a "keyword" or, more commonly, a "tag". For example, I might tag this document as:
<a href="http://www.technorati.com/tag/microformats" rel="tag">microformats</a>
to indicate that this was a document about microformats. Therel
tag is what specifies that as a tag, and thehref
attribute is the "tag space", with the tag consisting of the last part of the URL. So while most people use URLs that point back to, say, Technorati, which maintains a listing of pages for each tag, it is only important to point to something relevant. For example, as the specification suggests, we could point to<a href="http://www.wikipedia.com/microformats" rel="tag">microformats</a>
and still have are relevant tag. - XFN:
- This is an interesting little specification that enables you to note your relationship to other people in social networks. I could create a blogroll and specify my relationship with the authors with regards to several axes, such as friendship, whether or not I've met them, geographical, professional, family, and so on. For example:
<a href="http://www.chaosmagnet.com" rel="self">Chaos Magnet</a><br /> <a href=" http://www.squidoo.com/genealogyhowtoinfo" rel="spouse">Sarah Chase</a><br /> <a href="http://www.backstopmedia.com" rel="co-worker friend met">Troy Mott</a><br /> <a href="http://www.workbench.com" rel="acquaintance colleague">Rogers Cadenhead</a><br />
- RelLicense:
- This format enables you to link to the license for a particular page. For example, my blog is subject to a Creative Commons license, so I could add a link of:
This weblog is licensed under a <a href="http://creativecommons.org/licenses/by-nd/1.0/" rel="license">Creative Commons License</a>
The Microformats group has also specified several compound formats:
- hCard:
- This is the format we described at the start of this entry.
- hCalendar:
- Used to mark up events, appointments, and other such calendar-related information, this format encodes the information normally found in the iCalendar specification as XHTML. For example, if I wanted to add a conference on speaking and in June to my blog, I could mark it up as:
<div class="vcalendar"> <div class="vevent"> <a class="url" href="http://www-306.ibm.com/software/rational/events/rsdc2006/"> <span class="summary">IBM Rational Software Development Conference 2006</span>: <abbr class="dtstart" title="2006-06-04">June 4</abbr>- <abbr class="dtend" title="2006-06-09">8</abbr>, at the <span class="location">Walt Disney World Swan and Dolphin Resort Orlando, Florida</span> </a> </div> <div class="vevent"> ... </div> </div>
Note that the vCalendardiv
is optional; you'll want to include it, however, if you're showing more than one event. Also note that it is only the class names that are important. You can just as easily include the event in a span, and list several of them on one line. Note also that thedtend
value refers to the day after the event, hence the disconnect between the data and the human readable version. - VoteLinks:
- The specification provides a way to link to something without showing your approval of it. Consider this example taken from the specification:
<a rev="vote-for" href="http://ragingcow.blogspot.com" title="neat spoof">Raging Cow</a> <a rev="vote-for" href="http://ragingcow.blogspot.com" title="the boycott site">I support the Raging Cow</a> <a rev="vote-against" href="http://ragingcow.com" title="nasty corn syrup drink">Raging Cow</a>
The page links to both favored an unfavored sites, using therev
attribute to distinguish between them. Yes, this is a real attribute for this element, intended to denote a "reverse link", but rarely used. - hReview:
- Reviews are perhaps one of the most commonly cited
purposes for microformats. This is probably because so much blogging is done to tell people what you think
about something. This specification is intentionally minimalist, capturing only properties that are common to all
reviews. Consider this example from the specification:
<div class="hreview"> <span><span class="rating">5</span> out of 5 stars</span> <h4 class="summary">Crepes on Cole is awesome</h4> <span class="reviewer vcard">Reviewer: <span class="fn">Tantek</span> - <abbr class="dtreviewed" title="20050418T2300-0700">April 18, 2005</abbr></span> <div class="description item vcard"><p> <span class="fn org">Crepes on Cole</span> is one of the best little creperies in <span class="adr"><span class="locality">San Francisco</span></span>. Excellent food and service. Plenty of tables in a variety of sizes for parties large and small. Window seating makes for excellent people watching to/from the N-Judah which stops right outside. I've had many fun social gatherings here, as well as gotten plenty of work done thanks to neighborhood WiFi. </p></div> <p>Visit date: <span>April 2005</span></p> <p>Food eaten: <span>Florentine crepe</span></p> </div>
Notice that the hReview specification makes use of other microformats, such as hCard. This is, after all, a compound specification, and does include other attributes for specific types of reviews, such as product reviews or "multidimensional" reviews. See the specification for more information.
Now we need an easy way for authors to generate this information.