Introducing HTML 5
- The XHTML 2 Disaster
- The WHAT Working Group?
- Apps, Not Documents
- Save Me!
- Next Week
In 1991, Tim Berners-Lee described a very simple SGML profile for marking up online documents. This defined a few tags for marking up text. The original web browsers, Sir Tim's WorldWideWeb for NeXTSTEP and a terminal-based UNIX version, included support for this language. A couple of years later, Mosaic introduce support for inline images.
Over the next couple of years, things like tables and forms appeared in various browsers. The HTML 2.0 specification was published in 1995 by the IETF and was a formalization of the various features that were well-supported.
After this, the World Wide Web Consortium (W3C) took over evolving the standard. HTML 3 was released a couple of years later and, again, standardized something that was roughly the overlap between various different existing implementations. This was at the height of the browser war, when Microsoft, Netscape, and a few other players were competing heavily based on features.
Then the W3C adopted a different approach. For HTML 4, they decided to define the standard as the working group felt it should be used, rather than as it was used. This involved deprecating a lot of presentation-related markup and delegating things to CSS that had previously been done in HTML.
HTML 4 had both a transitional profile which deprecated these tags, and a strict version that removed them. It was followed by XHTML 1.0, which used the same tags but required documents to be well-formed XML.
The XHTML 2 Disaster
Flushed with success, the W3C then began working on XHTML 2. XHTML 1 had a few advantages over HTML 4. They could both display the same documents. It is possible to transform one to the other quite trivially. The requirement of XHTML to be well-formed XML provides some additional advantages, however. XML documents are allowed to contain any arbitrary tags. As long as they are properly namespaced, you can include things like MathML and SVG inline inside XHTML documents, without needing to make them separate files. This speeds up loading, because the browser gets everything in one goit doesn't need to parse the main file and then get the referenced files, and it means that everything goes in the same DOM tree, so can be modified from JavaScript.
This is only now starting to be well-supported by browsers, and even modern browsers that do support inline SVG are quite picky about the documents that they will accept as being XML (for example, requiring DTDs, .xhtml file extensions, or specific MIME types).
The goal for XHTML 2 was simplicity. All presentation-related tags were removed. All tags that duplicated the function of other tags were removed. Anything that duplicated the functionality of some other W3C standard (XForms, XFrames, and so on) was removed. The standard was then decomposed into various smaller profiles so that implementers could easily define subsets for different uses.
It was, unfortunately, a classic example of second-system syndrome. XHTML 2, in the current working drafts, is quite a nice standard. It's clean, easy to implement, and easy to produce. It is, unfortunately, only vaguely compatible with XHTML 1. That is to say, there is a smalland not very usefulsubset of XHTML 1 that is also a subset of XHTML 2.
This rather destroys any advantages of the simplicity of XHTML 2. Browser writers still have to support XHTML 1 and earlier versions, but now they'd have to support what is effectively a completely new language as well.
XHTML 2 is what, in hindsight, the W3C would like HTML 1.0 to have been. Unfortunately, it's not possible to completely reset the web and say “Okay, that was just a trial run; now we'll have the real version of HTML, so please update all your sites now.” XHTML 2 has been 10 years in the making, and now looks like it will never be released.