- What Is Well-Formedness?
- Change Name to Lowercase
- Quote Attribute Value
- Fill In Omitted Attribute Value
- Replace Empty Tag with Empty-Element Tag
- Add End-tag
- Remove Overlap
- Convert Text to UTF-8
- Escape Less-Than Sign
- Escape Ampersand
- Escape Quotation Marks in Attribute Values
- Introduce an XHTML DOCTYPE Declaration
- Terminate Each Entity Reference
- Replace Imaginary Entity References
- Introduce a Root Element
- Introduce the XHTML Namespace
Introduce the XHTML Namespace
Add an xmlns="http://www.w3.org/1999/xhtml" attribute to every html element.
<html>
<html xmlns="http://www.w3.org/1999/xhtml">
Motivation
XSLT and other XML-based tools can treat the same element differently, depending on its namespace. XML-based XHTML tools expect to find HTML elements in the XHTML namespace and will usually not function correctly if they are in no namespace instead.
Furthermore, many browser extensions such as XForms, SVG, and MathML operate correctly only when embedded inside a properly namespaced XHTML document.
Potential Trade-offs
None. This will not affect browser display.
Mechanics
This can mostly be fixed with search and replace. The most common html start-tag is simply <html> with no attributes. Without even using regular expressions, you can do a multifile search and replace that converts this into <html xmlns="http://www.w3.org/1999/xhtml">.
However, you may also encounter some other additional attributes on the html element. The lang attribute is particularly common, but other possibilities include id and dir. For example:
<html lang='en-UK'>
Thus, as a first step, I suggest searching for <html\s—that is, <html followed by any whitespace character. If there are a few of them, you can fix them manually. If there are a lot of them, most likely some person, tool, or program made a common practice of adding some particular attribute to the html start-tag. If so, this is likely to be consistent across the site. For example, you may need to search for <html lang='en'> instead of just <html>.
The only thing you need to be careful of is that no one has already changed some (but not all) of the HTML documents to use the XHTML namespace. You may wish to do a search for this first. Thus, the order is
- Search for http://www.w3.org/1999/xhtml. If no results are found, continue. Otherwise, exclude the files containing this string from future replacements.
- Search for "<html\s" and replace it with "<html xmlns='http://www.w3.org/1999/xhtml' ".
- Search for <html> and replace it with <html xmlns=' http://www.w3.org/1999/xhtml'>.
When you're done, set your validator to check for XHTML specifically. It should warn you of any lingering problems that you missed.