- What Is Well-Formedness?
- Change Name to Lowercase
- Quote Attribute Value
- Fill In Omitted Attribute Value
- Replace Empty Tag with Empty-Element Tag
- Add End-tag
- Remove Overlap
- Convert Text to UTF-8
- Escape Less-Than Sign
- Escape Ampersand
- Escape Quotation Marks in Attribute Values
- Introduce an XHTML DOCTYPE Declaration
- Terminate Each Entity Reference
- Replace Imaginary Entity References
- Introduce a Root Element
- Introduce the XHTML Namespace
Terminate Each Entity Reference
Place semicolons after entity references.
© 2007 TIC Corp. if (i < 7) { Ben & Jerry's Ice Cream
© 2007 TIC Corp. if (i < 7) { Ben & Jerry's Ice Cream
Motivation
XML requires that each entity reference end with a semicolon.
Web browsers can usually work around a missing semicolon, but only if the entity name is followed by whitespace. For instance, most browsers can handle "Ben & Jerry's" but not "A&P".
Potential Trade-offs
None. All browsers recognize entity references that end with semicolons.
Mechanics
To find cases such as this, search for any entity reference where whitespace precedes the next semicolon:
&[^;]*\s
Because the next character after the entity reference is unpredictable, you're better off replacing it manually or letting Tidy or TagSoup do the work. They can both fix most of these problems.
This search will also find a number of purely unescaped ampersands. This is especially common in two places: JavaScript and URLs.
Validation should find any remaining cases, and you can fix those by hand. Sometimes manual inspection is necessary to see exactly where the entity boundary lies.