- What Is Well-Formedness?
- Change Name to Lowercase
- Quote Attribute Value
- Fill In Omitted Attribute Value
- Replace Empty Tag with Empty-Element Tag
- Add End-tag
- Remove Overlap
- Convert Text to UTF-8
- Escape Less-Than Sign
- Escape Ampersand
- Escape Quotation Marks in Attribute Values
- Introduce an XHTML DOCTYPE Declaration
- Terminate Each Entity Reference
- Replace Imaginary Entity References
- Introduce a Root Element
- Introduce the XHTML Namespace
Replace Imaginary Entity References
Make sure all entity references used in the document are defined.
©right; 2007 TIC Corp.
© 2007 TIC Corp.
Motivation
Occasionally, authors begin to use entity references that simply don't exist. Sometimes it's a simple typo, such as &apm; instead of &. Sometimes it's misremembered code, such as &tm; instead of ™ or ©right; instead of ©. Either way, this causes display problems for all browsers and should be fixed.
Potential Trade-offs
None. This is only good.
Mechanics
The hardest problem is finding these imaginary entity references, because there's not necessarily any rhyme or reason to them. Often, the first time you realize there's a problem is while browsing your site. If you're lucky it will appear in the plain text like this:
©right; 2007 TIC Corp.
If not, the browser will just drop it out completely:
2007 TIC Corp.
The same mistakes do tend to repeat themselves, so once you've noticed a problem, a straight search and replace will usually find and fix all other occurrences.
Otherwise, validation (or at least well-formedness checking) is necessary to identify these issues. Once a validator finds such imaginary entity references, you can fix them by hand if they aren't too numerous, or with a targeted search and replace if they are.
Occasionally, you'll find someone has invented an entity reference that perhaps should exist but doesn't: ¥ for ¥ or &bet; for the Hebrew letter . Although it's theoretically possible to define new entity references such as these in the internal DTD subset or external DTD, I do not recommend this. XML parsers can handle this, but browsers cannot. Either replace the references with the actual characters (especially if you already reencoded the document in UTF-8) or use a numeric character reference such as ¥ or ב.