- What Is Well-Formedness?
- Change Name to Lowercase
- Quote Attribute Value
- Fill In Omitted Attribute Value
- Replace Empty Tag with Empty-Element Tag
- Add End-tag
- Remove Overlap
- Convert Text to UTF-8
- Escape Less-Than Sign
- Escape Ampersand
- Escape Quotation Marks in Attribute Values
- Introduce an XHTML DOCTYPE Declaration
- Terminate Each Entity Reference
- Replace Imaginary Entity References
- Introduce a Root Element
- Introduce the XHTML Namespace
Escape Quotation Marks in Attribute Values
Convert " to " or ' to ' in attribute values.
<blockquote cite='Jane's Fighting Ships 2007-2008, Stephen, R.N. Saunders, p. 32'> <a title="How the Supreme Court "elected" George W. Bush president">
<blockquote cite='Jane's Fighting Ships 2007-2008, Stephen, R.N. Saunders, p. 32'> <blockquote cite="Jane's Fighting Ships 2007-2008, Stephen, R.N. Saunders, p. 32"> <a title='How the Supreme Court "elected" George W. Bush'> <a title="How the Supreme Court "elected" George W. Bush president">
Motivation
A quotation mark that appears inside an attribute value delimited with the same style of quotation mark prematurely closes the value. Different browsers deal differently with this situation, but the result is almost never anything you want. Even if you aren't transitioning to full XHTML, this refactoring is an important fix.
Potential Trade-offs
None. This change can only improve your web pages.
Mechanics
Because this is a real bug that does cause problems on pages, it's unlikely to show up in a lot of significant places. You can usually fix all the occurrences by hand fairly easily.
Because the legality or illegality of any one quote mark depends on others, it's not easy to check for this problem using regular expressions. However, well-formedness testing will find this problem. Indeed, you may need to fix this one before fixing other, lesser problems because it's likely to hide other errors.
As with < and &, this problem is most often caused by blindly copying data from a database or other external source without first scanning it for reserved characters. Be sure to clean the data using a function such as PHP's htmlspecialchars to convert quotation marks and apostrophes into the equivalent entity references before inserting them into attribute values.
Contrary to popular belief, you do not need to escape all quotation marks, only those inside attribute values. You can escape quote marks in plain text if you want to, but this is superfluous. I usually don't bother. Even inside attribute values, you only need to escape the kind of quote that delimits the attribute value. Because different authors, editors, and tools differ in whether they prefer single or double quote marks, I usually escape both to be safe.
Tidy and TagSoup cannot reliably fix quotation marks inside attribute values. For example, Tidy turned this:
<blockquote cite='Jane's Fighting Ships 2007-2008, Stephen, R.N. Saunders, p. 32'>
into this:
<blockquote cite='Jane' s="" fighting="" ships="" r.n.="" p.="">
You shouldn't encounter a lot of these problems, though, so it's best to fix them by hand once a validator points them out.