2.7 Text-Level Semantics—More New Tags
Apart from focusing on clear structures, the HTML5 specification also attaches importance to semantics and tries to assign each element a certain meaning on the text level. At the same time, the HTML5 specification determines in which context the tag concerned can be used and in which it cannot. There are some new elements and some that have disappeared completely (such as font, center, and big), and the definitions of others have changed slightly. The following chapter will introduce new and changed elements. Later, in Table 2.2 we will show you the classical applications of all elements that appear in the specification's Text-level semantics chapter. Let's start with the most exotic of the new elements—ruby.
2.7.1 The Elements "ruby," "rt," and "rp"
The term ruby refers to a typographic annotation system, meaning "short runs of text alongside the base text, typically used in East Asian documents to indicate pronunciation or to provide a short annotation" (www.w3.org/TR/ruby). Ruby annotation is used in Chinese and Japanese to show the pronunciation of characters, as you can see in the example on the left in Figure 2.7.
Figure 2.7 Two examples of ruby annotation
The markup for ruby annotations contains the elements ruby, rt, and rp. First, the expression that will be explained is specified within a ruby element. The explanation is then provided by the following rt element, and in browsers with ruby support the content of this rt element is positioned above the expression described. As you can see in the Beijing example, several words in a row can be annotated this way.
Browsers without ruby support (such as Firefox and Opera) display the individual components consecutively, which can make the words more difficult to read. Because it is not necessarily clear that the second word is the explanation of the first word, a visual separation of the two components is required. That is what the rp element is for: It enables adding optional parentheses that will only be displayed if a browser does not know ruby. As you can see in Figure 2.7, Google Chrome can interpret ruby and visually separate it. A browser without ruby support would display the examples as J ng and HTML N°5 (Web Standard).
2.7.2 The "time" Element
The time element represents either a time in the 24-hour-format or a date in the Gregorian calendar with optional time and time-zone components. Its purpose is to give modern date and time specifications in a machine-readable format within an HTML5 document. Vague time references, such in the spring of 2011 or five minutes before the turn of the millennium, are therefore not allowed.
To ensure machine readability, we can use the attribute datetime, and its attribute value can be specified either as time, date, or a combination of both. The syntax for specifying the time components is clearly defined in the specification and is described in Table 2.1.
Table 2.1. The Rules for Timestamps for the "time" Element's "datetime" Attribute
Component |
Syntax |
Example |
Date |
YYYY-MM-DD |
2011-07-13 |
Time with hours |
hh:mm |
18:28 |
Time with seconds |
hh:mm:ss |
18:28:05 |
Time with milliseconds |
hh:mm:ss.f |
18:28:05.2318 |
Date and time |
T to join date and time |
2011-07-13T18:28 |
With time zone GMT |
Z at the end |
2011-07-13T18:28:05Z |
With time zone as offset |
+mm:hh / -mm:hh |
2011-07-13T18:28:05+02:00 |
The pubdate attribute is a boolean attribute and indicates that the specified date applies to the next level article in the hierarchy, and—if there is none—should be understood as the publication date of the document. If you are using pubdate, there has to be a datetime element as well. If this is not the case, the section between the time element's start tag and end tag must contain a valid date.
2.7.3 The "mark" Element
The mark element represents a highlighted text segment that is regarded as relevant in a different context. That sounds a bit cumbersome, so we will illustrate it with some brief examples: If you want to highlight a certain passage of a quotation in particular, you change the original text and almost force a new meaning onto it. You can use the mark element to add significance to certain words in a document or code listing as a result of searching for them or in the course of interpreting the code.
2.7.4 The "wbr" Element
Unsurprisingly, the wbr element enables the browser to insert an optional line break in long words. For example, inserting a couple of wbr elements in a rather long word, such as supercalifragilisticexpialidocious, would give the browser the opportunity to break the word over two lines if the layout requires it:
supercali<wbr>fragilistic<wbr>expialidocious
It depends entirely on the layout whether and where the line break occurs. wbr only allows a line break, it does not force it. Possible applications would be long URLs or code listings. Similar to br, wbr is a so-called void element, which means it must not contain an end tag—a quality it shares with 14 other elements in HTML5. Here they are
area |
base |
br |
col |
command |
embed |
hr |
img |
input |
keygen |
link |
meta |
param |
source |
wbr |
But of course void elements can contain a slash in the start tag itself (e.g., <br />), which is useful with regard to meeting the requirements of valid XHTML5 documents.
2.7.5 Elements with Marginal Changes
The list of elements with marginal changes starts with b and i, two tags that no longer fit into the concept of HTML5, also because of their names: b for bold and i for italic give definite formatting instructions, and these are not popular in HTML5. The relevance is now essential, so we should instead use strong and em as in emphasis to stress the importance of a word. Unfortunately, b and i are among the most widely used tags, which is why it was impossible to prevent their use altogether. The solution was a compromise that continues to allow both but alters their meaning: b now refers to offset text in bold and i to offset text in italics. But if you want to write clean HTML5, you should avoid using b and i in the future and instead use strong and em.
Other small changes mean that cite now designates the title of a work and must explicitly not be used for citing names. small now means not only small print, but also represents side comments or small print in the sense of legal notices but without making statements as to their importance. hr now signals a thematic break, not just a horizontal line to break up the layout.
The specification offers a usage summary of individual tags with examples at the end of the chapter Text-level semantics. To save you from having to look it up, here it is in our Table 2.2.
Table 2.2. Usage of Semantic Text Elements
Element |
Purpose |
Example |
a |
Hyperlinks |
Visit my <a href="drinks.html">drinks</a> page. |
em |
Stress emphasis |
I must say I <em>adore</em> lemonade. |
strong |
Importance |
This tea is <strong>very hot</strong>. |
small |
Side comments |
These grapes are made into wine. <small>Alcohol is addictive.</small> |
s |
Inaccurate text |
Price: <s>£4.50</s> £2.00! |
cite |
Titles of works |
The case <cite>Hugo v. Danielle</cite> is relevant here. |
q |
Quotations |
The judge said <q>You can drink water from the fish tank</q> but advised against it. |
dfn |
Defining instance |
The term <dfn>organic food</dfn> refers to food produced without synthetic chemicals. |
abbr |
Abbreviations |
Organic food in Ireland is certified by the <abbr title="Irish Organic Farmers and Growers Association">IOFGA</abbr>. |
code |
Computer code |
The <code>fruitdb</code> program can be used for tracking fruit production. |
var |
Variables |
If there are <var>n</var> fruit in the bowl, at least <var>n</var>÷2 will be ripe. |
samp |
Computer output |
The computer said <samp>Unknown error -3</samp>. |
kbd |
User input |
Hit <kbd>F1</kbd> to continue. |
sub |
Subscripts |
Water is H<sub>2</sub>O. |
sup |
Superscripts |
The hydrogen in heavy water is usually <sup>2</sup>H. |
i |
Alternative voice |
Lemonade consists primarily of <i>Citrus limon</i>. |
b |
Keywords |
Take a <b>lemon</b> and squeeze it with a <b>juicer</b>. |
mark |
Highlight |
Elderflower cordial, with one <mark>part</mark> cordial to ten <mark>part</mark>s water, stands a<mark>part</mark> from the rest. |
ruby, rt, rp |
Ruby annotations |
<ruby> OJ <rp>(<rt>Orange Juice<rp>)</ruby> |
bdi |
Text directionality isolation |
The recommended restaurant is <bdi lang="">My Juice Café (At The Beach)</bdi>. |
bdo |
Text directionality formatting |
The proposal is to write English but in reverse order. "Juice" would become "<bdo dir=rtl>Juice</bdo>» |
span |
Other |
In French we call it <span lang="fr">sirop de sureau</span>. |
br |
Line break |
Simply Orange Juice Company<br>Apopka, FL 32703<br>U.S.A. |
wbr |
Line breaking opportunity |
www.simply<wbr>orange<wbr>juice.com |