- Language Settings
- Space Handling
- Date and Time Representation
- Summary
- References
Space Handling
Although internationalization is often about separating the presentation information from the content, a few instances exist where the presentation parameters must be known by the tools for a more efficient translation. One of them is the information about how to handle white spaces.
White spaces are defined as spaces, tabs, carriage returns, and line-feeds:
WS ::= (#x20 | #x9 | #xD | #xA)+
You will notice that other "space"-like characters such as NO-BREAK SPACE (U+00A0), IDEOGRAPHIC SPACE (U+3000), EM SPACE (U+2003), THIN SPACE (U+2009), EN QUAD (U+2000), and so forth are not included in the white space list. They are treated just like regular characters as far as XML processors are concerned.
As for the language, XML defines a special attribute to indicate how white spaces should be handled in a given element set: xml:space.
The attribute can have two values: default or preserve. The first one lets the XML processor behave as its default mechanism is set, whereas the second one indicates that all white spaces must be preserved and passed without transformation.
If you use a DTD to specify your format, xml:space must be declared, just as any other attribute:
<!ATTLIST SourceCode xml:space (default|preserve) 'preserve' >
or
<!ATTLIST pre xml:space (preserve) #FIXED 'preserve'>
Always keep in mind that xml:space is an indicator for the parser, not the rendering engine, although some rendering engines are taking it into consideration (such as Adobe's SVG viewer).
Localization tools should take into account the presence of xml:space when extracting content. It is the best indicator to specify whether the white spaces of a run of text should be left alone. This information should be carried during the translation.
Listing 3.5 shows an XML file where the element <cmdline> contains preformatted text, while the multiple spaces in the <p> element should be reduced to a single blank.
Listing 3.5 Spaces1.xmlUsage of the xml:space Attribute
<?xml version="1.0" ?> <doc> <cmdline id="1" xml:space="preserve">Command line: -x run the tool with option x -f[name] specify [name] for font</cmdline>
<p id="2">Text where any set of white spaces is reduced to 1.</p> </doc>
XHTML
The XHTML specifications add a few clauses to the handling of white spaces.
In addition to line-breaks, tabulations, and space, the characters FORM FEED (U+000C) and ZERO WIDTH SPACE (U+200B) must also be treated as white spaces.
Leading and trailing white spaces in block elements should be removed unless the xml:space attribute is set to preserve. In other words, the following XHTML fragments are identical to one another.
<p> This is an example </p><p> This is an example </p>
<p>This is an example</p>
CSS
When rendering is involved, you can use the white-space property of CSS to specify how the preformatting should be handled. The values available are normal, pre, nowrap, and inherit.
If you take the document shown in Listing 3.5 and apply to it the style sheet displayed in Listing 3.6, you can see in Figure 3.1 that the rendering of the <cmdline> is done correctly.
Listing 3.6 Spaces3.cssStyle Sheet used to Display Figure 3.1
doc { display: block; margin-top: 10px; margin-left: 10px; } p { display: block; margin-bottom: 10px; } cmdline { display: block; margin-bottom: 10px; white-space: pre; font-family: "Courier New"; }
Figure 3.1 Rendering of white spaces with the white-space CSS attribute in Navigator 6.
Note that the same example would not work with Internet Explorer 5.5, which does not support the white-space property correctly yet (version 5.5.4522.1800, with SP1).