Glyphs
It is easy to confuse “glyphs” with “characters” because it is the glyph of the character that is drawn onscreen and hence what we are looking at. A glyph is a pattern, a shape, or an outline of the character’s image. Characters are what you type; glyphs are what you see.
Two points need to be called out:
- A character conveys differences in meaning or sound. No appearance property is associated with it.
- A glyph conveys differences in appearance. The key thing is appearance. A glyph has no intrinsic meaning.
Figure 2.4 shows a sampling of some glyphs for the Latin character “c.” Note that they are all of the same character, lowercase “c,” and therefore have the same meaning, but they are displayed, shaped, and outlined differently.
Figure 2.4 A sampling of glyphs for the lowercase character “c.”
There are also cases in which a character will have a glyph assigned to it based on the font set so that the shape it displays is nothing like the traditional shape of the character. “Wingdings” and assorted symbol fonts like that are prime examples in which the lowercase “c” character actually displays as .
Also, variations in a character’s glyph can be associated with things like cursive connectors. In this scenario, the character is displayed traditionally, but it has slight changes depending on what character precedes it and what character follows it. Again, no meaning is lost; we have just gained flourishes.
Contextual Glyphs
For the Arabic and Indic family of languages, a character’s glyph can change greatly depending on the glyph’s position within the word and can change depending on characters that follow and precede it. Let’s focus on the Arabic character “Ain,” "" (Unicode code point U+0639). Figure 2.5 displays the different glyphs used for “Ain” depending on its context and position.
Figure 2.5 Examples of contextual glyphs.
Arabic is a right-to-left language, so the first/initial character is the rightmost, the second is to the left of that, the third is to the left of the second, and so on.
Another example is the Greek character sigma (σ). When it is used at the end of a word and the characters of that word are not all uppercase, the final form of the character “ς” is used, for example, “” (Odysseus). Note the two sigmas in the center of the name that remain the same and the word-final sigma at the end. Same character, different glyphs. This example also demonstrates that uppercase and lowercase are handled as separate characters and not as the same character—same character value, only displayed differently.
Now that we’ve covered glyphs, let’s move on to fonts.
Fonts
The term font is a common, everyday household term. We generally think of it as the shape and display of the characters we are working with, as well as the size and spacing. It is a combination of these properties, as well as the typeface associated with the font.
Font files are your storage depot for the glyphs that are associated with the characters. Well-crafted fonts won’t fake bold, italic, and bold-italic variations but have built-in, designed glyphs for these variations. After your application has worked out what characters it is dealing with, it will look in the font for glyphs in order to display or print those characters. Of course, if the encoding information was wrong, it will be looking up glyphs for the wrong characters.
A given font will usually cover a single character set. In the case of a large character set, like Unicode, just a subset of all the available characters will be available. This is one of the many reasons you will see specific fonts for CJK characters. It is more practical to have specific fonts hold specific character sets for both performance and file-size benefits.
If your font doesn’t have a glyph for a particular character, some applications as well as the OS will look for the missing glyph in other fonts on your system. Although this eliminates a missing glyph from displaying as an empty box or a box containing a question mark, it does have the potential of having the glyph look different from the surrounding text, like a ransom note.
Ligatures
The term ligature, which simply means “connection,” originates from the Latin ligari. The term itself doesn’t imply a certain purpose or use. Today, there are two possible ways to define a ligature, and both ways can appear in connection or individually. If we talk about the display of characters, a ligature is made from two or more letters, which appear connected. In handwriting such connections are created all the time, especially with cursive print.
Some ligatures are two separate characters displayed with a connected glyph, whereas the glyph is one character with its own code point.
Standard ligatures include might include fi, fl, ff, ffi, ffl, and ft. The purpose of these ligatures is to make certain letter parts that tend to knock up against each other more attractive.
Here are some individual Unicode ligature characters:
- æ—CYRILLIC SMALL LIGATURE A IE; Unicode: U+04D5; UTF-8: D3 95
- fl—LATIN SMALL LIGATURE FL; Unicode: U+FB02; UTF-8: EF AC 82
- fi—LATIN SMALL LIGATURE FI; Unicode: U+FB01; UTF-8: EF AC 81
Code Snippet to Compare Ligatures
Listing 2.7 compares the single-character ligature “ff” to the two-character equivalent “ff.” The localizedCompare method returns an NSComparisonResult value, which could be an enum of NSOrderedAscending, NSOrderedSame, or NSOrderedDescending.
Listing 2.7 Using Localized Compare to Determine Whether Characters Are Equal
NSString *characters = @"ff"; // Two "f" characters NSString *ligature = @"\uFB00"; // Single character - "ff" ligature NSComparisonResult result = [characters localizedCompare: ligature]; if (result == NSOrderedSame){ NSLog(@"%@ is equal to %@", characters, ligature); } else{ NSLog(@"Characters are not equal."); };
The code returns “ff is equal to ff.”