Summary and additional reference
Enabling your product for the world market simply makes economic sense. And the steps above show that the process is relatively straightforward. Now here's that quiz we mentioned in the introduction:
True or False: The majority of IBM's worldwide software sales revenue is within the United States.
False. Indeed, more than 50% of IBM software revenue comes from outside the United States.
Fortunately, those developers with products based on the Eclipse platform benefit from having ready translations of the base product. All that is left is to follow the clear steps outlined in this article to open your Eclipse-based product to a worldwide market!
Eclipse-specific (non-Java) translatable resources
Here is a summary of the previously presented list of translatable resources, along with a brief explanation of how they are processed.
Table 2. Eclipse-specific (non-Java) translatable resources
Translated items |
Required or optional |
High-level steps |
Plug-in files |
Required |
|
Plug-in "About" file |
Optional |
|
Online help |
Required |
|
Splash* |
Optional |
To localize the splash screen, you will need to create locale subdirectories under eclipse/splash. The names of these directories follow the standard Java locale-naming conventions. For example the platform looks up the splash screen for USA english locale (en_US) as follows:
|
Product configuration* |
Optional |
|
Plug-in product files* |
Required |
|
License* |
Optional |
|
For more information on translatable resources, see the article on the eclipse.org Web site "Creating Product Branding" (see Resources) by Greg Adams.
Unicode codepoints of common accented Latin characters
Table 3. Unicode codepoints of common accented Latin characters
Characters |
|
\u00e0 |
a grave |
\u00e1 |
a acute |
\u00c0 |
A grave |
\u00c1 |
A acute |
\u00c2 |
A circumflex |
\u00e2 |
a circumflex |
\u00c3 |
A tilde |
\u00e4 |
a dieresis |
\u00c4 |
A dieresis |
\u00e8 |
e grave |
\u00c8 |
E grave |
\u00e9 |
e acute |
\u00c9 |
E acute |
\u00ea |
e circumflex |
\u00eb |
e dieresis |
\u00cb |
E dieresis |
\u00ea |
e circumflex |
\u00ca |
E circumflex |
\u00ef |
i dieresis |
\u00ec |
i grave |
\u00ed |
i acute |
\u00cc |
I grave |
\u00cd |
I acute |
\u00ee |
i circumflex |
\u00ce |
I circumflex |
\u00f6 |
o dieresis |
\u00d6 |
O dieresis |
\u00e3 |
a tilde |
\u00f4 |
o circumflex |
\u00d4 |
O circumflex |
\u00f2 |
o grave |
\u00d2 |
O grave |
\u00f3 |
o acute |
\u00d3 |
O acute |
\u00f5 |
o tilde |
\u00d5 |
O tilde |
\u00f1 |
n tilde |
\u00d1 |
N tilde |
\u00f9 |
u grave |
\u00d9 |
U grave |
\u00fa |
u acute |
\u00da |
U acute |
\u00fb |
u circumflex |
\u00db |
U circumflex |
\u00fc |
u dieresis |
\u00dc |
U dieresis |
\u00df |
s sharp |
Special symbols |
|
\u00ba |
masculine ordinal indicator |
\u00a7 |
section sign |
\u00aa |
feminine ordinal indicator |
\u00ac |
not sign |
\u00b9 |
1 superscript |
\u00b2 |
2 superscript |
\u00b3 |
3 superscript |
\u00a3 |
pound sign |
\u00a2 |
cents sign |
\u00b0 |
degree sign |
Glossary
Codepoint
Characters can be represented by one or more bytes of information. Codepoints are the hexadecimal values assigned to each graphic character.
Codepage
A codepage is a specification of code points for each graphic character in a set, or in a collection of graphic character sets. Within a given codepage, a codepoint can have only one specific meaning. You can display the active codepage on the Windows® operating system with the CHCP command (only one codepage is active at any given moment).
Encoding
The codepage associated with a given piece of data. A file is said to be "encoded" in a given code page; for example, Notepad will encode its data in code page 437 on a US-English machine by default. The Save As dialog allows the user to select several other possible encodings, Unicode and UTF-8 most notable among them.
Internationalization (sometimes abbreviated "I18N")
Internationalization refers to the process of developing programs without prior knowledge of the language, cultural data, or character encoding schemes they are expected to handle. In system terms, it refers to the provision of interfaces that enable internationalized programs to change their behavior at run time for specific language operation.
Single-Byte Coded Character Set (SBCS)
In a single-byte coded character set, a one-byte codepoint represents each character in the set. Typically, SBCS is used to represent the characters of the English language, the European languages, the Cyrillic languages, the Arabic language, and the Hebrew language, to name a few.
Double-Byte Coded Character Set (DBCS)
In a double-byte coded character set (DBCS), a two-byte codepoint represents each character in the set. Languages that are ideographic in nature, such as Japanese, Chinese, and Korean, have more characters than can be represented internally by 256 code points and thus require double-byte coded character sets.
Localization (sometimes abbreviated "L10N")
Localization refers to the process of establishing information within a computer system specific to each supported language, cultural data, and coded character set combination.
Mixed-Byte Character Set
A mixed-byte coded character set is a set of characters containing both single-byte characters and double-byte characters. On the MBCS, each byte of data must be examined to see if it is the first byte of a double-byte or single-byte character. If the byte is in a certain range (greater than X'80', for example), then it is the first byte of a double-byte character.
NLS
National Language Support.
Unicode
Directly from http://www.unicode.org/: "Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language."
NOTE
While it is true that Java text manipulation classes are Unicode-centric, this is often not the case for data stored outside of your program's auspices. Java programmers must take into consideration the data encoding by performing local codepage-to-Unicode transformations where necessary.