- Character Sets and Collations in General
- Character Sets and Collations in MySQL
- Determining the Default Character Set and Collation
- Operations Affected by Character Set Support
- Unicode Support
- UTF8 for Metadata
- Compatibility with Other DBMSs
- New Character Set Configuration File Format
- National Character Set
- Upgrading Character Sets from MySQL 4.0
- Character Sets and Collations That MySQL Supports
3.11 Character Sets and Collations That MySQL Supports
Here is an annotated list of character sets and collations that MySQL supports. Because options and installation settings differ, some sites might not have all items listed, and some sites might have items not listed.
MySQL supports 70+ collations for 30+ character sets. The character sets and their default collations are displayed by the SHOW CHARACTER SET statement. (The output actually includes another column that is not shown so that the example fits better on the page.)
mysql> SHOW CHARACTER SET; +----------+-----------------------------+---------------------+ | Charset | Description | Default collation | +----------+-----------------------------+---------------------+ | big5 | Big5 Traditional Chinese | big5_chinese_ci | | dec8 | DEC West European | dec8_swedish_ci | | cp850 | DOS West European | cp850_general_ci | | hp8 | HP West European | hp8_english_ci | | koi8r | KOI8-R Relcom Russian | koi8r_general_ci | | latin1 | ISO 8859-1 West European | latin1_swedish_ci | | latin2 | ISO 8859-2 Central European | latin2_general_ci | | swe7 | 7bit Swedish | swe7_swedish_ci | | ascii | US ASCII | ascii_general_ci | | ujis | EUC-JP Japanese | ujis_japanese_ci | | sjis | Shift-JIS Japanese | sjis_japanese_ci | | cp1251 | Windows Cyrillic | cp1251_bulgarian_ci | | hebrew | ISO 8859-8 Hebrew | hebrew_general_ci | | tis620 | TIS620 Thai | tis620_thai_ci | | euckr | EUC-KR Korean | euckr_korean_ci | | koi8u | KOI8-U Ukrainian | koi8u_general_ci | | gb2312 | GB2312 Simplified Chinese | gb2312_chinese_ci | | greek | ISO 8859-7 Greek | greek_general_ci | | cp1250 | Windows Central European | cp1250_general_ci | | gbk | GBK Simplified Chinese | gbk_chinese_ci | | latin5 | ISO 8859-9 Turkish | latin5_turkish_ci | | armscii8 | ARMSCII-8 Armenian | armscii8_general_ci | | utf8 | UTF-8 Unicode | utf8_general_ci | | ucs2 | UCS-2 Unicode | ucs2_general_ci | | cp866 | DOS Russian | cp866_general_ci | | keybcs2 | DOS Kamenicky Czech-Slovak | keybcs2_general_ci | | macce | Mac Central European | macce_general_ci | | macroman | Mac West European | macroman_general_ci | | cp852 | DOS Central European | cp852_general_ci | | latin7 | ISO 8859-13 Baltic | latin7_general_ci | | cp1256 | Windows Arabic | cp1256_general_ci | | cp1257 | Windows Baltic | cp1257_general_ci | | binary | Binary pseudo charset | binary | | geostd8 | GEOSTD8 Georgian | geostd8_general_ci | +----------+-----------------------------+---------------------+
3.11.1 Unicode Character Sets
MySQL has two Unicode character sets. You can store texts in about 650 languages using these character sets. We have not added a large number of collations for these two new sets yet, but that will be happening soon. Currently, they have default case-insensitive accent-insensitive collations, plus the binary collation.
Currently, the ucs2_general_uca collation has only partial support for the Unicode Collation Algorithm. Some characters are not supported yet.
-
ucs2 (UCS-2 Unicode) collations:
-
ucs2_bin
-
ucs2_general_ci (default)
-
ucs2_general_uca
-
utf8 (UTF-8 Unicode) collations:
-
utf8_bin
-
utf8_general_ci (default)
3.11.2 West European Character Sets
West European Character Sets cover most West European languages, such as French, Spanish, Catalan, Basque, Portuguese, Italian, Albanian, Dutch, German, Danish, Swedish, Norwegian, Finnish, Faroese, Icelandic, Irish, Scottish, and English.
-
ascii (US ASCII) collations:
-
ascii_bin
-
ascii_general_ci (default)
-
cp850 (DOS West European) collations:
-
cp850_bin
-
cp850_general_ci (default)
-
dec8 (DEC West European) collations:
-
dec8_bin
-
dec8_swedish_ci (default)
-
hp8 (HP West European) collations:
-
hp8_bin
-
hp8_english_ci (default)
-
latin1 (ISO 8859-1 West European) collations:
-
latin1_bin
-
latin1_danish_ci
-
latin1_general_ci
-
latin1_general_cs
-
latin1_german1_ci
-
latin1_german2_ci
-
latin1_spanish_ci
-
latin1_swedish_ci (default)
-
latin1_german1_ci (dictionary) rules:
-
latin1_german2_ci (phone-book) rules:
-
macroman (Mac West European) collations:
-
macroman_bin
-
macroman_general_ci (default)
-
swe7 (7-bit Swedish) collations:
-
swe7_bin
-
swe7_swedish_ci (default)
The latin1 is the default character set. The latin1_swedish_ci collation is the default that probably is used by the majority of MySQL customers. It is constantly stated that this is based on the Swedish/Finnish collation rules, but you will find Swedes and Finns who disagree with that statement.
The latin1_german1_ci and latin1_german2_ci collations are based on the DIN-1 and DIN-2 standards, where DIN stands for Deutsches Institut für Normung (that is, the German answer to ANSI). DIN-1 is called the dictionary collation and DIN-2 is called the phone-book collation.
'Ä' = 'A', 'Ö' = 'O', 'Ü' = 'U', 'ß' = 's'
'Ä' = 'AE', 'Ö' = 'OE', 'Ü' = 'UE', 'ß' = 'ss'
-In the latin1_spanish_ci collation, 'Ñ' (N-tilde) is a separate letter between 'N' and 'O'.
3.11.3 Central European Character Sets
We have some support for character sets used in the Czech Republic, Slovakia, Hungary, Romania, Slovenia, Croatia, and Poland.
-
cp1250 (Windows Central European) collations:
-
cp1250_bin
-
cp1250_czech_ci
-
cp1250_general_ci (default)
-
cp852 (DOS Central European) collations:
-
cp852_bin
-
cp852_general_ci (default)
-
keybcs2 (DOS Kamenicky Czech-Slovak) collations:
-
keybcs2_bin
-
keybcs2_general_ci (default)
-
latin2 (ISO 8859-2 Central European) collations:
-
latin2_bin
-
latin2_croatian_ci
-
latin2_czech_ci
-
latin2_general_ci (default)
-
latin2_hungarian_ci
-
macce (Mac Central European) collations:
-
macce_bin
-
macce_general_ci (default)
3.11.4 South European and Middle East Character Sets
-
armscii8 (ARMSCII-8 Armenian) collations:
-
armscii8_bin
-
armscii8_general_ci (default)
-
cp1256 (Windows Arabic) collations:
-
cp1256_bin
-
cp1256_general_ci (default)
-
geostd8 (GEOSTD8 Georgian) collations:
-
geostd8_bin
-
geostd8_general_ci (default)
-
greek (ISO 8859-7 Greek) collations:
-
greek_bin
-
greek_general_ci (default)
-
hebrew (ISO 8859-8 Hebrew) collations:
-
hebrew_bin
-
hebrew_general_ci (default)
-
latin5 (ISO 8859-9 Turkish) collations:
-
latin5_bin
-
latin5_turkish_ci (default)
3.11.5 Baltic Character Sets
The Baltic character sets cover Estonian, Latvian, and Lithuanian languages. There are two Baltic character sets currently supported:
-
cp1257 (Windows Baltic) collations:
-
cp1257_bin
-
cp1257_general_ci (default)
-
cp1257_lithuanian_ci
-
latin7 (ISO 8859-13 Baltic) collations:
-
latin7_bin
-
latin7_estonian_cs
-
latin7_general_ci (default)
-
latin7_general_cs
3.11.6 Cyrillic Character Sets
Here are the Cyrillic character sets and collations for use with Belarusian, Bulgarian, Russian, and Ukrainian languages.
-
cp1251 (Windows Cyrillic) collations:
-
cp1251_bin
-
cp1251_bulgarian_ci
-
cp1251_general_ci (default)
-
cp1251_general_cs
-
cp1251_ukrainian_ci
-
cp866 (DOS Russian) collations:
-
cp866_bin
-
cp866_general_ci (default)
-
koi8r (KOI8-R Relcom Russian) collations:
-
koi8r_bin
-
koi8r_general_ci (default)
-
koi8u (KOI8-U Ukrainian) collations:
-
koi8u_bin
-
koi8u_general_ci (default)
3.11.7 Asian Character Sets
The Asian character sets that we support include Chinese, Japanese, Korean, and Thai. These can be complicated. For example, the Chinese sets must allow for thousands of different characters.
-
big5 (Big5 Traditional Chinese) collations:
-
big5_bin
-
big5_chinese_ci (default)
-
euckr (EUC-KR Korean) collations:
-
euckr_bin
-
euckr_korean_ci (default)
-
gb2312 (GB2312 Simplified Chinese) collations:
-
gb2312_bin
-
gb2312_chinese_ci (default)
-
gbk (GBK Simplified Chinese) collations:
-
gbk_bin
-
gbk_chinese_ci (default)
-
sjis (Shift-JIS Japanese) collations:
-
sjis_bin
-
sjis_japanese_ci (default)
-
tis620 (TIS620 Thai) collations:
-
tis620_bin
-
tis620_thai_ci (default)
-
ujis (EUC-JP Japanese) collations:
-
ujis_bin
-
ujis_japanese_ci (default)