- Character Sets and Collations in General
- Character Sets and Collations in MySQL
- Determining the Default Character Set and Collation
- Operations Affected by Character Set Support
- Unicode Support
- UTF8 for Metadata
- Compatibility with Other DBMSs
- New Character Set Configuration File Format
- National Character Set
- Upgrading Character Sets from MySQL 4.0
- Character Sets and Collations That MySQL Supports
3.2 Character Sets and Collations in MySQL
The MySQL server can support multiple character sets. To list the available character sets, use the SHOW CHARACTER SET statement:
mysql> SHOW CHARACTER SET; +----------+-----------------------------+---------------------+ | Charset | Description | Default collation | +----------+-----------------------------+---------------------+ | big5 | Big5 Traditional Chinese | big5_chinese_ci | | dec8 | DEC West European | dec8_swedish_ci | | cp850 | DOS West European | cp850_general_ci | | hp8 | HP West European | hp8_english_ci | | koi8r | KOI8-R Relcom Russian | koi8r_general_ci | | latin1 | ISO 8859-1 West European | latin1_swedish_ci | | latin2 | ISO 8859-2 Central European | latin2_general_ci | ...
The output actually includes another column that is not shown so that the example fits better on the page.
Any given character set always has at least one collation. It may have several collations.
To list the collations for a character set, use the SHOW COLLATION statement. For example, to see the collations for the latin1 ("ISO-8859-1 West European") character set, use this statement to find those collation names that begin with latin1:
mysql> SHOW COLLATION LIKE 'latin1%'; +-------------------+---------+----+---------+----------+---------+ | Collation | Charset | Id | Default | Compiled | Sortlen | +-------------------+---------+----+---------+----------+---------+ | latin1_german1_ci | latin1 | 5 | | | 0 | | latin1_swedish_ci | latin1 | 8 | Yes | Yes | 1 | | latin1_danish_ci | latin1 | 15 | | | 0 | | latin1_german2_ci | latin1 | 31 | | Yes | 2 | | latin1_bin | latin1 | 47 | | Yes | 1 | | latin1_general_ci | latin1 | 48 | | | 0 | | latin1_general_cs | latin1 | 49 | | | 0 | | latin1_spanish_ci | latin1 | 94 | | | 0 | +-------------------+---------+----+---------+----------+---------+
The latin1 collations have the following meanings:
Collation |
Meaning |
latin1_bin |
Binary according to latin1 encoding |
latin1_danish_ci |
Danish/Norwegian |
latin1_general_ci |
Multilingual |
latin1_general_cs |
Multilingual, case sensitive |
latin1_german1_ci |
German DIN-1 |
latin1_german2_ci |
German DIN-2 |
latin1_spanish_ci |
Modern Spanish |
latin1_swedish_ci |
Swedish/Finnish |
Collations have these general characteristics:
-
Two different character sets cannot have the same collation.
-
Each character set has one collation that is the default collation. For example, the default collation for latin1 is latin1_swedish_ci.
-
There is a convention for collation names: They start with the name of the character set with which they are associated, they usually include a language name, and they end with _ci (case insensitive), _cs (case sensitive), _bin (binary), or _uca (Unicode Collation Algorithm, http://www.unicode.org/reports/tr10/).