- Character Sets and Collations in General
- Character Sets and Collations in MySQL
- Determining the Default Character Set and Collation
- Operations Affected by Character Set Support
- Unicode Support
- UTF8 for Metadata
- Compatibility with Other DBMSs
- New Character Set Configuration File Format
- National Character Set
- Upgrading Character Sets from MySQL 4.0
- Character Sets and Collations That MySQL Supports
3.10 Upgrading Character Sets from MySQL 4.0
Now, what about upgrading from older versions of MySQL? MySQL 4.1 is almost upward compatible with MySQL 4.0 and earlier for the simple reason that almost all the features are new, so there's nothing in earlier versions to conflict with. However, there are some differences and a few things to be aware of.
Most important: The "MySQL 4.0 character set" has the properties of both "MySQL 4.1 character sets" and "MySQL 4.1 collations." You will have to unlearn this. Henceforth, we will not bundle character set/collation properties in the same conglomerate object.
There is a special treatment of national character sets in MySQL 4.1. NCHAR is not the same as CHAR, and N'...' literals are not the same as '...' literals.
Finally, there is a different file format for storing information about character sets and collations. Make sure that you have reinstalled the /share/mysql/charsets/ directory containing the new configuration files.
If you want to start mysqld from a 4.1.x distribution with data created by MySQL 4.0, you should start the server with the same character set and collation. In this case you won't need to reindex your data.
There are two ways to do so:
shell> ./configure --with-charset=... --with-collation=... shell> ./mysqld --default-character-set=... --default-collation=...
If you used mysqld with, for example, the MySQL 4.0 danish character set, you should now use the latin1 character set and the latin1_danish_ci collation:
shell> ./configure --with-charset=latin1 \ --with-collation=latin1_danish_ci shell> ./mysqld --default-character-set=latin1 \ --default-collation=latin1_danish_ci
Use the table shown in Section 3.10.1, "4.0 Character Sets and Corresponding 4.1 Character Set/Collation Pairs," to find old 4.0 character set names and their 4.1 character set/collation pair equivalents.
If you have non-latin1 data stored in a 4.0 latin1 table and want to convert the table column definitions to reflect the actual character set of the data, use the instructions in Section 3.10.2, "Converting 4.0 Character Columns to 4.1 Format."
3.10.1 4.0 Character Sets and Corresponding 4.1 Character Set/Collation Pairs
ID |
4.0 Character Set |
4.1 Character Set |
4.1 Collation |
1 |
big5 |
big5 |
big5_chinese_ci |
2 |
czech |
latin2 |
latin2_czech_ci |
3 |
dec8 |
dec8 |
dec8_swedish_ci |
4 |
dos |
cp850 |
cp850_general_ci |
5 |
german1 |
latin1 |
latin1_german1_ci |
6 |
hp8 |
hp8 |
hp8_english_ci |
7 |
koi8_ru |
koi8r |
koi8r_general_ci |
8 |
latin1 |
latin1 |
latin1_swedish_ci |
9 |
latin2 |
latin2 |
latin2_general_ci |
10 |
swe7 |
swe7 |
swe7_swedish_ci |
11 |
usa7 |
ascii |
ascii_general_ci |
12 |
ujis |
ujis |
ujis_japanese_ci |
13 |
sjis |
sjis |
sjis_japanese_ci |
14 |
cp1251 |
cp1251 |
cp1251_bulgarian_ci |
15 |
danish |
latin1 |
latin1_danish_ci |
16 |
hebrew |
hebrew |
hebrew_general_ci |
17 |
win1251 |
(removed) |
(removed) |
18 |
tis620 |
tis620 |
tis620_thai_ci |
19 |
euc_kr |
euckr |
euckr_korean_ci |
20 |
estonia |
latin7 |
latin7_estonian_ci |
21 |
hungarian |
latin2 |
latin2_hungarian_ci |
22 |
koi8_ukr |
koi8u |
koi8u_ukrainian_ci |
23 |
win1251ukr |
cp1251 |
cp1251_ukrainian_ci |
24 |
gb2312 |
gb2312 |
gb2312_chinese_ci |
25 |
greek |
greek |
greek_general_ci |
26 |
win1250 |
cp1250 |
cp1250_general_ci |
27 |
croat |
latin2 |
latin2_croatian_ci |
28 |
gbk |
gbk |
gbk_chinese_ci |
29 |
cp1257 |
cp1257 |
cp1257_lithuanian_ci |
30 |
latin5 |
latin5 |
latin5_turkish_ci |
31 |
latin1_de |
latin1 |
latin1_german2_ci |
3.10.2 Converting 4.0 Character Columns to 4.1 Format
Normally, the server runs using the latin1 character set by default. If you have been storing column data that actually is in some other character set that the 4.1 server now supports directly, you can convert the column. However, you should avoid trying to convert directly from latin1 to the "real" character set. This may result in data loss. Instead, convert the column to a binary column type, and then from the binary type to a non-binary type with the desired character set. Conversion to and from binary involves no attempt at character value conversion and preserves your data intact. For example, suppose that you have a 4.0 table with three columns that are used to store values represented in latin1, latin2, and utf8:
CREATE TABLE t ( latin1_col CHAR(50), latin2_col CHAR(100), utf8_col CHAR(150) );
After upgrading to MySQL 4.1, you want to convert this table to leave latin1_col alone but change the latin2_col and utf8_col columns to have character sets of latin2 and utf8. First, back up your table, then convert the columns as follows:
ALTER TABLE t MODIFY latin2_col BINARY(100); ALTER TABLE t MODIFY utf8_col BINARY(150); ALTER TABLE t MODIFY latin2_col CHAR(100) CHARACTER SET latin2; ALTER TABLE t MODIFY utf8_col CHAR(150) CHARACTER SET utf8;
The first two statements "remove" the character set information from the latin2_col and utf8_col columns. The second two statements assign the proper character sets to the two columns.
If you like, you can combine the to-binary conversions and from-binary conversions into single statements:
ALTER TABLE t MODIFY latin2_col BINARY(100), MODIFY utf8_col BINARY(150); ALTER TABLE t MODIFY latin2_col CHAR(100) CHARACTER SET latin2, MODIFY utf8_col CHAR(150) CHARACTER SET utf8;