Convert latin1 to UTF-8 in MySQL

Since MySQL 4.1, UTF-8 is the default charset. If you have an old database, containing data encoded in latin1, and you want upgrade to a newer MySQL server, then you can do the following if you want all your data in UTF-8 instead:

Be careful when switching to UTF-8. Once you have converted your data, any program/webapp that uses the database will have to check that the data they are sending to the database is valid UTF-8. If it isn't then MySQL will silently truncate the data after the invalid part, which can cause all sorts of problems. If your program/webapp doesn't specifically say that it supports unicode then you may want to stick with latin1 instead.

MySQL dump
First of all, we need to dump the old data into a file.

Please note: You have to replace the user, the host and the dbname, otherwise it won't connect to your database.

Convert dump
Before the next step, it'd be useful to run the following command to check the current charset in your dump file

If your file is already in UTF-8, you may need to skip the following iconv command.

Next thing to do is, converting the characters in the MySQL dump from latin1 to UTF-8

If you see after your conversion, that umlauts in your database are converted correctly, but that the sign ß and € are broken, you might get it working by using -f CP1252 instead of -f ISO8859-1 in this command.

There may be other places in your database, where latin-1 character set is used. For example, there may be lines in your dump similar to:

The best is to either grep your dump against the word "latin1" or look for those lines in some text editor (don't forget to use case-insensitive search).

If you have another source charset, you need to replace the -f option with your local character set.

Drop and create
Now it's time to drop the old database and create a new one with UTF-8 support.

(MySql seems to recommend utf8_unicode_ci over utf8_general_ci for 5.1 +, see http://dev.mysql.com/doc/refman/5.1/en/charset-unicode-sets.html)

Import dump to database
Last but not least, we need to import the converted data back to the new database.

The max_allowed_packet option is sometimes important. If your import ends up with a "ERROR 1153 at line 42: Got a packet bigger than 'max_allowed_packet'", you need to increase the packet size. To do this, edit and set  under the [mysqld] directive. This is because utf-8 coded characters will take more space than latin-1.

Dont forget to restart your mysql server.

Alternative Method
Instead of using the above method of dumping and reloading your data, you can use the ALTER TABLE/DATABASE SQL statements to convert your data.

For each table you want to convert to utf8, use the following statement.

This will automatically convert all text columns to utf8.

To set the default character set for a database, use the following statement.

This does not affect any existing tables, but any future tables created in this database will use utf8 by default.

Note this warning from: http://dev.mysql.com/doc/refman/5.0/en/alter-table.html

"The CONVERT TO operation converts column values between the character sets. This is not what you want if you have a column in one character set (like latin1) but the stored values actually use some other, incompatible character set (like utf8). In this case, you have to do the following for each such column:

The reason this works is that there is no conversion when you convert to or from BLOB columns."

Alternative Method 2
When the data has been double-encoded, none of the methods above will work. You will still have funny looking characters in the database. Someone provided the following solution on his blog and it works very well

The blog post can be found here: http://blog.hno3.org/2010/04/22/fixing-double-encoded-utf-8-data-in-mysql/