The most surprising thing about utf8mb4 is that it’s not just a slightly better utf8 – it’s a complete replacement that includes utf8 as a subset, but also adds characters that utf8 simply cannot represent.
Let’s see it in action. Imagine you have a MySQL table like this:
CREATE TABLE messages (
id INT AUTO_INCREMENT PRIMARY KEY,
content VARCHAR(255)
);
INSERT INTO messages (content) VALUES ('Hello, world!');
INSERT INTO messages (content) VALUES ('This is a test with an emoji: 👍');
If your messages table and the content column are currently using utf8 (which is actually utf8mb3 in MySQL terms), that emoji will cause trouble. It might get silently replaced with a question mark ?, or worse, cause an Incorrect string value error during insertion.
To unlock full Unicode support, including emojis, ancient scripts, and other symbols, you need to migrate to utf8mb4. This involves changing the character set of your database, tables, and specific columns.
Here’s how you do it, step-by-step:
First, check your current default character set and collation:
SHOW VARIABLES LIKE 'character_set_server';
SHOW VARIABLES LIKE 'collation_server';
You’ll likely see utf8 and utf8_general_ci or similar. We want to change this to utf8mb4 and utf8mb4_unicode_ci.
Next, you need to alter your MySQL server configuration. This is usually done in your my.cnf or my.ini file. Locate the [mysqld] section and add or modify these lines:
[mysqld]
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci
After saving the configuration file, restart your MySQL server. This is a critical step that applies the server-wide defaults.
Now, you need to alter your existing database(s) to use the new default character set. For each database, run:
ALTER DATABASE your_database_name CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
Replace your_database_name with the actual name of your database.
The most granular change is altering your tables. You need to do this for every table that stores character data. A common mistake is assuming that changing the database default is enough; it isn’t for existing tables.
ALTER TABLE your_table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
Again, replace your_table_name with your table’s name. If you have many tables, you can generate this SQL dynamically.
Finally, you might have specific columns that need to be explicitly set, especially if they were created with a different character set before the database default was changed.
ALTER TABLE your_table_name MODIFY your_column_name VARCHAR(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
You’ll need to do this for all VARCHAR, TEXT, CHAR, and other string-based columns.
After these changes, try inserting that emoji again:
INSERT INTO messages (content) VALUES ('This is a test with an emoji: 👍');
SELECT content FROM messages WHERE id = 2;
You should see the emoji perfectly preserved.
The key to utf8mb4’s power is its ability to store up to 4 bytes per character, whereas the older utf8 (which MySQL calls utf8mb3) is limited to 3 bytes. This extra byte is what allows for characters outside the Basic Multilingual Plane (BMP), which is where emojis, many CJK ideographs, and other less common characters reside. When you convert a table or column, MySQL re-encodes the existing data to the new character set, ensuring no data loss.
One subtle but important point is that after migrating, you might notice that indexes on VARCHAR columns might need to be rebuilt or adjusted. The maximum index length in MySQL is 767 bytes for InnoDB tables using utf8mb3. When you switch to utf8mb4, a character can take up to 4 bytes. If you have a VARCHAR(255) column and were using utf8mb3, the maximum byte length was 255 * 3 = 765 bytes, which fit within the index limit. However, with utf8mb4, this becomes 255 * 4 = 1020 bytes, exceeding the 767-byte limit. This will cause an error like Specified key was too long; max key length is 767 bytes when you try to create or alter indexes on such columns. To fix this, you’ll need to either:
- Reduce the
VARCHARlength (e.g., toVARCHAR(191)which is 191 * 4 = 764 bytes). - Change the
innodb_large_prefixsetting toONandinnodb_file_formattoBarracuda(orDoble) in yourmy.cnfand then alter the table to useROW_FORMAT=DYNAMICorCOMPRESSED.
The next hurdle you’ll likely encounter is ensuring your application code and ORM frameworks are also configured to use utf8mb4 when connecting to the database.