15

Inserting UTF-8 encoded string into UTF-8 encoded table gives incorrect string value.

PDOException: SQLSTATE[HY000]: General error: 1366 Incorrect string value: '\xF0\x9D\x84\x8E i...' for column 'body_value' at row 1: INSERT INTO

I have a 𝄎 character, in a string that mb_detect_encoding claims is UTF-8 encoded. I try to insert this string into a MySQL table, which is defined as (among other things) DEFAULT CHARSET=utf8

Edit: Drupal always does SET NAMES utf8 with optional COLLATE (atleast when talking to MySQL).

Edit 2: Some more details that appear to be relevant. I grab some text from a PostgreSQL database. I stick it onto an object, use mb_detect_encoding to verify that it's UTF-8, and persist the object to the database, using node_save. So while there is an HTTP request that triggers the import, the data does not come from the browser.

Edit 3: Data is denormalized over two tables:

SELECT character_set_name FROM information_schema.COLUMNS C WHERE table_schema = "[database]" AND table_name IN ("field_data_body", "field_revision_body") AND column_name = "body_value";

>+--------------------+
| character_set_name |
+--------------------+
| utf8               |
| utf8               |
+--------------------+

Edit 4: Is it possible that the character is "to new"? I'm more than a little fuzzy on the relationship between unicode and UTF-8, but this wikipedia article, implies that the character was standardized very recently.

I don't understand how that can fail with "Incorrect string value".

5
  • Is the field on that table defined with a UTF-8 charset? Commented Aug 13, 2012 at 15:01
  • possible duplicate of UTF-8 Database Problem Commented Aug 13, 2012 at 15:03
  • 1
    What does SELECT character_set_name FROM information_schema.`COLUMNS` C WHERE table_schema = "db_name" AND table_name = "table_name" AND column_name = "column_name"; give Commented Aug 13, 2012 at 15:11
  • MySQL's utf8 is only the BMP. Its utf8mb4 corresponds to the outside world's UTF-8 (and includes 4-byte characters). Commented May 30, 2019 at 5:38
  • The error is caused by trying to cram 4 bytes into a charset that can't handle it (namely MySQL's "utf8"). Commented May 30, 2019 at 5:39

4 Answers 4

26

𝄎 (U+1D10E) is a character Unicode found outside the BMP (Basic Multilingual Plane) (above U+FFFF) and thus can't be represented in UTF-8 in 3 bytes. MySQL charset utf8 only accepts UTF-8 characters if they can be represented in 3 bytes. If you need to store this in MySQL, you'll need to use MySQL charset utf8mb4. You'll need MySQL 5.5.3 or later. You can use ALTER TABLE to change the character set without much problem; since it needs more space to store the characters, a couple issues show up that may require you to reduce string size. See http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-upgrading.html .

Sign up to request clarification or add additional context in comments.

5 Comments

Great, thank you so much. ALTERing the table, and doing "SET NAMES" when doing this particular type of save, solved the problem for me.
Is there solutions for older versions of MySQL?
If he can change databases, surely he can upgrade MySQL a lot easier then using Postgresql.
@JeromeJ - 5.5.3 is when utf8mb4 was introduced. Sorry, there is no text way to store 4-byte UTF-8 (non-BMP) characters.
@RickJames I think I did end up replacing all 4-byte UTF-8 by their HTML character before saving the string.
8

to solve this issue, first you change your database field to utf8m4b charset. For example:

ALTER TABLE `tb_name` CHANGE `field_name` `field_name` VARCHAR(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL; 

then in your db connection, set driver_options for it to utf8mb4. For example, if you use PDO

$db = new PDO('mysql:host=localhost;dbname=testdb;charset=utf8mb4', 'username', 'password');

or in zend framework 1.2

$dbParam = array('host' => 'localhost', 'username' => 'db_user_name',
            'password' => 'password', 'dbname' => 'db_name',
            'driver_options' => array(
                '1002' => "SET NAMES 'utf8mb4'",
                '12'    => 0 //this is not necessary
            )
        );

3 Comments

What do 1002 and 12 represent here?
They are options of driver_options in PDO mysql. You can refer here for more detail php.net/manual/en/ref.pdo-mysql.php
PDO::MYSQL_ATTR_INIT_COMMAND (1002) and PDO::FETCH_KEY_PAIR (12). Is FETCH_KEY_PAIR necessary here in relation to the question?
4

In your PDO connecton, set the charset.

new PDO('mysql:host=localhost;dbname=the_db;charset=utf8mb4', $user, $password);

4 Comments

Does my edit to the question about 'SET NAMES' change this? I can't imagine a framework as well used would get that wrong.
That's odd, what/where are you inserting? Might try adding utf8 in the header: <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
My <HEAD> contains <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />, but no data comes from the browser. (I updated the question with more data, again)
No. It must be utf8mb4, not just utf8. (I edited the answer.)
3

I fixed the error: SQLSTATE[HY000]: General error: 1366 Incorrect string value ...... with this method:

I use utf8mb4_unicode_ci for database database Set utf8mb4_unicode_ci for all tables tables

Set longblog datatype for column (not text, longtext.... you need big datatype to store 4 bytes of your content) fields

It is okay now. If you use laravel, continue to edit config/database.php

'charset' => 'utf8mb4',
'collation' => 'utf8mb4_unicode_ci',

laravel

If you use function strtolower, replace it with mb_strtolower Notice: you have to put <meta charset="utf-8"> on your head tag

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.