1

What is causing this incorrect string? I have read many questions and answers and here are my results. I still am getting same error after reading answers.

I am getting the following error: ERROR 1366 (HY000) at line 34373: Incorrect string value: '\xEF\xBB\xBF<?x...' for column 'change' at row 1

When I try to enter the following into SQL: Line number 34373: INSERT INTO gitlog_changes VALUES ('123456', 'NhincCommonEntity.xsd', '<?xml version=\"1.0\" encoding=\"UTF-8\"?>');

My table looks like this:

DROP TABLE IF EXISTS `gitlog_changes`;
CREATE TABLE `gitlog_changes` (
  `hashID` varchar(40) NOT NULL,
  `filename` varchar(450) DEFAULT NULL,
  `change` mediumtext
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

I read many answers that say change the charset to UTF8 [1][2][3][4]. So I execute this: alter table yourTableName DEFAULT CHARACTER SET utf8;

I continue to get the same error. Then I alter table yourTableName DEFAULT CHARACTER SET utf8mb4_general_ci;

Still same error occurs.

I also attempt to read in a file from python and do direct commits to the database. From this answer[1]. I get a warning instead of an error.

I insert the following code into my python script:

    cursor.execute("SET NAMES 'utf8'")
    cursor.execute("SET CHARACTER SET utf8")

Python script:

def insert_changes(modList):
    db = MySQLdb.connect("localhost", "user", "password", "table")
    cursor = db.cursor()

    cursor.execute("SET NAMES 'utf8'")
    cursor.execute("SET CHARACTER SET utf8")

    for mod in modList:
        hashID = mod["hashID"]
        fileName = mod["fileName"]
        change = mod["change"]

        cursor.execute("INSERT INTO gitlog_changes VALUES (%s, %s, %s" , (hashID, fileName, change))
    # # disconnect from server
    db.commit()
    db.close()

The warning I get here is: Warning: Invalid utf8 character string: '\xEF\xBB\xBF<?x...' cursor.execute("INSERT INTO gitlog_changes VALUES (%s, %s, %s)" , (hashID, fileName, change))

2 Answers 2

3

The string you're trying to insert into db has an unusual character at its beginning. I just copied your string:

In [1]: a = '<'

In [2]: a
Out[2]: '\xef\xbb\xbf<'

You need to get rid of those characters. This is a good post explaining what these characters are.

Sign up to request clarification or add additional context in comments.

Comments

2

Text you're trying to insert contains UTF-8 BOM in the beginning (that's the \xEF\xBB\xBF in your error).

Please check this answer to see how to convert from UTF-8 with BOM into UTF-8.


As stated in MySQL docs

MySQL uses no BOM for UTF-8 values.

So the only solution is decoding this string in your python code.

6 Comments

What do I change in the database?
@Whitecat As I mentioned in edit, the only way is to decode the string inside your python code
Correct me if I am wrong. All strings need to be edited to put them into SQL. I even have to edit the insert statements from the file if I want to use the file.
The reason I avoided decode and encode is this answer: stackoverflow.com/questions/1168036/… says it is a bad idea.
Arg. I get the error 'utf8' codec can't decode byte 0xe9 in position 77: invalid continuation byte
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.