0

I am reading scandinavian language websites with a web-crawler - and wish to insert them into my PostgreSQL database.

Originally I tried to encode my PSQL DB as utf-8, then manually tried to insert the characters that would be of a problem like this:

Insert into name (surname) VALUES ('Børre');

This was done in the windows PSQL shell.

This gave me the following error: ERROR: invalid byte sequence for encoding "UTF8": 0x9b. So after doing some googling I changed the client encoding to latin1. Now that statement was successfull. The server encoding is still utf8.

When I do the same insert through my python script the name appears in my database as B°rre. If I change back the encoding of client to utf8, I also get entries with wrong special characters.

My python script is utf8 encoded, but prints the name correct.

Insert statement:

con = psycopg2.connect(*database details*)

print("Opened database successfully")

cur = con.cursor()

#INSERT NAME

query = "INSERT INTO name (surname) VALUES (%s) RETURNING id"

data = ('børre')

cur.execute(query,data)

As previously stated, print(personObject.surname) gives 'Børre'

If I try the following:

query = "INSERT INTO name (surname) VALUES (%s) RETURNING id"

data = ('børre'.encode('utf-8'))

cur.execute(query,data)

I get the following in my database:

\x62c383c2b8727265

7
  • 1
    Which version of Python? Commented Dec 30, 2016 at 21:32
  • 1
    Can you post your stack trace ? Commented Dec 30, 2016 at 21:32
  • 1
    Why don't you use UTF-8 encoding? Today, there exists no reason not to use it. Commented Dec 30, 2016 at 21:39
  • Python version is 3.x. The reason why I changed from utf-8 is stated in the start of the question. I will update the question with stack trace asap. Commented Dec 30, 2016 at 21:44
  • The stack trace does not output anything, I get no error in python. @LaurentLAPORTE Commented Dec 30, 2016 at 21:53

2 Answers 2

1

psycopg2 doesn't understand postgresql queries it just converts the arguments given into their postgresql representation

if you give it an array of bytes to will convert it to a postgresql BYTEA literal,

data = ('børre'.encode('utf-8')) gets you a bytes.

so, don't do that, use a string.

The code fragment you have at the top should work.

In the error I see ø encoded as hex c383c2b8, that hex translates to UTF8 as two charactersà and ¸. It looks to me like python thinks your script is not wtitten is UTF8, but instead some other codepage.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you for your answer! Any suggestion on how I will get 'børre' to be 'børre' in the PSQL database as well?
0

using client_encoding key words
eg: conn=psycopg2.connect("dbname='foo' user='dbuser' password='mypass' client_encoding='utf8'")

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.