1

I have recently started using PostgreSQL for creating/updating existing SQL databases. Being rather new in this I came across an issue of selecting correct encoding type while creating new database. UTF-8 (default) did not work for me as data to be included is of various languages (English, Chinese, Japanese, Russia etc) as well as includes symbolic characters.

Question: What is the right database encoding type to satisfy my needs.

Any help is highly appreciated.

3
  • 4
    Actually UTF-8 is your only option to accept characters from all languages. Commented Nov 10, 2013 at 13:12
  • Thanks Daniel, thanks for your prompt response. That is strange than a i have been trying to import .csv file and and the error i am getting refers to unknown character for UTF8 format. Might be related to the fact that i'm creating .csv file on office for mac Commented Nov 10, 2013 at 15:31
  • Yes, presumably it's invalid UTF-8. Check out Filtering invalid utf8. The usual answer is the one based on iconv. Commented Nov 10, 2013 at 21:26

1 Answer 1

2

There are four different encoding settings at play here:

  • The server side encoding for the database

  • The client_encoding that the PostgreSQL client announces to the PostgreSQL server. The PostgreSQL server assumes that text coming from the client is in client_encoding and converts it to the server encoding.

  • The operating system default encoding. This is the default client_encoding set by psql if you don't provide a different one. Other client drivers might have different defaults; eg PgJDBC always uses utf-8.

  • The encoding of any files or text being sent via the client driver. This is usually the OS default encoding, but it might be a different one - for example, your OS might be set to use utf-8 by default, but you might be trying to COPY some CSV content that was saved as latin-1.

You almost always want the server encoding set to utf-8. It's the rest that you need to change depending on what's appropriate for your situation. You would have to give more detail (exact error messages, file contents, etc) to be able to get help with the details.

Sign up to request clarification or add additional context in comments.

2 Comments

Sorry for the late reaction - the error that i'm getting says: error invalid byte sequence in utf-8.
@user2959843 If you want more info at this late stage post a new question - and this time include as much detail as possible. Server version, client type/version, exact error, input text, actual raw bytes of the input text (i.e. hexdump), OS default encoding, client_encoding, exact command that produces the error, etc. If you comment here with a link to the new question I can take a look.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.