Python Encoding error with some unicode characers

Question

I am having some problems with encoding some unicode characters. This is the code I am using:

test = raw_input("Test: ")
print test.encode("utf-8")

When I use now normal ASCII characters it works, same with some "strange" unicode characters like ☃. But when I use characters like ß ä ö ü § it fails creating this error:

Traceback (most recent call last):
  File "C:\###\Test.py", line 5, in <module>
    print test.encode("utf-8")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xdf in position 0: ordinal not in range(128)

Note that I am using a pc where German is the default language (so these characters are default characters).

Martijn Pieters · Accepted Answer · 2015-02-11 17:31:15Z

raw_input() returns a byte string. You don't need to encode that byte string, it is already encoded.

What happens instead then is that Python will first decode to get a unicode value to encode; you asked Python to encode so it'll damn well try to get you something that can be encoded. It is the decoding that fails here. Implicit decoding uses ASCII, which is why you got a UnicodeDecodeError exception (note the Decode in the name) for that codec.

If you wanted to produce a unicode object you'd have to explicitly decode. Use the codec Python has detected for stdin:

import sys

test = raw_input("Test: ")
print test.decode(sys.stdin.encoding)

You don't need to do that here because you are printing, so writing right back to the same terminal which will use the same codec for input and output. Writing a byte string encoded with UTF-8 when you just received that byte string is then fine. Decoding to unicode is fine too, as printing will auto-encode to sys.stdout.encoding.

Collectives™ on Stack Overflow

Python Encoding error with some unicode characers

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related