1

I am having some problems with encoding some unicode characters. This is the code I am using:

test = raw_input("Test: ")
print test.encode("utf-8")

When I use now normal ASCII characters it works, same with some "strange" unicode characters like ☃. But when I use characters like ß ä ö ü § it fails creating this error:

Traceback (most recent call last):
  File "C:\###\Test.py", line 5, in <module>
    print test.encode("utf-8")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xdf in position 0: ordinal not in range(128)

Note that I am using a pc where German is the default language (so these characters are default characters).

1 Answer 1

4

raw_input() returns a byte string. You don't need to encode that byte string, it is already encoded.

What happens instead then is that Python will first decode to get a unicode value to encode; you asked Python to encode so it'll damn well try to get you something that can be encoded. It is the decoding that fails here. Implicit decoding uses ASCII, which is why you got a UnicodeDecodeError exception (note the Decode in the name) for that codec.

If you wanted to produce a unicode object you'd have to explicitly decode. Use the codec Python has detected for stdin:

import sys

test = raw_input("Test: ")
print test.decode(sys.stdin.encoding)

You don't need to do that here because you are printing, so writing right back to the same terminal which will use the same codec for input and output. Writing a byte string encoded with UTF-8 when you just received that byte string is then fine. Decoding to unicode is fine too, as printing will auto-encode to sys.stdout.encoding.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.