How to convert encoding in Python?

Question

I have a string with miss encoding »Æ¹ûÊ��. On http://2cyr.com/decode/?lang=en website, you can encode it with gb2312 then decode it with iso8859 so to display it correctly.

In C#, there's a function called Encoding.Convert, which can help you convert convert the bytes from one encoding to the other. In process is straight forward:

encode the string into bytesA, using gb2312 encoder
Encoding.Convert bytesA from gb2312 encoding to iso8859 encoding
decode the bytes using iso8859 encoder

In Python, I have tried all kinds of encoding and decoding methods I can think of, but no one can help me convert the given string to the correct codecs that can be displayed correctly.

Martijn Pieters · Accepted Answer · 2014-01-05 00:24:27Z

6

Your data is UTF-8 encoded GB2312, at least as pasted into my UTF-8 configured terminal window:

>>> data = '»Æ¹ûÊ÷'
>>> data.decode('utf8').encode('latin1').decode('gb2312')
u'\u9ec4\u679c\u6811'
>>> print _
黄果树

Encoding to Latin 1 lets us interpret characters as bytes to fix the encoding.

Rule of thumb: whenever you have double-encoded data, undo the extra 'layer' of encoding by decoding to Unicode using that codec, then encoding again with Latin-1 to get bytes again.

edited Jan 5, 2014 at 0:24

answered Jan 4, 2014 at 14:24

Martijn Pieters

1.1m326 gold badges4.2k silver badges3.4k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

user1220978 Over a year ago

This won't work in Python 3 (str has no decode method). But this will: "»Æ¹ûÊ÷".encode("latin1").decode("gb2312"). The string must be encoded in UTF-8, use #encoding: utf-8 for example.

Martijn Pieters Over a year ago

@arbautjc: Note that both my method and yours require that the raw string bytes use a certain encoding, yes. My terminal used UTF-8, hence the decode from UTF-8 first.

Collectives™ on Stack Overflow

How to convert encoding in Python?

1 Answer 1

2 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Related