7

actual data:CN=username,OU=CompanyName,DC=company,DC=intra(how it seems in MySQL db) and when I fetch this data, this is how it seems in python variable(retrieved from MySQL): CN=username,OU=CompanyName,DC=company,DC=intra

when I try this;

truestr = unicode(str,'utf-8');

throws exception with this message:

'ascii' codec can't decode byte 0xc4 in position 4: ordinal not in range(128)

How can I fix this issue ? (I use python 2.6)

3
  • What is the actual value of str variable? Please update your question. Commented Oct 22, 2015 at 6:22
  • in MySQL I see string CN=Uğur ... When I select and fetch it to str variable in python it seems CN=UÄŸur ... Commented Oct 22, 2015 at 6:26
  • You need to update your question to show us how you populate str. Use the edit link. Commented Oct 22, 2015 at 6:30

4 Answers 4

4

Can you check encoding by following method:

>>> import sys
>>> sys.getdefaultencoding()
'utf-8'
>>> 

If encoding is ascii then set to utf-8

  1. open following file(I am using Python 2.7):

    /usr/lib/python2.7/sitecustomize.py

  2. then update following to utf-8

    sys.setdefaultencoding("utf-8")

[Edit 2]

Can you add following in tour code(at start) and then check:-

>>> try:
...     import apport_python_hook
... except ImportError:
...     pass
... else:
...     apport_python_hook.install()
... 
>>> import sys
>>> 
>>> sys.setdefaultencoding("utf-8")
>>> 
>>>
Sign up to request clarification or add additional context in comments.

4 Comments

there is no file like sitecustomize.py in /usr/lib/python2.6.. but site-packages. no in it too
@MehmetYenerYILMAZ: hi can you add sys.setdefaultencoding("utf-8") in your code, as mansion above
I see your answer wil lsolve my problem but there is an exp :'module' object has no attribute 'setdefaultencoding'"
@MehmetYenerYILMAZ: you write only sys.setdefaultencoding("utf-8") then it will show exception.. we have to write try and excpet block too
1

This error means that your message is already a unicode object, no decoding needed.

When you are doing:

truestr = unicode(string, 'utf-8')

your variable string is first implicitly converted to str type using default 'ascii' codec. And of course, it fails because your string contains non-ascii characters.

If you want to write string somewhere as UTF-8, use string.encode('utf-8').


Note: I've renamed your str variable to string because of name clash with built-in str type. Naming variable str (or int, or float, etc.) is a very bad style.

2 Comments

name = cursor.fetchall()[0]["NAME"] name_ = name.encode('utf-8') exception:'ascii' codec can't decode byte 0xc4 in position 4: ordinal not in range(128)
Read some docs about unicode in Python: 1. docs.python.org/2/howto/unicode.html 2. farmdev.com/talks/unicode
1

go to this file

vi /usr/lib/python2.7/site-packages/sitecustomize.py

Add this text

import sys

reload(sys)

sys.setdefaultencoding("utf-8")

2 Comments

what is reload meaning?
No, this is a really bad idea. You'll end up writing code that will fail on any other machine and will mask all kind of issues. Please understand it before suggesting it
0

Default encoding of your system is ASCII. use "sys.setdefaultencoding" to switch it to utf-8 encoding. This function is only available on startup while python scans the environment. To use this function you have to reload sys after importing the module. Following is the code for you problem.

import sys
reload(sys)
sys.setdefaultencoding ("utf-8")

Edit:

If you want to use utf-8 encoding than use it at the very beginning of your code. If you use it in middle of your code than it will create problems with already loaded ascii data.

4 Comments

This should be a comment, not an answer. Once you gain a bit of reputation on this site, these things will be easier. Until then, please hold back for a bit.
Thanks.I will keep it in my mind but at the moment I don't have enough reputation to post a comment.
Right. That's why I suggest you hold back until you gained a bit of rep.
No, you should not be suggesting this nasty hack. It masks a multitude of issues and means that any code written with it becomes very brittle.