0

i have file having name "SSE-Künden, SSE-Händler.pdf" which having those two unicode char ( ü,ä) when i am printing this file name on python interpreter the unicode values are getting converted into respective ascii value i guess 'SSE-K\x81nden, SSE-H\x84ndler.pdf' but i want to

test dir contains the pdf file of name 'SSE-Künden, SSE-Händler.pdf'

i tried this: path = 'C:\test' for a,b,c in os.walk(path): print c

['SSE-K\x81nden, SSE-H\x84ndler.pdf']

how do i convert this ascii chars to its respective unicode vals and i want to show the original name("SSE-Künden, SSE-Händler.pdf") on interpreter and also writeing into some file as it is.how do i achive this. I am using Python 2.6 and windows OS.

Thanks.

3
  • 1
    Is your terminal session's character encoding set to UTF-8? Commented Sep 22, 2011 at 6:57
  • sorry but how to verify that. Commented Sep 22, 2011 at 6:59
  • If you're using Ubuntu, Terminal (from the menu) --> Set Character Encoding Commented Sep 22, 2011 at 7:00

3 Answers 3

3

Assuming your terminal supports displaying the characters, iterate over the list of files and print them individually (or use Python 3, which displays Unicode in lists):

Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> for p,d,f in os.walk(u'.'):
...  for n in f:
...   print n
...
SSE-Künden, SSE-Händler.pdf

Also note I used a Unicode string (u'.') for the path. This instructs os.walk to return Unicode strings as opposed to byte strings. When dealing with non-ASCII filenames this is a good idea.

In Python 3 strings are Unicode by default and non-ASCII characters are displayed to the user instead of displayed as escape codes:

Python 3.2.1 (default, Jul 10 2011, 21:51:15) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> for p,d,f in os.walk('.'):
...  print(f)
...
['SSE-Künden, SSE-Händler.pdf']
Sign up to request clarification or add additional context in comments.

9 Comments

sorry i didnt mention before i am using python 2.6 and windows os, ipython
His question is how to display the unicode characters in their native form (non-byte format)
+1 Using a unicode path does indeed work, interesting and non-obvious.
no i tried on python 2.6.7 i am getting following error:UnicodeEncodeError: 'charmap' codec can't encode character u'\x81' in position 22: character maps to <undefined>
@Shashi, interesting. Your filename is a Unicode string but contains the cp437 (US Windows console encoding) character value for ü. Was this file originally created on Windows? I created the file for the example above and the Unicode characters for ü and ä are \xfc and \xe4.
|
1
for a,b,c in os.walk(path):
    for n in c:
        print n.decode('utf-8')

3 Comments

+1: This should work if his terminal session is set to display unicode.
To set the windows terminal to unicode see stackoverflow.com/questions/5419/…
This won't work if the file system doesn't use UTF-8, such as Windows.
0

For writing to a file: http://docs.python.org/howto/unicode.html#reading-and-writing-unicode-data

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.