1

In my brain, the following:

>>> re.sub('([eo])', '_\1_', 'aeiou')

should return:

'a_e_i_o_u'

instead it returns:

'a_\x01_i_\x01_u'

I'm sure I'm having a brain cramp, but I can't for the life of me figure out what's wrong.

0

2 Answers 2

4

\1 produces \x01 in Python string literals. Double the slash, or use a raw string literal:

>>> import re
>>> re.sub('([eo])', '_\1_', 'aeiou')
'a_\x01_i_\x01_u'
>>> re.sub('([eo])', '_\\1_', 'aeiou')
'a_e_i_o_u'
>>> re.sub('([eo])', r'_\1_', 'aeiou')
'a_e_i_o_u'

See The Backslash Plague in the Python regex HOWTO:

As stated earlier, regular expressions use the backslash character ('\') to indicate special forms or to allow special characters to be used without invoking their special meaning. This conflicts with Python’s usage of the same character for the same purpose in string literals.

2

Use raw string r:

re.sub('([eo])', r'_\1_', 'aeiou')

Output:

In [3]: re.sub('([eo])', r'_\1_', 'aeiou')
Out[3]: 'a_e_i_o_u'
In [4]: "\1"
Out[4]: '\x01'   
In [5]: r"\1"
Out[5]: '\\1'
1
  • Brain cramp indeed. Thank you. Commented May 14, 2015 at 13:15

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.