In my brain, the following:
>>> re.sub('([eo])', '_\1_', 'aeiou')
should return:
'a_e_i_o_u'
instead it returns:
'a_\x01_i_\x01_u'
I'm sure I'm having a brain cramp, but I can't for the life of me figure out what's wrong.
\1
produces \x01
in Python string literals. Double the slash, or use a raw string literal:
>>> import re
>>> re.sub('([eo])', '_\1_', 'aeiou')
'a_\x01_i_\x01_u'
>>> re.sub('([eo])', '_\\1_', 'aeiou')
'a_e_i_o_u'
>>> re.sub('([eo])', r'_\1_', 'aeiou')
'a_e_i_o_u'
See The Backslash Plague in the Python regex HOWTO:
As stated earlier, regular expressions use the backslash character (
'\'
) to indicate special forms or to allow special characters to be used without invoking their special meaning. This conflicts with Python’s usage of the same character for the same purpose in string literals.
Use raw string r:
re.sub('([eo])', r'_\1_', 'aeiou')
Output:
In [3]: re.sub('([eo])', r'_\1_', 'aeiou')
Out[3]: 'a_e_i_o_u'
In [4]: "\1"
Out[4]: '\x01'
In [5]: r"\1"
Out[5]: '\\1'