47

I need to replace part of a string. I was looking through the Python documentation and found re.sub.

import re
s = '<textarea id="Foo"></textarea>'
output = re.sub(r'<textarea.*>(.*)</textarea>', 'Bar', s)
print output

>>>'Bar'

I was expecting this to print '<textarea id="Foo">Bar</textarea>' and not 'bar'.

Could anybody tell me what I did wrong?

2
  • 3
    The usual recommendation is that you not use regex for HTML. It is a longstanding response on this site, with some classic responses, culminating in this one. stackoverflow.com/questions/1732348/…
    – hughdbrown
    Commented Oct 22, 2010 at 15:59
  • Yep, was thinking to use regex since it's really small piece but switched to BeautifulSoup instead.
    – Pickels
    Commented Oct 22, 2010 at 19:27

2 Answers 2

80

Instead of capturing the part you want to replace you can capture the parts you want to keep and then refer to them using a reference \1 to include them in the substituted string.

Try this instead:

output = re.sub(r'(<textarea.*>).*(</textarea>)', r'\1Bar\2', s)

Also, assuming this is HTML you should consider using an HTML parser for this task, for example Beautiful Soup.

2
  • I think you mean r'\1Bar\3'.
    – nmichaels
    Commented Oct 22, 2010 at 14:07
  • 3
    As mentioned, best not to parse your own html. But for the sake of completeness, should point out that by default regular expressions are greedy, so in this example, the first capture group would match up to the last open angle bracket. If the string had tags inside the <textarea>, those would be included inside the match. It would be better to use the question mark to prevent this: r'(<textarea.*?>).*(</textarea>)' Commented Mar 25, 2013 at 21:22
3

Or you could just use the search function instead:

match=re.search(r'(<textarea.*>).*(</textarea>)', s)
output = match.group(1)+'bar'+match.group(2)
print output
>>>'<textarea id="Foo">bar</textarea>'

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.