Python Regex sub() with multiple patterns

Question

I'm wondering if there's any way to combine patterns with re.sub() instead of using multiples like below:

import re
s1 = "Please check with the store to confirm holiday hours."
s2 = ''' Hours:
            Monday: 9:30am - 6:00pm
Tuesday: 9:30am - 6:00pm
Wednesday: 9:30am - 6:00pm
Thursday: 9:30am - 6:00pm
Friday: 9:30am - 9:00pm
Saturday: 9:30am - 6:00pm
Sunday: 11:00am - 6:00pm

Please check with the store to confirm holiday hours.'''

strip1 = re.sub(s1, '', s2)
strip2 = re.sub('\t', '', strip1)
print(strip2)

Desired output:

Hours:
Monday: 9:30am - 6:00pm
Tuesday: 9:30am - 6:00pm
Wednesday: 9:30am - 6:00pm
Thursday: 9:30am - 6:00pm
Friday: 9:30am - 9:00pm
Saturday: 9:30am - 6:00pm
Sunday: 11:00am - 6:00pm

If you want to use s1 as literal regex, you should be calling re.escape on it to prevent random characters in it from being interpreted as regex special characters and/or making it a raw string literal with an r prefix, e.g. r'Please check ...'. If you want to remove each component word, you'd have to split it up and replace each part. — ShadowRanger
– ShadowRanger, Commented Nov 11, 2015 at 1:20

ShadowRanger · Accepted Answer · 2015-11-11 01:25:44Z

13

If you're just trying to delete specific substrings, you can combine the patterns with alternation for a single pass removal:

pat1 = r"Please check with the store to confirm holiday hours."
pat2 = r'\t'
combined_pat = r'|'.join((pat1, pat2))
stripped = re.sub(combined_pat, '', s2)

It's more complicated if the "patterns" use actual regex special characters (because then you need to worry about wrapping them to ensure the alternation breaks at the right places), but for simple fixed patterns, it's simple.

If you had real regexes, rather than fixed patterns, you might do something like:

all_pats = [...]
combined_pat = r'|'.join(map(r'(?:{})'.format, all_pats))

so any regex specials remain grouped without possibly "bleeding" across an alternation.

answered Nov 11, 2015 at 1:25

ShadowRanger

158k12 gold badges221 silver badges315 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

David Metcalfe Over a year ago

It seems your answer most accurately addresses the Regex portion of my question, but I'm confused on pat2 as I thought r' ' would treat the string as raw, so my \t Tab would break. Am I confused here?

ShadowRanger Over a year ago

r'\t' and '\t' happen to work the same by coincidence. The latter is looking for the literal byte representing a tab, the former is looking for the regex pattern \t that, as it happens, looks for a tab. It's the same end result. I'm just OCD about using raw strings; r'\n' and r'\t' work fine raw or non-raw, but if you search for '\b' instead of r'\b' (for example), you're looking for an ASCII backspace, not a word boundary, and you almost never wanted the former.

Jack · Accepted Answer · 2015-11-11 01:11:34Z

7

You're not even using regular expressions so you may as well just chain replace:

s1 = "Please check with the store to confirm holiday hours."
s2 = ''' Hours:
            Monday: 9:30am - 6:00pm
Tuesday: 9:30am - 6:00pm
Wednesday: 9:30am - 6:00pm
Thursday: 9:30am - 6:00pm
Friday: 9:30am - 9:00pm
Saturday: 9:30am - 6:00pm
Sunday: 11:00am - 6:00pm

Please check with the store to confirm holiday hours.'''

strip2 = s2.replace(s1, "").replace("Hours:","").strip()

print(strip2)

answered Nov 11, 2015 at 1:11

Jack

21.3k12 gold badges51 silver badges48 bronze badges

2 Comments

David Metcalfe Over a year ago

Ah, chaining. Hadn't thought of that. Can re.sub() be chained as well then, for when I am using actually regex expressions?

Jack Over a year ago

@DavidMetcalfe No because re.sub returns a string which doesn't have sub. You could nest them, but that would get ugly fast.

Collectives™ on Stack Overflow

Python Regex sub() with multiple patterns

2 Answers 2

2 Comments

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

2 Comments

Linked

Related