6

I'm wondering if there's any way to combine patterns with re.sub() instead of using multiples like below:

import re
s1 = "Please check with the store to confirm holiday hours."
s2 = ''' Hours:
            Monday: 9:30am - 6:00pm
Tuesday: 9:30am - 6:00pm
Wednesday: 9:30am - 6:00pm
Thursday: 9:30am - 6:00pm
Friday: 9:30am - 9:00pm
Saturday: 9:30am - 6:00pm
Sunday: 11:00am - 6:00pm

Please check with the store to confirm holiday hours.'''

strip1 = re.sub(s1, '', s2)
strip2 = re.sub('\t', '', strip1)
print(strip2)

Desired output:

Hours:
Monday: 9:30am - 6:00pm
Tuesday: 9:30am - 6:00pm
Wednesday: 9:30am - 6:00pm
Thursday: 9:30am - 6:00pm
Friday: 9:30am - 9:00pm
Saturday: 9:30am - 6:00pm
Sunday: 11:00am - 6:00pm
1
  • 1
    If you want to use s1 as literal regex, you should be calling re.escape on it to prevent random characters in it from being interpreted as regex special characters and/or making it a raw string literal with an r prefix, e.g. r'Please check ...'. If you want to remove each component word, you'd have to split it up and replace each part. Commented Nov 11, 2015 at 1:20

2 Answers 2

13

If you're just trying to delete specific substrings, you can combine the patterns with alternation for a single pass removal:

pat1 = r"Please check with the store to confirm holiday hours."
pat2 = r'\t'
combined_pat = r'|'.join((pat1, pat2))
stripped = re.sub(combined_pat, '', s2)

It's more complicated if the "patterns" use actual regex special characters (because then you need to worry about wrapping them to ensure the alternation breaks at the right places), but for simple fixed patterns, it's simple.

If you had real regexes, rather than fixed patterns, you might do something like:

all_pats = [...]
combined_pat = r'|'.join(map(r'(?:{})'.format, all_pats))

so any regex specials remain grouped without possibly "bleeding" across an alternation.

Sign up to request clarification or add additional context in comments.

2 Comments

It seems your answer most accurately addresses the Regex portion of my question, but I'm confused on pat2 as I thought r' ' would treat the string as raw, so my \t Tab would break. Am I confused here?
r'\t' and '\t' happen to work the same by coincidence. The latter is looking for the literal byte representing a tab, the former is looking for the regex pattern \t that, as it happens, looks for a tab. It's the same end result. I'm just OCD about using raw strings; r'\n' and r'\t' work fine raw or non-raw, but if you search for '\b' instead of r'\b' (for example), you're looking for an ASCII backspace, not a word boundary, and you almost never wanted the former.
7

You're not even using regular expressions so you may as well just chain replace:

s1 = "Please check with the store to confirm holiday hours."
s2 = ''' Hours:
            Monday: 9:30am - 6:00pm
Tuesday: 9:30am - 6:00pm
Wednesday: 9:30am - 6:00pm
Thursday: 9:30am - 6:00pm
Friday: 9:30am - 9:00pm
Saturday: 9:30am - 6:00pm
Sunday: 11:00am - 6:00pm

Please check with the store to confirm holiday hours.'''

strip2 = s2.replace(s1, "").replace("Hours:","").strip()

print(strip2)

2 Comments

Ah, chaining. Hadn't thought of that. Can re.sub() be chained as well then, for when I am using actually regex expressions?
@DavidMetcalfe No because re.sub returns a string which doesn't have sub. You could nest them, but that would get ugly fast.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.