2

Have below sample text descriptions

Input Example 1: (Have Comments in the text)

1. ADEQUATE HANDWASHING SINKS PROPERLY SUPPLIED AND ACCESSIBLE - Comments: AN HANDWASHING SINK INSTALLED AT EAST BAR AREA.NEED TO RELOCATE HANDWASHING SINK AT MIDDLE OF AREA BEHIND THE BAR COUNTER FOR ACCESSIBILITY TO HAND WASHING. PRIORITY FOUNDATION VIOLATION :7-38-030(C), NO CITATION ISSUED.

Example 2: (No Comments in the text)

47. FOOD & NON-FOOD CONTACT SURFACES CLEANABLE, PROPERLY DESIGNED, CONSTRUCTED & USED

The Output should be like below. Need to extract all texts after starting index numbers,followed by '.' and before '- Comments:'. If no '- Comments:' found then extract all texts after the starting index numbers.

ADEQUATE HANDWASHING SINKS PROPERLY SUPPLIED AND ACCESSIBLE

FOOD & NON-FOOD CONTACT SURFACES CLEANABLE, PROPERLY DESIGNED, CONSTRUCTED & USED

Tried using the regular expression '(?:^[\d+. ]*)(.*?)(?:\-\s\w*: ?)' which worked for example 1 but not for example 2.

Is it possible to match both examples in one regular expression?

1 Answer 1

1

Here is one regex find all approach. We can match on the following regex:

\d+\. (.*?)(?: - |\r?\n|$)

This will match from the start of a numbered section until reaching either -, CR?LF, or the end of the input.

inp = """1. ADEQUATE HANDWASHING SINKS PROPERLY SUPPLIED AND ACCESSIBLE - Comments: AN HANDWASHING SINK INSTALLED AT EAST BAR AREA.NEED TO RELOCATE HANDWASHING SINK AT MIDDLE OF AREA BEHIND THE BAR COUNTER FOR ACCESSIBILITY TO HAND WASHING. PRIORITY FOUNDATION VIOLATION :7-38-030(C), NO CITATION ISSUED.

Example 2: (No Comments in the text)

47. FOOD & NON-FOOD CONTACT SURFACES CLEANABLE, PROPERLY DESIGNED, CONSTRUCTED & USED"""

matches = re.findall(r'\d+\. (.*?)(?: - |\r?\n|$)', inp)
print(matches)

This prints:

['ADEQUATE HANDWASHING SINKS PROPERLY SUPPLIED AND ACCESSIBLE',
 'FOOD & NON-FOOD CONTACT SURFACES CLEANABLE, PROPERLY DESIGNED, CONSTRUCTED & USED']
0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.