0

So, I'm trying to capture this big string in Python but it is failing me. The regex I wrote works fine in regexr: http://regexr.com/3cmdc

But trying to using it in Python to capture the text returns None. This is the code:

pattern = "var initialData = (.*?);\\n"
match = re.search(pattern, source).group(1)

What am I missing ?

4
  • Are you sure the input has no linebreaks as in the regex demo? If you have linebreaks, you can try match = re.search(pattern, source, re.S).group(1), but if your string is very large, you might have an issue related to the lazy matching stack "overflow". Commented Jan 28, 2016 at 23:32
  • Try r'var initialData = ([^;]*(?:;(?!\\n)[^;]*)*)' pattern if your \n contains a literal \. Else, if \n is a normal linebreak, I'd advise r'var initialData = ([^;]*(?:;(?!\n)[^;]*)*)' Commented Jan 28, 2016 at 23:37
  • Consider paring down your example string. The problem is easily demonstrated with a much smaller example that could be added to the question directly.
    – tdelaney
    Commented Jan 29, 2016 at 0:06
  • There is only one line break in that sample string. Just why do you care about it though ? var initialData = (.*?); or var initialData = (.*?);\r?\n and get on with life. Or even var initialData = ([\S\s]*?);
    – user557597
    Commented Jan 29, 2016 at 0:16

2 Answers 2

1

You need to set the appropriate flags:

re.search(pattern, source, re.MULTILINE | re.DOTALL).group(1)
3
  • 1
    Why use re.MULTILINE | re.DOTALL? OP does not use them at regexr but gets a match. And there are no ^/$ here, so re.M is totally redundant. Commented Jan 28, 2016 at 23:30
  • @WiktorStribiżew well, I suspect the newlines on regexr are not handled properly. Adding flags actually worked for me. Let's see if the answer is gonna help the OP. If not, I'll definitely remove the answer. Thanks.
    – alecxe
    Commented Jan 28, 2016 at 23:50
  • Most probably the \n in the regex demo are real linebreaks, so re.S is OK to use, I believe. Although OP says the strings are huge, and I think lazy dot matching is very fragile here. Commented Jan 28, 2016 at 23:51
1

Use pythons raw string notation:

pattern = r"var initialData = (.*?);\\n"
match = re.search(pattern, source).group(1)

More information

1
  • 1
    It would be helpful to mention why... to match a literal \n in the text, the regex needs to escape the backslash to \\n but then python either needs a raw string r"\\n" or extra escaping of the backslashes "\\\\n".
    – tdelaney
    Commented Jan 29, 2016 at 0:04

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.