If your line of JS there is guaranteed to contain no newline characters before the terminating ;
, then the problem is simple enough - match var config =
, followed by non-newline characters captured in a group, and then matcha semicolon and the end of the line. If the JSON is delimited with '
s, then, for example, use the pattern
var config = '(.+)';$
and extract the first group.
input = '''
var config = '{ "foo": "b\\ar", "ba{{}}}{{z": ["buzz}", "qux", {"innerprop": "innerval"}]}';
var someOtherVar = 'bar';
'''
match = re.search("(?m)var config = '(.+)';$", input);
If the JSON isn't guaranteed to be on its own line, then it's a lot more complicated. Parsing nested structures like JSON is difficult - the only way the general problem is solvable with regular expressions is if the structure is known beforehand (which often isn't the case, and can require a lot of repetitive code in the pattern), or if the RE engine being used supports recursive matches. Without that, there's no way to to express the need for a balanced number of {
s with }
s in the pattern.
Luckily, if you're working with Python, even though Python's native REs don't support recursion, there'a a regex module available that does. You'll also need to make sure that the {
and }
s that may come inside of strings in the JSON don't affect the current nesting level. For a raw string, you'd need a pattern like
var config = String\.raw`\K({(?:"(?:\\|\\"|[^"])*"|[^{}]|(?1))*})(?=`;)
The outside of the capture group is
var config = String\.raw`\K({ ... })(?=`;)
matching the line you want and the string delimiters, with a capturing group of
{(?:"(?:\\|\\"|[^"])*"|[^{}]|(?1))*}
which means - {
, followed by any number of: either
"(?:\\|\\"|[^"])*"
- match a string inside the JSON (either a key or a value), from its starting delimiter to its ending delimiter, ignoring escaped "
s, or
[^{}]
- Match anything that isn't a {
or }
- other characters can be ignored, since we just want to get the nesting level right, or
(?1)
- Recurse the whole first capture group (the one that matches the { ... }
)
This will ensure that the {
}
brackets are balanced by the end of the pattern.
But - the above is an example where String.raw
was used, where literal backslashes in the Javascript code indicate literal backslashes in the string. With '
delimiters, on the other hand, literal backslashes need to be double-escaped in the JS, so the above input would look like
var config = '{ "foo": "b\\\\ar", "ba{{}}}{{z": ["buzz}", "qux", {"innerprop": "innerval"}]}';
requiring double-escaping the backslashes in the pattern as well:
var config = '\K({(?:"(?:\\\\|\\\\"|[^"])*"|[^{}]|(?1))*})(?=';)
https://regex101.com/r/8rSrGf/1
It's pretty complicated. I'd recommend going with the first approach or a variation on it instead, if at all possible.
{aslkdjsakljdkalsj{asdasdas}askldjaskljd};
is not JSON