2

TL,DR; How can JSON containing a regex with escaped backslahes, be loaded using Python's JSON decoder?

Detail; The regular expression \\[0-9]\\ will match (for example):

\2\

The same regular expression could be encoded as a JSON value:

{
  "pattern": "\\[0-9]\\"
}

And in turn, the JSON value could be encoded as a string in Python (note the single quotes):

'{"pattern": "\\[0-9]\\"}'

When loading the JSON in Python, a JSONDecodeError is raised:

import json
json.loads('{"pattern": "\\[0-9]\\"}')

The problem is caused by the regular expression escaping the blackslashes:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Invalid \escape: line 1 column 14 (char 13)
>>> json.loads('{"pattern": "\\[0-9]\\"}')

This surprised me since each step seems reasonable (i.e. valid regex, valid JSON, and valid Python).

How can JSON containing a regex with escaped backslahes, be loaded using Python's JSON decoder?

3
  • json.loads(r'{"pattern": "\\[0-9]\\"}')
    – JacobIRR
    Commented Jan 3, 2019 at 19:58
  • 1
    What's your actual JSON? Your string literal isn't right but whatever you're loading ought to be.
    – jonrsharpe
    Commented Jan 3, 2019 at 20:00
  • What is the regex error ?
    – user557597
    Commented Jan 3, 2019 at 20:00

2 Answers 2

2

What's happening is that Python is first escaping the input to loads as a string literal, making it '{"pattern": "\[0-9]\"}' (double backslash -> single backslash). Then, loads now attempts to escape \[, which is invalid. To fix, escape the backslashes again. However, it's easier and more practical to specify it as a raw string:

>>> import json
>>> json.loads('{"pattern": "\\[0-9]\\"}')
json.decoder.JSONDecodeError: Invalid \escape: line 1 column 14 (char 13)
>>> json.loads(r'{"pattern": "\\[0-9]\\"}')
{'pattern': '\\[0-9]\\'} # No error

Note that this problem won't apply if loading from a file.

test.json:

{"pattern": "\\[0-9]\\"}

Python:

import json

with open('test.json', 'r') as infile:
    json.load(infile) # no problem

Basically, the problem arises with the fact that you're passing in a string literal, but ironically, your string literal isn't being taken literally.

14
  • And how would this be done without using the raw syntax ?
    – user557597
    Commented Jan 3, 2019 at 20:04
  • @sln: Escape the backslashes again: '{"pattern": "\\\\[0-9]\\\\"}'
    – jwodder
    Commented Jan 3, 2019 at 20:05
  • @sln I mean, you could do '{"pattern": "\\\\[0-9]\\\\"}', but especially if you're getting the json string from an outside source, the raw syntax is better.
    – iz_
    Commented Jan 3, 2019 at 20:05
  • So in python single quoted string is the same as a double quoted one ? where '\n' is a carriage return ?
    – user557597
    Commented Jan 3, 2019 at 20:07
  • 1
    @Jack I should have worded that better, I meant that Python processes escape sequences in the string literal. For example, replacing \n with newline, \t with tab, etc.
    – iz_
    Commented Jan 4, 2019 at 18:48
0

The r means that the string is to be treated as a raw string, which means all escape codes will be ignored:

json.loads(r'{"pattern": "\\[0-9]\\"}')

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.