0

I've scraped a json from a LAN connection but I am unable to parse it through json.loads.

I'm trying to get all of the html from within the text lines, as you can see at the bottom.

I get the following error: ValueError: Expecting , delimiter: line 9 column 39 (char 222)

import json
import urllib2
##json_data = urllib2.urlopen("http://192.168.1.221:39285/set/json/sec", timeout=30).read()
json_data = """
    {
        "line_type":"Test",
        "title":"Test",
        "timestamp":"201310201628",
        "line": [
                                            { 
                "id":2,
                "text": "<span class=\"prefix\">\n                                Result            <\/span>\n            \n"                } ,                                             { 
                "id":1,
                "text": "<span class=\"prefix\">\n                                Result            <\/span>\n            \n"                }                     ]
    }
"""

s = json.loads(r'{}'.format(json_data))
print s['line']

I want to be able to print: <span class=\"prefix\">\n Testing <\/span>\n \n and <span class=\"prefix\">\n Test <\/span>\n \n

Any help would be much appreciated

I should of mentioned I'm looking for a regex or a workaround...

1 Answer 1

1

Try to print json_data. That's what you'll see:

    {
        "line_type":"Test",
        "title":"Test",
        "timestamp":"201310201628",
        "line": [
                                            { 
                "id":2,
                "text": "<span class="prefix">
                                Testing            <\/span>

"                } ,                                             { 
                "id":1,
                "text": "<span class="prefix">
                                Test            <\/span>

"                }                     ]
    }

Apparently, it's not a valid JSON. You have some mix-ups with escaping:

  1. Quotes must be escaped as \\": the first backslash is the escaping for a Python string, and the second one is the escaping for a JSON string literal.
  2. You can't have literal newlines in JSON string literals; you have to write them as \n, therefore in a Python string they should be written as \\n.
  3. (not sure) you do not need to escape forward slashes, or is <\/span> actually a desired result?

(Note: you can get rid of one level of escaping if you use r""" """ string literals.)

After these changes, the JSON can be loaded:

>>> s = """
...     {
...         "line_type":"Test",
...         "title":"Test",
...         "timestamp":"201310201628",
...         "line": [
...                                             { 
...                 "id":2,
...                 "text": "<span class=\\"prefix\\">\\n                                Testing            </span>\\n            \\n"                } ,                                             { 
...                 "id":1,
...                 "text": "<span class=\\"prefix\\">\\n                                Test            </span>\\n            \\n"                }                     ]
...     }
... """
>>> 
>>> json.loads(s)
{'line': [{'text': '<span class="prefix">\n                                Testing            </span>\n            \n', 'id': 2}, {'text': '<span class="prefix">\n                                Test            </span>\n            \n', 'id': 1}], 'timestamp': '201310201628', 'title': 'Test', 'line_type': 'Test'} 
8
  • The problem is I can't change the json, I should of mentioned I'm trying to make a regex for this data.
    – Ryflex
    Commented Oct 21, 2013 at 12:31
  • See the note. If you add r before the triple quote in your question, it becomes valid too.
    – fjarri
    Commented Oct 21, 2013 at 12:38
  • 1
    Well, it works for me with your exact example. What is the error you are getting?
    – fjarri
    Commented Oct 21, 2013 at 12:47
  • jsonlint.com is your friend. If you copy the actual text between """ """ this is valid JSON. As Bogdan says, your mistake is not escaping the \. If you've copy/pasted the data from another source, you're possibly getting confused with the escapes. A single \ is valid escape for JSON, but a double \\ is required to escape a slash in a Python string. By search/replacing \ to \\, your original code works for me too.
    – Paul
    Commented Oct 21, 2013 at 12:50
  • Basically, I'm scraping a LAN based website so lets say im using urllib2 to to set jsondata so i an unable to put the r before the variable.
    – Ryflex
    Commented Oct 21, 2013 at 12:54

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.