0

I am trying to parse this json file: http://pastebin.com/VcVR0ue0

While using these modules

from pprint import pprint
import codecs
import json

file = 'Desktop10000_760_CurtSacks.json'

I've tried these methods

a)

data = data = json.load(open(file))

b)

data = json.load(codecs.open(file, encoding='utf_8_sig'))

In both cases the output has a u inserted in front of each key-value:

{u'document_tone': {u'tone_categories': [{u'category_id': u'emotion_tone',
                                          u'category_name': u'Emotion Tone',
                                          u'tones': [{u'score': 0.111838,
                                                      u'tone_id': u'anger',
                                                      u'tone_name': u'Anger'},
                                                     {u'score': 0.159831,
                                                      u'tone_id': u'disgust',
                                                      u'tone_name': u'Disgust'},
                                                     {u'score': 0.17082,
                                                      u'tone_id': u'fear',
                                                      u'tone_name': u'Fear'},
                                                     {u'score': 0.507748,
                                                      u'tone_id': u'joy',
                                                      u'tone_name': u'Joy'},
                                                     {u'score': 0.520722,
                                                      u'tone_id': u'sadness',
                                                      u'tone_name': u'Sadness'}]},

How do I read the file correctly?

1
  • thank you. now i feel this question wasn't an important one at all. but i'm glad i learned something basic. Commented Mar 11, 2017 at 3:22

2 Answers 2

1

It looks like everything's being parsed properly.

Python's syntax for a unicode string is:

u'Here is the string.'

So the Python equivalent of this JSON:

{"foo": "bar"}

is this:

{u'foo': u'bar'}

If you just print out the Python representation of the data, you'll see the Python syntax.

Sign up to request clarification or add additional context in comments.

Comments

0

The 'u' indicates a python unicode string - this is normal. The json library by nature returns unicode strings, so it looks like your data is being parsed properly.

If for whatever reason you don't want unicode strings in your JSON you can use yaml

import yaml
data = yaml.safe_load(open(file))
print( data )

So you'd get

{'key':'item'}

Instead of

{u'key':'item'}

Although I don't see a reason not to use unicode, as for most purposes it won't affect much. (see Python str vs unicode types)

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.