0

I am trying to find and extract a pattern living in a json file. If I do this as a test, it finds and prints it, because the json.dumps makes it a string:

    my_mi =  {"_links": {"self": {"href": "/xx-beta/media/111ee111-1e11-11a1-b111/metadata"}}}
    new = json.dumps(my_mi)
    my_id = re.findall(r'\w{1,9}\-\w{1,5}\-\w{1,5}\-\w{1,5}\-\w{1,13}', 
    new) 
    print my_id

The problem is that when I try using it as a json file, I'm having trouble converting it in a way that it will work without throwing the error "TypeError: <open file 'resTwo.json', mode 'r' at 0x1109eee40> is not JSON serializable", which is what it does with the following:

    with open ("resTwo.json", "r") as input_file:
        new = json.dumps(input_file)

        my_id = (re.findall(r'\w{1,9}\-\w{1,5}\-\w{1,5}\-\w{1,5}\-\w{1,13}', new))
        print my_id

I thought json.dumps converted into a string so the regex would then work as in the test example?

2 Answers 2

1

The rows returned from a csv reader object will be lists. re.findall expects a string as the second argument.

Either specify which field you want the regex to match on, or add another for-loop to iterate through each of the fields (i.e. iterate the row).

4
  • So the string I want is in row[0], but when I get it to print that out it looks like this: {"_links": {"self": {"href": "/xx-beta/media/111ee111-1e11-11a1-b111-111bb11b0ada/metadata"}} so if I want the regex to match on that field, do I need to convert it to a string so it could do that? if it iterates through the rows they still aren't in a format that works for the regex, i.e. the string that the findall is expecting, correct? So I would like more explanation on how to do that in order to match the regex expected syntax. Thanks. Commented Aug 28, 2017 at 18:50
  • That doesn't look much like a csv file
    – wim
    Commented Aug 28, 2017 at 18:52
  • It's json, but the file is saved as a csv in a script it's coming from. If it were a json to begin with, would there be a simpler solution? I tried working with it as a json but didn't succeed in getting that to be a string either or something the regex could work with. Commented Aug 28, 2017 at 18:58
  • I'm going to rework the question as json rather than csv because that makes more sense given the syntax of it. Commented Aug 28, 2017 at 19:23
0

I solved it with this:

    for value in input_file:
        mediaid = (re.findall(r'\w{1,9}\-\w{1,5}\-\w{1,5}\-\w{1,5}\-\w{1,13}', value))

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.