1

There are loads of answers on this topic, but for the life of me I cannot a solution to my issue.

Say I have a JSON like

json_2_explode = [{'scalar': '43',
  'units': 'm',
  'parameter': [{'no_1': '45',
    'no_2': '1038',
    'no_3': '356'}],
  'name': 'Foo'},
 {'scalar': '54.1',
  'units': 's',
  'parameter': [{'no_1': '78',
    'no_2': '103',
    'no_3': '356'}],
  'name': 'Yoo'},
 {'scalar': '1123.1',
  'units': 'Hz',
  'parameter': [{'no_1': '21',
    'no_2': '43',
    'no_3': '3577'}],
  'name': 'Baz'}]

documenting some readings for attributes Foo, Yoo and Baz. For each I detail a number, that is, the value itself, some parameters, and the name.

Say this JSON is a column in a dataframe,

df = pd.DataFrame(data = {'col1': [11, 9, 23, 1],
                         'col2': [7, 3, 1, 12],
                         'col_json' : [json_2_explode,
                                       json_2_explode,
                                       json_2_explode,
                                       json_2_explode]}, index=[0, 1, 2, 3])
    col1    col2    col_json
0   11      7        [{'scalar': '43', 'units': 'MPa', 'parameter':...
1   9       3        [{'scalar': '43', 'units': 'MPa', 'parameter':...
2   23      1        [{'scalar': '43', 'units': 'MPa', 'parameter':...
3   1       12       [{'scalar': '43', 'units': 'MPa', 'parameter':...

The issue I have is that if I try

df = pd.json_normalize(df['col_json'].explode())

I get

    scalar  units            parameter                          name
0   43      m   [{'no_1': '45', 'no_2': '1038', 'no_3': '356'}] Foo
1   54.1    s   [{'no_1': '78', 'no_2': '103', 'no_3': '356'}]  Yoo
2   1123.1  Hz  [{'no_1': '21', 'no_2': '43', 'no_3': '3577'}]  Baz
3   43      m   [{'no_1': '45', 'no_2': '1038', 'no_3': '356'}] Foo
4   54.1    s   [{'no_1': '78', 'no_2': '103', 'no_3': '356'}]  Yoo
5   1123.1  Hz  [{'no_1': '21', 'no_2': '43', 'no_3': '3577'}]  Baz
6   43      m   [{'no_1': '45', 'no_2': '1038', 'no_3': '356'}] Foo
7   54.1    s   [{'no_1': '78', 'no_2': '103', 'no_3': '356'}]  Yoo
8   1123.1  Hz  [{'no_1': '21', 'no_2': '43', 'no_3': '3577'}]  Baz
9   43      m   [{'no_1': '45', 'no_2': '1038', 'no_3': '356'}] Foo
10  54.1    s   [{'no_1': '78', 'no_2': '103', 'no_3': '356'}]  Yoo
11  1123.1  Hz  [{'no_1': '21', 'no_2': '43', 'no_3': '3577'}]  Baz

So it is exploding each JSON in 3 rows (admitteldy each JSON does contain 3 sub-dicts, so to say).
I actually would like Foo, Yoo and Baz to be documented in the same row, adding columns. Is there maybe solution not involving manually manipulating rows/piercing it as desired? I would love to see one of your fancy one-liners, thanks so much for your help

1
  • kindly add your expected output dataframe
    – sammywemmy
    Commented 16 hours ago

1 Answer 1

0

json_normalize may work after adding prefixes, I didn't try but this puts all your data on one line.

from flatten_json import flatten

prefix_json_2_explode = {}
for d in json_2_explode:
    prefix_json_2_explode.update({d['name'] + '_' + key: value for key, value in d.items()})
print(prefix_json_2_explode)
dict_flattened = (flatten(d, '.') for d in [prefix_json_2_explode])
df = pd.DataFrame(dict_flattened)
df

  Foo_scalar Foo_units Foo_parameter.0.no_1 Foo_parameter.0.no_2  \
0         43         m                   45                 1038   

  Foo_parameter.0.no_3 Foo_name Yoo_scalar Yoo_units Yoo_parameter.0.no_1  \
0                  356      Foo       54.1         s                   78   

  Yoo_parameter.0.no_2 Yoo_parameter.0.no_3 Yoo_name Baz_scalar Baz_units  \
0                  103                  356      Yoo     1123.1        Hz   

  Baz_parameter.0.no_1 Baz_parameter.0.no_2 Baz_parameter.0.no_3 Baz_name  
0                   21                   43                 3577      Baz  

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.