0

i have a column dataframe with a json array that i want to split in columns for every row.

Dataframe

      FIRST_NAME                                       CUSTOMFIELDS
    0       Maria [{'FIELD_NAME': 'CONTACT_FIELD_1', 'FIELD_VALU...
    1       John  [{'FIELD_NAME': 'CONTACT_FIELD_1', 'FIELD_VALU...
    ...

Goal

I need convert the json content in that column into a dataframe

+------------+-----------------+-------------+-----------------+
| FIRST NAME |   FIELD_NAME    | FIELD_VALUE | CUSTOM_FIELD_ID |
+------------+-----------------+-------------+-----------------+
| Maria      | CONTACT_FIELD_1 | EN          | CONTACT_FIELD_1 |
| John       | CONTACT_FIELD_1 | false       | CONTACT_FIELD_1 |
+------------+-----------------+-------------+-----------------+

1 Answer 1

1

The code snippet below should work for you.

import pandas as pd
df = pd.DataFrame()
df['FIELD'] = [[{'FIELD_NAME': 'CONTACT_FIELD_1', 'FIELD_VALUE': 'EN', 'CUSTOM_FIELD_ID': 'CONTACT_FIELD_1'}, {'FIELD_NAME': 'CONTACT_FIELD_10', 'FIELD_VALUE': 'false', 'CUSTOM_FIELD_ID': 'CONTACT_FIELD_10'}]]

temp_dict = {}
counter = 0
for entry in df['FIELD'][0]:
    temp_dict[counter] = entry
    counter += 1

new_dataframe = pd.DataFrame.from_dict(temp_dict, orient='index')

new_dataframe #outputs dataframe

Edited answer to reflect edited question:

Under the assumption that each entry in CUSTOMFIELDS is a list with 1 element (which is different from original question; the entry had 2 elements), the following will work for you and create a dataframe in the requested format.

import pandas as pd

# Need to recreate example problem
df = pd.DataFrame()
df['CUSTOMFIELDS'] = [[{'FIELD_NAME': 'CONTACT_FIELD_1', 'FIELD_VALUE': 'EN', 'CUSTOM_FIELD_ID': 'CONTACT_FIELD_1'}], 
                      [{'FIELD_NAME': 'CONTACT_FIELD_1', 'FIELD_VALUE': 'FR', 'CUSTOM_FIELD_ID': 'CONTACT_FIELD_1'}]]
df['FIRST_NAME'] = ['Maria', 'John']

#begin solution
counter = 0
dataframe_solution = pd.DataFrame()
for index, row in df.iterrows():
    dataframe_solution = pd.concat([dataframe_solution, pd.DataFrame.from_dict(row['CUSTOMFIELDS'][0], orient = 'index').transpose()], sort = False, ignore_index = True)
    dataframe_solution.loc[counter,'FIRST_NAME'] = row['FIRST_NAME']
    counter += 1

Your dataframe is in dataframe_solution

Sign up to request clarification or add additional context in comments.

4 Comments

Thank you jammin, i forgot to mention get the json array field values for every json document. i've completed my question
Originally, you had a list in the CUSTOMFIELDS column with 2 elements in it. Is that still the case?
Made an assumption about the data but let me know if that doesn't work and I need to adjust it
@Vince no problem! Happy to help!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.