0

I have a data.csv file contain 1 column call data trend in json type, each row of it is:

 1. [{"period":"11/2020", "UNTPRICE": 49940000,"DIST_UNTPRICE": 30789500},
     {"period":"12/2020", "UNTPRICE": 48710000,"DIST_UNTPRICE": 30719773}]
 2. [{"period":"12/2020", "UNTPRICE": 28540000,"DIST_UNTPRICE": 27428824}]
 3. [{"period":"12/2020", "UNTPRICE": 27428824,"DIST_UNTPRICE": 28540000}]

The question here is how to covert this column to a array like this in python

|UNTPRICE(11/2020)|DIST_UNTPRICE(11/2020)|UNTPRICE(12/2020)|DIST_UNTPRICE(12/2020)|
|-----------------|----------------------|-----------------|----------------------|
|     4994000     |        30789500      | 48710000        | 30719773             |
|     NULL        |        NULL          |28540000         |27428824              |
|     NULL        |        NULL          |27428824         |28540000              |

sample raw image of csv file enter image description here

4
  • 1
    Does it really have the numbers 1, 2, 3 in the csv? Please just post the raw contents of the csv. Commented Jan 12, 2021 at 7:20
  • Are you using pandas? It has a pivot function. Commented Jan 12, 2021 at 7:24
  • @Barmar can you explain in more detail Commented Jan 12, 2021 at 7:27
  • @bremen_matt i just edited Commented Jan 12, 2021 at 7:30

2 Answers 2

2

first of all, write a function to convert a row from the csv file to a row in the data frame:

import json

def csv_row_to_df_row(csv_row):
    csv_row = json.loads(csv_row)
    df_row = {}
    for entry in csv_row:
        period = entry['period']
        for k, v in entry.items():
            if k != 'period':
                df_row[f'{k}({period})'] = int(v)
    return df_row

then you can iterate all the lines in the file and add the rows to you data frame.

import pandas as pd

df = pd.DataFrame()
with open('yourfile.csv') as f:
    for csv_row in f:
        df_row = csv_row_to_df_row(csv_row)
        df = df.append(df_row, ignore_index=True)

to get the same order of columns as in the desired output:

df = df[['UNTPRICE(11/2020)', 'DIST_UNTPRICE(11/2020)', 'UNTPRICE(12/2020)', 'DIST_UNTPRICE(12/2020)']]
Sign up to request clarification or add additional context in comments.

7 Comments

Thanks for the answer but the output is not the same format that I wanted, it not contain NULL value in it
@Al3xSund3r When appending using pandas.DataFrame.append NaN is automatically added.
I want this added as new column, append not work in this case
@Al3xSund3r I've edited the answer, does it work now?
for example my csv file contain 5 column and json string in column 3, how can I do this
|
0

first load json using pandas

import pandas as pd
df = pd.read_json('data.json')

then transpose dataframe using:

df.T

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.