0

So I have a situation where I need to convert an Excel sheet into an array of Json objects using Python3.

The Excel sheet looks like the following:

CARTS ITEMS
Cart A Lemons
Apples
Strawberries
Cart B Chocolate
Cake
Biscuits

etc etc...

Now, the expectation is that items can be added to each "Cart". So for example, someone will go in and add an additional row into Cart A where the cell in the CARTS column will remain empty, and the respective cell in the ITEMS column will contain the new item (i.e Banana). The row will shift down so it doesn't overlap Cart B of course.

What would need to be done is to convert this sheet/table into an array of JSON objects that look like this:

[ 
    {
       "CARTS": "Cart A",

       "ITEMS": ["Lemons", "Apples", "Strawberries"]        
    },
    
    {
       "CARTS": "Cart B",

       "ITEMS": ["Chocolate", "Cakes", "Biscuits"]
    }

]  
        

Using Python3. What is the best approach/solution for this? Sadly I'm a beginner in Python at this point so not aware of all its functionality with excel etc.

I attempted a solution but it's not working. Where I'm struggling is the empty cells and converting items into a JSON array etc

3 Answers 3

0

There is probably a faster and more pythonic way but this one should do and is quite understandable:

import pandas as pd

df = pd.read_excel("sample.xlsx")

sample = []
for i, line in df.iterrows():
    if isinstance(line[0], str):  # to handle empty rows, read as float by pandas by default
        sample.append({"CARTS": line[0], "ITEMS": [line[1]]})
    else:
        sample[-1]["ITEMS"].append(line[1])
print(sample)

# [
#     {"CARTS": "Cart A", "ITEMS": ["Lemons", "Apples", "Strawberries"]},
#     {"CARTS": "Cart B", "ITEMS": ["Chocolate", "Cake", "Biscuits"]},
# ]

You need to install pandas library and openpyxl to be able to read excel files.

pip install pandas openpyxl
0

Using pandas groupby:

import pandas as pd
import pprint

df = pd.read_excel(r"...")
# Fill to group easily
df.fillna(method="ffill", inplace=True)

d = []

# Iterate over the rows of the DataFrame
for name, group in df.groupby("CARTS"):
    d.append({
        "CARTS": name,
        "ITEMS": list(group["ITEMS"])
    })

# Print the output list in the desired format
pprint.pprint(d, indent=2)

Output:

$ python3 parser.py
>>>
[ {'CARTS': 'Cart A', 'ITEMS': ['Lemons', 'Apples', 'Strawberries']},
  {'CARTS': 'Cart B', 'ITEMS': ['Chocolate', 'Cake', 'Biscuits']}]
0
0

another way:

import pandas as pd
import json

df = pd.read_excel('carts.xlsx')
df["CARTS"].fillna(method='ffill',inplace=True)
data = json.loads(df.to_json())
carts = data.get("CARTS").values()
# dict_values(['CART A', 'CART A', 'CART A', 'CART B', 'CART B', 'CART B'])
items = data.get("ITEMS").values()
# dict_values(['Lemons', 'Apples', 'Strawberries', 'Chocolate', 'Cake', 'Biscuits'])
result = {k:list() for k in set(carts)}
# result = {'CART A': [], 'CART B': []}
for k, v in zip(carts, items):
    result[k].append(v)
print(result)

should produce:

{'CART A': ['Lemons', 'Apples', 'Strawberries'], 'CART B': ['Chocolate', 'Cake', 'Biscuits']}

and, if you haven't:

pip install pandas openpyxl

PS: for me, the empty cells are NaN but if not you can just add:

df[df["CARTS"]==""] = np.NaN

which would require:

import numpy as np

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.