Create multiple dataframes inside a for loop - pandas

Question

I am begginer in python.

I have three datasets that have a similar structure and I want to perform the same process for each one and save the results. My code run well but it only save the results for the last one base in bases. ¿How can I save the results in three different datasets?

My code looks like:

bases = [base1, base1_a, base1_b]
for base in bases:
    base = base.groupby(["Cliente_id", "Periodo_id"]).agg({"monto_2018": ["sum", "mean", "min", "max"]}).reset_index().round(1)
    base.columns = base.columns.get_level_values(1)
    base = base.set_axis(["Cliente_id", "Periodo_id", 'sum_v', 'mean_v', 'min_v', "max_v"], axis=1, inplace=False)

What do you mean by "it only save the results for the last one base in bases"? Where does your code save the results? What is the actual output of your code? What do you want it to be instead? — Code-Apprentice, Commented Nov 25, 2021 at 16:20
What I mean is that I do not know how to save the results of my code in three separate dataframes. The code I wrote only do the process for every dataframe I had. — Shirley Michelle Redroban, Commented Nov 25, 2021 at 16:26
Did you make sure to indent the code within the for loop, or is that just a mistake in the post? — morhc, Commented Nov 25, 2021 at 16:28
@ShirleyMichelleRedroban Refresh the page to see my latest edit to make sure your code is formatted correctly. — Code-Apprentice, Commented Nov 25, 2021 at 16:30
try for base in bases: print(base) I think you are already achieving what you want, with this you can verify it — Ivan, Commented Nov 25, 2021 at 16:30

Code-Apprentice · Accepted Answer · 2021-11-25 17:51:07Z

2

bases = [base1, base1_a, base1_b]
for base in bases:
    base = base.groupby(["Cliente_id", "Periodo_id"]).agg({"monto_2018": ["sum", "mean", "min", "max"]}).reset_index().round(1)
    base.columns = base.columns.get_level_values(1)
    base = base.set_axis(["Cliente_id", "Periodo_id", 'sum_v', 'mean_v', 'min_v', "max_v"], axis=1, inplace=False)

When you assign base = base.groupby() in the loop, you are reassigning the variable base to refer to a new dataframe which is the result of the groupby() action. This does NOT modify the original bases list.

To get a list of your new data frames, you should just create a new list:

bases = [base1, base1_a, base1_b]
results = [] # -----------------------------> create an empty list
for base in bases:
    base = base.groupby(["Cliente_id", "Periodo_id"]).agg({"monto_2018": ["sum", "mean", "min", "max"]}).reset_index().round(1)
    base.columns = base.columns.get_level_values(1)
    base = base.set_axis(["Cliente_id", "Periodo_id", 'sum_v', 'mean_v', 'min_v', "max_v"], axis=1, inplace=False)
    results.append(base) # ------------------> Add dataframe to the new list

I chose the name results here as a generic name. I strongly encourage you to use a different name that is more descriptive for what you are doing.

edited Nov 25, 2021 at 17:51

answered Nov 25, 2021 at 16:35

Code-Apprentice

83.7k26 gold badges161 silver badges285 bronze badges

Thank you so much!!!
– Shirley Michelle Redroban
Commented Nov 25, 2021 at 17:43
Good to see a Java/Android guru answer a Python pandas question. We need more polygots on the Stack! BTW - the pythonic list comprehension avoids the bookkeeping of empty list and append calls.
– Parfait
Commented Nov 25, 2021 at 18:36
@Parfait Yes, a list comprehension could be used here. Since the expression for each element in the result list is so complex, I prefer the explicit for loop. To avoid the initial empty list and .append() calls, I would create a generator that yields each element, but I decided to leave that out of my answer here because generators are a more advanced topic.
– Code-Apprentice
Commented Nov 25, 2021 at 18:47
@Parfait And thanks for the kudos. I was learning Android on my own when I first started contributing to Stack Overflow. I have much more professional experience with Python in general, but not much with pandas.
– Code-Apprentice
Commented Nov 25, 2021 at 18:49

Add a comment |

Parfait · Accepted Answer · 2021-11-25 16:47:16Z

Avoid saving similarly structured data frames as separate objects in global environment. Right now it is three but can easily be 300! Instead, continue using a list (bases = [base1, base1_a, base1_b]) or even better dictionary as shown below then process it with list or dict comprehension using a user-defined method. Below uses named aggregation to avoid the column header manipulation.

# ASSIGN DATA FRAMES IN DICTIONARY 
# IDEALLY. YOU CREATE DICT DIRECTLY FROM SOURCE VIA read_* METHODS
df_dict = {"base1": base1, "base1_a": base1_a, "base1_b": base1_b}

def process_data(base):
    base = (
        base.groupby(["Cliente_id", "Periodo_id"], as_index=False)
            .agg(
                 sum_v=("monto_2018", "sum"),
                 mean_v=("monto_2018", "mean"),
                 min_v=("monto_2018", "min"),
                 max_v=("monto_2018", "max")
            )
            .round(1)
    )

    return base

# DICTIONARY COMPREHENSION TO ITERATE THROUGH ALL DATA FRAMES
new_df_dict = { k:process_data(v) for k,v in df_dict.items() }

Then call your dictionary elements as needed since you lose no functionality of data frame if saved in a container like list or dict:

new_df_dict["base1"].head()
new_df_dict["base1_a"].tail()
new_df_dict["base1_b"].describe()

Collectives™ on Stack Overflow

Create multiple dataframes inside a for loop - pandas

2 Answers 2

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Related