1

I am begginer in python.

I have three datasets that have a similar structure and I want to perform the same process for each one and save the results. My code run well but it only save the results for the last one base in bases. ¿How can I save the results in three different datasets?

My code looks like:

bases = [base1, base1_a, base1_b]
for base in bases:
    base = base.groupby(["Cliente_id", "Periodo_id"]).agg({"monto_2018": ["sum", "mean", "min", "max"]}).reset_index().round(1)
    base.columns = base.columns.get_level_values(1)
    base = base.set_axis(["Cliente_id", "Periodo_id", 'sum_v', 'mean_v', 'min_v', "max_v"], axis=1, inplace=False)
5
  • What do you mean by "it only save the results for the last one base in bases"? Where does your code save the results? What is the actual output of your code? What do you want it to be instead? Commented Nov 25, 2021 at 16:20
  • What I mean is that I do not know how to save the results of my code in three separate dataframes. The code I wrote only do the process for every dataframe I had. Commented Nov 25, 2021 at 16:26
  • Did you make sure to indent the code within the for loop, or is that just a mistake in the post?
    – morhc
    Commented Nov 25, 2021 at 16:28
  • @ShirleyMichelleRedroban Refresh the page to see my latest edit to make sure your code is formatted correctly. Commented Nov 25, 2021 at 16:30
  • 1
    try for base in bases: print(base) I think you are already achieving what you want, with this you can verify it
    – Ivan
    Commented Nov 25, 2021 at 16:30

2 Answers 2

2
bases = [base1, base1_a, base1_b]
for base in bases:
    base = base.groupby(["Cliente_id", "Periodo_id"]).agg({"monto_2018": ["sum", "mean", "min", "max"]}).reset_index().round(1)
    base.columns = base.columns.get_level_values(1)
    base = base.set_axis(["Cliente_id", "Periodo_id", 'sum_v', 'mean_v', 'min_v', "max_v"], axis=1, inplace=False)

When you assign base = base.groupby() in the loop, you are reassigning the variable base to refer to a new dataframe which is the result of the groupby() action. This does NOT modify the original bases list.

To get a list of your new data frames, you should just create a new list:

bases = [base1, base1_a, base1_b]
results = [] # -----------------------------> create an empty list
for base in bases:
    base = base.groupby(["Cliente_id", "Periodo_id"]).agg({"monto_2018": ["sum", "mean", "min", "max"]}).reset_index().round(1)
    base.columns = base.columns.get_level_values(1)
    base = base.set_axis(["Cliente_id", "Periodo_id", 'sum_v', 'mean_v', 'min_v', "max_v"], axis=1, inplace=False)
    results.append(base) # ------------------> Add dataframe to the new list

I chose the name results here as a generic name. I strongly encourage you to use a different name that is more descriptive for what you are doing.

4
  • Thank you so much!!! Commented Nov 25, 2021 at 17:43
  • Good to see a Java/Android guru answer a Python pandas question. We need more polygots on the Stack! BTW - the pythonic list comprehension avoids the bookkeeping of empty list and append calls.
    – Parfait
    Commented Nov 25, 2021 at 18:36
  • @Parfait Yes, a list comprehension could be used here. Since the expression for each element in the result list is so complex, I prefer the explicit for loop. To avoid the initial empty list and .append() calls, I would create a generator that yields each element, but I decided to leave that out of my answer here because generators are a more advanced topic. Commented Nov 25, 2021 at 18:47
  • @Parfait And thanks for the kudos. I was learning Android on my own when I first started contributing to Stack Overflow. I have much more professional experience with Python in general, but not much with pandas. Commented Nov 25, 2021 at 18:49
1

Avoid saving similarly structured data frames as separate objects in global environment. Right now it is three but can easily be 300! Instead, continue using a list (bases = [base1, base1_a, base1_b]) or even better dictionary as shown below then process it with list or dict comprehension using a user-defined method. Below uses named aggregation to avoid the column header manipulation.

# ASSIGN DATA FRAMES IN DICTIONARY 
# IDEALLY. YOU CREATE DICT DIRECTLY FROM SOURCE VIA read_* METHODS
df_dict = {"base1": base1, "base1_a": base1_a, "base1_b": base1_b}

def process_data(base):
    base = (
        base.groupby(["Cliente_id", "Periodo_id"], as_index=False)
            .agg(
                 sum_v=("monto_2018", "sum"),
                 mean_v=("monto_2018", "mean"),
                 min_v=("monto_2018", "min"),
                 max_v=("monto_2018", "max")
            )
            .round(1)
    )

    return base

# DICTIONARY COMPREHENSION TO ITERATE THROUGH ALL DATA FRAMES
new_df_dict = { k:process_data(v) for k,v in df_dict.items() }

Then call your dictionary elements as needed since you lose no functionality of data frame if saved in a container like list or dict:

new_df_dict["base1"].head()
new_df_dict["base1_a"].tail()
new_df_dict["base1_b"].describe()

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.