0

I have a list of names. for each name, I start with my dataframe df, and use the elements in the list to define new columns for the df. after my data manipulation is complete, I eventually create a new data frame whose name is partially derived from the list element.

list = ['foo','bar']
for x in list :
      df = prior_df
      (long code for manipulating df)
      new_df_x = df
      new_df_x.to_parquet('new_df_x.parquet')
      del new_df_x

new_df_foo = pd.read_parquet(new_df_foo.parquet)
new_df_bar = pd.read_parquet(new_df_bar.parquet)
new_df = pd.merege(new_df_foo ,new_df_bar , ...)

The reason I am using this approach is that, if I don't use a loop and just add the foo and bar columns one after another to the original df, my data gets really big and highly fragmented before I go from wide to long and I encounter insufficient memory error. The workaround for me is to create a loop and store the data frame for each element and then at the very end join the long-format data frames together. Therefore, I cannot use the approach suggested in other answers such as creating dictionaries etc. I am stuck at the line

new_df_x = df

where within the loop, I am using the list element in the name of the data frame. I'd appreciate any help.

2 Answers 2

1

IIUC, you only want the filenames, i.e. the stored parquet files to have the foo and bar markers, and you can reuse the variable name itself.

list = ['foo','bar']
for x in list :
      df = prior_df
      (long code for manipulating df)
      df.to_parquet(f'new_df_{x}.parquet')
      del df

new_df_foo = pd.read_parquet(new_df_foo.parquet)
new_df_bar = pd.read_parquet(new_df_bar.parquet)
new_df = pd.merge(new_df_foo ,new_df_bar , ...)
4
  • Thanks, it was so simple and I don't know why I never looked at it this way!
    – jayjunior
    Commented Oct 20, 2022 at 16:25
  • quick question. Is the f'new_df_{x}.parquet' correct? or is the leading f a typo?
    – jayjunior
    Commented Oct 20, 2022 at 16:26
  • 1
    No, the f is correct - it indicates an f-string - see here
    – Mortz
    Commented Oct 20, 2022 at 16:41
  • It's alright - the simplest "bugs" in our own codes are always the hardest to spot by ourselves ;) - sometimes explaining your code to a Rubber Duck helps :)
    – Mortz
    Commented Oct 20, 2022 at 16:44
1

Here is an example, if you are looking to define a variables names dataframe using a list element.

import pandas as pd
data = {"A": [42, 38, 39],"B": [13, 25, 45]}

prior_df=pd.DataFrame(data)

list= ['foo','bar'] 

variables = locals()


for x in list :
      df = prior_df.copy() # assign a dataframe copy to the variable df.
      # (smple code for manipulating df)
      #-----------------------------------
      if x=='foo':
        df['B']=df['A']+df['B'] #
      if x=='bar':
        df['B']=df['A']-df['B'] #
      #-----------------------------------
        
      new_df_x="new_df_{0}".format(x)
      variables[new_df_x]=df
      #del variables[new_df_x]   

print(new_df_foo) # print the 1st df variable.
print(new_df_bar) # print the 2nd df variable.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.