I have a list of names. for each name, I start with my dataframe df, and use the elements in the list to define new columns for the df. after my data manipulation is complete, I eventually create a new data frame whose name is partially derived from the list element.
list = ['foo','bar']
for x in list :
df = prior_df
(long code for manipulating df)
new_df_x = df
new_df_x.to_parquet('new_df_x.parquet')
del new_df_x
new_df_foo = pd.read_parquet(new_df_foo.parquet)
new_df_bar = pd.read_parquet(new_df_bar.parquet)
new_df = pd.merege(new_df_foo ,new_df_bar , ...)
The reason I am using this approach is that, if I don't use a loop and just add the foo and bar columns one after another to the original df, my data gets really big and highly fragmented before I go from wide to long and I encounter insufficient memory error. The workaround for me is to create a loop and store the data frame for each element and then at the very end join the long-format data frames together. Therefore, I cannot use the approach suggested in other answers such as creating dictionaries etc. I am stuck at the line
new_df_x = df
where within the loop, I am using the list element in the name of the data frame. I'd appreciate any help.