I have a list of n values (in my case, n=19), and I want to generate all possible combinations of these values. My goal is to use each combination as a filter for a Polars DataFrame, iterate over the combinations, do some functions, and save the results into a new DataFrame.
However, since n=19, this results in 19! combinations, which overwhelms my RAM. Iterating over such a large number of combinations is impractical due to memory constraints.
How can I handle this computation efficiently without consuming too much RAM? Is there a way to either reduce memory usage or process this iteratively without holding everything in memory at once? Any suggestions for optimizing this workflow with Polars?
My current approach:
import polars as pl
import itertools
states = ["a", "b", "c", "d"]
df = pl.DataFrame({
"ID": [1, 2, 3, 4, 5, 6,7,8,9,10],
"state": ["b", "b", "a", "d","a", "b", "c", "d","c", "d"],
"Value" : [3,6,9,12,15,18,21,24,27,30],
})
all_combinations = []
for r in range(1,len(states)+1):
all_combinations.extend(itertools.combinations(states, r))
def foo(df):
return(
df
)
new_rows = []
for i in range(len(all_combinations)):
df_filtered = df.filter(pl.col("state").is_in(all_combinations[i]))
df_func = foo(df_filtered)
x = df_func.shape[0]
new_rows.append({"loop_index": i, "shape": x})
df_final = pl.DataFrame(new_rows)
df_final
EDIT: Thanks for the feedback! I've realized my current approach isn't optimal. I'll post a new question with full context soon.
EDIT2 : Link to my new question: Optimizing Variable Combinations to Maximize a Classification
19!is. It's not just memory; even if you could create 1,000,000,000 such filters every second, it would take nearly 4 years just to create, let alone use, them all.fooyou're trying to calculate? That might make the difference between whether it's feasible or not.