2

I have a list of n values (in my case, n=19), and I want to generate all possible combinations of these values. My goal is to use each combination as a filter for a Polars DataFrame, iterate over the combinations, do some functions, and save the results into a new DataFrame.

However, since n=19, this results in 19! combinations, which overwhelms my RAM. Iterating over such a large number of combinations is impractical due to memory constraints.

How can I handle this computation efficiently without consuming too much RAM? Is there a way to either reduce memory usage or process this iteratively without holding everything in memory at once? Any suggestions for optimizing this workflow with Polars?

My current approach:

import polars as pl
import itertools

states = ["a", "b", "c", "d"]

df = pl.DataFrame({
    "ID": [1, 2, 3, 4, 5, 6,7,8,9,10],
    "state": ["b", "b", "a", "d","a", "b", "c", "d","c", "d"],
    "Value" : [3,6,9,12,15,18,21,24,27,30],
})

all_combinations = []

for r in range(1,len(states)+1):
    all_combinations.extend(itertools.combinations(states, r))

def foo(df):
    return(
    df
    )

new_rows = []

for i in range(len(all_combinations)):
    df_filtered = df.filter(pl.col("state").is_in(all_combinations[i]))
    df_func = foo(df_filtered)
    x = df_func.shape[0]
    new_rows.append({"loop_index": i, "shape": x})
    
df_final = pl.DataFrame(new_rows)
df_final

EDIT: Thanks for the feedback! I've realized my current approach isn't optimal. I'll post a new question with full context soon.

EDIT2 : Link to my new question: Optimizing Variable Combinations to Maximize a Classification

7
  • 2
    I'm not sure you are aware of just how big 19! is. It's not just memory; even if you could create 1,000,000,000 such filters every second, it would take nearly 4 years just to create, let alone use, them all. Commented Oct 8, 2024 at 17:37
  • 1
    19! is very, very large, and would have an extremely long runtime. Have you considered using Monte Carlo sampling to get some estimate of the statistics you would like? Commented Oct 8, 2024 at 17:37
  • 3
    What you call 'combinations' is actually the powerset of your states, so you actually only have 2^19 possible 'combinations'. Either way, this also doesn't scale if the number of states goes up. Commented Oct 8, 2024 at 18:34
  • 2
    @Simon Can you give us more information on the foo you're trying to calculate? That might make the difference between whether it's feasible or not. Commented Oct 8, 2024 at 18:36
  • 3
    for the record 19! === 121,645,100,408,832,000, please ,re-think your challenge, stating what you are attempting not 'the how' would be helpful, don't be vague be concise. Commented Oct 8, 2024 at 22:28

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.