Efficiently Handling Large Combinations in Polars Without Overloading RAM

Ask Question

Asked 1 year, 4 months ago

Modified 3 months ago

Viewed 146 times

I have a list of n values (in my case, n=19), and I want to generate all possible combinations of these values. My goal is to use each combination as a filter for a Polars DataFrame, iterate over the combinations, do some functions, and save the results into a new DataFrame.

However, since n=19, this results in 19! combinations, which overwhelms my RAM. Iterating over such a large number of combinations is impractical due to memory constraints.

How can I handle this computation efficiently without consuming too much RAM? Is there a way to either reduce memory usage or process this iteratively without holding everything in memory at once? Any suggestions for optimizing this workflow with Polars?

My current approach:

import polars as pl
import itertools

states = ["a", "b", "c", "d"]

df = pl.DataFrame({
    "ID": [1, 2, 3, 4, 5, 6,7,8,9,10],
    "state": ["b", "b", "a", "d","a", "b", "c", "d","c", "d"],
    "Value" : [3,6,9,12,15,18,21,24,27,30],
})

all_combinations = []

for r in range(1,len(states)+1):
    all_combinations.extend(itertools.combinations(states, r))

def foo(df):
    return(
    df
    )

new_rows = []

for i in range(len(all_combinations)):
    df_filtered = df.filter(pl.col("state").is_in(all_combinations[i]))
    df_func = foo(df_filtered)
    x = df_func.shape[0]
    new_rows.append({"loop_index": i, "shape": x})
    
df_final = pl.DataFrame(new_rows)
df_final

EDIT: Thanks for the feedback! I've realized my current approach isn't optimal. I'll post a new question with full context soon.

EDIT2 : Link to my new question: Optimizing Variable Combinations to Maximize a Classification

edited Nov 2, 2025 at 20:44

jqurious

24.6k6 gold badges24 silver badges44 bronze badges

asked Oct 8, 2024 at 17:21

Simon

1,2097 silver badges20 bronze badges

2

I'm not sure you are aware of just how big 19! is. It's not just memory; even if you could create 1,000,000,000 such filters every second, it would take nearly 4 years just to create, let alone use, them all.

chepner
– chepner

2024-10-08 17:37:20 +00:00
Commented Oct 8, 2024 at 17:37
1

19! is very, very large, and would have an extremely long runtime. Have you considered using Monte Carlo sampling to get some estimate of the statistics you would like?

Brock Brown
– Brock Brown

2024-10-08 17:37:24 +00:00
Commented Oct 8, 2024 at 17:37
3

What you call 'combinations' is actually the powerset of your states, so you actually only have 2^19 possible 'combinations'. Either way, this also doesn't scale if the number of states goes up.

orlp
– orlp

2024-10-08 18:34:52 +00:00
Commented Oct 8, 2024 at 18:34
2

@Simon Can you give us more information on the foo you're trying to calculate? That might make the difference between whether it's feasible or not.

orlp
– orlp

2024-10-08 18:36:24 +00:00
Commented Oct 8, 2024 at 18:36
3

for the record 19! === 121,645,100,408,832,000, please ,re-think your challenge, stating what you are attempting not 'the how' would be helpful, don't be vague be concise.

ticktalk
– ticktalk

2024-10-08 22:28:32 +00:00
Commented Oct 8, 2024 at 22:28

| Show 2 more comments

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Efficiently Handling Large Combinations in Polars Without Overloading RAM

0

Linked

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Linked