Faster solution for ascending sublist combinations

Question

This is a follow-up question of my previous question: Retrieve all possible combinations of ascending integers from sublists

The code that works for this problem is really slow for bigger lists. This is my solution so far:

from typing import List, Tuple
from itertools import product, combinations

def is_ascending(t: tuple) -> bool:
    return sorted(t) == list(t)

def get_ascending_combinations(a: List[List[int]]) -> List[Tuple, ...]]:
    all_combs = []
    add_comb = all_combs.append
    input_length = len(a) + 1
    for comb_length in range(2, input_length):
        for main_comb in combinations(a, comb_length):
            for comb in product(*main_comb):
                if is_ascending(comb):
                    add_comb(comb)
    return all_combs

example_list = [[0], [1, 4], [2]]
get_ascending_combinations(example_list)
>> [(0, 1), (0, 4), (0, 2), (1, 2), (0, 1, 2)]

example_list1 = [[0], [1, 4, 6], [2]]
get_ascending_combinations(example_list1)
>> [(0, 1), (0, 4), (0, 2), (1, 2), (0, 1, 2), (0, 6)]

example_list2 = [[0], [1, 4], [2], [5, 6]]
get_ascending_combinations(example_list2)
>> [(0, 1), (0, 4), (0, 2), (0, 5), (0, 6), (1, 2), (1, 5), (1, 6), (4, 5), (4, 6), (2, 5),
    (2, 6), (0, 1, 2), (0, 1, 5), (0, 1, 6), (0, 4, 5), (0, 4, 6), (0, 2, 5), (0, 2, 6),
    (1, 2, 5), (1, 2, 6), (0, 1, 2, 5), (0, 1, 2, 6)]

As shown in the example it should only make combinations in ascending order. Another restriction is that is only allowed to make combinations with only one item from each unique sublist.

However this code is really slow on larger lists, since the number of possible combinations is much larger. How can make a faster algorithm that does this? See the example list below.

large_example = [[19, 67], [21, 43], [64], [47, 65, 69, 90], [48, 70], [0, 27, 63], [1], [2], [3], 
                 [4, 53], [5], [6], [7, 57], [8, 58], [9], [10], [11, 82], [12, 37], [13], 
                 [14, 77], [15], [16], [17], [18, 74], [20], [22], [23], [24, 30], [25], [26], 
                 [28], [29], [31], [32], [33], [34], [35], [36], [38], [39], [40], [41], [78], 
                 [79], [80], [46, 84], [85], [86], [87], [88], [89], [91], [50], [93]]

Is each sublist in ascending order? The time complexity for long sublists will be better if you take advantage of having sorted sublists so you don't have to juxtapose every value of one sublist against every value of another. If you want to work with long sublists and the sublists are not already sorted, that should be the first stage of get_ascending_combinations. — TMBailey
– TMBailey, Commented Aug 16, 2021 at 5:55
Are sublists mutually exclusive so if an integer appears in one sublist the same integer does not appear in any other? — TMBailey
– TMBailey, Commented Aug 16, 2021 at 6:02
It is not really in ascending order, since it represents the indices of two texts that matched. The order of the sublists is the order in which the tokens occur in both texts — chatax
– chatax, Commented Aug 16, 2021 at 7:39
Sublists are therefore not mutually exclusive, since I want to find all combinations of matching tokens that are in a valid order. This is the reason that the combination lists should be in ascending order — chatax
– chatax, Commented Aug 16, 2021 at 7:42

gimix · Accepted Answer · 2021-08-13 14:06:15Z

Ok, I put here this code as is, though it probably needs further examination.

Approach

My approach is, first build all order-2 combinations, then create order-n combos (n = 3, 4, ..., length_of_list) by appending to each order-(n-1) combo each acceptable number from each subsequent sublist which 1. has no numbers in that combo, and 2. is subsequent to the last sublist used in that combo.

Performance

I checked the result with your example_list2 and it's ok, but my implementation is about 2x slower, because it has some overhead. Then I created a mid_example list containing the first 14 sublists from your large_example and the result is amazing. On my laptop:

>>> timeit("get_ascending_combinations(mid_example)", number=50, globals=globals())
40.589324300000044

>>> a=AscendingCombos(mid_example)
>>> timeit("a.calculate()", number=50, globals=globals())
0.23768389999997908

Caveats

I didn't check the output, although I'm reasonably confident it will be correct.

I wasn't able to execute neither of the procedures on the large_example because both were blocking other jobs I had in execution, although once again I expect the speed up to be confirmed

Code

class AscendingCombos():

    def __init__(self, mainlist):
        self.mainlist = mainlist
        self.maindict = {k:v for k,v in enumerate(mainlist)}
        self.mainrange = set(range(len(self.mainlist)))
        self.out = {}
        
    def calculate(self):
        ##order 2
        self.out[2] = []
        for combo in combinations(self.maindict.keys(), 2):
            for first in self.maindict[combo[0]]:
                for second in self.maindict[combo[1]]:
                    if second > first:
                        self.out[2].append([list(combo),[first,second]])
        ##order 3+        
        for order in range(3,len(self.mainlist)+1):
            self.out[order] = []
            for prevcombo in self.out[order-1]:
                cands = self.mainrange.difference(
                    prevcombo[0]).difference(
                        range(prevcombo[0][-1]))
                for cand in cands:
                    ##i = self.index[prevcombo[0]].append(self.mainlist.index(self.mainlist[cand]))
                    for nextnum in self.maindict[cand]:
                        if nextnum > prevcombo[1][-1]:
                            self.out[order].append([prevcombo[0]+[cand],prevcombo[1]+[nextnum]])
        return self.out

    def tolist(self):
        for order in range(2,len(self.mainlist)+1):
            print(order, [c[1] for c in self.out[order]])

the output contains integers within combinations that are not even in the input list
That shouldn't be possible. You will already have noted it, but just to be sure... each combination, i.e. each item in a.out[x] contains two lists, where the second one is the real combo: the first one is just a helper, it stores to which input sublist the numbers in the second one belong. To see the real list of combinations you need to run a.tolist() too
Anyhow later I will check the code to see whether some bugs slipped in :)

Balaïtous · Accepted Answer · 2021-06-26 10:30:18Z

Using recursive function you can do:

def get_ascending_combinations(a):
    if len(a) == 0:
        return []
    for left in get_ascending_combinations(a[:-1]):
        yield left
        for right in a[-1]:
            if left[-1] < right:
                yield left + [right]
    for right in a[-1]:
        yield [right]

>>> for x in get_ascending_combinations([[0], [1, 4], [2]]):
...     print(x)
[0]
[0, 2]
[0, 1]
[0, 1, 2]
[0, 4]
[1]
[1, 2]
[4]
[2]

But since number of combination is exponential it is not possible to get all as a list:

>>> it = get_ascending_combinations(large_example)
>>> for x in range(10):
...     print(next(it))
[19]
[19, 93]
[19, 50]
[19, 50, 93]
[19, 91]
[19, 91, 93]
[19, 89]
[19, 89, 93]
[19, 89, 91]
[19, 89, 91, 93]

The idea is interesting, but the result is wrong for example_list2 for example.
I dont't find difference when you remove results of size 1. But it's that's not matter according to first question.

Collectives™ on Stack Overflow

Faster solution for ascending sublist combinations

2 Answers 2

8 Comments

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

8 Comments

2 Comments

Linked

Related