7

Given a combination of k of the first n natural numbers, for some reason I need to find the position of such combination among those returned by itertools.combination(range(1,n),k) (the reason is that this way I can use a list instead of a dict to access values associated to each combination, knowing the combination).

Since itertools yields its combinations in a regular pattern it is possible to do it (and I also found a neat algorithm), but I'm looking for an even faster/natural way which I might ignore.

By the way here is my solution:

def find_idx(comb,n):
    k=len(comb)
    idx=0
    last_c=0
    for c in comb:
        #idx+=sum(nck(n-2-x,k-1) for x in range(c-last_c-1)) # a little faster without nck caching
        idx+=nck(n-1,k)-nck(n-c+last_c,k) # more elegant (thanks to Ray), faster with nck caching
        n-=c-last_c
        k-=1
        last_c=c
    return idx

where nck returns the binomial coefficient of n,k.

For example:

comb=list(itertools.combinations(range(1,14),6))[654] #pick the 654th combination
find_idx(comb,14) # -> 654

And here is an equivalent but maybe less involved version (actually I derived the previous one from the following one). I considered the integers of the combination c as positions of 1s in a binary digit, I built a binary tree on parsing 0/1, and I found a regular pattern of index increments during parsing:

def find_idx(comb,n):
    k=len(comb)
    b=bin(sum(1<<(x-1) for x in comb))[2:]
    idx=0
    for s in b[::-1]:
        if s=='0':
            idx+=nck(n-2,k-1)
        else:
            k-=1
        n-=1
    return idx
4
  • 4
    +1 It might be helpful to take a look at the "source code" of itertools.combinations: docs.python.org/2/library/itertools.html#itertools.combinations Commented Jan 22, 2013 at 9:54
  • 2
    These look like iterative versions of the unchoose ranker I keep in the toolbox. find_idx seems faster than unchoose, which isn't surprising, as Python recursion tends to be slow. Commented Jan 22, 2013 at 10:06
  • 1
    Thinking out loud: would combinatorial numbers help here? Commented Jan 22, 2013 at 17:03
  • @Iarsmans Great link, thanks. Nevertheless I found a difference between lexicographic ordering explained in Wikipedia and the one used by itertools: the former assume decreasing ordering of chosen items, the latter increasing ordering. I'm trying to reduce my algorithm to something analogous (is not that far) to the elegant and fast one in Wikipedia, but I don't know whether it's feasible, due to such difference. Commented Jan 22, 2013 at 21:49

3 Answers 3

2

Your solution seems quite fast. In find_idx, you have two for loop, the inner loop can be optimized using the formular:

C(n, k) + C(n-1, k) + ... + C(n-r, k) = C(n+1, k+1) - C(n-r, k+1)

so, you can replace sum(nck(n-2-x,k-1) for x in range(c-last_c-1)) with nck(n-1, k) - nck(n-c+last_c, k).

I don't know how you implement your nck(n, k) function, but it should be O(k) measured in time complexity. Here I provide my implementation:

from operator import mul
from functools import reduce # In python 3
def nck_safe(n, k):
    if k < 0 or n < k: return 0
    return reduce(mul, range(n, n-k, -1), 1) // reduce(mul, range(1, k+1), 1)

Finally, your solution become O(k^2) without recursion. It's quite fast since k wouldn't be too large.

Update

I've noticed that nck's parameters are (n, k). Both n and k won't be too large. We may speed up the program by caching.

def nck(n, k, _cache={}):
    if (n, k) in _cache: return _cache[n, k]
    ....
    # before returning the result
    _cache[n, k] = result
    return result

In python3 this can be done by using functools.lru_cache decorator:

@functools.lru_cache(maxsize=500)
def nck(n, k):
    ...
Sign up to request clarification or add additional context in comments.

4 Comments

Great "theoretical" improvement, I was looking exactly for such a formula. Unfortunately it seems not to bring much performance gain. I tried some tests and with great disappointment I found no speed advancement in comparison to the original version with sum. I guess that after all you need more cycles to calculate the 2 nck than to compute the several nck of the sum in the original version. By the way I updated the code to reflect your "theoretical" optimization.
Next optimization step should be to find a similar formula even for the main loop.
What a pity that the trick don't work well. I've introduced another way in my updated answer. The two method can be combined to use.
Caching surely is a possibility when the ranges of n and k are small (I actually used it in my real implementation), and in such case your improvement is useful cause it always needs just 2 lookups.
0

I dug up some old (although it's been converted to Python 3 syntax) code that includes the function combination_index which does what you request:

def fact(n, _f=[1, 1, 2, 6, 24, 120, 720]):
    """Return n!
    The “hidden” list _f acts as a cache"""
    try:
        return _f[n]
    except IndexError:
        while len(_f) <= n:
            _f.append(_f[-1] * len(_f))
        return _f[n]

def indexed_combination(n: int, k: int, index: int) -> tuple:
    """Select the 'index'th combination of k over n
    Result is a tuple (i | i∈{0…n-1}) of length k

    Note that if index ≥ binomial_coefficient(n,k)
    then the result is almost always invalid"""

    result= []
    for item, n in enumerate(range(n, -1, -1)):
        pivot= fact(n-1)//fact(k-1)//fact(n-k)
        if index < pivot:
            result.append(item)
            k-= 1
            if k <= 0: break
        else:
            index-= pivot
    return tuple(result)

def combination_index(combination: tuple, n: int) -> int:
    """Return the index of combination (length == k)

    The combination argument should be a sorted sequence (i | i∈{0…n-1})"""

    k= len(combination)
    index= 0
    item_in_check= 0
    n-= 1 # to simplify subsequent calculations
    for offset, item in enumerate(combination, 1):
        while item_in_check < item:
            index+= fact(n-item_in_check)//fact(k-offset)//fact(n+offset-item_in_check-k)
            item_in_check+= 1
        item_in_check+= 1
    return index

def test():
    for n in range(1, 11):
        for k in range(1, n+1):
            max_index= fact(n)//fact(k)//fact(n-k)
            for i in range(max_index):
                comb= indexed_combination(n, k, i)
                i2= combination_index(comb, n)
                if i2 != i:
                    raise RuntimeError("mismatching n:%d k:%d i:%d≠%d" % (n, k, i, i2))

indexed_combination does the inverse operation.

PS I remember that I sometime attempted removing all those fact calls (by substituting appropriate incremental multiplications and divisions) but the code became much more complicated and wasn't actually faster. A speedup was achievable if I substituted a pre-calculated list of factorials for the fact function, but again the speed difference was negligible for my use cases, so I kept this version.

1 Comment

You can just use stdlib math.factorial. And probably using math.comb then you won't even need the factorial function (it looks like using fact just as a means to compute nCr, correct me if I'm wrong).
-1

Looks like you need to better specify your task or I am just getting it wrong. For me it seems that when you iterating through the itertools.combination you can save indexes you need to an appropriate data structure. If you need all of them then I would go with the dict (one dict for all your needs):

combinationToIdx = {}
for (idx, comb) in enumerate(itertools.combinations(range(1,14),6)):
    combinationToIdx[comb] = idx

def findIdx(comb):
    return combinationToIdx[comb]

1 Comment

Using dict is more time and memory consuming than using list. Moreover, if you have (like I do) billions of combinations, to store all comb/index pairs you might need more memory and time than you have, whereas a function needs few bytes of memory and you spend time to get the index only when you need it.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.