2

I need some logic or idea for Achieving the below use case:

Base Input List:

data=[
["10000025",710],
["1000138833",1065],
["100274005",820],
["1353180",3160],
["481584",3670],
["4851845845",1690],
["485584",1310],
["48848448",1000],
["49849948",1050],
["585598",4620],
["84154858584",620],
["841584584",2050],
["8451184584",2860],
["845188956",1800],
["845845184",1300],
["8458484",2300],
["8954884",1780],
["9416481584",2720],
["9448155584",1000],
["94949494",1000],
["959494158",1590],
["98558858",1550]
]

Expected Output: List of Combos from the list having the sum of 10000, and if no sum of a specific value is possible, then a list of leftover elements with a max of 6 elements at each combo. enter image description here

I tried following some threads:

Find all combinations of a list of numbers with a given sum

How to get all combination for a given sum with a given number of elements

My code:

data = [
    ["10000025", 710], ["1000138833", 1065], ["100274005", 820], ["1353180", 3160], ["481584", 3670],
    ["4851845845", 1690], ["485584", 1310], ["48848448", 1000],
    ["49849948", 1050], ["585598", 4620], ["84154858584", 620], ["841584584", 2050], ["8451184584", 2860],
    ["845188956", 1800], ["845845184", 1300], ["8458484", 2300], ["8954884", 1780], ["9416481584", 2720],
    ["9448155584", 1000],
    ["94949494", 1000],
    ["959494158", 1590], ["98558858", 1550]
]
valueList = [x[1] for x in data]


def GetNumbers(number):
    result = []
    for i in sorted(valueList, reverse=True):
        sum_list = sum(result)
        if sum_list + i == number:
            result.append(i)
            return result
        elif sum_list + i < number:
            result.append(i)
    return result


for i in range(0, len(data)):
    print(GetNumbers(10000))

Output in code could be in the list of the list or dict, whichever is achievable.

3 Answers 3

1

Here is a code, that solve this problem however there is a high complexity, so it will quickly become too long to compute if you have more element in your dataset.

data=[
["10000025",710],
["1000138833",1065],
["100274005",820],
["1353180",3160],
["481584",3670],
["4851845845",1690],
["485584",1310],
["48848448",1000],
["49849948",1050],
["585598",4620],
["84154858584",620],
["841584584",2050],
["8451184584",2860],
["845188956",1800],
["845845184",1300],
["8458484",2300],
["8954884",1780],
["9416481584",2720],
["9448155584",1000],
["94949494",1000],
["959494158",1590],
["98558858",1550]
]


targetAmmount = 10000
groupMaxSize = 6


def createGroup(current, remaining_data):
    currentAmmount = sum(e[1] for e in current)
    missingAmmount = targetAmmount - currentAmmount

    if len(current) >= groupMaxSize:
        if currentAmmount == targetAmmount:
            yield current
        else:
            yield None
    

    possibilities = sorted(filter(lambda e: e[1] <= missingAmmount, remaining_data), key=lambda e: e[1], reverse=True)
    for possibility in possibilities:
        next_current = [*current, possibility]
        next_remaining = list(filter(lambda e: e!=possibility, possibilities))


        if sum(e[1] for e in next_current) == targetAmmount:
            yield next_current
        else:
            for res in createGroup(next_current, next_remaining):
                yield res



res = []
ungrouped = []

while len(data) != 0:
    first = [data.pop(0)]

    groupeCreated = False
    for group in createGroup(first, data):
        if group != None:
            groupeCreated = True
            res.append(group)
            for elem in group:
                if elem in data:
                    data.remove(elem)
            break

    if not groupeCreated:
        ungrouped.append(first[0])


print("Res : \n", res)
print("Ungrouped : \n", ungrouped)
Sign up to request clarification or add additional context in comments.

Comments

1

The below code works for me. In jupyter lab on a 10 year old laptop it executes in 0.4 seconds.

from itertools import combinations, chain, islice

data=[["10000025",710],["1000138833",1065],["100274005",820],["1353180",3160],["481584",3670],["4851845845",1690],["485584",1310],["48848448",1000],["49849948",1050],["585598",4620],["84154858584",620],["841584584",2050],["8451184584",2860],["845188956",1800],["845845184",1300],["8458484",2300],["8954884",1780],["9416481584",2720],["9448155584",1000],["94949494",1000],["959494158",1590],["98558858",1550]]
limit = 10000

def data_sum(xs):
    global limit # using global to avoid having to do multiple argument passing in filter
    return (sum(x[1] for x in xs)) == limit

def data_uptil(xs):
    global limit
    return (sum(x[1] for x in xs)) < limit

match_limit = chain.from_iterable([(filter(data_sum,combinations(data,k))) for k in range(2,len(data)+1)])
uptil_limit = (filter(data_uptil,combinations(data,6)))

print("Exact matchest that sum to limit")
for m in match_limit:
    print(list(m))

print("Samples with 6 elements that sum to less than limit")
for u in uptil_limit:
    print(list(u))

A few notes:

  • If you run with pypy it might be faster, although the itertools are pretty fast on their own.

  • You said all possible combinations, so I included combinations of size 2 up to the size of the input list

  • I believe your code will not produce correct results as it is summing up from a sorted list as it goes. This would not work if limit = 10 and the list is (2,4,7,8) because it would not find (2,8).

  • match_limit and uptil_limit generators are created almost instantaneous using len consumes the list and will take processing time, for example print(len(list(match_limit))) which gives 278 possible combinations in 4.2 seconds on my 10 year old laptop

  • you can use islice if you want just a few answers, for example for m in islice(match_limit,5):

  • your choice if you want a dict or a list. this also works: print(dict(m))

  • as noted, it is the calling of list that will consume the generator this means results start be produced immediately as they are found.

2 Comments

If I am not wrong, this is giving an output of only a single possible combination in the list of list, and if you see the image in question, it needs to have a unique combination whose sum is 10000 and leftover whose sum can't be 10000 could be divided in chunks whose sum would be less then 10000.
The code checks every possible combination of k elements of the list where 1 < k < len(k)+1. That's the for k in in match_limit. It is an exhaustive search of all possible combinations. If there are no results, you could put in an if statement to use the uptil_limit and then just choose the maximum from that set.
0

Here's another version. It forces full evaluation of the lists to allow for the conditions from OP, I believe.

from itertools import combinations, chain

# data = [["10000025",710],["1000138833",1065],["100274005",820],["1353180",3160],["481584",3670],["4851845845",1690],["485584",1310],["48848448",1000],["49849948",1050],["585598",4620],["84154858584",620],["841584584",2050],["8451184584",2860],["845188956",1800],["845845184",1300],["8458484",2300],["8954884",1780],["9416481584",2720],["9448155584",1000],["94949494",1000],["959494158",1590],["98558858",1550]]
data = [["10000025",71],["1000138833",165],["100274005",820],["1353180",160]]

limit = 10000

def data_sum(xs):
    global limit # using global to avoid having to do multiple argument passing in filter
    return (sum(x[1] for x in xs)) == limit

def my_sum(xs):
    return (sum(x[1] for x in xs))

match_limit = chain.from_iterable([filter(data_sum,combinations(data2,k)) for k in range(2,len(data)+1)])
match_limit_list = list(match_limit)
if len(match_limit_list) > 0:
    [print(m) for m in match_limit_list]
else:
    uptil_limit_list = list(chain.from_iterable([list(combinations(data2,k)) for k in range(2,7)]))
    uptil_limit_max = sorted(uptil_limit_list, key=my_sum, reverse=True)
    [print(u) for u in uptil_limit_max]

For the test data that does not sum to limit, the output is:

(['10000025', 71], ['1000138833', 165], ['100274005', 820], ['1353180', 160])
(['1000138833', 165], ['100274005', 820], ['1353180', 160])
(['10000025', 71], ['1000138833', 165], ['100274005', 820])
(['10000025', 71], ['100274005', 820], ['1353180', 160])
(['1000138833', 165], ['100274005', 820])
(['100274005', 820], ['1353180', 160])
(['10000025', 71], ['100274005', 820])
(['10000025', 71], ['1000138833', 165], ['1353180', 160])
(['1000138833', 165], ['1353180', 160])
(['10000025', 71], ['1000138833', 165])
(['10000025', 71], ['1353180', 160])

2 Comments

You still didn't got the point of expected output cause your code is just making combinations try to the run the other answer posted by @Xiidref it is slow but gives correct output where as your does not.
I see, OP wants to extract from the data set a subset of up to 6 elements that sum to 10,000. And then run again against that reduced set, and then print remaining. Is that correct?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.