5

Say I have a dictionary of sets like this:

d = {'a': {1,2,8}, 'b': {3,1,2,6}, 'c': {0,4,1,2}, 'd': {9}, 'e': {2,5},
     'f': {4,8}, 'g': {0,9}, 'h': {7,2,3}, 'i': {5,6,3}, 'j': {4,6,8}}

Each set represents a subset of a total set s = set(range(10)). I would like an efficient algorithm to find the least amount of keys that make up the whole set, and return an empty list if it's not possible by any combinations of keys. If there are many possible combinations that have the least amount of keys to sum up to the whole set, I only need one combination and it can be any one of those.

So far I am using an exhaustive approach that checks all possible combinations and then take the combination that has the least amount of keys.

import copy
def append_combinations(combo, keys):
    for i in range(len(keys)):
        new_combo = copy.copy(combo)
        new_combo.append(keys[i])
        new_keys = keys[i+1:]
        if {n for k in new_combo for n in d[k]} == s:
            valid_combos.append(new_combo)
        append_combinations(new_combo, new_keys)

valid_combos = []
combo = []
keys = sorted(d.keys())
append_combinations(combo, keys)
sorted_combos = sorted(valid_combos, key=lambda x: len(x))
print(sorted_combos[0])
# ['a', 'c', 'd', 'h', 'i']

However, this becomes very expensive when the dictionary has many keys (in practice I will have around 100 keys). Any suggestions for a faster algorithm?

4
  • @ShaunHan For the sake of finding a solution to your question, the fact that you are looking for keys of a dict is actually not relevant in my understanding. In the spirit of a minimal, reproducible example, wouldn't it suffice to ask for finding the actual subsets, rather than the keys of your dict? I guess, this would be the gist of your problem. Commented May 27, 2025 at 13:21
  • @simon yeah I get it I can reduce the problem to just a list of sets, but I just want it easier to show the number of subsets in terms of number of keys Commented May 27, 2025 at 13:28
  • This is a set cover problem. It is known to be computationally intensive. In other words, you should focus on whether your actual task can be solved in a reasonable amount of time. How large is your dictionary exactly? Commented May 27, 2025 at 13:56
  • Is the total set always set(range(10))? Commented May 30, 2025 at 9:48

3 Answers 3

15

"Fast" is relative. For this size of input, it's quite fast to build up a sparse cover constraint matrix like this:

import time

import numpy as np
from scipy.optimize import milp, Bounds, LinearConstraint
from scipy.sparse import coo_array

sets = {
    'a': {1,2,8}, 'b': {3,1,2,6}, 'c': {0,4,1,2}, 'd': {9}, 'e': {2,5},
    'f': {4,8}, 'g': {0,9}, 'h': {7,2,3}, 'i': {5,6,3}, 'j': {4,6,8},
}
m = 1 + max(v for vset in sets.values() for v in vset)
n = len(sets)

# Variables: for each set, binary assignment.
cost = np.ones(n)  # minimize keys assigned
bounds = Bounds(lb=0, ub=1)  # binary variables

# For every value: it must be covered by at least one set
coords = np.array(
    [
        (y, x)
        for y in range(m)
        for x, vset in enumerate(sets.values())
        if y in vset
    ],
    dtype=np.int32,
).T
cover_constraint = LinearConstraint(
    A=coo_array((
        np.ones(coords.shape[1]),  # data
        coords,
    )).tocsc(),
    lb=1,
)

start = time.perf_counter()
result = milp(
    c=cost, integrality=1, bounds=bounds, constraints=cover_constraint,
)
end = time.perf_counter()
assert result.success

print(result.message)
print(f'Cost: {result.fun} in {(end - start)*1e3:.2f} ms')
print(
    'Assigned:',
    ' '.join(
        k
        for k, assigned in zip(sets.keys(), result.x)
        if assigned
    ),
)
Optimization terminated successfully. (HiGHS Status 7: Optimal)
Cost: 5.0 in 2.58 ms
Assigned: c g h i j

For 100 keys:

rand = np.random.default_rng(seed=0)
m = 10
n = 100
sets = {
    i: rand.choice(
        a=m, size=rand.integers(low=1, high=4, endpoint=True),
        replace=False, shuffle=False,
    )
    for i in range(n)
}
# ...
print(result.message)
print(f'Cost: {result.fun} in {(end - start)*1e3:.2f} ms')
print(
    'Assigned:',
    ' '.join(
        str(k)
        for k, assigned in zip(sets.keys(), result.x)
        if assigned > 0.5
    ),
)
for (k, vset), assigned in zip(sets.items(), result.x):
    if assigned > 0.5:
        print(f'{k}: {vset}')
Optimization terminated successfully. (HiGHS Status 7: Optimal)
Cost: 3.0 in 18.49 ms
Assigned: 0 82 99
0: [4 7 2 3]
82: [6 0 5 8]
99: [1 5 4 9]

It processes input of size ~1200 in about a second on my nothing-special laptop.

Sign up to request clarification or add additional context in comments.

Comments

2

This is the optimization version of the well-known Set Cover Problem. It is NP hard, so there is no efficient solution known. However, there are a number of heuristics available for it.

Comments

1

You're tackling a Set Cover Problem, which is NP-hard. Your current brute-force approach checks all subsets, which has exponential complexity—unsuitable for ~100 keys.

Greedy Implementation

def greedy_set_cover(d, s):
    uncovered = s.copy()
    result = []

    while uncovered:
        best_key = None
        best_covered = set()

        for key, subset in d.items():
            covered = uncovered & subset
            if len(covered) > len(best_covered):
                best_key = key
                best_covered = covered

        if not best_key:
            return []  # No key can cover remaining elements

        result.append(best_key)
        uncovered -= best_covered

    return result

1 Comment

Just tried your code and it returns ['b', 'c', 'a', 'd', 'e', 'h'] which is longer than the minimum amount of keys which is 5

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.