1

I have 3 lists and I am trying to find matching combinations.

mylist1 = ["a", "b", "c", "d", "e", "f", "x", "y", "p"]
mylist2 = ["a", "b", "c", "d", "p", "q"]
mylist3 = ["a", "b", "c", "d", "e", "f", "g", "h", "q"]

g, h, x and y do not have any match so I will discard them. The result ["a", "b", "c" ] 3 is valid but I need to discard that as well because that is the subset of ["a", "b", "c", "d"] 3 The expected output is:

["a", "b", "c", "d"] 3
["a", "b", "c", "d", "e", "f"] 2
["a", "b", "c", "d", "p"] 2
["a", "b", "c", "d", "q"] 2
6
  • Just to make sure I am reading this right, you want the largest set that contains letters that occur in at least two arrays? Are you guaranteed that each item will show up at most one time in each array? Commented Dec 9, 2020 at 3:51
  • @Jon Yes and Yes. Commented Dec 9, 2020 at 4:51
  • What's your question exactly? Do you want to ask how to do it? If so, what have you already tried? Commented Dec 9, 2020 at 17:40
  • 1
    What's the expected output if I swap the "b" and "c" in mylist3? Commented Dec 9, 2020 at 17:56
  • Output will be the same. My list contains unique, sorted values. Commented Dec 10, 2020 at 2:57

4 Answers 4

2
+50

Supposing you requirement is: You don't want to see anything which occurs only once - But only want to display anything that is at least common in two of your lists.

First you need to figure out how many combinations you can choose from your lists. Here you have 3 lists --> that is 4 combinations - itertools.combinations can help with that

Then you need to define the combinations and intersect them one by one see it below:

import itertools
from functools import reduce

mylist1 = ["a", "b", "c", "d", "e", "f", "x", "y", "p"]
mylist2 = ["a", "b", "c", "d", "p", "q"]
mylist3 = ["a", "b", "c", "d", "e", "f", "g", "h", "q"]

def definer(*args):
    # Number of lists for input
    counter = len(args)
    my_outputs = []
    # Only collecting where values are at least in two lists:
    for i in range(2, counter+1):
        x = (g for g in itertools.combinations(args, i))
        for item in x:
            result = reduce(set.intersection, (set(a) for a in item))

            my_outputs.append([sorted(list(result)), i])
    return my_outputs

print(definer(mylist1,mylist2,mylist3))
Sign up to request clarification or add additional context in comments.

Comments

2
s1 = set(mylist1)
s2 = set(mylist2)
s3 = set(mylist3)

print (s1.intersection(s2).intersection(s3), 3)
print (s1.intersection(s2), 2)
print (s1.intersection(s3), 2)
print (s2.intersection(s3), 2)

Output:

{'a', 'b', 'd', 'c'} 3
{'d', 'c', 'a', 'p', 'b'} 2
{'d', 'f', 'c', 'a', 'e', 'b'} 2
{'d', 'c', 'a', 'b', 'q'} 2

2 Comments

Will it scale if I have a few hundred (if not thousands) of lists? This is a sample of original data. Should I compare each set with all other sets with an increment of 2,3,4...?
for large dataset, i think you should use Trie
0

This will get you the expected output

mylist1 = ["a", "b", "c", "d", "e", "f", "x", "y", "p"]
mylist2 = ["a", "b", "c", "d", "p", "q"]
mylist3 = ["a", "b", "c", "d", "e", "f", "g", "h", "q"]

s1 = set(mylist1)
s2 = set(mylist2)
s3 = set(mylist3)

print(sorted(list(s1.intersection(s2).intersection(s3))), 3)
print(sorted(list(s1.intersection(s3))), 2)
print(sorted(list(s1.intersection(s2))), 2)
print(sorted(list(s2.intersection(s3))), 2)

First, convert the list to set. then do intersection with the sets, then convert that into the list, then sort it.

Comments

0
import itertools

def main(T0:list[set]):
    for i in range(2,len(T0)+1):
        for o1 in itertools.combinations(T0,i):
            t1=o1[0].copy()
            for o2 in o1[1:]:
                t1&=o2
            n1=0
            for o3 in T0:
                n1+=t1.issubset(o3)
            print(t1,n1)


main([
    {"a", "b", "c", "d", "e", "f", "x", "y", "p"},
    {"a", "b", "c", "d", "p", "q"},
    {"a", "b", "c", "d", "e", "f", "g", "h", "q"},
    ])

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.