Before Python 2.3, use dict() :
>>> lst = [1, 2, 3, 5, 6, 7, 5, 2]
>>> stats = {}
>>> for x in lst : # count occurrences of each letter:
... stats[x] = stats.get(x, 0) + 1
>>> print stats
{1: 1, 2: 2, 3: 1, 5: 2, 6: 1, 7: 1} # filter letters appearing more than once:
>>> duplicates = [dup for (dup, i) in stats.items() if i > 1]
>>> print duplicates
So a function :
def getDuplicates(iterable):
"""
Take an iterable and return a generator yielding its duplicate items.
Items must be hashable.
e.g :
>>> sorted(list(getDuplicates([1, 2, 3, 5, 6, 7, 5, 2])))
[2, 5]
"""
stats = {}
for x in iterable :
stats[x] = stats.get(x, 0) + 1
return (dup for (dup, i) in stats.items() if i > 1)
With Python 2.3 comes set(), and it's even a built-in after than :
def getDuplicates(iterable):
"""
Take an iterable and return a generator yielding its duplicate items.
Items must be hashable.
e.g :
>>> sorted(list(getDuplicates([1, 2, 3, 5, 6, 7, 5, 2])))
[2, 5]
"""
try: # try using built-in set
found = set()
except NameError: # fallback on the sets module
from sets import Set
found = Set()
for x in iterable:
if x in found : # set is a collection that can't contain duplicate
yield x
found.add(x) # duplicate won't be added anyway
With Python 2.7 and above, you have the collections
module providing the very same function than the dict one, and we can make it shorter (and faster, it's probably C under the hood) than solution 1 :
import collections
def getDuplicates(iterable):
"""
Take an iterable and return a generator yielding its duplicate items.
Items must be hashable.
e.g :
>>> sorted(list(getDuplicates([1, 2, 3, 5, 6, 7, 5, 2])))
[2, 5]
"""
return (dup for (dup, i) in collections.counter(iterable).items() if i > 1)
I'd stick with solution 2.