How to find duplicate elements in array using for loop in Python?

Question

I have a list with duplicate elements:

 list_a=[1,2,3,5,6,7,5,2]

 tmp=[]

 for i in list_a:
     if tmp.__contains__(i):
         print i
     else:
         tmp.append(i)

I have used the above code to find the duplicate elements in the list_a. I don't want to remove the elements from list.

But I want to use for loop here. Normally C/C++ we use like this I guess:

 for (int i=0;i<=list_a.length;i++)
     for (int j=i+1;j<=list_a.length;j++)
         if (list_a[i]==list_a[j])
             print list_a[i]

how do we use like this in Python?

for i in list_a:
    for j in list_a[1:]:
    ....

I tried the above code. But it gets solution wrong. I don't know how to increase the value for j.

YOU · Accepted Answer · 2009-12-17 08:43:58Z

67

Just for information, In python 2.7+, we can use Counter

import collections

x=[1, 2, 3, 5, 6, 7, 5, 2]

>>> x
[1, 2, 3, 5, 6, 7, 5, 2]

>>> y=collections.Counter(x)
>>> y
Counter({2: 2, 5: 2, 1: 1, 3: 1, 6: 1, 7: 1})

Unique List

>>> list(y)
[1, 2, 3, 5, 6, 7]

Items found more than 1 time

>>> [i for i in y if y[i]>1]
[2, 5]

Items found only one time

>>> [i for i in y if y[i]==1]
[1, 3, 6, 7]

edited Dec 17, 2009 at 8:43

answered Dec 17, 2009 at 8:36

YOU

124k34 gold badges190 silver badges222 bronze badges

3

[n for n, i in y.iteritems() if i > 1] instead, and i == 1.
– Roger Pate
Commented Dec 17, 2009 at 8:40
...but why the list(y), isn't Counter iterable ?
– LeMiz
Commented Dec 17, 2009 at 8:41
@Roger Pate, thanks, yours is no need to do dict lookup, it could be better.
– YOU
Commented Dec 17, 2009 at 8:50
The only drawbacks of Counter is that it doesn't terminate early if duplicates are found early in a big list and it doesn't work for infinite iterator.
– Lie Ryan
Commented Sep 25, 2012 at 8:11
My solution: for k, v in x.most_common(): if v > 1: print k
– luistm
Commented Jul 3, 2014 at 13:44

Add a comment |

score 28 · Accepted Answer · 2009-12-17 08:26:30Z

Use the in operator instead of calling __contains__ directly.

What you have almost works (but is O(n**2)):

for i in xrange(len(list_a)):
  for j in xrange(i + 1, len(list_a)):
    if list_a[i] == list_a[j]:
      print "duplicate:", list_a[i]

But it's far easier to use a set (roughly O(n) due to the hash table):

seen = set()
for n in list_a:
  if n in seen:
    print "duplicate:", n
  else:
    seen.add(n)

Or a dict, if you want to track locations of duplicates (also O(n)):

import collections
items = collections.defaultdict(list)
for i, item in enumerate(list_a):
  items[item].append(i)
for item, locs in items.iteritems():
  if len(locs) > 1:
    print "duplicates of", item, "at", locs

Or even just detect a duplicate somewhere (also O(n)):

if len(set(list_a)) != len(list_a):
  print "duplicate"

Evan Fosmark · Accepted Answer · 2009-12-17 08:14:26Z

22

You could always use a list comprehension:

dups = [x for x in list_a if list_a.count(x) > 1]

answered Dec 17, 2009 at 8:14

Evan Fosmark

102k36 gold badges108 silver badges118 bronze badges

3

This traverses the list once for each element (Although, OP's code is O(N**2), too).
– Alok Singhal
Commented Dec 17, 2009 at 8:41
Yeah, I understood it's inefficient. If the OP is looking for that, he should go with Roger's answers for sure.
– Evan Fosmark
Commented Dec 17, 2009 at 10:23
2

I think this is slightly more efficient: [x for i,x in enumerate(list_a) if list_a[i:].count(x) > 1]
– dMb
Commented Aug 12, 2011 at 19:50
this will return a list with duplicates as well as list_a.count(x) > 1 will return True for each occurence of the element. I'd use set() to get unique duplicates
– dmitko
Commented Nov 8, 2012 at 14:21

Add a comment |

Bite code · Accepted Answer · 2009-12-17 09:36:28Z

Before Python 2.3, use dict() :

>>> lst = [1, 2, 3, 5, 6, 7, 5, 2]
>>> stats = {}
>>> for x in lst : # count occurrences of each letter:
...     stats[x] = stats.get(x, 0) + 1 
>>> print stats
{1: 1, 2: 2, 3: 1, 5: 2, 6: 1, 7: 1} # filter letters appearing more than once:
>>> duplicates = [dup for (dup, i) in stats.items() if i > 1] 
>>> print duplicates

So a function :

def getDuplicates(iterable):
    """
       Take an iterable and return a generator yielding its duplicate items.
       Items must be hashable.

       e.g :

       >>> sorted(list(getDuplicates([1, 2, 3, 5, 6, 7, 5, 2])))
       [2, 5]
    """
    stats = {}
    for x in iterable : 
        stats[x] = stats.get(x, 0) + 1
    return (dup for (dup, i) in stats.items() if i > 1)

With Python 2.3 comes set(), and it's even a built-in after than :

def getDuplicates(iterable):
    """
       Take an iterable and return a generator yielding its duplicate items.
       Items must be hashable.

       e.g :

       >>> sorted(list(getDuplicates([1, 2, 3, 5, 6, 7, 5, 2])))
       [2, 5]
    """
    try: # try using built-in set
        found = set() 
    except NameError: # fallback on the sets module
        from sets import Set
        found = Set()

    for x in iterable:
        if x in found : # set is a collection that can't contain duplicate
            yield x
        found.add(x) # duplicate won't be added anyway

With Python 2.7 and above, you have the collections module providing the very same function than the dict one, and we can make it shorter (and faster, it's probably C under the hood) than solution 1 :

import collections

def getDuplicates(iterable):
    """
       Take an iterable and return a generator yielding its duplicate items.
       Items must be hashable.

       e.g :

       >>> sorted(list(getDuplicates([1, 2, 3, 5, 6, 7, 5, 2])))
       [2, 5]
    """
    return (dup for (dup, i) in collections.counter(iterable).items() if i > 1)

I'd stick with solution 2.

kalehmann · Accepted Answer · 2019-06-09 19:06:05Z

7

You can use this function to find duplicates:

def get_duplicates(arr):
    dup_arr = arr[:]
    for i in set(arr):
        dup_arr.remove(i)       
    return list(set(dup_arr))

Examples

print get_duplicates([1,2,3,5,6,7,5,2])

[2, 5]

print get_duplicates([1,2,1,3,4,5,4,4,6,7,8,2])

[1, 2, 4]

edited Jun 9, 2019 at 19:06

kalehmann

5,0516 gold badges27 silver badges36 bronze badges

answered Jun 26, 2012 at 5:21

ASKN

3562 silver badges10 bronze badges

Add a comment |

Alok Singhal · Accepted Answer · 2009-12-17 08:12:50Z

3

If you're looking for one-to-one mapping between your nested loops and Python, this is what you want:

n = len(list_a)
for i in range(n):
    for j in range(i+1, n):
        if list_a[i] == list_a[j]:
            print list_a[i]

The code above is not "Pythonic". I would do it something like this:

seen = set()
for i in list_a:
   if i in seen:
       print i
   else:
       seen.add(i)

Also, don't use __contains__, rather, use in (as above).

answered Dec 17, 2009 at 8:12

Alok Singhal

96.4k21 gold badges130 silver badges158 bronze badges

Add a comment |

LeMiz · Accepted Answer · 2009-12-17 08:35:43Z

2

The following requires the elements of your list to be hashable (not just implementing __eq__ ). I find it more pythonic to use a defaultdict (and you have the number of repetitions for free):

import collections
l = [1, 2, 4, 1, 3, 3]
d = collections.defaultdict(int)
for x in l:
   d[x] += 1
print [k for k, v in d.iteritems() if v > 1]
# prints [1, 3]

answered Dec 17, 2009 at 8:35

LeMiz

5,6345 gold badges29 silver badges23 bronze badges

Add a comment |

Zoran Pavlovic · Accepted Answer · 2016-01-28 07:42:36Z

2

Using only itertools, and works fine on Python 2.5

from itertools import groupby
list_a = sorted([1, 2, 3, 5, 6, 7, 5, 2])
result = dict([(r, len(list(grp))) for r, grp in groupby(list_a)])

Result:

{1: 1, 2: 2, 3: 1, 5: 2, 6: 1, 7: 1}

answered Jan 28, 2016 at 7:42

Zoran Pavlovic

1,2303 gold badges24 silver badges38 bronze badges

Add a comment |

David Andrei Ned · Accepted Answer · 2016-10-14 14:11:02Z

1

It looks like you have a list (list_a) potentially including duplicates, which you would rather keep as it is, and build a de-duplicated list tmp based on list_a. In Python 2.7, you can accomplish this with one line:

tmp = list(set(list_a))

Comparing the lengths of tmp and list_a at this point should clarify if there were indeed duplicate items in list_a. This may help simplify things if you want to go into the loop for additional processing.

edited Oct 14, 2016 at 14:11

David Andrei Ned

8191 gold badge12 silver badges29 bronze badges

answered Jan 27, 2013 at 2:55

Ahmer Kureishi

316 bronze badges

Add a comment |

Fire Lancer · Accepted Answer · 2009-12-17 09:15:23Z

0

You could just "translate" it line by line.

c++

for (int i=0;i<=list_a.length;i++)
    for (int j=i+1;j<=list_a.length;j++)
        if (list_a[i]==list_a[j])
            print list_a[i]

Python

for i in range(0, len(list_a)):
    for j in range(i + 1, len(list_a))
        if list_a[i] == list_a[j]:
            print list_a[i]

c++ for loop:

for(int x = start; x < end; ++x)

Python equivalent:

for x in range(start, end):

edited Dec 17, 2009 at 9:15

Roger Pate

answered Dec 17, 2009 at 8:13

Fire Lancer

30.2k34 gold badges123 silver badges185 bronze badges

3

You should not accept this answer. Yes, it's valid code, but it's not the way you should code in Python. Don't code Python like C/C++, or Java. They are not the same languages, and are not meant to be used the same way.
– Bite code
Commented Dec 17, 2009 at 9:27
I agree with e-satis, although the the question specifically tries to compare the routine to C/C++ we should try to nudge it in the right direction.
– Mizipzor
Commented Dec 17, 2009 at 9:43

Add a comment |

Komu · Accepted Answer · 2013-09-26 09:15:23Z

0

Just quick and dirty,

list_a=[1,2,3,5,6,7,5,2] 
holding_list=[]

for x in list_a:
    if x in holding_list:
        pass
    else:
        holding_list.append(x)

print holding_list

Output [1, 2, 3, 5, 6, 7]

answered Sep 26, 2013 at 9:15

Komu

15.1k2 gold badges32 silver badges23 bronze badges

Add a comment |

Juh_ · Accepted Answer · 2013-10-29 15:43:43Z

0

Using numpy:

import numpy as np
count,value = np.histogram(list_a,bins=np.hstack((np.unique(list_a),np.inf)))
print 'duplicate value(s) in list_a: ' + ', '.join([str(v) for v in value[count>1]])

edited Oct 29, 2013 at 15:43

answered Apr 4, 2012 at 7:42

Juh_

15.7k8 gold badges65 silver badges103 bronze badges

Add a comment |

Prashant Lakhera · Accepted Answer · 2017-01-26 01:00:27Z

0

In case of Python3 and if you two lists

def removedup(List1,List2):
    List1_copy = List1[:]
        for i in List1_copy:
            if i in List2:
                List1.remove(i)

List1 = [4,5,6,7]
List2 = [6,7,8,9]
removedup(List1,List2)
print (List1)

answered Jan 26, 2017 at 1:00

Prashant Lakhera

8807 silver badges14 bronze badges

Add a comment |

Make42 · Accepted Answer · 2017-07-10 15:23:40Z

0

Granted, I haven't done tests, but I guess it's going to be hard to beat pandas in speed:

 pd.DataFrame(list_a, columns=["x"]).groupby('x').size().to_dict()

answered Jul 10, 2017 at 15:23

Make42

13.2k28 gold badges91 silver badges168 bronze badges

Add a comment |

double-beep · Accepted Answer · 2019-06-09 12:19:17Z

0

You can use:

b=['E', 'P', 'P', 'E', 'O', 'E']
c={}
for i in b:
    value=0
    for j in b:
        if(i == j):
            value+=1
            c[i]=value
print(c)

Output:

{'E': 3, 'P': 2, 'O': 1}

edited Jun 9, 2019 at 12:19

double-beep

5,53719 gold badges40 silver badges49 bronze badges

answered Jun 9, 2019 at 12:15

Prince Vijay

91 bronze badge

Add a comment |

Zaheer Niazi · Accepted Answer · 2020-11-21 08:43:48Z

0

Find duplicates in the list using loops, conditional logic, logical operators, and list methods

some_list = ['a','b','c','d','e','b','n','n','c','c','h',]

duplicates = [] 

for values in some_list:

    if some_list.count(values) > 1:

        if values not in duplicates:

            duplicates.append(values)

print("Duplicate Values are : ",duplicates)

edited Nov 21, 2020 at 8:43

answered Nov 21, 2020 at 8:31

Zaheer Niazi

11 bronze badge

How to get this without using any python library such as count?@Zaheer
– Navi
Commented Nov 27, 2020 at 10:24

Add a comment |

Reza Ghorbani · Accepted Answer · 2021-09-19 14:59:42Z

0

Finding the number of repeating elements in a list:

myList = [3, 2, 2, 5, 3, 8, 3, 4, 'a', 'a', 'f', 4, 4, 1, 8, 'D']
listCleaned = set(myList)
for s in listCleaned:
    count = 0
    for i in myList:
        if s == i :
            count += 1
    print(f'total {s} => {count}')

edited Sep 19, 2021 at 14:59

answered Sep 19, 2021 at 10:45

Reza Ghorbani

11 bronze badge

Add a comment |

marc_s · Accepted Answer · 2021-10-12 14:54:34Z

0

Try like this:

list_a=[1,2,3,5,6,7,5,2]
unique_values = []
duplicates = []

for i in list_a:
    if i not in unique_values:
        unique_values.append(i)
    else:
        found = False
        for x in duplicates:
            if x.get("key") == i:
                found = True
        if found:
            x["occurrence"] += 1
        else:
            duplicates.append({
                "key": i,
                "occurrence": 1
            })

edited Oct 12, 2021 at 14:54

marc_s

757k184 gold badges1.4k silver badges1.5k bronze badges

answered Sep 19, 2021 at 10:36

ilyas Jumadurdyew

93011 silver badges24 bronze badges

Add a comment |

joanis · Accepted Answer · 2021-11-23 13:23:15Z

0

some_string= list(input("Enter any string:\n"))
count={}
dup_count={}
for i in some_string:
    if i not in count:
        count[i]=1
    else:
        count[i]+=1
        dup_count[i]=count[i]
print("Duplicates of given string are below:\n",dup_count)

edited Nov 23, 2021 at 13:23

joanis

12.5k23 gold badges37 silver badges48 bronze badges

answered Nov 14, 2021 at 11:33

Dr.Raj Kulkarni

12 bronze badges

Add a comment |

fortran · Accepted Answer · 2009-12-17 11:24:14Z

-2

A little bit more Pythonic implementation (not the most, of course), but in the spirit of your C code could be:

for i, elem in enumerate(seq):
    if elem in seq[i+1:]:
        print elem

Edit: yes, it prints the elements more than once if there're more than 2 repetitions, but that's what the op's C pseudo code does too.

edited Dec 17, 2009 at 11:24

answered Dec 17, 2009 at 8:52

fortran

76.2k27 gold badges139 silver badges179 bronze badges

You must sort before doing that. Use sorted. What's more, you will print the same duplicate several times if there is more than one of the same.
– Bite code
Commented Dec 17, 2009 at 9:03
This will print the same element multiple times if it occurs more than 2 times in the list.
– mthurlin
Commented Dec 17, 2009 at 9:04
1

Have you guys bothered to read the op's code? It does the exactly the same. @e-satis There's no need to sort, maybe you meant something like [k for k, it in itertools.groupby(sorted(l)) if len(list(it)) > 1] ?
– fortran
Commented Dec 17, 2009 at 11:21

Add a comment |

Collectives™ on Stack Overflow

How to find duplicate elements in array using for loop in Python?

20 Answers 20

Examples

Output [1, 2, 3, 5, 6, 7]

Linked

Hot Network Questions

Collectives™ on Stack Overflow

20 Answers 20

Examples

Output [1, 2, 3, 5, 6, 7]

Linked

Related