Remove duplicates in a list while keeping its order (Python)

Question

This is actually an extension of this question. The answers of that question did not keep the "order" of the list after removing duplicates. How to remove these duplicates in a list (python)

biglist = 

[ 

    {'title':'U2 Band','link':'u2.com'}, 
    {'title':'Live Concert by U2','link':'u2.com'},
    {'title':'ABC Station','link':'abc.com'}

]

In this case, the 2nd element should be removed because a previous "u2.com" element already exists. However, the order should be kept.

egafni · Accepted Answer · 2014-08-29 00:59:31Z

39

use set(), then re-sort using the index of the original list.

>>> mylist = ['c','a','a','b','a','b','c']
>>> sorted(set(mylist), key=lambda x: mylist.index(x))
['c', 'a', 'b']

answered Aug 29, 2014 at 0:59

egafni

1,9881 gold badge16 silver badges11 bronze badges

This is fantastic! Exactly what I was looking for. Would you mind explaining how it works? (with the use of lambda etc) Thanks
– user4587874
Commented Oct 1, 2015 at 15:44
2

Neat Python code. The downside is that it causes an extra sort, thus an unneeded O(n * log(n)) (where otherwise O(n) would be sufficient).
– yprez
Commented Nov 1, 2015 at 12:48
3

Worse, it calls .index for each distinct element, which does a linear search, so it's an unneeded O(n^2).
– kaya3
Commented Apr 23, 2021 at 21:42

Add a comment |

Alex Martelli · Accepted Answer · 2009-10-11 01:11:15Z

26

My answer to your other question, which you completely ignored!, shows you're wrong in claiming that

The answers of that question did not keep the "order"

my answer did keep order, and it clearly said it did. Here it is again, with added emphasis to see if you can just keep ignoring it...:

Probably the fastest approach, for a really big list, if you want to preserve the exact order of the items that remain, is the following...:

biglist = [ 
    {'title':'U2 Band','link':'u2.com'}, 
    {'title':'ABC Station','link':'abc.com'}, 
    {'title':'Live Concert by U2','link':'u2.com'} 
]

known_links = set()
newlist = []

for d in biglist:
  link = d['link']
  if link in known_links: continue
  newlist.append(d)
  known_links.add(link)

biglist[:] = newlist

answered Oct 11, 2009 at 1:11

Alex Martelli

884k174 gold badges1.2k silver badges1.4k bronze badges

3

Hey Alex, out of curiosity why do you put the [:] on the left hand side of the assignment? I've usually seen it on the RHS. Is it just personal preference? Looking at it at first I wasn't even sure what it would do, haha.
– xitrium
Commented Sep 8, 2011 at 3:07
3

@xitrium Using [:] on the left replaced all the items in the list, instead of the list itself. It could have an effect e.g. if you do this inside a function with a list that is passed in: if you change the list it's changed outside the function, if you replace it then the outside list is unaffected). In this particular case, there are no observable effect that I can see.
– Mark
Commented Oct 23, 2015 at 15:00

Add a comment |

Jochen Ritzel · Accepted Answer · 2009-10-11 01:19:01Z

13

Generators are great.

def unique( seq ):
    seen = set()
    for item in seq:
        if item not in seen:
            seen.add( item )
            yield item

biglist[:] = unique( biglist )

edited Oct 11, 2009 at 1:19

answered Oct 11, 2009 at 1:11

Jochen Ritzel

108k33 gold badges204 silver badges195 bronze badges

1

This is what I needed for my problem. I would suggest making it more generic adding key=lambda item: item to the method signature. Then, use key(item) for the set.
– Harvey
Commented Jan 20, 2017 at 18:31

Add a comment |

Tarnay Kálmán · Accepted Answer · 2009-10-22 21:31:38Z

This page discusses different methods and their speeds: http://www.peterbe.com/plog/uniqifiers-benchmark

The recommended* method:

def f5(seq, idfun=None):  
    # order preserving 
    if idfun is None: 
        def idfun(x): return x 
    seen = {} 
    result = [] 
    for item in seq: 
        marker = idfun(item) 
        # in old Python versions: 
        # if seen.has_key(marker) 
        # but in new ones: 
        if marker in seen: continue 
        seen[marker] = 1 
        result.append(item) 
    return result

f5(biglist,lambda x: x['link'])

*by that page

rools · Accepted Answer · 2014-03-19 23:20:46Z

3

This is an elegant and compact way, with list comprehension (but not as efficient as with dictionary):

mylist = ['aaa','aba','aaa','aea','baa','aaa','aac','aaa',]

[ v for (i,v) in enumerate(mylist) if v not in mylist[0:i] ]

And in the context of the answer:

[ v for (i,v) in enumerate(biglist) if v['link'] not in map(lambda d: d['link'], biglist[0:i]) ]

answered Mar 19, 2014 at 23:20

rools

1,67515 silver badges24 bronze badges

Add a comment |

Peter · Accepted Answer · 2009-10-11 01:08:24Z

1

dups = {}
newlist = []
for x in biglist:
    if x['link'] not in dups:
      newlist.append(x)
      dups[x['link']] = None

print newlist

produces

[{'link': 'u2.com', 'title': 'U2 Band'}, {'link': 'abc.com', 'title': 'ABC Station'}]

Note that here I used a dictionary. This makes the test not in dups much more efficient than using a list.

edited Oct 11, 2009 at 1:08

answered Oct 11, 2009 at 0:59

Peter

133k53 gold badges183 silver badges214 bronze badges

1

You're wrong about checking in a dict being faster than in a set (lists are a completely different matter).
– Alex Martelli
Commented Oct 11, 2009 at 1:07
ok, fixed, thanks. I guess set is probably implemented with a hash.
– Peter
Commented Oct 11, 2009 at 1:08

Add a comment |

Arkistarvh Kltzuonstev · Accepted Answer · 2019-09-09 08:35:29Z

1

Try this :

list = ['aaa','aba','aaa','aea','baa','aaa','aac','aaa',]
uniq = []
for i in list:
               if i not in uniq:
                   uniq.append(i)

print list
print uniq

output will be :

['aaa', 'aba', 'aaa', 'aea', 'baa', 'aaa', 'aac', 'aaa']
['aaa', 'aba', 'aea', 'baa', 'aac']

edited Sep 9, 2019 at 8:35

Arkistarvh Kltzuonstev

6,9357 gold badges30 silver badges61 bronze badges

answered Mar 5, 2012 at 18:50

falco

111 bronze badge

Add a comment |

Greg Hewgill · Accepted Answer · 2009-10-11 00:55:38Z

0

A super easy way to do this is:

def uniq(a):
    if len(a) == 0:
        return []
    else:
        return [a[0]] + uniq([x for x in a if x != a[0]])

This is not the most efficient way, because:

it searches through the whole list for every element in the list, so it's O(n^2)
it's recursive so uses a stack depth equal to the length of the list

However, for simple uses (no more than a few hundred items, not performance critical) it is sufficient.

answered Oct 11, 2009 at 0:55

Greg Hewgill

997k192 gold badges1.2k silver badges1.3k bronze badges

Can anyone come up with a way that is scalable?
– TIMEX
Commented Oct 11, 2009 at 0:59

Add a comment |

ABentSpoon · Accepted Answer · 2009-10-11 00:59:49Z

0

I think using a set should be pretty efficent.

seen_links = set()
for index in len(biglist):
    link = biglist[index]['link']
    if link in seen_links:
        del(biglist[index])
    seen_links.add(link)

I think this should come in at O(nlog(n))

answered Oct 11, 2009 at 0:59

ABentSpoon

5,1891 gold badge29 silver badges24 bronze badges

2

in fact it is O(n^2) because del on a list is O(n)
– Xavier Combelle
Commented Apr 1, 2011 at 14:33

Add a comment |

Collectives™ on Stack Overflow

Remove duplicates in a list while keeping its order (Python)

9 Answers 9

Linked

Hot Network Questions

Collectives™ on Stack Overflow

9 Answers 9

Linked

Related