27

This is actually an extension of this question. The answers of that question did not keep the "order" of the list after removing duplicates. How to remove these duplicates in a list (python)

biglist = 

[ 

    {'title':'U2 Band','link':'u2.com'}, 
    {'title':'Live Concert by U2','link':'u2.com'},
    {'title':'ABC Station','link':'abc.com'}

]

In this case, the 2nd element should be removed because a previous "u2.com" element already exists. However, the order should be kept.

9 Answers 9

39

use set(), then re-sort using the index of the original list.

>>> mylist = ['c','a','a','b','a','b','c']
>>> sorted(set(mylist), key=lambda x: mylist.index(x))
['c', 'a', 'b']
3
  • This is fantastic! Exactly what I was looking for. Would you mind explaining how it works? (with the use of lambda etc) Thanks
    – user4587874
    Commented Oct 1, 2015 at 15:44
  • 2
    Neat Python code. The downside is that it causes an extra sort, thus an unneeded O(n * log(n)) (where otherwise O(n) would be sufficient).
    – yprez
    Commented Nov 1, 2015 at 12:48
  • 3
    Worse, it calls .index for each distinct element, which does a linear search, so it's an unneeded O(n^2).
    – kaya3
    Commented Apr 23, 2021 at 21:42
26

My answer to your other question, which you completely ignored!, shows you're wrong in claiming that

The answers of that question did not keep the "order"

  • my answer did keep order, and it clearly said it did. Here it is again, with added emphasis to see if you can just keep ignoring it...:

Probably the fastest approach, for a really big list, if you want to preserve the exact order of the items that remain, is the following...:

biglist = [ 
    {'title':'U2 Band','link':'u2.com'}, 
    {'title':'ABC Station','link':'abc.com'}, 
    {'title':'Live Concert by U2','link':'u2.com'} 
]

known_links = set()
newlist = []

for d in biglist:
  link = d['link']
  if link in known_links: continue
  newlist.append(d)
  known_links.add(link)

biglist[:] = newlist
2
  • 3
    Hey Alex, out of curiosity why do you put the [:] on the left hand side of the assignment? I've usually seen it on the RHS. Is it just personal preference? Looking at it at first I wasn't even sure what it would do, haha.
    – xitrium
    Commented Sep 8, 2011 at 3:07
  • 3
    @xitrium Using [:] on the left replaced all the items in the list, instead of the list itself. It could have an effect e.g. if you do this inside a function with a list that is passed in: if you change the list it's changed outside the function, if you replace it then the outside list is unaffected). In this particular case, there are no observable effect that I can see.
    – Mark
    Commented Oct 23, 2015 at 15:00
13

Generators are great.

def unique( seq ):
    seen = set()
    for item in seq:
        if item not in seen:
            seen.add( item )
            yield item

biglist[:] = unique( biglist )
1
  • 1
    This is what I needed for my problem. I would suggest making it more generic adding key=lambda item: item to the method signature. Then, use key(item) for the set.
    – Harvey
    Commented Jan 20, 2017 at 18:31
3

This page discusses different methods and their speeds: http://www.peterbe.com/plog/uniqifiers-benchmark

The recommended* method:

def f5(seq, idfun=None):  
    # order preserving 
    if idfun is None: 
        def idfun(x): return x 
    seen = {} 
    result = [] 
    for item in seq: 
        marker = idfun(item) 
        # in old Python versions: 
        # if seen.has_key(marker) 
        # but in new ones: 
        if marker in seen: continue 
        seen[marker] = 1 
        result.append(item) 
    return result

f5(biglist,lambda x: x['link'])

*by that page

3

This is an elegant and compact way, with list comprehension (but not as efficient as with dictionary):

mylist = ['aaa','aba','aaa','aea','baa','aaa','aac','aaa',]

[ v for (i,v) in enumerate(mylist) if v not in mylist[0:i] ]

And in the context of the answer:

[ v for (i,v) in enumerate(biglist) if v['link'] not in map(lambda d: d['link'], biglist[0:i]) ]
1
dups = {}
newlist = []
for x in biglist:
    if x['link'] not in dups:
      newlist.append(x)
      dups[x['link']] = None

print newlist

produces

[{'link': 'u2.com', 'title': 'U2 Band'}, {'link': 'abc.com', 'title': 'ABC Station'}]

Note that here I used a dictionary. This makes the test not in dups much more efficient than using a list.

2
  • 1
    You're wrong about checking in a dict being faster than in a set (lists are a completely different matter). Commented Oct 11, 2009 at 1:07
  • ok, fixed, thanks. I guess set is probably implemented with a hash.
    – Peter
    Commented Oct 11, 2009 at 1:08
1

Try this :

list = ['aaa','aba','aaa','aea','baa','aaa','aac','aaa',]
uniq = []
for i in list:
               if i not in uniq:
                   uniq.append(i)

print list
print uniq

output will be :

['aaa', 'aba', 'aaa', 'aea', 'baa', 'aaa', 'aac', 'aaa']
['aaa', 'aba', 'aea', 'baa', 'aac']
0

A super easy way to do this is:

def uniq(a):
    if len(a) == 0:
        return []
    else:
        return [a[0]] + uniq([x for x in a if x != a[0]])

This is not the most efficient way, because:

  • it searches through the whole list for every element in the list, so it's O(n^2)
  • it's recursive so uses a stack depth equal to the length of the list

However, for simple uses (no more than a few hundred items, not performance critical) it is sufficient.

1
  • Can anyone come up with a way that is scalable?
    – TIMEX
    Commented Oct 11, 2009 at 0:59
0

I think using a set should be pretty efficent.

seen_links = set()
for index in len(biglist):
    link = biglist[index]['link']
    if link in seen_links:
        del(biglist[index])
    seen_links.add(link)

I think this should come in at O(nlog(n))

1
  • 2
    in fact it is O(n^2) because del on a list is O(n) Commented Apr 1, 2011 at 14:33

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.