1

I have a dictionary containing strings as keys and the number of times they occur in a file as values. I am trying to find a way to find the strings that differ by one character and then remove the string with the lowest count from the dictionary.

From this:

dictionary = {'ATAA':5, 'GGGG':34, 'TTTT':34, 'AGAA':1}

To this:

new_dictionary = {'ATAA':5, 'GGGG':34, 'TTTT':34}

The dictionary is huge, so I am trying to find an efficient way to solve this. Any suggestions of how one could solve it would be super appreciated.

2

1 Answer 1

2

This would be my homemade recipe. First, we gather all the keys with a unique character. Then we sort this new dictionary by keys. In your case we will end up with {'AGAA': 1, 'ATAA': 5} which means we can take AGAA and delete it from the dictionary.

import collections
dic = {'ATAA':5, 'GGGG':34, 'TTTT':34, 'AGAA':1}
del dic[list({k: v for k, v in sorted({k:v for k,v in dic.items() if len(set(k)) == 2}.items(), key=lambda item: item[1])}.keys())[0]]

output

{'ATAA': 5, 'GGGG': 34, 'TTTT': 34}

Now now there is more. What if you had some keys with similar values. The above code will not work. I spent the last couple minutes baking up some new code.

I'll break it down.

import collections
from collections import defaultdict
#----------
#This will give us {'ATAA': 5, 'AGAA': 5}, we have located the different keys
dictionary = {'ATAA':5, 'GGGG':34, 'TTTT':34, 'AGAA':5}
lowest =  {k: v for k, v in sorted({k:v for k,v in dictionary.items() if len(set(k)) == 2}.items(), key=lambda item: item[1])}
#----------
#This will give us ['ATAA', 'AGAA']. Checks for all keys with similar values.
grouped = defaultdict(list)
for key in lowest:grouped[lowest[key]].append(key)
simKeys = min(grouped.values(), key=len)
#----------
#This will check if we have to delete many keys or just one
if len(simKeys) > 1:x = {k:v for k,v in dictionary.items() if k not in simKeys}
if len(simKeys) == 1:del dictionary[list(lowest.keys())[0]]
#----------
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks BuddyBob! What if I have the following dictionary: dictionary = {'ATAA':53, 'GGGG':34, 'GCGG':3, 'AGAA':5}. Then I would like the following output: dictionary = {'ATAA': 53, 'GGGG': 34}. In the solution you made only one comparison is made.
Why would it be that? GCGG is the lowest different key. My output is {'ATAA': 53, 'GGGG': 34, 'AGAA': 5}

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.