1

I'm trying to compare individual characters in a string in python and i'm not sure how to do it. In a file of strings, all strings belong to groups and I want to determine if 75% of the strings in a group have the same character at a given position, and if so, delete all of the strings getting compared to the original string.

I'm thinking something like the following, comparing char2 in the word big/bug:

count=0
group1_big
group1_big
group1_bigs
group1_bugs
group2_bug

for(string in file)
    if(chars 1-7 of string == chars 1-7 of next string & char 9 is the same in both words)
        if(75% are the same at position 9)
            delete all other strings in the same group

In this case, if we compare chars 1-7, all group1 matches, and 75% have and 'i' at character position 9 delete all but the first one. Resulting in the following file output:

group1_big
group2_bug
3
  • What about if we have 2 group of words that has same character in position 9? consider group1_big group1_big group1_bigs group1_bugs1 group1_bugs2 group1_bug3 group2_bug Commented Jun 2, 2015 at 22:12
  • Treat as separate groups only if it is the same within a group should the rest of the group members be deleted. Commented Jun 2, 2015 at 22:13
  • How are the groups written in the file? Commented Jun 2, 2015 at 22:35

1 Answer 1

1
>>> s="""group1_big
... group1_big
... group1_bigs
... group1_bugs
... group2_bug"""
>>> d={}
>>> for i in s.split('\n') :
...   d.setdefault(i[:7],[]).append(i)
... 
>>> from collections import Counter
>>> count={len(j):Counter([t[8] for t in j]).most_common() for i,j in d.items()}
>>> final_count=[next((t[0] for t in j if t[1]>=0.75*i),j) for i,j in count.items()]
>>> words=[next((t for t in v if t[8] in final_count),None) for v in d.values()]
>>> words
['group2_bug', 'group1_big']

This is my first try i think it can be done better.

In the first part you can create a dictionary like following :

>>> for i in s.split('\n') :
...   d.setdefault(i[:7],[]).append(i)

>>> d
{'group2_': ['group2_bug'], 'group1_': ['group1_big', 'group1_big', 'group1_bigs', 'group1_bugs']}

then create a dictionary from count of 9th characters of the values of dusing collections.Counter and the length of words as the key:

>>> count
{1: [('u', 1)], 4: [('i', 3), ('u', 1)]}

then find the final 9th characters that meets your condition using following list comprehension :

final_count=[next((t[0] for t in j if t[1]>=0.75*i),j) for i,j in count.items()]
>>> final_count
['u', 'i']

And at last get the final words from values of d using a list comprehension

>>> words=[next((t for t in v if t[8] in final_count),None) for v in d.values()]
>>> words
['group2_bug', 'group1_big']
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.