Compare Characters Within a String

Question

I'm trying to compare individual characters in a string in python and i'm not sure how to do it. In a file of strings, all strings belong to groups and I want to determine if 75% of the strings in a group have the same character at a given position, and if so, delete all of the strings getting compared to the original string.

I'm thinking something like the following, comparing char2 in the word big/bug:

count=0
group1_big
group1_big
group1_bigs
group1_bugs
group2_bug

for(string in file)
    if(chars 1-7 of string == chars 1-7 of next string & char 9 is the same in both words)
        if(75% are the same at position 9)
            delete all other strings in the same group

In this case, if we compare chars 1-7, all group1 matches, and 75% have and 'i' at character position 9 delete all but the first one. Resulting in the following file output:

group1_big
group2_bug

What about if we have 2 group of words that has same character in position 9? consider group1_big group1_big group1_bigs group1_bugs1 group1_bugs2 group1_bug3 group2_bug — Kasravnd
– Kasravnd, Commented Jun 2, 2015 at 22:12
Treat as separate groups only if it is the same within a group should the rest of the group members be deleted. — The Nightman
– The Nightman, Commented Jun 2, 2015 at 22:13

Kasravnd · Accepted Answer · 2015-06-02 22:45:10Z

>>> s="""group1_big
... group1_big
... group1_bigs
... group1_bugs
... group2_bug"""
>>> d={}
>>> for i in s.split('\n') :
...   d.setdefault(i[:7],[]).append(i)
... 
>>> from collections import Counter
>>> count={len(j):Counter([t[8] for t in j]).most_common() for i,j in d.items()}
>>> final_count=[next((t[0] for t in j if t[1]>=0.75*i),j) for i,j in count.items()]
>>> words=[next((t for t in v if t[8] in final_count),None) for v in d.values()]
>>> words
['group2_bug', 'group1_big']

This is my first try i think it can be done better.

In the first part you can create a dictionary like following :

>>> for i in s.split('\n') :
...   d.setdefault(i[:7],[]).append(i)

>>> d
{'group2_': ['group2_bug'], 'group1_': ['group1_big', 'group1_big', 'group1_bigs', 'group1_bugs']}

then create a dictionary from count of 9th characters of the values of dusing collections.Counter and the length of words as the key:

>>> count
{1: [('u', 1)], 4: [('i', 3), ('u', 1)]}

then find the final 9th characters that meets your condition using following list comprehension :

final_count=[next((t[0] for t in j if t[1]>=0.75*i),j) for i,j in count.items()]
>>> final_count
['u', 'i']

And at last get the final words from values of d using a list comprehension

>>> words=[next((t for t in v if t[8] in final_count),None) for v in d.values()]
>>> words
['group2_bug', 'group1_big']

Collectives™ on Stack Overflow

Compare Characters Within a String

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related