1

I'm trying to find a way to calculate/determine the closest string match from a list of strings.

Here is the string that I want to find the closest match to: CTGGAG

From a list of strings:

matchlist = ['ACTGGA', 'CTGGAG', 'CTGGAA', 'CTGGTG', 'ACCGGT']

I've tried using the SequenceMatcher from difflib:

for t in match:
    assignseqmatch = SequenceMatcher(None, CTGGAG, t)
    ratio = assignseqmatch.ratio()
    seqratiomatchlist.append(ratio)
    for r, s in zip(seqratiomatchlist, neutralhex):
        neutralmatchscores[r].append(s)

However, when I use this method, the first four values in the list are all reported to have the same ratio (0.833333) when the third and fourth values in the list should have the highest ratio since there is only a one letter difference between CTGGAG, CTGGAA, and CTGGTG. I basically just want to calculate how many letter changes there are between the two strings. Is this possible?

1
  • You could use Levenshtein distance between two strings (i.e edit distance, how many edits do you need to make to make you're two strings match). There are several python packages that have already implemented this. Also it's pretty easy to implement. And it allows you to compare two strings of unequal length. Commented Mar 1, 2016 at 2:20

1 Answer 1

2

To find the number of letter changes between two equal-length strings, x and y, do the following:

numChanges = sum(i != j for i, j in zip(x, y))
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.