Is there any implementation of this string matching method in python?

Question

I am trying to work out which entries in my data store are near-duplicates using approximate string matching.

Is there any implementation of the following approach in python, or do i need to try and roll my own?

Thanks :)

...

A brute-force approach would be to compute the edit distance to P for all substrings of T, and then choose the substring with the minimum distance. However, this algorithm would have the running time O(n3 m)

A better solution[3][4], utilizing dynamic programming, uses an alternative formulation of the problem: for each position j in the text T and each position i in the pattern P, compute the minimum edit distance between the i first characters of the pattern, Pi, and any substring Tj',j of T that ends at position j.

What is the most efficient way to apply this to many strings?

John Machin · Accepted Answer · 2011-03-04 10:27:52Z

1

Yes.

google("python levenshtein")

answered Mar 4, 2011 at 10:27

John Machin

83.2k12 gold badges147 silver badges193 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

mgautierfr · Accepted Answer · 2011-03-04 10:35:55Z

1

difflib.get_close_matches should do the work.

answered Mar 4, 2011 at 10:35

mgautierfr

7813 silver badges10 bronze badges

Comments

lafras · Accepted Answer · 2011-03-04 10:23:18Z

0

difflib might be the answer, eg,

from difflib import context_diff

a = 'acaacbaaca'
b = 'accabcaacc'

print ''.join(context_diff(a,b))

answered Mar 4, 2011 at 10:23

lafras

9,2464 gold badges31 silver badges29 bronze badges

Comments

sk8asd123 · Accepted Answer · 2013-08-02 23:12:53Z

0

Levenshtein distance performs very similarly to the fuzzywuzzy standard ratio() function. fuzzywuzzy uses difflib http://seatgeek.com/blog/dev/fuzzywuzzy-fuzzy-string-matching-in-python

example from the fuzzywuzzy documentation: https://github.com/seatgeek/fuzzywuzzy

fuzz.ratio("this is a test", "this is a test!")
    96

answered Aug 2, 2013 at 23:12

sk8asd123

1,71516 silver badges14 bronze badges

Collectives™ on Stack Overflow

Is there any implementation of this string matching method in python?

4 Answers 4

Comments

Comments

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

Comments

Related