typo

Source Link

edited Apr 8, 2019 at 19:14

71.2k
5
76
257

I need a function that checks how different are two different strings. I chose the Levenshtein distance as a quick approach, and implemented this function:

from difflib import ndiff

def calculate_levenshtein_distance(str_1, str_2):
    """
        The Levenshtein distance is a string metric for measuring the difference between two sequences.
        It is calculated as the minimum number of single-character edits necessary to transform one string into another
    """
    distance = 0
    buffer_removed = buffer_added = 0
    for x in ndiff(str_1, str_2):
        code = x[0]
        # Code ? is ignored as it does not translate to any modification
        if code == ' ':
            distance += max(buffer_removed, buffer_added)
            buffer_removed = buffer_added = 0
        elif code == '-':
            buffer_removed += 1
        elif code == '+':
            buffer_added += 1
    distance += max(buffer_removed, buffer_added)
    return distance

Then calling it as:

similarity = 1 - calculate_levenshtein_distance(str_1, str_2) / max(len(str_1), len(str_2))

How sloppy/pronprone to errors is this code? How can it be improved?

I need a function that checks how different are two different strings. I chose the Levenshtein distance as a quick approach, and implemented this function:

from difflib import ndiff

def calculate_levenshtein_distance(str_1, str_2):
    """
        The Levenshtein distance is a string metric for measuring the difference between two sequences.
        It is calculated as the minimum number of single-character edits necessary to transform one string into another
    """
    distance = 0
    buffer_removed = buffer_added = 0
    for x in ndiff(str_1, str_2):
        code = x[0]
        # Code ? is ignored as it does not translate to any modification
        if code == ' ':
            distance += max(buffer_removed, buffer_added)
            buffer_removed = buffer_added = 0
        elif code == '-':
            buffer_removed += 1
        elif code == '+':
            buffer_added += 1
    distance += max(buffer_removed, buffer_added)
    return distance

Then calling it as:

similarity = 1 - calculate_levenshtein_distance(str_1, str_2) / max(len(str_1), len(str_2))

How sloppy/pron to errors is this code? How can it be improved?

I need a function that checks how different are two different strings. I chose the Levenshtein distance as a quick approach, and implemented this function:

from difflib import ndiff

def calculate_levenshtein_distance(str_1, str_2):
    """
        The Levenshtein distance is a string metric for measuring the difference between two sequences.
        It is calculated as the minimum number of single-character edits necessary to transform one string into another
    """
    distance = 0
    buffer_removed = buffer_added = 0
    for x in ndiff(str_1, str_2):
        code = x[0]
        # Code ? is ignored as it does not translate to any modification
        if code == ' ':
            distance += max(buffer_removed, buffer_added)
            buffer_removed = buffer_added = 0
        elif code == '-':
            buffer_removed += 1
        elif code == '+':
            buffer_added += 1
    distance += max(buffer_removed, buffer_added)
    return distance

Then calling it as:

similarity = 1 - calculate_levenshtein_distance(str_1, str_2) / max(len(str_1), len(str_2))

How sloppy/prone to errors is this code? How can it be improved?

Tweeted twitter.com/StackCodeReview/status/1115268234708692992

occurred Apr 8, 2019 at 15:00

Became Hot Network Question

occurred Apr 8, 2019 at 11:43

Source Link

asked Apr 8, 2019 at 10:01

Kyra_W

333
1
2
5

Calculate Levenshtein distance between two strings in Python

I need a function that checks how different are two different strings. I chose the Levenshtein distance as a quick approach, and implemented this function:

from difflib import ndiff

def calculate_levenshtein_distance(str_1, str_2):
    """
        The Levenshtein distance is a string metric for measuring the difference between two sequences.
        It is calculated as the minimum number of single-character edits necessary to transform one string into another
    """
    distance = 0
    buffer_removed = buffer_added = 0
    for x in ndiff(str_1, str_2):
        code = x[0]
        # Code ? is ignored as it does not translate to any modification
        if code == ' ':
            distance += max(buffer_removed, buffer_added)
            buffer_removed = buffer_added = 0
        elif code == '-':
            buffer_removed += 1
        elif code == '+':
            buffer_added += 1
    distance += max(buffer_removed, buffer_added)
    return distance

Then calling it as:

similarity = 1 - calculate_levenshtein_distance(str_1, str_2) / max(len(str_1), len(str_2))

How sloppy/pron to errors is this code? How can it be improved?

python edit-distance

Stack Exchange Network

Return to Question

Calculate Levenshtein distance between two strings in Python