Finding differences between strings

Question

I have the following function that gets a source and a modified strings, and bolds the changed words in it.

def appendBoldChanges(s1, s2):
    "Adds <b></b> tags to words that are changed"
    l1 = s1.split(' ')
    l2 = s2.split(' ')
    for i, val in enumerate(l1):
        if l1[i].lower() != l2[i].lower():
            s2 = s2.replace(l2[i], "<b>%s</b>" % l2[i])

    return s2
print appendBoldChanges("britney spirs", "britney spears") # returns britney <b>spears</b>

It works fine on strings with the same word count, but fails with different word counts, as sora iro days and sorairo days.

How can I take the spaced into consideration?

@mata You can actually make that an answer. :)
– UltraInstinct
Commented May 27, 2012 at 15:44 — UltraInstinct, Commented May 27, 2012 at 15:44

fraxel · Accepted Answer · 2012-05-27 16:30:16Z

20

You could use difflib, and do it like this:

from difflib import Differ

def appendBoldChanges(s1, s2):
    "Adds <b></b> tags to words that are changed"
    l1 = s1.split(' ')
    l2 = s2.split(' ')
    dif = list(Differ().compare(l1, l2))
    return " ".join(['<b>'+i[2:]+'</b>' if i[:1] == '+' else i[2:] for i in dif 
                                                           if not i[:1] in '-?'])

print appendBoldChanges("britney spirs", "britney sprears")
print appendBoldChanges("sora iro days", "sorairo days")
#Output:
britney <b>sprears</b>
<b>sorairo</b> days

edited May 27, 2012 at 16:30

answered May 27, 2012 at 15:58

fraxel

35.3k11 gold badges101 silver badges104 bronze badges

1

+1 for a superb one-liner. You may want to clean up the returned string. Input appendBoldChanges("sorairo days", "sora iro days") results in sora iro days when OP would probably need sora iro days. Truly Pythonic elegance.
– daedalus
Commented May 27, 2012 at 16:09
@gauden - cheers ;) I figured it probably wouldn't matter as they display the same. If its an issue, then yep, .replace(' ',' ') on end, would be the fix..
– fraxel
Commented May 27, 2012 at 16:18
for sorairo days and soraiao days I get the following result back: ^\n soraiao ^\n days. I tried to replace the `\n' and '^', but I'm not sure it's the right way.
– iTayb
Commented May 27, 2012 at 16:23

Add a comment |

mata · Accepted Answer · 2012-05-27 15:49:15Z

2

Have a look at the difflib module, you could use a SequenceMatcher to find the changed regions in your text.

answered May 27, 2012 at 15:49

mata

69.2k10 gold badges168 silver badges162 bronze badges

Add a comment |

Rani · Accepted Answer · 2017-11-21 20:54:14Z

A small upgrade tp @fraxel answer that returns 2 outputs - the original and the new version with marked changes. I also change the one-liner to a more readable version in my opinion

def show_diff(text, n_text):
    seqm = difflib.SequenceMatcher(None, text, n_text)
    output_orig = []
    output_new = []
    for opcode, a0, a1, b0, b1 in seqm.get_opcodes():
        orig_seq = seqm.a[a0:a1]
        new_seq = seqm.b[b0:b1]
        if opcode == 'equal':
            output_orig.append(orig_seq)
            output_new.append(orig_seq)
        elif opcode == 'insert':
            output_new.append("<font color=green>{}</font>".format(new_seq))
        elif opcode == 'delete':
            output_orig.append("<font color=red>{}</font>".format(orig_seq))
        elif opcode == 'replace':
            output_new.append("<font color=blue>{}</font>".format(new_seq))
            output_orig.append("<font color=blue>{}</font>".format(orig_seq))
        else:
            print('Error')
    return ''.join(output_orig), ''.join(output_new)

Collectives™ on Stack Overflow

Finding differences between strings

3 Answers 3

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Linked

Related