6

I have the following function that gets a source and a modified strings, and bolds the changed words in it.

def appendBoldChanges(s1, s2):
    "Adds <b></b> tags to words that are changed"
    l1 = s1.split(' ')
    l2 = s2.split(' ')
    for i, val in enumerate(l1):
        if l1[i].lower() != l2[i].lower():
            s2 = s2.replace(l2[i], "<b>%s</b>" % l2[i])

    return s2
print appendBoldChanges("britney spirs", "britney spears") # returns britney <b>spears</b>

It works fine on strings with the same word count, but fails with different word counts, as sora iro days and sorairo days.

How can I take the spaced into consideration?

1
  • @mata You can actually make that an answer. :) Commented May 27, 2012 at 15:44

3 Answers 3

20

You could use difflib, and do it like this:

from difflib import Differ

def appendBoldChanges(s1, s2):
    "Adds <b></b> tags to words that are changed"
    l1 = s1.split(' ')
    l2 = s2.split(' ')
    dif = list(Differ().compare(l1, l2))
    return " ".join(['<b>'+i[2:]+'</b>' if i[:1] == '+' else i[2:] for i in dif 
                                                           if not i[:1] in '-?'])

print appendBoldChanges("britney spirs", "britney sprears")
print appendBoldChanges("sora iro days", "sorairo days")
#Output:
britney <b>sprears</b>
<b>sorairo</b> days
3
  • 1
    +1 for a superb one-liner. You may want to clean up the returned string. Input appendBoldChanges("sorairo days", "sora iro days") results in <b>sora</b> <b>iro</b> days when OP would probably need <b>sora iro</b> days. Truly Pythonic elegance.
    – daedalus
    Commented May 27, 2012 at 16:09
  • @gauden - cheers ;) I figured it probably wouldn't matter as they display the same. If its an issue, then yep, .replace('</b> <b>',' ') on end, would be the fix..
    – fraxel
    Commented May 27, 2012 at 16:18
  • for sorairo days and soraiao days I get the following result back: ^\n <b>soraiao</b> ^\n days. I tried to replace the `\n' and '^', but I'm not sure it's the right way.
    – iTayb
    Commented May 27, 2012 at 16:23
2

Have a look at the difflib module, you could use a SequenceMatcher to find the changed regions in your text.

1

A small upgrade tp @fraxel answer that returns 2 outputs - the original and the new version with marked changes. I also change the one-liner to a more readable version in my opinion

def show_diff(text, n_text):
    seqm = difflib.SequenceMatcher(None, text, n_text)
    output_orig = []
    output_new = []
    for opcode, a0, a1, b0, b1 in seqm.get_opcodes():
        orig_seq = seqm.a[a0:a1]
        new_seq = seqm.b[b0:b1]
        if opcode == 'equal':
            output_orig.append(orig_seq)
            output_new.append(orig_seq)
        elif opcode == 'insert':
            output_new.append("<font color=green>{}</font>".format(new_seq))
        elif opcode == 'delete':
            output_orig.append("<font color=red>{}</font>".format(orig_seq))
        elif opcode == 'replace':
            output_new.append("<font color=blue>{}</font>".format(new_seq))
            output_orig.append("<font color=blue>{}</font>".format(orig_seq))
        else:
            print('Error')
    return ''.join(output_orig), ''.join(output_new)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.