2
\$\begingroup\$

I am trying to write code that creates a partial match table given a list. The following seems to produce the correct outputs, but it looks quite clunky to me. In particular, it feels strange to me to have repeated checks for i < len(listB) and also to modify the variable in the for loop. How can this be improved?

def create_table(listB):
    tab = [0] * len(listB)
    tab[0] = -1
    
    if len(listB) == 1:
        return tab
        
    for i in range(1, len(listB)):
        while i < len(listB) and listB[i] != listB[0] :
            i += 1
        if i < len(listB):
            if listB[i] == listB[0]:
                    tab[i] = -1
                    cur_index = 1
                    i += 1
            while i < len(listB) and listB[cur_index] == listB[i]:
                i += 1
                cur_index += 1
            if i == len(listB):
                return tab
            tab[i] = cur_index
    
    return tab
\$\endgroup\$

2 Answers 2

3
\$\begingroup\$

Improper Output

The algorithm on the page you linked to reads:

algorithm kmp_table:
    input:
        an array of characters, W (the word to be analyzed)
    output:
        an array of integers, T (the table to be filled)

    define variables:
        an integer, pos ← 1 (the current position we are computing in T)
        an integer, cnd ← 0 (the zero-based index in W of the next character of the current candidate substring)

    let T[0] ← -1

    while pos < length(W) do
        if W[pos] = W[cnd] then
            let T[pos] ← T[cnd]
        else
            let T[pos] ← cnd
            while cnd ≥ 0 and W[pos] ≠ W[cnd] do
                let cnd ← T[cnd]
        let pos ← pos + 1, cnd ← cnd + 1

    let T[pos] ← cnd (only needed when all word occurrences are searched)

First thing to check is whether the algorithm is uses 0-based or 1-based arrays. Since it assigns to T[0] we can immediately tell it is zero based.

Since pos is being incremented by 1, the while pos < length(W) loop will end when pos == length(W).

The final statement of the algorithm assigns T[pos] a value. But if pos = length(W), this means that the T array will contain length(W) + 1 elements!

Since you initialize tab = [0] * len(listB), the output cannot contain the final value the algorithm should produce! The output table is supposed to be 1 element longer than the input! Using the example from the article, W = “ABCDABD” (7 letters) is supposed to produce [-1, 0, 0, 0, -1, 0, 2, 0] (8 elements)

Improper looping

The for i in range(…) loop does not allow you to advance the loop index. On each iteration, the next value from the range(…) will be used regardless of any modification made to i inside the loop. For example:

for i in range(10):
    if i == 2:
        i = 8
    print(i)

will print ten numbers: 0, 1, 8, 3, 4, 5, 6, 7, 8, and 9. After modifying i, the next iteration continues with i=3, and does not skip directly to i=9. For this reason, attempting to modify a for loop index is confusing to the reader, and should be avoided.

If you want to modify loop indexes manually, use a while loop, not a for loop.

\$\endgroup\$
2
\$\begingroup\$

Documentation

The PEP 8 style guide recommends adding docstrings for functions. The docstring should:

  • Describe the type of the input data
  • The type of the returned data
  • What the function is doing and how it is doing it. For example, if it implements a known algorithm, then mention it.
    def create_table(listB):
        """
        Create a table of ...
        """

Naming

The PEP-8 recommends snake_case for variable names. listB would become list_b, or preferably something that conveys more meaning than the letter "B".

The variable named tab is ambiguous. table would be clearer since you are creating a table.

DRY

The expression len(listB) is called several times. It would be more efficient to set it to a variable before its first use such that the len function is just called once:

number_things = len(listB)
tab = [0] * number_things

You should choose a more meaningful name than number_things.

Comments

modify the variable in the for loop

I agree with you than it feels strange, and I find it quite confusing that you modify the for loop iterator variable inside the loop.

I don't know how to improve this situation, but this is where adding meaningful comments to the code might help.

Add a comment before the first while loop to describe what you are doing and why you are doing it that way.

Do the same with the if and while statements that follow it. In doing so, it may become evident what you need to do to refactor the code.

Layout

There is some inconsistent indentation. The black program can be used to automatically reformat the code with consistency.

Tests

seems to produce the correct outputs

Add tests to prove that it is doing what you expect. Lots of tests.

\$\endgroup\$
1
  • \$\begingroup\$ No, the OP's output is not correct. \$\endgroup\$
    – Booboo
    Commented Feb 16 at 22:29

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.