58
\$\begingroup\$

I wrote a solution to the Knapsack problem in Python, using a bottom-up dynamic programming algorithm. It correctly computes the optimal value, given a list of items with values and weights, and a maximum allowed weight.

Any critique on code style, comment style, readability, and best-practice would be greatly appreciated. I'm not sure how Pythonic the code is; filling in a matrix seems like a natural way of implementing dynamic programming, but it doesn't "feel" Pythonic (and is that a bad thing, in this case?).

Note that the comments are a little on the verbose side; as I was writing the algorithm, I tried to be as explicit about each step as possible, since I was trying to understand it to the fullest extent possible. If they are excessive (or just plain incorrect), feel free to comment (hah!) on it.

import sys

def knapsack(items, maxweight):
    # Create an (N+1) by (W+1) 2-d list to contain the running values
    # which are to be filled by the dynamic programming routine.
    #
    # There are N+1 rows because we need to account for the possibility
    # of choosing from 0 up to and including N possible items.
    # There are W+1 columns because we need to account for possible
    # "running capacities" from 0 up to and including the maximum weight W.
    bestvalues = [[0] * (maxweight + 1)
                  for i in xrange(len(items) + 1)]

    # Enumerate through the items and fill in the best-value table
    for i, (value, weight) in enumerate(items):
        # Increment i, because the first row (0) is the case where no items
        # are chosen, and is already initialized as 0, so we're skipping it
        i += 1
        for capacity in xrange(maxweight + 1):
            # Handle the case where the weight of the current item is greater
            # than the "running capacity" - we can't add it to the knapsack
            if weight > capacity:
                bestvalues[i][capacity] = bestvalues[i - 1][capacity]
            else:
                # Otherwise, we must choose between two possible candidate values:
                # 1) the value of "running capacity" as it stands with the last item
                #    that was computed; if this is larger, then we skip the current item
                # 2) the value of the current item plus the value of a previously computed
                #    set of items, constrained by the amount of capacity that would be left
                #    in the knapsack (running capacity - item's weight)
                candidate1 = bestvalues[i - 1][capacity]
                candidate2 = bestvalues[i - 1][capacity - weight] + value

                # Just take the maximum of the two candidates; by doing this, we are
                # in effect "setting in stone" the best value so far for a particular
                # prefix of the items, and for a particular "prefix" of knapsack capacities
                bestvalues[i][capacity] = max(candidate1, candidate2)

    # Reconstruction
    # Iterate through the values table, and check
    # to see which of the two candidates were chosen. We can do this by simply
    # checking if the value is the same as the value of the previous row. If so, then
    # we say that the item was not included in the knapsack (this is how we arbitrarily
    # break ties) and simply move the pointer to the previous row. Otherwise, we add
    # the item to the reconstruction list and subtract the item's weight from the
    # remaining capacity of the knapsack. Once we reach row 0, we're done
    reconstruction = []
    i = len(items)
    j = maxweight
    while i > 0:
        if bestvalues[i][j] != bestvalues[i - 1][j]:
            reconstruction.append(items[i - 1])
            j -= items[i - 1][1]
        i -= 1

    # Reverse the reconstruction list, so that it is presented
    # in the order that it was given
    reconstruction.reverse()

    # Return the best value, and the reconstruction list
    return bestvalues[len(items)][maxweight], reconstruction


if __name__ == '__main__':
    if len(sys.argv) != 2:
        print('usage: knapsack.py [file]')
        sys.exit(1)

    filename = sys.argv[1]
    with open(filename) as f:
        lines = f.readlines()

    maxweight = int(lines[0])
    items = [map(int, line.split()) for line in lines[1:]]

    bestvalue, reconstruction = knapsack(items, maxweight)

    print('Best possible value: {0}'.format(bestvalue))
    print('Items:')
    for value, weight in reconstruction:
        print('V: {0}, W: {1}'.format(value, weight))

The input file that it expects is as follows:

165
92 23
57 31
49 29
68 44
60 53
43 38
67 63
84 85
87 89
72 82

The first line contains the maximum weight allowed, and subsequent lines contain the items, represented by value-weight pairs.

\$\endgroup\$
2
  • 5
    \$\begingroup\$ An old post, I know, but if you haven't run into it already enumerate allows you to specify the starting index (e.g. for i,item in enumerate(items,1): would have i begin at 1. \$\endgroup\$
    – SimonT
    Commented Oct 22, 2013 at 0:18
  • \$\begingroup\$ @SimonT: Fantastic! I've programmed in Python for 2+ years now, and I had never seen enumerate used that way. I'll definitely keep it in mind. \$\endgroup\$
    – voithos
    Commented Oct 22, 2013 at 4:44

3 Answers 3

62
\$\begingroup\$

1. Review

  1. The function knapsack lacks a docstring that would explain what arguments the function takes (what kind of things are in items? must items be a sequence, or can it be an iterable?) and what it returns.

  2. This kind of function is ideal for doctests.

  3. The comments say things like "Create an (N+1) by (W+1) 2-d list". But what is N and what is W? Presumably N is len(items) and W is maxweight, so it would be a good idea to use the same names in the comments and the code:

    N = len(items)
    W = maxweight
    
  4. The comment above bestvalues fails to explain what the values in this table mean. I would write something like this:

    # bestvalues[i][j] is the best sum of values for any
    # subsequence of the first i items, whose weights sum
    # to no more than j.
    

    This makes it obvious why \$0 ≤ i ≤ N\$ and why \$0 ≤ j ≤ W\$.

  5. In a loop like:

    bestvalues = [[0] * (maxweight + 1)
                  for i in xrange(len(items) + 1)]
    

    where the loop variable (here i) is unused, it's conventional to name it _.

  6. These lines could be omitted:

    # Increment i, because the first row (0) is the case where no items
    # are chosen, and is already initialized as 0, so we're skipping it
    i += 1
    

    and then in the rest of the code, use i + 1 instead of i and i instead of i - 1.

  7. The reconstruction loop:

    i = N
    while i > 0:
        # code
        i -= 1
    

    can be written like this:

    for i in reversed(range(1, N + 1)):
        # code
    
  8. The code prints an error message like this:

    print('usage: knapsack.py [file]')
    

    Error messages ought to go to standard error (not standard output). And it's a good idea not to hard-code the name of the program, because it would be easy to rename the program but forget to update the code. So write instead:

    sys.stderr.write("usage: {0} [file]\n".format(sys.argv[0]))
    
  9. The block of code that reads the problem description and prints the result only runs when __name__ == '__main__'. This makes it hard to test, for example from the interactive interpreter. It's usually best to put this kind of code in its own function, like this:

    def main(filename):
        with open(filename) as f:
            # etc.
    
    if __name__ == '__main__':
        if len(sys.argv) != 2:
            print('usage: knapsack.py [file]')
            sys.exit(1)
        main(sys.argv[1])
    

    and now you can run main('problem.txt') from the interpreter to test it.

  10. The code reads the whole of the file into memory as a list of lines:

    lines = f.readlines()
    

    this is harmless here because the file is small, but it's usually better to process a file one line at a time, like this:

    with open(filename) as f:
        maxweight = int(next(f))
        items = [[int(word) for word in line.split()] for line in f]
    

2. Recursive approach

Any dynamic programming algorithm can be implemented in two ways: by building a table of partial results from the bottom up (as in the code in the post), or by recursively computing the result from the top down, using memoization to avoid computing any partial result more than once.

The top-down approach often results in slightly simpler and clearer code, and it only computes the partial results that are needed for the particular problem instance (whereas in the bottom-up approach it's often hard to avoid computing all partial results even if some of them go unused).

So we could use @functools.lru_cache to implement a top-down solution, like this:

from functools import lru_cache

def knapsack(items, maxweight):
    """Solve the knapsack problem by finding the most valuable subsequence
    of items that weighs no more than maxweight.

    items must be a sequence of pairs (value, weight), where value is a
    number and weight is a non-negative integer.

    maxweight is a non-negative integer.

    Return a pair whose first element is the sum of values in the most
    valuable subsequence, and whose second element is the subsequence.

    >>> items = [(4, 12), (2, 1), (6, 4), (1, 1), (2, 2)]
    >>> knapsack(items, 15)
    (11, [(2, 1), (6, 4), (1, 1), (2, 2)])

    """
    @lru_cache(maxsize=None)
    def bestvalue(i, j):
        # Return the value of the most valuable subsequence of the first
        # i elements in items whose weights sum to no more than j.
        if j < 0:
            return float('-inf')
        if i == 0:
            return 0
        value, weight = items[i - 1]
        return max(bestvalue(i - 1, j), bestvalue(i - 1, j - weight) + value)

    j = maxweight
    result = []
    for i in reversed(range(len(items))):
        if bestvalue(i + 1, j) != bestvalue(i, j):
            result.append(items[i])
            j -= items[i][1]
    result.reverse()
    return bestvalue(len(items), maxweight), result

To see how many partial solutions this needs to compute, print bestvalue.cache_info() just before returning the result. When solving the example problem in the docstring, this outputs:

CacheInfo(hits=17, misses=42, maxsize=None, currsize=42)

The 42 entries in the cache are fewer than the 96 partial solutions computed by the bottom up approach.

\$\endgroup\$
8
  • \$\begingroup\$ Thanks for your response (especially points 3 and 9)! I can't believe I forgot to consider the memoized recursive method. I'll apply some of your suggestions later today, hopefully. Thanks again. \$\endgroup\$
    – voithos
    Commented Jan 16, 2013 at 22:19
  • \$\begingroup\$ Beware of recursion depth limit issues with this proposed solution. \$\endgroup\$ Commented Mar 8, 2014 at 15:13
  • 1
    \$\begingroup\$ @Frank: yes, this solution uses len(items) levels of stack. \$\endgroup\$ Commented Mar 13, 2014 at 8:26
  • \$\begingroup\$ @GarethRees Do you need to define bestvalue() inside knapsack()? I would tend to avoid defining a function inside a function to make things clearer. \$\endgroup\$
    – green diod
    Commented Jan 17, 2017 at 13:43
  • 4
    \$\begingroup\$ @greendiod: I defined bestvalue inside knapsack because (i) it's only used there; (ii) it uses items from the outer scope (it would be a pain to have to pass this down through all the recursive calls); (iii) it's memoized so it has an associated cache that we don't want to keep around when knapsack has returned. \$\endgroup\$ Commented Jan 17, 2017 at 18:06
2
\$\begingroup\$

For the first section of code involving nested loops, if you should choose to
- use curr instead of i,
- assign prev = curr - 1 and
- use enumerate(items, 1),

the code will literally document itself.

for curr, (value, weight) in enumerate(items, 1):
    prev = curr - 1
    for capacity in range(maxweight + 1):
        if weight > capacity:
            bestvalues[curr][capacity] = bestvalues[prev][capacity]
        else:
            candidate1 = bestvalues[prev][capacity]
            candidate2 = bestvalues[prev][capacity - weight] + value

            bestvalues[curr][capacity] = max(candidate1, candidate2)
\$\endgroup\$
1
\$\begingroup\$

Following problem can be solved using Dynamic Programming in a much efficient way, in term of lines of code and fastest time to perform computation. On top that , following code perform memoization to cache previously computed results.

try:
    from functools import lru_cache
except ImportError:
    # For Python2
    # pip install backports.functools_lru_cache
    from backports.functools_lru_cache import lru_cache
class knapsack:
    """
    Maximize sum of selected weight.
    Sum of selected size is less than capacity.
    Algorithm: Dynamic Programming
    ----
    >>>items = [(4, 12), (2, 1), (6, 4), (1, 1), (2, 2)]
    >>>weight, size   = zip(*items)
    >>>weight = list(weight)
    >>>size = list(size)
    >>>capacity = 15
    >>>knapsack(size, weight).solve(capacity)

    >>>(11, [1, 2, 3, 4])

    """
    def __init__(self, size, weight):
        self.size = size
        self.weight = weight
    @lru_cache()
    def solve(self, cap, i=0):
        if cap < 0: return -sum(self.weight), []
        if i == len(self.size): return 0, []
        res1 = self.solve(cap,  i + 1)
        res2 = self.solve(cap - self.size[i], i + 1)
        res2 = (res2[0] + self.weight[i], [i] + res2[1])
        return res1 if res1[0] >= res2[0] else res2
\$\endgroup\$
1
  • 2
    \$\begingroup\$ Welcome to Code Review! You have presented an alternative solution, but haven't reviewed the code. Please edit to show what aspects of the question code prompted you to write this version, and in what ways it's an improvement over the original. It may be worth (re-)reading How to Answer. \$\endgroup\$ Commented Oct 5, 2018 at 7:54

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.