Skip to main content
deleted 16 characters in body; edited title
Source Link
Jamal
  • 35.2k
  • 13
  • 134
  • 238

all All-pythonPython implementation of n-dimensional convolution: help enhancing performance?

First time posting on Code Review Stack Exchange. Apologies if I'm not following all norms! Please let me know if I should adjust anything or provide additional information!

To better understand how discrete finite convolution works (read educational purposes) I wrote an all-python implementation of the convolution function. The idea was for it to give the same output as numpy.convolve, including the mode options. I generalized the code so that it functions for n-dimensional convolutions rather than just for 1-dimensional convolutions. This means it more closely matches the behavior of scipy.signal.convolve.

Here is the code.

all-python implementation of n-dimensional convolution: help enhancing performance?

First time posting on Code Review Stack Exchange. Apologies if I'm not following all norms! Please let me know if I should adjust anything or provide additional information!

To better understand how discrete finite convolution works (read educational purposes) I wrote an all-python implementation of the convolution function. The idea was for it to give the same output as numpy.convolve, including the mode options. I generalized the code so that it functions for n-dimensional convolutions rather than just for 1-dimensional convolutions. This means it more closely matches the behavior of scipy.signal.convolve.

Here is the code.

All-Python implementation of n-dimensional convolution

To better understand how discrete finite convolution works (read educational purposes) I wrote an all-python implementation of the convolution function. The idea was for it to give the same output as numpy.convolve, including the mode options. I generalized the code so that it functions for n-dimensional convolutions rather than just for 1-dimensional convolutions. This means it more closely matches the behavior of scipy.signal.convolve.

Tweeted twitter.com/StackCodeReview/status/1157848945474842624
user does not want to use built-in functions
Link
dfhwze
  • 14.2k
  • 3
  • 40
  • 101
Source Link

all-python implementation of n-dimensional convolution: help enhancing performance?

First time posting on Code Review Stack Exchange. Apologies if I'm not following all norms! Please let me know if I should adjust anything or provide additional information!

To better understand how discrete finite convolution works (read educational purposes) I wrote an all-python implementation of the convolution function. The idea was for it to give the same output as numpy.convolve, including the mode options. I generalized the code so that it functions for n-dimensional convolutions rather than just for 1-dimensional convolutions. This means it more closely matches the behavior of scipy.signal.convolve.

Here is the code.

def py_nd_convolve(s, k, mode='full'):
    # All python implementation of n-dimensional scipy.signal.convolve for educational purposes

    # First ensure that one of the arrays fits inside of the other and by convention call this one
    # the kernel "k" and call the larger array the signal "s".
    if all(np.less_equal(k.shape, s.shape)):
        pass
    elif all(np.greater(k.shape, s.shape)):
        s, k = k.astype(float), s.astype(float)
    else:
        raise ValueError(f'Arrays have mismatched shapes: {k.shape} and {s.shape}')

    dim_list = np.arange(s.ndim)
    n_s = np.array(s.shape, dtype=int)
    n_k = np.array(k.shape, dtype=int)
    n_full = n_s + n_k - 1
    out = np.zeros(n_full, dtype=float)
    zero_arr = np.zeros(s.ndim, dtype=int)
    for ind, val in np.ndenumerate(out):
        ind_arr = np.array(ind)  # convert index tuple to numpy array so it can be used for arithmetic

        # i_min and i_max specify the allowed range which indices can take on in each dimension
        # to perform the convolution summation. The values for these are non-trivial near the
        # boundaries of the output array.
        i_min = np.maximum(zero_arr, ind_arr - n_k + 1)
        i_max = np.minimum(ind_arr, n_s - 1)

        # create a d-dimensional tuple of slices to slice both the kernel and signal to perform the convolution.
        s_slicer = tuple([slice(i_min[d], i_max[d] + 1) for d in dim_list])
        k_slicer = tuple([slice(ind[d]-i_max[d], ind[d]-i_min[d]+1) for d in dim_list])

        # element-wise product and summation of subset of signal and reversed kernel
        out[tuple(ind)] = np.multiply(s[s_slicer], np.flip(k[k_slicer])).sum()


    # This section of code clips the output data as per user specification to manage boundary effects
    n_min = np.zeros_like(n_full)
    n_max = n_full
    if mode == 'full':  # take all possible output points, maximal boundary effects
        pass
    elif mode == 'valid':  # eliminate any part of the output which is effect by boundary effects
        n_min = n_k-1
        n_max = n_s
    elif mode == 'same':  # clip the output so that it has the same shape as s
        n_min = (n_k-1)/2
        n_max = n_s + (n_k-1)/2
    slicer = [slice(int(n_min[d]), int(n_max[d])) for d in dim_list]  # d-dimensional slice tuple for clipping
    return out[tuple(slicer)]

Since this code was for educational purposes I did not desire it to run fast, I namely desired for it to be very readable. As written the code is very very slow. While scipy.signal.convolve can convolve a (200, 200) array with a (50, 50) array in a few milliseconds this code takes a few seconds.

I am not interested in making this code as fast as possible because I know that for that I could simply use the built in functions. I also know that I could possibly perform this operation faster by using an fft approach.

Question

Rather, I am curious if there are any glaring performance issues with the code that can be easily modified so that I can maintain a similar level of readability and logical flow but a higher level of performance.

For example, am I making an egregious numpy mistakes such as copying arrays unnecessarily or misusing memory in a very bad way?

  • I've noticed that I could flip the k array outside of the loop but it doesn't look like this will save much time.
  • It looks like a big chunk of time is spent on defining s_slicer and k_slicer.