Creation of MxM matrix: Python version much slower than Matlab version

Question

I have written the same code in Python (NumPy) and in Matlab, and I tried to use the same structure for both language and follow the same procedure. Now my problem is that when I run my code in Python it's very very slow but it runs fast in Matlab.

How can I improve my Python code? I want to work on the Python version of my code.

In the code I am trying to create an MxM matrix (M can vary between 100 to 1000) and trying to fill this matrix 16 times, each time I add the previous value of M[i,j] with the current value. The operation inside the code is simple just some add and divide, and now I wonder why Matlab performs better than Python.

Python:

import numpy as np
import matplotlib.pyplot as plt

from datetime import datetime
startTime = datetime.now()

bb = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
bb = ['{0:04b}'.format(x) for x in bb]


step_prob=0.01

step = int(1/step_prob)

out_t = np.zeros((step,step), dtype=float)
out_f = np.zeros((step,step), dtype=float)
s = np.zeros((step,step), dtype=float)
improve = np.empty((step,step))

for ii in range(1, step+1):
    print ii
    pti=ii*step_prob
    for  jj in range(1, step+1):
        pfi=jj*step_prob
        improve[ii-1,jj-1]=0
        si=(1-pfi+pti)/2
        for k in range(len(bb)):
            f1 = int(bb[k][0])
            f2 = int(bb[k][1])  
            f3 = int(bb[k][2])
            f4 = int(bb[k][3])
            for i in range(1, step+1):
                for j in range(1, step+1):
                    ptj=i*step_prob
                    pfj=j*step_prob
                    out_t[i-1,j-1] = f1*pti*ptj+f2*pti*(1-ptj)+f3*(1-pti)*ptj+f4*(1-pti)*(1-ptj)
                    out_f[i-1,j-1] = f1*pfi*pfj+f2*pfi*(1-pfj)+f3*(1-pfi)*pfj+f4*(1-pfi)*(1-pfj)
                    sj=(1-pfj+ptj)/2;
                    s[i-1,j-1] = ((1-out_f[i-1,j-1]+out_t[i-1,j-1])/2)-max(si,sj);

#         temp=s*(s>0)*tril(ones(size(s)));
            temp = s*(s>0)*np.tril(np.ones((len(s),len(s))).astype(int))
            improve[ii-1,jj-1]=improve[ii-1,jj-1]+sum(sum(temp))

im = plt.imshow(improve*np.tril(np.ones((len(improve),len(improve))).astype(int)),interpolation='bilinear', aspect='auto') # s*[s>0]
# im = plt.imshow(improve, interpolation='bilinear', aspect='auto') # s*[s>0]
plt.colorbar(im, orientation='vertical')
plt.show()

Matlab:

close all
clear all
clc


bb = de2bi([0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15],4,'left-msb');
% bb = de2bi([1],4);

step_prob=0.01;
for ii=1:1*(1/step_prob)
    ii
    pti=ii*step_prob;
    for jj=1:1*(1/step_prob)
        pfi=jj*step_prob;
        improve(ii,jj)=0;
        si=(1-pfi+pti)/2;
        for k=1:size(bb,1)
            f1=bb(k,1);
            f2=bb(k,2);
            f3=bb(k,3);
            f4=bb(k,4);
            for i=1:1*(1/step_prob)
                for j=1:1*(1/step_prob)
                    ptj=i*step_prob;
                    pfj=j*step_prob;
                    out_t(i,j)=f1*pti*ptj+f2*pti*(1-ptj)+f3*(1-pti)*ptj+f4*(1-pti)*(1-ptj);
                    out_f(i,j)=f1*pfi*pfj+f2*pfi*(1-pfj)+f3*(1-pfi)*pfj+f4*(1-pfi)*(1-pfj);
                    sj=(1-pfj+ptj)/2;
                    s(i,j)=((1-out_f(i,j)+out_t(i,j))/2)-max(si,sj);
                end
            end
            temp=s.*(s>0).*tril(ones(size(s)));
            improve(ii,jj)=improve(ii,jj)+sum(sum(temp));
            clear f1 f2 f3 f4
        end
    end
end
scale=step_prob:step_prob:1;
imagesc(scale,scale,improve.*tril(ones(size(improve))))
xlabel('Pfj')
ylabel('Ptj')
axis xy

Your title should state what your code is for, not that it is slow! What is it's purpose? — Adam, Commented Apr 28, 2014 at 22:00
To make life easier for reviewers, please add sufficient context to your question. The more you tell us about what your code does and what the purpose of doing that is, the easier it will be for reviewers to help you. — Simon Forsberg, Commented Apr 28, 2014 at 23:03
Coincidentally, I have the exact same problem, except the speeds of the two languages are switched for me... — Vedaad Shakib, Commented Jul 14, 2015 at 8:57

Winston Ewert · Accepted Answer · 2014-05-01 21:22:11Z

9

You use very short variable names, so its hard to follow what your code is doing.

For speed in numpy, you need to use vector operations. So instead of writing

for ii in range(step):
    out[ii] = ii * x + b

You should write:

out = numpy.arange(step)*x + b

Individually accessing the elements of numpy array is actually quite slow. You need to write your code to do everything in vector operations.

Here is my rewrite of your inner loop to be vectorized. You could vectorize the outer loop as well, but this should make the biggest difference.

        f1 = int(bb[k][0])
        f2 = int(bb[k][1])  
        f3 = int(bb[k][2])
        f4 = int(bb[k][3])

        ptj = np.arange(step_prob, 1.0 + step_prob, step_prob)[:, None]
        pfj = np.arange(step_prob, 1.0 + step_prob, step_prob)[None, :]

        out_t = f1*pti*ptj+f2*pti*(1-ptj)+f3*(1-pti)*ptj+f4*(1-pti)*(1-ptj)
        out_f = f1*pfi*pfj+f2*pfi*(1-pfj)+f3*(1-pfi)*pfj+f4*(1-pfi)*(1-pfj)
        sj = (1 - pfj + ptj) / 2

        the_max = np.maximum(sj, [si])
        s = ((1-out_f+out_t)/2)-the_max

        temp = s*(s>0)*np.tril(np.ones((len(s),len(s))).astype(int))
        improve[ii-1,jj-1]=improve[ii-1,jj-1]+temp.sum()

edited May 1, 2014 at 21:22

answered Apr 29, 2014 at 23:59

Winston Ewert

30.5k4 gold badges52 silver badges79 bronze badges

\$\begingroup\$ it's kinda impossible for me to do this, since I need to compute some "expected value" individually for different threshold. anyway I don't understand why this way of coding works find in Matlab but not in Numpy \$\endgroup\$
– Am1rr3zA
Commented Apr 30, 2014 at 0:41
5

\$\begingroup\$ @Am1rr3zA, Python and Matlab are different languages. In Python, manual loops like you've written are rather slow. They used to be slow in Matlab too, but the implementation was improved. Furthermore, I doubt its actually impossible to vectorize, but since I have no idea what you are doing in this code it's hard to give you suggestions. \$\endgroup\$
– Winston Ewert
Commented Apr 30, 2014 at 0:52
\$\begingroup\$ If I want to explain my code in a simple word I am trying to fill a MxM matrix (improve) each value of matrix is computed as follows: improve[i,j] is sum of 16 times (for all the element in bb array) calculating my formula ( out_t(i,j) and out_f(i,j) ), and my formula try to compute expected value of pti and pfi (outter loop) for all possible range of ptj and pfj (inner loop). \$\endgroup\$
– Am1rr3zA
Commented Apr 30, 2014 at 1:02
3

\$\begingroup\$ @Am1rr3zA, the third parameter to linspace is a count, not a step size. If you want to use linspace, it should be np.linspace(0, 1, steps). But that's not quite what you were doing because your code skipped zero. You do need the [:, None] to make it work. I may well have not gotten it quite right, but hopefully it helps you see what you could do to speed up the poythn. \$\endgroup\$
– Winston Ewert
Commented Apr 30, 2014 at 2:59
1

\$\begingroup\$ @Am1rr3zA, as I noted, I may not have gotten it quite correct, but the general principle should work. If you fix my arange to refer to step_prob instead of step, I get the correct results at least for the first iteration. \$\endgroup\$
– Winston Ewert
Commented May 1, 2014 at 21:23

| Show 5 more comments

TheBlackCat · Accepted Answer · 2016-05-06 19:13:47Z

My observations, in terms of implementing this in Python (without regards to improve the algorithm, that will be later):

You are doing a LOT of unnecessary type conversions in Python that you aren't doing in MATLAB. Type conversions are always slow, so this is going to hurt your performance a lot.
You shouldn't do for k in range(len(bb)):, always iterate over the sequence directly. You can use unpacking to get the values without needing to index in this case.
You are iterating over range(1, foo+1), then subtracting one to put it back into Python 0-based indexing. This is an unnecessary mathematical operation, better to do range(foo), or in your case enumerate(range(1, foo+1)). Better yet, pre-compute your values, then enumerate over those.
numpy.unpackbits is the equivalent of de2bi and will be much faster.
Python has in-place operations, which will be faster. So a += 1 instead of a = a + 1.
Numpy sums work of flattened arrays by default. Further, summing is a method of numpy arrays, which is simpler and probably faster.
Numpy arrays are floats by default if created using ones or zeros, but there is a dtype argument that can set it to whatever you want.
Numpy has a ones_like and zeros_like to create an array of the same shape as another. By default it also creates one with the same dtype, but you can override that.
PEP8 is the recommended style for python code.

Now, in terms of improving the algorithm:

si and sj are not very big (on the order of tens of megabytes, tops). You can vectorize those and re-use the values. In fact, they are identical, so you only need one.
You not only never use the other values of out_t and out_f besides the current one, you overwrite them repeatedly. So these are better off as scalars.
You can vectorize many of the values for the two innermost loops.
If you put the third loop as the outer loop, you can vectorize out_t and out_f.

So here is my partially-vectorized version:

bb0 = np.arange(16, dtype='uint8')
bb = np.unpackbits(bb0[None], axis=0)[-4:, :].T

step_prob = 0.01
step = int(1/step_prob)

pxx = np.linspace(step_prob, 1, step)
pxxt = pxx[:, None]

sx = (1-pxx+pxxt)/2

improve = np.zeros((step,step))
for f4, f3, f2, f1 in bb:
    out_x = f1*pxx*pxxt + f2*pxx*(1-pxxt) + f3*(1-pxx)*pxxt + f4*(1-pxx)*(1-pxxt)
    for ii, (out_t, sxi) in enumerate(izip(out_x, sx)):
        for jj, (out_f, si) in enumerate(izip(out_x, sxi)):
            s = (1-out_f[:, None]+out_t) - np.maximum(si, sxi)
            temp = s*(s>0)*np.tril(np.ones_like(s))
            improve[ii, jj] = temp.sum()

It is orders of magnitude faster than your version.

\$\begingroup\$ your points all are on point \$\endgroup\$
– Am1rr3zA
Commented May 9, 2016 at 14:35 — Am1rr3zA, Commented May 9, 2016 at 14:35

Stack Exchange Network

Creation of MxM matrix: Python version much slower than Matlab version

2 Answers 2

Hot Network Questions

Creation of MxM matrix: Python version much slower than Matlab version

2 Answers 2

Related

Hot Network Questions