Trying to implement linear regression in python

Question

I am implementing linear regression in Python, and I think I am doing something wrong while converting matrix to numpy array, but cannot seem to figure it out. Any help will be appreciated.

I am loading data from a csv file that has 100 columns. y is the last column. I am not using col 1 and 2 for regression.

communities=np.genfromtxt("communities.csv", delimiter = ",", dtype=float)
xdata = communities[1:,2:99]
x = np.array([np.concatenate((v,[1]))for v in xdata])
y = communities[1:,99]

Function definition

def standRegress(xArr, yArr):
    xMat = mat(xArr); yMat = mat(yArr).T
    xTx = xMat.T*xMat
    if linalg.det(xTx)==0.0:
        print"singular matrix"
        return
    ws = xTx.I*(xMat.T*yMat)
    return ws

calling the function

w = standRegress(x,y)
xMat = mat(x) #shape(1994L,98L)
yMat = mat(y) #shape (1L, 1994L)
yhat = xMat*w #shape (1994L, 1L)

Next I am trying to calculate RMSE and this is where I am having problem

yMatT = yMat.T #shape(1994L, 1L)
err = yhat - yMatT #shape(1994L, 1L)
error = np.array(err)
total_error = np.dot(error,error)
rmse = np.sqrt(total_error/len(p))

I get an error while I am doing the dot product and thus not able to calculate rmse. I will appreciate if someone can help me find my mistake.

Error: 
 ---> 11 np.dot(error,error)
 12 #test = (error)**2
 13 #test.sum()/len(y)
 ValueError: matrices are not aligned

Can you edit your question and include the specific error message you're receiving? — Michael0x2a, Commented Oct 31, 2014 at 15:57
as you're using numpy, just wonder why if there is any particular reason you're not using linalg? — Anzel, Commented Oct 31, 2014 at 16:04
@Anzel, did not think of using linalg. Can you please guide how to use that. — nasia jaffri, Commented Oct 31, 2014 at 16:15
@Michael0x2a, I have edited the question. Please have a look now. — nasia jaffri, Commented Oct 31, 2014 at 16:24

Falko · Accepted Answer · 2014-10-31 17:12:49Z

1

I'm not quite sure what the last dot is supposed to do. But you can't multiple error with itself this way. dot does a matrix multiplication, thus the dimensions have to align.

See, e.g., the following example:

import numpy as np
A = np.ones((3, 4))
B = np.ones((3, 4))
print np.dot(A, B)

This yields the error ValueError: matrices are not aligned.

What is possible, however, is:

print np.dot(A.T, B)

Output:

[[ 3.  3.  3.  3.]
 [ 3.  3.  3.  3.]
 [ 3.  3.  3.  3.]
 [ 3.  3.  3.  3.]]

In your example error is just a column vector - but stored as a 2D array:

A = np.ones((3, 1))
B = np.ones((3, 1))
print np.dot(A, B)

Same error.

So you can either transpose one argument - as shown above - or extract one column as a 1D array:

print np.dot(A[:, 0], B[:, 0])

Output:

3.0

edited Oct 31, 2014 at 17:12

answered Oct 31, 2014 at 16:08

Falko

18k14 gold badges65 silver badges116 bronze badges

Yes you are right, but err is supposed to be 1994 rows, but only 1 column. I am not sure what am I doing wrong before the dot product.
– nasia jaffri
Commented Oct 31, 2014 at 16:26
@nasiajaffri: Oh, I see. I edited my answer accordingly.
– Falko
Commented Oct 31, 2014 at 17:14
Error: matrices are not aligned
– wwii
Commented Oct 31, 2014 at 18:12
Also - from np.info(np.dot) - ...Raises ------ ValueError If the last dimension of `a` is not the same size as the second-to-last dimension of `b`....
– wwii
Commented Oct 31, 2014 at 18:20

Add a comment |

Collectives™ on Stack Overflow

Trying to implement linear regression in python

1 Answer 1

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Related