I am implementing linear regression in Python, and I think I am doing something wrong while converting matrix to numpy array, but cannot seem to figure it out. Any help will be appreciated.
I am loading data from a csv file that has 100 columns. y is the last column. I am not using col 1 and 2 for regression.
communities=np.genfromtxt("communities.csv", delimiter = ",", dtype=float)
xdata = communities[1:,2:99]
x = np.array([np.concatenate((v,[1]))for v in xdata])
y = communities[1:,99]
Function definition
def standRegress(xArr, yArr):
xMat = mat(xArr); yMat = mat(yArr).T
xTx = xMat.T*xMat
if linalg.det(xTx)==0.0:
print"singular matrix"
return
ws = xTx.I*(xMat.T*yMat)
return ws
calling the function
w = standRegress(x,y)
xMat = mat(x) #shape(1994L,98L)
yMat = mat(y) #shape (1L, 1994L)
yhat = xMat*w #shape (1994L, 1L)
Next I am trying to calculate RMSE and this is where I am having problem
yMatT = yMat.T #shape(1994L, 1L)
err = yhat - yMatT #shape(1994L, 1L)
error = np.array(err)
total_error = np.dot(error,error)
rmse = np.sqrt(total_error/len(p))
I get an error while I am doing the dot product and thus not able to calculate rmse. I will appreciate if someone can help me find my mistake.
Error:
---> 11 np.dot(error,error)
12 #test = (error)**2
13 #test.sum()/len(y)
ValueError: matrices are not aligned
numpy
, just wonder why if there is any particular reason you're not usinglinalg
?