While making a simple Machine Learning project, I've decided to rework a piece of list logic using numpy arrays. Just changed numerical incrementation in loops to numpy arithmetics. However, numpy implementation gives decision boundaries far from those of the loop implementations, behaves chaotically, and doesn't converge.
Complete Working Example can be found at Google Colab here: https://colab.research.google.com/drive/1zLy0oTidhm2lrwkASgT1lpsI9yfrBPvx?usp=sharing
It really bothers me. If it's a numpy precision error, on larger projects it might not be so obvious and thwart the results. If it's a conceptual error, I can't see it. This question most probably looks naive, but I genuinely want to learn what went wrong.
Consider the following list implementation of Perceptron training. The logic behind it is explained further down, but most importantly, see how loop num value increments for W
and b
are nothing special.
def perceptronStep(X, y, W, b, learn_rate = 0.01):
diff = []
for i in range(len(X)):
y_hat = prediction(X[i],W,b)[0]
# diff=0 when prediction is correct
# diff=1 or diff=-1 show direction W and b change
diff.append(y[i] - y_hat)
dif = diff[i]
W[0] += dif * X[i][0]*learn_rate
W[1] += dif * X[i][1]*learn_rate
b += dif * learn_rate
return W, b, diff
As I see it, the following numpy implementation should perfectly recreate how perceptronStep
behaves.
def np_perceptronStep(X, y, W, b, learn_rate = 0.01):
Y_hat = np.squeeze(prediction(X, W, b))
diff = y - Y_hat
sumX = np.sum(X * diff[..., np.newaxis], axis=0, keepdims=True)
W += learn_rate * sumX.T # W_np
b += learn_rate * np.sum(diff) # b_np
return W, b, diff
Yet it doesn't. W_np
and b_np
are different from W
and b
, by 1.0e-16
s from epoch 0 and by 10
s at the end of training. b_np
jumps all over the place, while b
shows that it should quickly settle down. Chaotic.
Prediction is obtained via step function.
def prediction(X, W, b):
Y_hat = np.matmul(X,W)+b
# step function
Y_hat[Y_hat >= 0] = 1
Y_hat[Y_hat < 0] = 0
return Y_hat
Context
I go through Intro to ML with PyTorch on Udacity. A crude trick is introduces before teaching the gradient descent. Consider a Perceptron holding a linear classifier Wx+b
, in this case w1x1 + w2x2 + b
, where two possible classes are 0
and 1
. Taking the difference between true and predicted class for point x'
to be c
, the Perceptron is updated like this: W + acx'
and b + ac
where a
is the learning rate. As simple as it gets.
I've tried manually ship-of-theseus transition from loops to np inside np_perceptronStep
, hoping that one thing was causing problems. I've tried different combinations of determining diff, W and b - either in a loop or with np. I've tracked how differences between W
and b
of two implementations change. They changed, but np_perceptronStep
never came close to the perceptronStep
results.
y_hat
appears to be using the initialW
for all points, when in actuality W[0] and W[1] are rewritten on each iteration.