0

I have recently started studying Neural Network and thought about writing my own Neural Network before using libraries like tensorflow or pytorch so that I understand deeply what happens inside the network. I developed the code based on math of Neural Networks. Now there is some strange behaviour about the network. First of all, The network sometimes performs really well, Like with iris data set of scikit learn and one small configuration, I was able to take the network to accuracy of 99.3%. But sometimes it performs too poorly. Also, In one problem I noticed that using ReLU activation function for first layer and then all rest sigmoid gave almost 63%accuracy but when I made the first layer sigmoid and second layer ReLU and rest all sigmoid then accuracy went to 93%. Now the main problem is regression which doesn't see to work at all. The model seem to output same thing for all inputs.

Here's how the network works, Its a simple code for a single layer. When we declare a layer, it takes 4 parameters -> number of inputs, number of neurons, activation function, differentiation of activation function. And now we can forward the layer with .forward(inputs) method. For back propagaton we have backward() method which takes d(Loss Function)/d(activation outputs) and learning rate. Also it automatically returns d(Loss Function)/d(activation outputs of previous layer) which can be put in previous layer to back propagate. Also, since for softmax activation, the differentiation is little complex, I cheated a little and assumed that softmax activation function will be used in last layer only if at all used and hence for neural layer with softmax function we don't need to pass anything, just learning rate and rest it will calculate(Due to this you can ignore softmax_diff function, its useless). With all that context, here is the code

import numpy as np

def ReLU(z):
  return np.maximum(0,z)
def ReLU_diff(a):
  return (a>0).astype(int)
def sigmoid(z):
  z = np.clip(z, -500, 500)
  return 1 / (1 + np.exp(-z))
def sigmoid_diff(a):
  return a * (1 - a)
def tanh(z):
  return np.tanh(z)
def tanh_diff(a):
  return 1 - a ** 2
def softmax(z):
  z = z - np.max(z, axis=1, keepdims=True)
  return np.exp(z) / np.sum(np.exp(z), axis=1, keepdims=True)
def softmax_diff(a):
  return a * (1 - a)
def linear(z):
  return z
def linear_diff(a):
  return np.ones(a.shape)



class Layer():
  def __init__(self, numberOfInputs,numberOfNeurons, activationFunction,activationFunctionDiff):
    self.numberOfInputs = numberOfInputs
    self.numberOfNeurons = numberOfNeurons
    self.activationFunction = activationFunction
    self.activationFunctionDiff = activationFunctionDiff
    self.weights = np.random.randn(numberOfNeurons, numberOfInputs) * np.sqrt(2.0/numberOfInputs)  # He initialization
    self.biases = np.zeros((numberOfNeurons,1))  # Initialize biases to zero
  def forward(self, inputs):
    self.inputs = inputs
    self.z = np.dot(inputs, self.weights.T) + self.biases.T
    self.outputs = self.activationFunction(self.z)
    return self.outputs
  def backward(self, dl_da, learning_rate,outPuts=None):

    if self.activationFunction == softmax:
      dz = dz = self.outputs - outPuts
    else:
      da_dz = self.activationFunctionDiff(self.outputs)
      dz = dl_da * da_dz
    dl_da_prev = np.dot(dz,self.weights)
    self.backParam = dl_da_prev
    dw = np.zeros((self.numberOfNeurons, self.numberOfInputs))
    for i in range(dz.shape[0]):
      dws = np.dot(dz[i].reshape(self.numberOfNeurons,1), self.inputs[i].reshape(1,self.numberOfInputs))
      dw += dws
    dw = dw / dz.shape[0]
    db = np.sum(dz, axis=0, keepdims=True).T / dz.shape[0]
    self.weights = self.weights - learning_rate * dw
    self.biases = self.biases - learning_rate * db
    return dl_da_prev

Now here is how I use this code

from sklearn.datasets import fetch_california_housing
data = fetch_california_housing()
X, y = data.data, data.target
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
layer1 = Layer(8,8,ReLU,ReLU_diff)
layer2 = Layer(8,1,linear,linear_diff)
y_t = np.array(y_train).reshape(len(y_train),1)
for i in range(1000):
  outputs = layer1.forward(X_train)
  outputs = layer2.forward(outputs)
  dl = -2*(y_t-outputs)
  backParam = layer2.backward(dl,0.001)
  backParam = layer1.backward(backParam,0.001)

So the final output is in layer2.outputs which is same as outputs. But first of all the output is strongly incorrect and second its same of all inputs. Why is this happening?

I have been thinking about it for very long and have tried help of ChatGPT and other models. I don't get where is the error? Thanks for the help!

1
  • One issue is that the different features have very different ranges so the data needs to be normalized. PS: there is probably no need to go to 1000 epochs.
    – rehaqds
    Commented Jan 4 at 17:03

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.