My neural network for MNIST digit recognition learns for one epoch and then stops learning

Question

I am coding an MNIST digit recognition neural network. I thought I was finished but when I run the program to train the MNIST the accurcacy after each epoch is stable. I use MSE as my cost function and tanh(x) as my activation function. The learning rate is currently set to 0.1.

Here is the accuracy for the first couple of epochs with the first being pre-training:

8.35

9.8

My functions are these:

tanh(z): Takes a vector as input and outputs the activate vector

tanhDerivative(z): Takes a vector as input and calculates the gradient vector

feedforward(input, stop): calculates the output. Stop means that it is able to stop before the last layer to calculate the activations of any layer

feedforward2(input, stop): calculates the output of the function but the results outputted does not have any activation function

MSE(input, desiredOutput): calculates the MSE

transformer(label): takes a label such as 0 and outputs [[1],[-1],[-1],[-1],[-1],[-1],[-1],[-1],[-1],[-1]]

The following 2 functions are where I suspect the error lies:

#The actual training. Backpropagation is going to be another algorithm
    def train(self, dataLocation, learnRate, batchSize=100):
        self.bias_updates = self.bias_templates #This will contain the updates required for the biases
        self.weight_updates = self.weight_templates #This will contain the updates required for the weights
        file = np.loadtxt(dataLocation, delimiter=",", dtype="float128")
        count = 0
        print("Starting training")
        for row in file:
            count += 1
            data = []
            for item in row:
                data.append([item/255]) #This takes an array like [1,2,3,4] and makes it [[1],[2],[3],[4]] which is necessary for this program
            desired = self.Transformer(data.pop(0)) #Look at transformer method to understand
            self.backpropagation(data, desired)
            if count % 100 == 0:
                for n in range(0,len(self.bias_updates)):
                    self.bias_updates[n] *= learnRate
                    self.weight_updates[n] *= learnRate
                    self.biases[n] -= self.bias_updates[n]
                    self.weights[n] -= self.weight_updates[n]
                self.bias_updates = self.bias_templates
                self.weight_updates = self.weight_templates
                print(count//100)

    #The heart of the training algorithm. BACKPROPAGATION
    #NOTE: Learning rate is only applied in train()
    def backpropagation(self, input, desiredOutput):
        SigmoidLastLayerActivations = np.array(self.feedforward(input, self.size-1))
        LastLayerActivation = np.array(self.feedforward2(input, self.size-1))
        δ = 2 * (SigmoidLastLayerActivations-desiredOutput) * self.tanhDerivative(LastLayerActivation) #This first value of the delta is just the standard ∂C/∂z(L)
        self.bias_updates[-1] += δ #The update for the biases in the last layer
        self.weight_updates[-1] += np.matmul(δ, np.transpose(self.feedforward(input, self.size-2))) #The update for the weights in the last layer

        for i in range(len(self.weight_templates)-2 ,-1 , -1):
           requiredWeights = np.transpose(np.array(self.weights[i+1])) #This is the required weight matrix from the formulas
           LayerActivations = np.array(self.feedforward2(input, i+1)) #This is the z thing from the formulas
           SigmoidLayerActivations = np.transpose(np.array(self.feedforward(input,i))) #Look at formula
           
           δ = np.matmul(requiredWeights, δ) * self.tanhDerivative(LayerActivations)
           self.bias_updates[i] += δ
           otherVariable = np.matmul(δ, SigmoidLayerActivations) #This is the variable that contains the update required for the weight updates instead of the 
           self.weight_updates[i] += otherVariable

I apologise if this looks confusing. Any help as to why the accuracy always converges to 9.8 would be very good. If any other function is needed to find the error, please ask.

The reason some of the variable contain sigmoid as I originally used sigmoid activation function but changed to see whether this made an difference.

Zoltán · Accepted Answer · 2026-01-25 16:19:42Z

9.8% is a huge clue, it’s almost exactly the proportion of “0” digits in MNIST. It very often means “the model is effectively always predicting 0”, or your training labels are being turned into 0.

This snippet in your code:

for item in row:
    data.append([item/255])
desired = self.Transformer(data.pop(0))

...divides everything by 255, including the label in row[0]. Thus, data.pop(0) is not 7, it’s [7/255] (a nested list containing a small float like 0.027...).

If Transformer() does anything like int(label) or uses that value as an index, it will become 0 almost every time, so your desired output becomes “class 0” for every training example, and the network learns to output 0 for everything.

Don’t normalize the label, and don’t wrap it in a list. A clean version looks like:

row = row.astype(np.float32)

label = int(row[0])                  # keep as 0..9
x = (row[1:] / 255.0).reshape(-1, 1) # pixels only

desired = self.Transformer(label)
self.backpropagation(x, desired)

If you’re using tanh everywhere and your targets are -1/+1, also consider mapping inputs to [-1, 1] instead of [0, 1]:

x = (row[1:] / 255.0) * 2.0 - 1.0
x = x.reshape(-1, 1)

Sorry, I forgot to add the code. It is def Transformer(self, label): desiredOuput = [[0],[0],[0],[0],[0],[0],[0],[0],[0],[0]] desiredOuput[int(label[0]*255)] = [1] return desiredOuput
Also I changed back to sigmoid, as I only changed to tanh(z) to see whether it made any difference.

Collectives™ on Stack Overflow

My neural network for MNIST digit recognition learns for one epoch and then stops learning

1 Answer 1

3 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Related