Implementation of gradient descent is very inefficient and does not work in all cases

Question

I am supposed to implement gradient descent for linear regression. Here is the implementation:

class SimpleLinearRegressionModel():

    def __init__(self, x, y, theta, alpha):
        self.x = x
        self.y = y
        self.theta = theta
        self.alpha = alpha

    '''
    Equation for the regression line. 
    input x_i (float) - single input feature
    @return corresponding model output (float)
    '''
    def h(self, x_i):
        return self.theta[0] + x_i * self.theta[1]
    '''
    Loss function measuring mean squared error of the regression line for a given training 
    set and model parameters. 
    @return MSE based on the current parameters (float)
    '''
    def J(self):
        m = len(self.y)
        return (1 / (m)) * np.sum((self.h(self.x) - self.y) ** 2)

    def get_gradient(self):
        m = len(self.y)
        return np.array([(1 / m) * np.sum(self.h(self.x) - self.y), (1 / m) * np.sum((self.h(self.x) - self.y) * self.x)])

    '''
    Update the model parameters (i.e. the two theta values) for one gradient descent step. 
    '''

  
    def gradient_descent_step(self):
        return self.theta - self.alpha * self.get_gradient()
    '''
    Run gradient descent to optimize the model parameters.
    @param threshold (float) - run gradient descent until the magnitude of the gradient is 
    below this value. 
    @return a list storing the value of the cost function after every step of gradient descent (float list)
    '''

    def run_gradient_descent(self, threshold=.01):
        cost_values = []
        while np.linalg.norm(self.get_gradient()) > threshold:
            self.theta = self.gradient_descent_step()
            cost_values.append(self.J())
        return cost_values

This works for a small dataset (25 elements) but when using a large one (20000 elements), it becomes impossibly slow. How can I optimize this? I've tried to vectorize all the functions, but J() and get_gradient() seem especially slow. I've also noticed when debugging while using the large dataset that the error is increasing as the algorithm runs, which definitely should not be happening.

cheersmate · Accepted Answer · 2023-01-27 07:45:33Z

A few things are obvious immediately:

You are running self.h(self.x) three times in every loop iteration, once should be enough. Try to store and re-use the intermediate values of your calculation, also for self.h(self.x) - self.y etc. You may need to combine J() and gradient_descent_step() for this (and change the ordering, i.e., save the loss before the update instead of afterwards).
Most likely, using np.mean(x) instead of np.sum(x) / len(x) is beneficial (at the very least, it simplifies your code).
You could just do self.theta -= self.alpha * self.get_gradient() in the gradient step to avoid having to assign self.theta outside the method.

Your convergence behavior (e.g., if the loss sometimes increases) will depend a lot on the choice of the learning rate (self.alpha), so play around with its value to see what happens.

Collectives™ on Stack Overflow

Implementation of gradient descent is very inefficient and does not work in all cases

1 Answer 1

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Related