0

I am supposed to implement gradient descent for linear regression. Here is the implementation:

class SimpleLinearRegressionModel():

    def __init__(self, x, y, theta, alpha):
        self.x = x
        self.y = y
        self.theta = theta
        self.alpha = alpha

    '''
    Equation for the regression line. 
    input x_i (float) - single input feature
    @return corresponding model output (float)
    '''
    def h(self, x_i):
        return self.theta[0] + x_i * self.theta[1]
    '''
    Loss function measuring mean squared error of the regression line for a given training 
    set and model parameters. 
    @return MSE based on the current parameters (float)
    '''
    def J(self):
        m = len(self.y)
        return (1 / (m)) * np.sum((self.h(self.x) - self.y) ** 2)

    def get_gradient(self):
        m = len(self.y)
        return np.array([(1 / m) * np.sum(self.h(self.x) - self.y), (1 / m) * np.sum((self.h(self.x) - self.y) * self.x)])

    '''
    Update the model parameters (i.e. the two theta values) for one gradient descent step. 
    '''

  
    def gradient_descent_step(self):
        return self.theta - self.alpha * self.get_gradient()
    '''
    Run gradient descent to optimize the model parameters.
    @param threshold (float) - run gradient descent until the magnitude of the gradient is 
    below this value. 
    @return a list storing the value of the cost function after every step of gradient descent (float list)
    '''

    def run_gradient_descent(self, threshold=.01):
        cost_values = []
        while np.linalg.norm(self.get_gradient()) > threshold:
            self.theta = self.gradient_descent_step()
            cost_values.append(self.J())
        return cost_values

This works for a small dataset (25 elements) but when using a large one (20000 elements), it becomes impossibly slow. How can I optimize this? I've tried to vectorize all the functions, but J() and get_gradient() seem especially slow. I've also noticed when debugging while using the large dataset that the error is increasing as the algorithm runs, which definitely should not be happening.

1 Answer 1

0

A few things are obvious immediately:

  1. You are running self.h(self.x) three times in every loop iteration, once should be enough. Try to store and re-use the intermediate values of your calculation, also for self.h(self.x) - self.y etc. You may need to combine J() and gradient_descent_step() for this (and change the ordering, i.e., save the loss before the update instead of afterwards).

  2. Most likely, using np.mean(x) instead of np.sum(x) / len(x) is beneficial (at the very least, it simplifies your code).

  3. You could just do self.theta -= self.alpha * self.get_gradient() in the gradient step to avoid having to assign self.theta outside the method.

Your convergence behavior (e.g., if the loss sometimes increases) will depend a lot on the choice of the learning rate (self.alpha), so play around with its value to see what happens.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.