8
$\begingroup$

SGD is able to jump out of local minima that would otherwise trap BGD

I don't really understand the above statement. Could someone please provide a mathematical explanation for why SGD (Stochastic Gradient Descent) is able to escape local minima, while BGD (Batch Gradient Descent) can't?

P.S.

While searching online, I read that it has something to do with "oscillations" while taking steps towards the global minima. What's that?

$\endgroup$
6
  • 1
    $\begingroup$ This is a popular question. You can find a good explanation in the answer here. $\endgroup$ Commented May 31, 2020 at 10:18
  • $\begingroup$ Where did you read that SGD is able to escape local minima? $\endgroup$ Commented Jun 2, 2020 at 10:30
  • 1
    $\begingroup$ The statement above is from a MOOC on Coursera, while I found more details here - proceedings.mlr.press/v80/kleinberg18a/kleinberg18a.pdf $\endgroup$ Commented Jun 2, 2020 at 11:08
  • 3
    $\begingroup$ @cogito_ai I wouldn't say that SGD can't really escape local minima, but that it can get stuck in other (better?) minima. It's not like simulated annealing, where sometimes you perform a random action to escape local minima. However, it's true that there's some form of stochasticity in SGD that probably helps to explore more the search space. And that's what people probably mean by that sentence. See also ai.stackexchange.com/a/20380/2444. $\endgroup$ Commented Jun 2, 2020 at 15:38
  • 1
    $\begingroup$ Anyway, can you please provide the link to the exact Coursera's lecture that claims that? Also, if you want, later, I could read that paper and try to provide an answer based on it. And don't forget to tag me with @nbro next time, so that I see the comment ;) $\endgroup$ Commented Jun 2, 2020 at 15:41

1 Answer 1

0
$\begingroup$

SGD has greater mobility precisely due to its stochastic nature. Because the gradient can drastically change (or oscillate) due to random training items being picked, it might produce a gradient large enough to jump out the local minimum.

Take a look on the purple line on the illustration below (source):

Look how chaotic it is compared to the non stochastic ones. It is harder to keep a naughty child in place than a well-behaved one.

$\endgroup$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.