Search Results


Momentum is commonly understood as a variation of stochastic gradient descent, but with one important difference: stochastic gradient descent can unnecessarily oscillate, and doesn’t accelerate based on the shape of the curve.

In contrast, momentum can dampen oscillations and accelerate convergence.

Momentum was originally proposed in 1964 by Boris T. Polyak.