ADAM, or Adaptive Moment Estimation, is a stochastic optimization method introduced by Diederik P. Kingma and Jimmy Lei Ba.
They intended to combine the advantages of Adagrad’s handling of sparse gradients and RMSProp’s handling of non-stationary environments.