ADAM is a method for stochastic optimization, a process that’s important in deep learning and machine learning. The recursive nature lends itself well to solving linear systems that have noisy data and approximating extreme values of functions that can only be estimated by noisy observations.
The ADAM algorithm is simple to implement, computationally efficient, requires little memory, and is appropriate for situations with large data sets and parameters.
ADAM combines two stochastic gradient descent approaches, Adaptive Gradients, and Root Mean Square Propagation. Instead of using the entire data set to calculate the actual gradient, this optimization algorithm uses a randomly selected data subset to create a stochastic approximation.