article

SGD

SGD (Stochastic Gradient Descent) is a popular algorithm for training a wide range of models in machine learning, including (linear) support vector machine, logistic regression. It is an iterative process.

During training one will typically like to see convergence to a best point.

For efficiency it is possible to calculate the gradient after a batch and not for every single training sample.

I Kereas the SGD optimization will take the following parameters :

lr : learning rate
momentum : momentum
decay : decay of the learning rate over each update
nesterov : true/false weather to apply Nesterov momentum