SGD (Stochastic Gradient Descent) is a popular algorithm for training a wide
range of models in machine learning, including (linear) support vector machine, logistic regression. It is an iterative process.
During training one will typically like to see convergence to a best point.
For efficiency it is possible to calculate the gradient after a batch and not for every single training sample.
I Kereas the SGD optimization will take the following parameters :
- lr : learning rate
- momentum : momentum
- decay : decay of the learning rate over each update
- nesterov : true/false weather to apply Nesterov momentum