Skip to main content

Optimization

Convex Optimization

Optimizers

Stochastic Gradient Descent (SGD)

Adam

RMSProp

Sharpness-Aware Minimization (SAM)

  • Simply minimizing commonly used loss functions (e.g. cross-entropy) on the training set is typically not sufficient to achieve satisfactory generalization
  • The training loss landscapes of today's models are commonly complex and non-convex, with a multiplicity of local and global minima, and with different global minima yielding models with different generalization abilities
  • SAM functions by seeking parameters that lie in the neighborhoods having uniformly low loss value (rather than parameters that only themselves have low loss value)