Optimization

Convex Optimization

Simply minimizing commonly used loss functions (e.g. cross-entropy) on the training set is typically not sufficient to achieve satisfactory generalization
The training loss landscapes of today's models are commonly complex and non-convex, with a multiplicity of local and global minima, and with different global minima yielding models with different generalization abilities
SAM functions by seeking parameters that lie in the neighborhoods having uniformly low loss value (rather than parameters that only themselves have low loss value)