Paper
Width of Minima Reached by Stochastic Gradient Descent is Influenced by Learning Rate to Batch Size Ratio
No abstract available from the OpenAlex source record.
Authors: Stanislaw Jastrzębski · Zachary Kenton · Devansh Arpit · Nicolas Ballas · Asja Fischer · Yoshua Bengio · Amos Storkey
Topics
Relevant entities
People
Related coverage
Linked coverage will appear here.
Related events
Linked events will appear here.
Related discussions
Related discussion nodes will appear here.