Paper

Width of Minima Reached by Stochastic Gradient Descent is Influenced by Learning Rate to Batch Size Ratio

No abstract available from the OpenAlex source record.

Lecture Notes in Computer SciencePublished 2018-01-01Paper linkPDF

Authors: Stanislaw Jastrzębski · Zachary Kenton · Devansh Arpit · Nicolas Ballas · Asja Fischer · Yoshua Bengio · Amos Storkey

Topics

Relevant entities

People

Related coverage

Linked coverage will appear here.

Related events

Linked events will appear here.

Related discussions

Related discussion nodes will appear here.