Paper

BitPruning: Learning Bitlengths for Aggressive and Accurate Quantization

BitPruning is a training method for minimizing inference bitlengths at any granularity while maintaining accuracy. BitPruning extends the meaning of fixed-point bitlenghts into the continuous domain by interpolating between the nearest two integers, enabling gradient descent to learn bitlengths together with other parameters. A novel regularizer penalizes large bitlength representations and can be modified to minimize other quantifiable criteria, such as number of operations or memory footprint. BitPruning learns thrifty representations while maintaining accuracy: With ImageNet, it produces an average per layer bitlength of 3.76 and 4.36 bits on ResNet18 and MobileNet V2 respectively, remaining within 0.5% of the base TOP-1 accuracy. Simple modifications of the BitPruning regularizer can be used to further reduce compute workload by up to 24%, as well as memory footprint in activation or weight-heavy tasks by up to 14% and 8% respectively.

2024 IEEE International Symposium on Circuits and Systems (ISCAS)Published 2024-05-19Paper link

Authors: Miloš Nikolić · Ghouthi Boukli Hacene · Ciaran Bannon · Alberto Delmas Lascorz · Matthieu Courbariaux · Omar Mohamed Awad · Isak Edo Vivancos · Yoshua Bengio · Vincent Gripon · Andreas Moshovos

Topics

Relevant entities

People

Related coverage

Linked coverage will appear here.

Related events

Linked events will appear here.

Related discussions

Related discussion nodes will appear here.