Paper

Gradient-Based Optimization of Hyperparameters

Many machine learning algorithms can be formulated as the minimization of a training criterion that involves a hyperparameter. This hyperparameter is usually chosen by trial and error with a model selection criterion. In this article we present a methodology to optimize several hyperparameters, based on the computation of the gradient of a model selection criterion with respect to the hyperparameters. In the case of a quadratic training criterion, the gradient of the selection criterion with respect to the hyperparameters is efficiently computed by backpropagating through a Cholesky decomposition. In the more general case, we show that the implicit function theorem can be used to derive a formula for the hyperparameter gradient involving second derivatives of the training criterion.

Neural ComputationPublished 2000-08-01Paper link

Authors: Yoshua Bengio

Topics

General AI

Relevant entities

People

openalex-author

Yoshua Bengio

Computer Scientist

Related coverage

Linked coverage will appear here.

Related events

Linked events will appear here.

Related discussions

Related discussion nodes will appear here.