Paper

Using an autoencoder with deformable templates to discover features for automated speech recognition

In this paper we show how we can discover non-linear features of frames of spectrograms using a novel autoencoder. The autoencoder uses a neural network encoder that predicts how a set of prototypes called templates need to be transformed to reconstruct the data, and a decoder that is a function that performs this operation of transforming prototypes and reconstructing the input. We demonstrate this method on spectrograms from the TIMIT database. The features are used in a Deep Neural Network Hidden Markov Model (DNN-HMM) hybrid system for automatic speech recognition. On the TIMIT monophone recognition task we were able to achieve gains of 0.5% over Mel log spectra, by augmenting traditional the spectra with the predicted transformation parameters. Further, using the recently discovered ‘dropout’ training, we were able to achieve a phone error rate (PER) of 17.9% on the dev set and 19.5% on the test set, which, to our knowledge is the best reported number on this task using a hybrid system.

Interspeech 2013Published 2013-08-25Paper link

Authors: Navdeep Jaitly · Geoffrey E. Hinton

Topics

Relevant entities

People

Related coverage

Linked coverage will appear here.

Related events

Linked events will appear here.

Related discussions

Related discussion nodes will appear here.