Paper

Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies

Introduction Recurrent networks (crossreference Chapter 12) can, in principle, use their feedback connections to store representations of recent input events in the form of activations. The most widely used algorithms for learning what to put in short-term memory, however, take too much time to be feasible or do not work well at all, especially when minimal time lags between inputs and corresponding teacher signals are long. Although theoretically fascinating, they do not provide clear practical advantages over, say, backprop in feedforward networks with limited time windows (see crossreference Chapters 11 and 12). With conventional &quot;algorithms based on the computation of the complete gradient&quot;, such as &quot;Back-Propagation Through Time&quot; (BPTT, e.g., [22, 27, 26]) or &quot;Real-Time Recurrent Learning&quot; (RTRL, e.g., [21]) error signals &quot;flowing backwards in time&quot; tend to either (1) blow up or (2) vanish: the temporal evolution of the backpropagated error exponentially depends on the size of th

ftp://ftp.idsia.ch/pub/juergen/ch7.ps.gzPublished 2001-01-01Paper link

Authors: Sepp Hochreiter · Yoshua Bengio · Paolo Frasconi · Jürgen Schmidhuber

Topics

Relevant entities

People

openalex-author

Yoshua Bengio

Computer Scientist

Related coverage

Linked coverage will appear here.

Related events

Linked events will appear here.

Related discussions

Related discussion nodes will appear here.