Fundamentals of Statistics contains material of various lectures and courses of H. Lohninger on statistics, data analysis and here for more.

ANN - Recurrent Networks

Networks with feedback loops belong to the group of recurrent networks. There, unit activations are not only delayed while being fed forward through the network, but are also delayed and fed back to preceding layers. This way, information can cycle in the network. At least theoretically, this allows an unlimited number of past activations to be taken into account. Practically, the influence of past inputs decays rapidly, because it is merged with new ones. The speed of decay can be tuned, but after a few time steps, past information cannot be recognized anymore. Recurrent networks should be trained by some training algorithm propagating the error back through time. This is the consequence of a sequence of past inputs having an influence on later outputs. This technique can be visualized by unfolding the network in time [Hertz et al., 1991]. It is used by the BPTT (Back Propagation Through Time) algorithm. Since the effort of propagating the error back through time is quite high, the temporal aspect is often ignored during training.

Simple Recurrent Networks:

A typical example of a simple recurrent network is the Elman network. It keeps a copy of the hidden layer for the next updating step. This copy is then used together with the new input. Since the hidden layer pre-structures the input for the output, it is a valuable source of information. This explains its success with many forecasting tasks. Its architecture is shown in the following figure:

Elman Network
Elman Network

Another example of a simple recurrent network is the Jordan network. There, the output layer is fed back to a memory layer, where the unit activations of the output layer are merged with those of the preceding activations of the memory layer. The Jordan network is displayed in the following figure:

Jordan Network Figure
Jordan Network


Merging can be done by weighting and adding the unit activations. For instance, the activation of a unit in a memory layer at can integrate the output ot-1 as follows:

at := 0.5 at-1 + 0.5 ot-1.

Of course, these weights can be altered. The memory layer can also be regarded as the state of the network. Moreover, multiple techniques for handling sequential information, such as various types of time delays, feedback loops, and memory layers, can be combined to form more powerful temporal neural networks. The strength of such networks is that they combine memories of different length. However, large networks have a large number of degrees of freedom, which makes their training difficult.

Fully Recurrent Network:

The Fully Recurrent Network (FRN), which is presented by Williams et al. , belongs to the group of recurrent networks because it contains several feedback loops. However, it differs from most other approaches. The layers are arranged differently, and a special training algorithm is available: the RTRL algorithm (Real Time Recurrent Learning Algorithm). It traces all activations back through time, which results in very good solutions. This algorithm is non-local in that single units have access to the whole network. This conflicts with the idea that the units receive input solely via the incoming connections. As a result, the tasks of the units cannot be parallelized. Due to the enormous effort and processing time required by the RTRL algorithm, the tested Fully Recurrent Networks are usually rather small and hardly applicable to real-world tasks.