MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
136<br />
6.6.2 Weight decay<br />
A more interesting option is to prune weights that seem to have little influence on the network<br />
operation. This is an operation that requires global knowledge of all the weights in the network,<br />
and, thus, is less supported biologically. The idea is that no single weight should grow too large,<br />
while keeping the total weight connections into a particular output neuron fairly constant. One of<br />
the simplest rule for weight decay was developed by Grossberg in 1968 and was of the form:<br />
dw<br />
dt<br />
ij<br />
= α yx − w<br />
(6.23)<br />
i j ij<br />
dw ij<br />
It is clear that the weights will be stable when 0<br />
dt<br />
= at the points where wij = αE( xi yj<br />
)<br />
One can show that, at stability, we must have α Cw = w and, thus, that w must be an<br />
eigenvector of the correlation matrix C.<br />
We will next consider two more sophisticated learning equations, which use weight decay.<br />
The instar rule where the decay term is gated by the input term<br />
In the discrete case, we have:<br />
dw<br />
dt<br />
ij<br />
{ }<br />
i ij j<br />
x<br />
j<br />
:<br />
= α y − w x<br />
(6.24)<br />
Δ w = α⋅x ⋅y −γ<br />
⋅w ⋅ y<br />
(6.25)<br />
ij i j ij j<br />
( )<br />
α = γ ⇒ Δ w = α⋅y ⋅ x − w<br />
(6.26)<br />
ij i i ij<br />
( )<br />
y = 1 ⇒ Δ w = α ⋅ x −w<br />
i ij i ij<br />
( ) (1 α) ( 1) α ( )<br />
⇒ w t = − ⋅w t− + ⋅x t<br />
ij ij i<br />
(6.27)<br />
The weight moves in the direction of the input.<br />
© A.G.Billard 2004 – Last Update March 2011