MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA MACHINE LEARNING TECHNIQUES - LASA
140 where g is an arbitrary function dependent on the weight w. One can consider two cases: 1) Multiplicative constraints d wt ( ) = Cwt ( ) − g ( wt ( )) (6.34) dt ( ( )) γ ( ) ( ) ( ) g w t = w t ⋅ w t (6.35) where the decay term is multiplied by the weight. In this case, the decay term can be viewed as a feedback term, which limits the rate of growth of each weight. The bigger the weight, the bigger the decay. 2) Substractive constraint d wt ( ) = Cwt ( ) − γ ( wt ( )) wt ( ) (6.36) dt where the weight decay is proportional to the weight and multiplies the weight. 6.7 Anti-Hebbian learning If inputs to a neural network are correlated, then each contains information about the other. In I x; y > 0. other words, there is redundancy in the information conveyed by the input and ( ) Anti-Hebbian learning is designed to decorrelate inputs. The ultimate goal is to maximize the information that can be processed by the network. The less redundancy, the more information and the less number of output nodes required for transferring this information. Anti-Hebbian learning is also known as lateral inhibition, as the anti-learning occurs within members of the same layer. The basic model is defined by: Δ w =− α < y ⋅ y > (6.37) ij i j where the angle brackets indicate the ensemble average of values taken over all training patterns. Note that this is a tremendous limitation of the system, as it forces global and off-line learning. © A.G.Billard 2004 – Last Update March 2011
141 If y i and y are highly correlated, then, the weights between them will grow to a large negative j value and each will tend to turn the other off. Indeed, we have: ( Δwij →0 ) ⇒ ( < yi , yj > → 0) The weight change stops when the two outputs are decorrelated. At this stage, the algorithm converges. Note that there is no need for weight decay or renormalizing on anti-Hebbian weights, as they are automatically self-limiting. 6.7.1 Foldiak’s models Foldiak has suggested several models combining anti-Hebbian learning and weight decay. Here, we will consider the first 2 models as examples of solely anti-Hebbian learning. The first model is shown in Figure 6-12 and has anti-Hebbian connections between the output neurons. Figure 6-12: Foldiak's 1st model The equations, which define its dynamical behavior, are with learning rule y x w y i i ij j j= 1 n = +∑ (6.38) Δ w =−α ⋅y ⋅y for i≠ j (6.39) ij i j In matrix terms, we have And so, y= x+ W⋅y ( ) −1 y= I −W ⋅x (6.40) Therefore, we can view the system as a transformation, T, from the input vector x to the output y given by: ( ) 1 − y= T⋅ x= I−W ⋅ x (6.41) Now, the matrix W must be symmetric. It has only non-zero non-diagonal terms, i.e. if we consider only a two input, two output net as in the diagram. W ⎛⎛0 w⎞⎞ = ⎜⎜ ⎟⎟ ⎝⎝ w 0⎠⎠ © A.G.Billard 2004 – Last Update March 2011
- Page 89 and 90: 89 J j ( µ 1,...., µ K) = ∑∑
- Page 91 and 92: 91 A simple pattern recognition alg
- Page 93 and 94: 93 ( ) ( , ) f x = sign w x + b (5.
- Page 95 and 96: 95 Figure 5-6: A binary classificat
- Page 97 and 98: 97 where N is the number of support
- Page 99 and 100: 99 5.8 Support Vector Regression In
- Page 101 and 102: 101 The optimization problem given
- Page 103 and 104: 103 Note that since we never have t
- Page 105 and 106: 105 Figure 5-13: Effect of the kern
- Page 107 and 108: 107 To better understand the effect
- Page 109 and 110: 109 5.9 Gaussian Process Regression
- Page 111 and 112: 111 One can then use the above expr
- Page 113 and 114: 113 5.9.2 Equivalence of Gaussian P
- Page 115 and 116: 115 5.9.3 Curse of dimensionality,
- Page 117 and 118: 117 The weight w determines the slo
- Page 119 and 120: 119 Figure 5-21: Example of success
- Page 121 and 122: 121 • its performance tends to de
- Page 123 and 124: 123 neurons. Furthermore, they lear
- Page 125 and 126: 125 The sigmoid f x ( x) ( ) = tanh
- Page 127 and 128: 127 6.3.2 Information Theory and th
- Page 129 and 130: 129 ( R) ⎛⎛det ⎞⎞ I( x, y)
- Page 131 and 132: 131 y= ∑ w x ) of Because of the
- Page 133 and 134: 133 6.5 Willshaw net David Willshaw
- Page 135 and 136: 135 6.6.1 Weights bounds One of the
- Page 137 and 138: 137 Figure 6-11: The weight vector
- Page 139: 139 6.6.4 Oja’s one Neuron Model
- Page 143 and 144: 143 Foldiak’s second model allows
- Page 145 and 146: 145 ∂ ∂ J 1 = fy − 1 2 λ 1yf
- Page 147 and 148: 147 6.8 The Self-Organizing Map (SO
- Page 149 and 150: 149 6. Decrease the size of the nei
- Page 151 and 152: 151 the resulting distribution is a
- Page 153 and 154: 153 To simplify the description of
- Page 155 and 156: 155 C µν −1 where ( ) is the µ
- Page 157 and 158: 157 The continuous time Hopfield ne
- Page 159 and 160: 159 ∂f If the slope is negative,
- Page 161 and 162: 161 7.2 Hidden Markov Models Hidden
- Page 163 and 164: 163 Figure 7-2: Schematic illustrat
- Page 165 and 166: 165 these two quantities to compute
- Page 167 and 168: 167 7.2.4 Decoding an HMM There are
- Page 169 and 170: 169 7.2.5 Further Readings Rabiner,
- Page 171 and 172: 171 7.3.1 Principle In reinforcemen
- Page 173 and 174: 173 general, acting to maximize imm
- Page 175 and 176: 175 of reinforcement learning makes
- Page 177 and 178: 177 8 Genetic Algorithms We conclud
- Page 179 and 180: 179 However, you must define geneti
- Page 181 and 182: 181 Often the crossover operator an
- Page 183 and 184: 183 ( A λI) x 0 − = (8.5) where
- Page 185 and 186: 185 Joint probability: The joint pr
- Page 187 and 188: 187 The two most classical distribu
- Page 189 and 190: 189 9.2.7 Statistical Independence
140<br />
where g is an arbitrary function dependent on the weight w.<br />
One can consider two cases:<br />
1) Multiplicative constraints<br />
d wt ( ) = Cwt ( ) − g ( wt ( ))<br />
(6.34)<br />
dt<br />
( ( )) γ ( )<br />
( ) ( )<br />
g w t = w t ⋅ w t<br />
(6.35)<br />
where the decay term is multiplied by the weight. In this case, the decay term can<br />
be viewed as a feedback term, which limits the rate of growth of each weight.<br />
The bigger the weight, the bigger the decay.<br />
2) Substractive constraint<br />
d wt ( ) = Cwt ( ) − γ ( wt ( )) wt ( )<br />
(6.36)<br />
dt<br />
where the weight decay is proportional to the weight and multiplies the weight.<br />
6.7 Anti-Hebbian learning<br />
If inputs to a neural network are correlated, then each contains information about the other. In<br />
I x; y > 0.<br />
other words, there is redundancy in the information conveyed by the input and ( )<br />
Anti-Hebbian learning is designed to decorrelate inputs. The ultimate goal is to maximize the<br />
information that can be processed by the network. The less redundancy, the more information<br />
and the less number of output nodes required for transferring this information.<br />
Anti-Hebbian learning is also known as lateral inhibition, as the anti-learning occurs within<br />
members of the same layer. The basic model is defined by:<br />
Δ w =− α < y ⋅ y > (6.37)<br />
ij i j<br />
where the angle brackets indicate the ensemble average of values taken over all training<br />
patterns. Note that this is a tremendous limitation of the system, as it forces global and off-line<br />
learning.<br />
© A.G.Billard 2004 – Last Update March 2011