MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA MACHINE LEARNING TECHNIQUES - LASA
124 6.3 Perceptron The earliest neural network models go back to 1940 with the McCulloch & Pitts perceptron. The perceptron consists of a simple threshold unit taking binary inputs X = { x x } r 1 ,..., n . Its output y is the product of its entries multiplied by a (1-dim) weight matrix W, modulated by a transfer function or an activity function f. n ⎛⎛ ⎞⎞ y= f ⎜⎜ w ⋅ x = f W⋅X ⎝⎝ i= 0 ⎠⎠ ∑ i i⎟⎟ ( ) (6.1) where wxis the bias and is generally negative. 0 0 Figure 6-1: Perceptron with constant bias Classical transfer functions are: The linear function f( x) The step function = x ⎧⎧1 if x ≥ 0 f( x) = ⎨⎨ ⎩⎩0 if x < 0 © A.G.Billard 2004 – Last Update March 2011
125 The sigmoid f x ( x) ( ) = tanh ∈[ − 1,1] Or its positive form: 1 f( x) = ∈[0,1] Dx 1+ e − ⋅ , where D defines the slope of the function 2 ( ) f( x) = e −ax Radial Basis Functions A radial basis function is simply a Gaussian. The sigmoid is probably the most popular activation function, as it is differentiable; this is very important once one attempts to prove the convergence of the system. Figure 6-2: The XOR problem is unsolvable using a perceptron and a sigmoid activation function (left). However, by chosing a Radial Basis Function (RBF) it is possible to obtain a working solution (right). Note that the parameter a of the RBF kernel needs to be suitably adapted to the size of the data. [DEMOS\CLASSIFICATION\MLP-XOR.ML] Exercise: Can you find suitable weights that would make the perceptron behave like an ANDgate, NOR-gate and NAND-gate? © A.G.Billard 2004 – Last Update March 2011
- Page 73 and 74: 73 5 Kernel Methods These lecture n
- Page 75 and 76: 75 The kernel k provides a metric o
- Page 77 and 78: 77 M 1 T v = ∑ x ( x ) v M λ i j
- Page 79 and 80: 79 1 M The solutions to the dual ei
- Page 81 and 82: 81 5.4 Kernel CCA The linear versio
- Page 83 and 84: 83 additional ridge parameter induc
- Page 85 and 86: 85 Figure 5-3: TOP: Marginal (left)
- Page 87 and 88: 87 statistical independence. We def
- Page 89 and 90: 89 J j ( µ 1,...., µ K) = ∑∑
- Page 91 and 92: 91 A simple pattern recognition alg
- Page 93 and 94: 93 ( ) ( , ) f x = sign w x + b (5.
- Page 95 and 96: 95 Figure 5-6: A binary classificat
- Page 97 and 98: 97 where N is the number of support
- Page 99 and 100: 99 5.8 Support Vector Regression In
- Page 101 and 102: 101 The optimization problem given
- Page 103 and 104: 103 Note that since we never have t
- Page 105 and 106: 105 Figure 5-13: Effect of the kern
- Page 107 and 108: 107 To better understand the effect
- Page 109 and 110: 109 5.9 Gaussian Process Regression
- Page 111 and 112: 111 One can then use the above expr
- Page 113 and 114: 113 5.9.2 Equivalence of Gaussian P
- Page 115 and 116: 115 5.9.3 Curse of dimensionality,
- Page 117 and 118: 117 The weight w determines the slo
- Page 119 and 120: 119 Figure 5-21: Example of success
- Page 121 and 122: 121 • its performance tends to de
- Page 123: 123 neurons. Furthermore, they lear
- Page 127 and 128: 127 6.3.2 Information Theory and th
- Page 129 and 130: 129 ( R) ⎛⎛det ⎞⎞ I( x, y)
- Page 131 and 132: 131 y= ∑ w x ) of Because of the
- Page 133 and 134: 133 6.5 Willshaw net David Willshaw
- Page 135 and 136: 135 6.6.1 Weights bounds One of the
- Page 137 and 138: 137 Figure 6-11: The weight vector
- Page 139 and 140: 139 6.6.4 Oja’s one Neuron Model
- Page 141 and 142: 141 If y i and y are highly correla
- Page 143 and 144: 143 Foldiak’s second model allows
- Page 145 and 146: 145 ∂ ∂ J 1 = fy − 1 2 λ 1yf
- Page 147 and 148: 147 6.8 The Self-Organizing Map (SO
- Page 149 and 150: 149 6. Decrease the size of the nei
- Page 151 and 152: 151 the resulting distribution is a
- Page 153 and 154: 153 To simplify the description of
- Page 155 and 156: 155 C µν −1 where ( ) is the µ
- Page 157 and 158: 157 The continuous time Hopfield ne
- Page 159 and 160: 159 ∂f If the slope is negative,
- Page 161 and 162: 161 7.2 Hidden Markov Models Hidden
- Page 163 and 164: 163 Figure 7-2: Schematic illustrat
- Page 165 and 166: 165 these two quantities to compute
- Page 167 and 168: 167 7.2.4 Decoding an HMM There are
- Page 169 and 170: 169 7.2.5 Further Readings Rabiner,
- Page 171 and 172: 171 7.3.1 Principle In reinforcemen
- Page 173 and 174: 173 general, acting to maximize imm
125<br />
The sigmoid f x ( x)<br />
( ) = tanh ∈[ − 1,1]<br />
Or its positive form:<br />
1<br />
f( x) = ∈[0,1]<br />
Dx<br />
1+<br />
e − ⋅<br />
, where D defines the slope of the function<br />
2<br />
( )<br />
f( x)<br />
= e −ax<br />
Radial Basis Functions<br />
A radial basis function is simply a Gaussian.<br />
The sigmoid is probably the most popular activation function, as it is differentiable; this is very<br />
important once one attempts to prove the convergence of the system.<br />
Figure 6-2: The XOR problem is unsolvable using a perceptron and a sigmoid activation function (left).<br />
However, by chosing a Radial Basis Function (RBF) it is possible to obtain a working solution (right). Note<br />
that the parameter a of the RBF kernel needs to be suitably adapted to the size of the data.<br />
[DEMOS\CLASSIFICATION\MLP-XOR.ML]<br />
Exercise: Can you find suitable weights that would make the perceptron behave like an ANDgate,<br />
NOR-gate and NAND-gate?<br />
© A.G.Billard 2004 – Last Update March 2011