MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA MACHINE LEARNING TECHNIQUES - LASA
168 To do this, one proceeds iteratively and tries to find the optimal state at each time step. Such an iterative procedure is advantageous in that, if one is provided with part of the observations as in the above weather prediction example, one can use the observations { } 1 ,..., t first t time steps to guide the inference. o o made over the The optimal state at each time step is obtained by combining inferences made with the forward and backward procedures and is given by γ ( j) = t α N ∑ i= 1 The most likely sequence is then obtained by computing: j ( t) β ( t) α i j ( t) β ( t) ( ) arg max( γ () i) qt 1≤≤ i N t i , see also Equation(7.4). = (7.13) The state sequence maximizing the probability of a path which accounts for the first t observations and ends in state j is given by: δ ( j) = max p( q... q , q = j, o... o) (7.14) t 1 1 1 1... t − q q t t t−1 Computing the above quantity requires taking into account the emission probabilities and the transition probabilites. Again one proceeds iteratively through induction. This forms the core of the Viterbi algorithm and is summarized in the table below: Hence, when inferring the weather over the next five days, given information on the weather for the last ten days, one would first compute the first 10 states sequence q1,..... q using (7.14) and 10 then one would use (7.13) to infer the next five states q 11 ,..... q 15 . Given q 11 ,..... q 15 , one would then draw from the associated emission probabilities to predict the particular weather (i.e. the particular observation one should make) for the next 15 time slots. : © A.G.Billard 2004 – Last Update March 2011
169 7.2.5 Further Readings Rabiner, L.R. (1989) “A tutorial on hidden Markov models and selected applications in speech recognition”, Proceedings of the IEEE, 77:2 Shai Fine, Yoram Singer and Naftali Tishby (1998), “The Hierarchical Hidden Markov Model”, Machine Learning, Volume 32, Number 1, 41-62. © A.G.Billard 2004 – Last Update March 2011
- Page 117 and 118: 117 The weight w determines the slo
- Page 119 and 120: 119 Figure 5-21: Example of success
- Page 121 and 122: 121 • its performance tends to de
- Page 123 and 124: 123 neurons. Furthermore, they lear
- Page 125 and 126: 125 The sigmoid f x ( x) ( ) = tanh
- Page 127 and 128: 127 6.3.2 Information Theory and th
- Page 129 and 130: 129 ( R) ⎛⎛det ⎞⎞ I( x, y)
- Page 131 and 132: 131 y= ∑ w x ) of Because of the
- Page 133 and 134: 133 6.5 Willshaw net David Willshaw
- Page 135 and 136: 135 6.6.1 Weights bounds One of the
- Page 137 and 138: 137 Figure 6-11: The weight vector
- Page 139 and 140: 139 6.6.4 Oja’s one Neuron Model
- Page 141 and 142: 141 If y i and y are highly correla
- Page 143 and 144: 143 Foldiak’s second model allows
- Page 145 and 146: 145 ∂ ∂ J 1 = fy − 1 2 λ 1yf
- Page 147 and 148: 147 6.8 The Self-Organizing Map (SO
- Page 149 and 150: 149 6. Decrease the size of the nei
- Page 151 and 152: 151 the resulting distribution is a
- Page 153 and 154: 153 To simplify the description of
- Page 155 and 156: 155 C µν −1 where ( ) is the µ
- Page 157 and 158: 157 The continuous time Hopfield ne
- Page 159 and 160: 159 ∂f If the slope is negative,
- Page 161 and 162: 161 7.2 Hidden Markov Models Hidden
- Page 163 and 164: 163 Figure 7-2: Schematic illustrat
- Page 165 and 166: 165 these two quantities to compute
- Page 167: 167 7.2.4 Decoding an HMM There are
- Page 171 and 172: 171 7.3.1 Principle In reinforcemen
- Page 173 and 174: 173 general, acting to maximize imm
- Page 175 and 176: 175 of reinforcement learning makes
- Page 177 and 178: 177 8 Genetic Algorithms We conclud
- Page 179 and 180: 179 However, you must define geneti
- Page 181 and 182: 181 Often the crossover operator an
- Page 183 and 184: 183 ( A λI) x 0 − = (8.5) where
- Page 185 and 186: 185 Joint probability: The joint pr
- Page 187 and 188: 187 The two most classical distribu
- Page 189 and 190: 189 9.2.7 Statistical Independence
- Page 191 and 192: 191 1 1 1 1 h x ∫ a b a b 0 a a
- Page 193 and 194: 193 9.4 Estimators 9.4.1 Gradient d
- Page 195 and 196: 195 9.4.2.1 Maximum Likelihood Mach
- Page 197 and 198: 197 10 References • Machine Learn
169<br />
7.2.5 Further Readings<br />
Rabiner, L.R. (1989) “A tutorial on hidden Markov models and selected applications in speech<br />
recognition”, Proceedings of the IEEE, 77:2<br />
Shai Fine, Yoram Singer and Naftali Tishby (1998), “The Hierarchical Hidden Markov Model”,<br />
Machine Learning, Volume 32, Number 1, 41-62.<br />
© A.G.Billard 2004 – Last Update March 2011