MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA MACHINE LEARNING TECHNIQUES - LASA
196 EM is a description of a class of related algorithm, not of a particular algorithm. EM is a recipe or meta-algorithm, which is used to devise particular algorithms. The Baum-Welch algorithm is an example of an EM algorithm applied to Hidden Markov Models. Another example is the K-means clustering algorithm. It can be shown that an EM iteration does not decrease the observed data likelihood function, and that the only stationary points of the iteration are the stationary points of the observed data likelihood function. In practice, this means that an EM algorithm will converge to a local maximum of the observed data likelihood function. EM proceeds thus iteratively and is particularly suited for parameter estimation under incomplete data or missing data situations. By using the EM procedure, the so-called marginal (or incomplete-data) likelihood is obtained by computing the average or expectation of the complete-data likelihood with respect to the missing data using the current parameter estimates (E-step), then the new parameter estimates are obtained by maximizing the marginal likelihood (M-step). © A.G.Billard 2004 – Last Update March 2011
197 10 References • Machine Learning, Tom Mitchell, McGraw Hill, 1997. • Pattern Classification Richard O. Duda, Peter E. Hart, David G. Stork, • Information Theory, Inference and Learning Algorithms, David J.C Mackay, Cambridge University Press, 2003. • Independent Component Analysis, A. Hyvarinen, J. Karhunen and E. Oja, Wiley Inter- Sciences. 2001. • Artificial Neural Networks and Information Theory, Colin Fyfe, Tech. Report, Dept. of Computing and Infortion Science, The University of Paisley, 2000. • Neural Networks, Simon Haykin, Prentice Hall International Editions, 1994. • Self-Organizing Maps, Teuvo Kohonen, Springer Series in Information Sciences, 30, Springer. 2001. • Learning with Kernels, B. Scholkopf and A. Smola, MIT Press 2002 • Reinforcement Learning: An Introduction,. R. Sutton & A. Barto, A Bradford Book. MIT Press, 1998. • Neural Networks for Pattern Recognition, Bishop, C.M. 1996. New York: Oxford University Press. 10.1 Other Recommended Textbooks • Elements of Machine Learning, Pat Langley, Morgan Kaufmann, 1996. • Applied Nonlinear Control, J-J. Slotine, Slotine, J.J.E., and Li, W., Prentice-Hall, 1991. • Neural Networks, Simon Haykin, Prentice Hall International Editions, 1994. • Cluster Analysis, Copyright StatSoft, Inc., 1984-2004 © A.G.Billard 2004 – Last Update March 2011
- Page 145 and 146: 145 ∂ ∂ J 1 = fy − 1 2 λ 1yf
- Page 147 and 148: 147 6.8 The Self-Organizing Map (SO
- Page 149 and 150: 149 6. Decrease the size of the nei
- Page 151 and 152: 151 the resulting distribution is a
- Page 153 and 154: 153 To simplify the description of
- Page 155 and 156: 155 C µν −1 where ( ) is the µ
- Page 157 and 158: 157 The continuous time Hopfield ne
- Page 159 and 160: 159 ∂f If the slope is negative,
- Page 161 and 162: 161 7.2 Hidden Markov Models Hidden
- Page 163 and 164: 163 Figure 7-2: Schematic illustrat
- Page 165 and 166: 165 these two quantities to compute
- Page 167 and 168: 167 7.2.4 Decoding an HMM There are
- Page 169 and 170: 169 7.2.5 Further Readings Rabiner,
- Page 171 and 172: 171 7.3.1 Principle In reinforcemen
- Page 173 and 174: 173 general, acting to maximize imm
- Page 175 and 176: 175 of reinforcement learning makes
- Page 177 and 178: 177 8 Genetic Algorithms We conclud
- Page 179 and 180: 179 However, you must define geneti
- Page 181 and 182: 181 Often the crossover operator an
- Page 183 and 184: 183 ( A λI) x 0 − = (8.5) where
- Page 185 and 186: 185 Joint probability: The joint pr
- Page 187 and 188: 187 The two most classical distribu
- Page 189 and 190: 189 9.2.7 Statistical Independence
- Page 191 and 192: 191 1 1 1 1 h x ∫ a b a b 0 a a
- Page 193 and 194: 193 9.4 Estimators 9.4.1 Gradient d
- Page 195: 195 9.4.2.1 Maximum Likelihood Mach
197<br />
10 References<br />
• Machine Learning, Tom Mitchell, McGraw Hill, 1997.<br />
• Pattern Classification Richard O. Duda, Peter E. Hart, David G. Stork,<br />
• Information Theory, Inference and Learning Algorithms, David J.C Mackay,<br />
Cambridge University Press, 2003.<br />
• Independent Component Analysis, A. Hyvarinen, J. Karhunen and E. Oja, Wiley Inter-<br />
Sciences. 2001.<br />
• Artificial Neural Networks and Information Theory, Colin Fyfe, Tech. Report, Dept. of<br />
Computing and Infortion Science, The University of Paisley, 2000.<br />
• Neural Networks, Simon Haykin, Prentice Hall International Editions, 1994.<br />
• Self-Organizing Maps, Teuvo Kohonen, Springer Series in Information Sciences, 30,<br />
Springer. 2001.<br />
• Learning with Kernels, B. Scholkopf and A. Smola, MIT Press 2002<br />
• Reinforcement Learning: An Introduction,. R. Sutton & A. Barto, A Bradford Book. MIT<br />
Press, 1998.<br />
• Neural Networks for Pattern Recognition, Bishop, C.M. 1996. New York: Oxford<br />
University Press.<br />
10.1 Other Recommended Textbooks<br />
• Elements of Machine Learning, Pat Langley, Morgan Kaufmann, 1996.<br />
• Applied Nonlinear Control, J-J. Slotine, Slotine, J.J.E., and Li, W., Prentice-Hall, 1991.<br />
• Neural Networks, Simon Haykin, Prentice Hall International Editions, 1994.<br />
• Cluster Analysis, Copyright StatSoft, Inc., 1984-2004<br />
© A.G.Billard 2004 – Last Update March 2011