MACHINE LEARNING TECHNIQUES - LASA

01.11.2014 Views
2 1. I. Introduction .............................................................................................................. 6 1.1 What is Machine Learning? - Definitions ...................................................................... 6 1.1.1 ML Resources: ........................................................................................................... 6 1.2 What is Learning? ........................................................................................................... 7 1.2.1 Taxonomy of Learning Algorithms ........................................................................... 10 1.2.2 Other important terms in machine learning .............................................................. 10 1.2.3 Key features for a good learning system ................................................................. 11 1.2.4 Exercise ................................................................................................................... 12 1.3 Best Practices in ML ..................................................................................................... 12 1.3.1 Training, validation and testing sets ........................................................................ 12 1.3.2 Crossvalidation ........................................................................................................ 13 1.3.3 Performance Measures in ML .................................................................................. 13 1.3.4 In Practice ................................................................................................................ 14 1.4 Focus of this course ..................................................................................................... 14 2. 2 Methods for Correlation Analysis PCA, CCA, ICA .............................................. 16 2.1 Principal Component Analysis .................................................................................... 17 2.1.1 Dimensionality Reduction ........................................................................................ 18 2.1.2 Solving PCA as an optimization under constraint problem ...................................... 20 2.1.3 PCA limitations ........................................................................................................ 21 2.1.4 Projection Pursuit ..................................................................................................... 22 2.1.5 Probabilistic PCA ..................................................................................................... 24 2.2 Canonical Correlation Analysis ................................................................................... 26 2.2.1 CCA for more than two variables ............................................................................. 27 2.2.2 Limitations ................................................................................................................ 27 2.3 Independent Component Analysis .............................................................................. 28 2.3.1 Illustration of ICA ..................................................................................................... 28 2.3.2 Why Gaussian variables are forbidden .................................................................... 31 2.3.3 Definition of ICA ....................................................................................................... 31 2.3.4 Whitening ................................................................................................................. 33 2.3.5 ICA Ambiguities ....................................................................................................... 35 2.3.6 ICA by maximizing non-gaussianity ......................................................................... 35 2.4 Further Readings .......................................................................................................... 38 3. 3 Clustering and Classification ............................................................................... 39 3.1 Clustering Techniques ................................................................................................. 39 3.1.1 Hierarchical Clustering ............................................................................................. 40 3.1.2 K-means clustering .................................................................................................. 45 3.1.3 Soft K-means ........................................................................................................... 47 3.1.4 Clustering with Mixtures of Gaussians ..................................................................... 49 3.1.5 Gaussian Mixture Models ........................................................................................ 51 3.2 Linear Classifiers .......................................................................................................... 56 3.2.1 Linear Discriminant Analysis .................................................................................... 56 3.2.2 Fisher Linear Discriminant ....................................................................................... 57 3.2.3 Mixture of linear classifiers (boosting and bagging) ................................................. 58 3.3 Bayes Classifier ............................................................................................................ 61 3.4 Linear classification with Gaussian Mixture Models ................................................. 62 3.5 Further Readings .......................................................................................................... 63 © A.G.Billard 2004 – Last Update March 2011

3 4. 4 Regression Techniques ........................................................................................ 64 4.1 Linear Regression ......................................................................................................... 64 4.2 Partial Least Square Methods ...................................................................................... 64 4.3 Probabilistic Regression .............................................................................................. 66 4.4 Gaussian Mixture Regression ..................................................................................... 69 4.4.1 One Gaussian Case ................................................................................................ 70 4.4.2 Multi-Gaussian Case ............................................................................................... 71 5. 5 Kernel Methods ...................................................................................................... 73 5.1 The kernel trick ............................................................................................................. 73 5.2 Which kernel, when? .................................................................................................... 75 5.3 Kernel PCA .................................................................................................................... 76 5.4 Kernel CCA .................................................................................................................... 81 5.5 Kernel ICA ...................................................................................................................... 84 5.6 Kernel K-Means ............................................................................................................. 88 5.7 Support Vector Machines ............................................................................................. 90 5.7.1 Support Vector Machine for Linearly Separable Datasets ....................................... 92 5.7.2 Support Vector Machine for Non-linearly Separable Datasets ................................ 96 5.7.3 Non-Linear Support Vector Machines ...................................................................... 97 5.7.4 n-SVM ...................................................................................................................... 98 5.8 Support Vector Regression ......................................................................................... 99 5.8.1 n-SVR .................................................................................................................... 106 5.9 Gaussian Process Regression .................................................................................. 109 5.9.1 What is a Gaussian Process .................................................................................. 109 5.9.2 Equivalence of Gaussian Process Regression and Gaussian Mixture Regression 113 5.9.3 Curse of dimensionality, choice of hyperparameters ............................................. 115 5.10 Gaussian Process Classification .............................................................................. 116 6. 6 Artificial Neural Networks ................................................................................... 120 6.1 Applications of ANN ................................................................................................... 120 6.2 Biological motivation .................................................................................................. 120 6.2.1 The Brain as an Information Processing System ................................................... 120 6.2.2 Neural Networks in the Brain ................................................................................. 121 6.2.3 Neurons and Synapses ......................................................................................... 122 6.2.4 Synaptic Learning .................................................................................................. 122 6.2.5 Summary ............................................................................................................... 123 6.3 Perceptron ................................................................................................................... 124 6.3.1 Learning rule for the Perceptron ............................................................................ 126 6.3.2 Information Theory and the Neuron ....................................................................... 127 6.4 The Backpropagation Learning Rule ........................................................................ 129 6.4.1 The Adaline ............................................................................................................ 130 6.4.2 The Backpropagation Network .............................................................................. 131 6.4.3 The Backpropagation Algorithm ............................................................................ 132 6.5 Willshaw net ................................................................................................................ 133 6.6 Hebbian Learning ........................................................................................................ 134 © A.G.Billard 2004 – Last Update March 2011

Page 1: SCHOOL OF ENGINEERING MACHINE LEARN

Page 5 and 6: 5 9.2.2 Probability Distributions,

Page 7 and 8: 7 Journals: • Machine Learning

Page 9 and 10: 9 Performance What would be an opti

Page 11 and 12: 11 1.2.3 Key features for a good le

Page 13 and 14: 13 1.3.2 Crossvalidation To ensure

Page 15 and 16: 15 In particular, we will consider

Page 17 and 18: 17 2.1 Principal Component Analysis

Page 19 and 20: 19 ( ) Xʹ′ = W X − µ (2.6) i

Page 21 and 22: 21 2.1.2.2 Reconstruction error min

Page 23 and 24: 23 PCA is an example of PP approach

Page 25 and 26: 25 Algorithm: If one further assume

Page 27 and 28: 27 The CCA algorithm consists thus

Page 29 and 30: 29 Figure 2-6: Mixture of variables

Page 31 and 32: 31 2.3.2 Why Gaussian variables are

Page 33 and 34: 33 • In our general definition of

Page 35 and 36: 35 2.3.5 ICA Ambiguities We cannot

Page 37 and 38: 37 Denote by g the derivative of th

Page 39 and 40: 39 3 Clustering and Classification

Page 41 and 42: 41 An agglomerative clustering star

Page 43 and 44: 43 3.1.1.1 The CURE Clustering Algo

Page 45 and 46: 45 Disadvantages of hierarchical cl

Page 47 and 48: 47 Cases where K-means might be vie

Page 49 and 50: 49 3.1.4 Clustering with Mixtures o

Page 51 and 52: 51 k ( σ j ) 2 = k ∑ i α = r k

Page 53 and 54: 53 Theα are the so-called mixing c

Page 55 and 56: 55 Figure 3-16: Clustering with 3 G

Page 57 and 58: 57 When the transformation A is lin

Page 59 and 60: 59 C: X → Y ( ) C x K = arg max

Page 61 and 62: 61 Figure 3-18: Linear combination

Page 63 and 64: 63 Figure 3-19: Bayes classificatio

Page 65 and 66: 65 ⎛⎛ min ⎜⎜ w ⎝⎝ N i=

Page 67 and 68: 67 T ( yi − xi w) 2 M ⎛⎛ ⎞

Page 69 and 70: 69 Figure 4-2: Illustration of the

Page 71 and 72: 71 4.4.2 Multi-Gaussian Case It is

Page 73 and 74: 73 5 Kernel Methods These lecture n

Page 75 and 76: 75 The kernel k provides a metric o

Page 77 and 78: 77 M 1 T v = ∑ x ( x ) v M λ i j

Page 79 and 80: 79 1 M The solutions to the dual ei

Page 81 and 82: 81 5.4 Kernel CCA The linear versio

Page 83 and 84: 83 additional ridge parameter induc

Page 85 and 86: 85 Figure 5-3: TOP: Marginal (left)

Page 87 and 88: 87 statistical independence. We def

Page 89 and 90: 89 J j ( µ 1,...., µ K) = ∑∑

Page 91 and 92: 91 A simple pattern recognition alg

Page 93 and 94: 93 ( ) ( , ) f x = sign w x + b (5.

Page 95 and 96: 95 Figure 5-6: A binary classificat

Page 97 and 98: 97 where N is the number of support

Page 99 and 100: 99 5.8 Support Vector Regression In

Page 101 and 102: 101 The optimization problem given

Page 103 and 104: 103 Note that since we never have t

Page 105 and 106: 105 Figure 5-13: Effect of the kern

Page 107 and 108: 107 To better understand the effect

Page 109 and 110: 109 5.9 Gaussian Process Regression

Page 111 and 112: 111 One can then use the above expr

Page 113 and 114: 113 5.9.2 Equivalence of Gaussian P

Page 115 and 116: 115 5.9.3 Curse of dimensionality,

Page 117 and 118: 117 The weight w determines the slo

Page 119 and 120: 119 Figure 5-21: Example of success

Page 121 and 122: 121 • its performance tends to de

Page 123 and 124: 123 neurons. Furthermore, they lear

Page 125 and 126: 125 The sigmoid f x ( x) ( ) = tanh

Page 127 and 128: 127 6.3.2 Information Theory and th

Page 129 and 130: 129 ( R) ⎛⎛det ⎞⎞ I( x, y)

Page 131 and 132: 131 y= ∑ w x ) of Because of the

Page 133 and 134: 133 6.5 Willshaw net David Willshaw

Page 135 and 136: 135 6.6.1 Weights bounds One of the

Page 137 and 138: 137 Figure 6-11: The weight vector

Page 139 and 140: 139 6.6.4 Oja’s one Neuron Model

Page 141 and 142: 141 If y i and y are highly correla

Page 143 and 144: 143 Foldiak’s second model allows

Page 145 and 146: 145 ∂ ∂ J 1 = fy − 1 2 λ 1yf

Page 147 and 148: 147 6.8 The Self-Organizing Map (SO

Page 149 and 150: 149 6. Decrease the size of the nei

Page 151 and 152: 151 the resulting distribution is a

Page 153 and 154: 153 To simplify the description of

Page 155 and 156: 155 C µν −1 where ( ) is the µ

Page 157 and 158: 157 The continuous time Hopfield ne

Page 159 and 160: 159 ∂f If the slope is negative,

Page 161 and 162: 161 7.2 Hidden Markov Models Hidden

Page 163 and 164: 163 Figure 7-2: Schematic illustrat

Page 165 and 166: 165 these two quantities to compute

Page 167 and 168: 167 7.2.4 Decoding an HMM There are

Page 169 and 170: 169 7.2.5 Further Readings Rabiner,

Page 171 and 172: 171 7.3.1 Principle In reinforcemen

Page 173 and 174: 173 general, acting to maximize imm

Page 175 and 176: 175 of reinforcement learning makes

Page 177 and 178: 177 8 Genetic Algorithms We conclud

Page 179 and 180: 179 However, you must define geneti

Page 181 and 182: 181 Often the crossover operator an

Page 183 and 184: 183 ( A λI) x 0 − = (8.5) where

Page 185 and 186: 185 Joint probability: The joint pr

Page 187 and 188: 187 The two most classical distribu

Page 189 and 190: 189 9.2.7 Statistical Independence

Page 191 and 192: 191 1 1 1 1 h x ∫ a b a b 0 a a

Page 193 and 194: 193 9.4 Estimators 9.4.1 Gradient d

Page 195 and 196: 195 9.4.2.1 Maximum Likelihood Mach

Page 197 and 198: 197 10 References • Machine Learn

gaussian

kernel

linear

matrix

algorithm

probability

vector

input

classification

regression

techniques

lasa

lasa.epfl.ch

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA ... View more MACHINE LEARNING TECHNIQUES - LASA

Delete template?

Save as template ?

MACHINE LEARNING TECHNIQUES - LASA MACHINE LEARNING TECHNIQUES - LASA