MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA MACHINE LEARNING TECHNIQUES - LASA
184 9.1.2 Singular Value Decomposition (SVD) When the eigenvectors are not linearly independent, then X does not have an inverse (it is thus singular) and such decomposition does not exist. The eigenvalue decomposition consists then in finding a similarity transformation such that: A= UΛ V (8.11) With UV , two orthogonal (if real) or unitary (if complex) matrices and Λ a diagonal matrix. Such decomposition is called singular value decomposition (SVD). SVD are useful in sofar that A represents the mapping of a n-dimensional space onto itself, where n is the dimension of A. # An alternative to SVD is to compute the Moore-Penrose Pseudoinverse A of the non invertible # matrix A and then exploit the fact that, for a pair of vectors z and c, z = A c is the shortest length least-square solution to the problem Az = c . Methods such as PCA that find the optimal (in a least-square sense) projection of a dataset can be approximated using the pseudoinverse when the transformation matrix is singular. 9.1.3 Frobenius Norm The Frobenius norm of an m× nmatrix A is given by: A F m n i= 1 j= 1 ij 2 = ∑∑ a (8.12) 9.2 Recall of basic notions of statistics and probabilities 9.2.1 Probabilities Consider two variables x and y taking discrete values over the intervals [x 1 ,…, x M ] and [y 1 ,…, y N ] respectively, then P( x= x i ) is the probability that the variable x takes value x i , with: i) 0 ≤ P( x= x ) ≤1, ∀ i= 1,..., M, M ∑ i= 1 ( x ) ii) P x= = 1. i i ( j ) The same two above properties applies to the probabilities P y= y , ∀ j= 1,... N. Some properties follow from the above: Let P( x= a) be the probability that the variable x will take the value a. If P( x a) 1 = = , x is a constant with value a. If x is an integer and can take any value a between [1, N] ∈• with equal 1 probability, then the probability that x takes value a is Px ( = a) = N © A.G.Billard 2004 – Last Update March 2011
185 Joint probability: The joint probability that the two events A (variable x takes value a) and B (variable y takes value b) occur is expressed as: ( ) ( , ) ( ) ( ) ( ) P A B = P A∩ B = P x= a ∩ y= b (8.13) Conditional probability: P(A | B) is the conditional probability that event A will take place if event B already took place ( | ) P A B ( ∩ B) P( B) P A = (8.14) It follows that: ( ) ( | ) ( ) P A∩ B = P A B P B (8.15) By the same reasoning, we have: ( ) ( | ) ( ) P A∩ B = P B A P A (8.16) Hence, ( | ) ( ) P( B| A) P( A) P A B P B = (8.17) Bayes' theorem: ( | ) P A B ( | ) ( ) P( B) P B A P A = (8.18) Marginal Probability: The so-called marginal probability that variable x will take value x i is given by: N ∑ (8.19) P( x= x) = P( x= x, y= y ) x i i j j= 1 To compute the marginal, one needs to know the joint distribution of variables x and y. Often, one does not know it and one can only estimate it. Note that if x is a multidimensional variate, then the marginal is a joint distribution over the variate spanned by x. The joint distribution is far richer than the marginals. The marginals of N variables taking K values corresponds to N(K-1) probabilities, while their joint distribution corresponds to K N-1 probabilities. © A.G.Billard 2004 – Last Update March 2011
- Page 133 and 134: 133 6.5 Willshaw net David Willshaw
- Page 135 and 136: 135 6.6.1 Weights bounds One of the
- Page 137 and 138: 137 Figure 6-11: The weight vector
- Page 139 and 140: 139 6.6.4 Oja’s one Neuron Model
- Page 141 and 142: 141 If y i and y are highly correla
- Page 143 and 144: 143 Foldiak’s second model allows
- Page 145 and 146: 145 ∂ ∂ J 1 = fy − 1 2 λ 1yf
- Page 147 and 148: 147 6.8 The Self-Organizing Map (SO
- Page 149 and 150: 149 6. Decrease the size of the nei
- Page 151 and 152: 151 the resulting distribution is a
- Page 153 and 154: 153 To simplify the description of
- Page 155 and 156: 155 C µν −1 where ( ) is the µ
- Page 157 and 158: 157 The continuous time Hopfield ne
- Page 159 and 160: 159 ∂f If the slope is negative,
- Page 161 and 162: 161 7.2 Hidden Markov Models Hidden
- Page 163 and 164: 163 Figure 7-2: Schematic illustrat
- Page 165 and 166: 165 these two quantities to compute
- Page 167 and 168: 167 7.2.4 Decoding an HMM There are
- Page 169 and 170: 169 7.2.5 Further Readings Rabiner,
- Page 171 and 172: 171 7.3.1 Principle In reinforcemen
- Page 173 and 174: 173 general, acting to maximize imm
- Page 175 and 176: 175 of reinforcement learning makes
- Page 177 and 178: 177 8 Genetic Algorithms We conclud
- Page 179 and 180: 179 However, you must define geneti
- Page 181 and 182: 181 Often the crossover operator an
- Page 183: 183 ( A λI) x 0 − = (8.5) where
- Page 187 and 188: 187 The two most classical distribu
- Page 189 and 190: 189 9.2.7 Statistical Independence
- Page 191 and 192: 191 1 1 1 1 h x ∫ a b a b 0 a a
- Page 193 and 194: 193 9.4 Estimators 9.4.1 Gradient d
- Page 195 and 196: 195 9.4.2.1 Maximum Likelihood Mach
- Page 197 and 198: 197 10 References • Machine Learn
185<br />
Joint probability:<br />
The joint probability that the two events A (variable x takes value a) and B (variable y takes value<br />
b) occur is expressed as:<br />
( )<br />
( , ) ( ) ( ) ( )<br />
P A B = P A∩ B = P x= a ∩ y= b<br />
(8.13)<br />
Conditional probability:<br />
P(A | B) is the conditional probability that event A will take place if event B already took place<br />
( | )<br />
P A B<br />
( ∩ B)<br />
P( B)<br />
P A<br />
= (8.14)<br />
It follows that:<br />
( ) ( | ) ( )<br />
P A∩ B = P A B P B<br />
(8.15)<br />
By the same reasoning, we have:<br />
( ) ( | ) ( )<br />
P A∩ B = P B A P A<br />
(8.16)<br />
Hence,<br />
( | ) ( ) P( B|<br />
A) P( A)<br />
P A B P B<br />
= (8.17)<br />
Bayes' theorem:<br />
( | )<br />
P A B<br />
( | ) ( )<br />
P( B)<br />
P B A P A<br />
= (8.18)<br />
Marginal Probability:<br />
The so-called marginal probability that variable x will take value x i is given by:<br />
N<br />
∑ (8.19)<br />
P( x= x) = P( x= x, y=<br />
y )<br />
x i i j<br />
j=<br />
1<br />
To compute the marginal, one needs to know the joint distribution of variables x and y. Often, one<br />
does not know it and one can only estimate it. Note that if x is a multidimensional variate, then the<br />
marginal is a joint distribution over the variate spanned by x.<br />
The joint distribution is far richer than the marginals. The marginals of N variables taking K values<br />
corresponds to N(K-1) probabilities, while their joint distribution corresponds to K N-1 probabilities.<br />
© A.G.Billard 2004 – Last Update March 2011