MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA MACHINE LEARNING TECHNIQUES - LASA

01.11.2014 Views

184 9.1.2 Singular Value Decomposition (SVD) When the eigenvectors are not linearly independent, then X does not have an inverse (it is thus singular) and such decomposition does not exist. The eigenvalue decomposition consists then in finding a similarity transformation such that: A= UΛ V (8.11) With UV , two orthogonal (if real) or unitary (if complex) matrices and Λ a diagonal matrix. Such decomposition is called singular value decomposition (SVD). SVD are useful in sofar that A represents the mapping of a n-dimensional space onto itself, where n is the dimension of A. # An alternative to SVD is to compute the Moore-Penrose Pseudoinverse A of the non invertible # matrix A and then exploit the fact that, for a pair of vectors z and c, z = A c is the shortest length least-square solution to the problem Az = c . Methods such as PCA that find the optimal (in a least-square sense) projection of a dataset can be approximated using the pseudoinverse when the transformation matrix is singular. 9.1.3 Frobenius Norm The Frobenius norm of an m× nmatrix A is given by: A F m n i= 1 j= 1 ij 2 = ∑∑ a (8.12) 9.2 Recall of basic notions of statistics and probabilities 9.2.1 Probabilities Consider two variables x and y taking discrete values over the intervals [x 1 ,…, x M ] and [y 1 ,…, y N ] respectively, then P( x= x i ) is the probability that the variable x takes value x i , with: i) 0 ≤ P( x= x ) ≤1, ∀ i= 1,..., M, M ∑ i= 1 ( x ) ii) P x= = 1. i i ( j ) The same two above properties applies to the probabilities P y= y , ∀ j= 1,... N. Some properties follow from the above: Let P( x= a) be the probability that the variable x will take the value a. If P( x a) 1 = = , x is a constant with value a. If x is an integer and can take any value a between [1, N] ∈• with equal 1 probability, then the probability that x takes value a is Px ( = a) = N © A.G.Billard 2004 – Last Update March 2011

185 Joint probability: The joint probability that the two events A (variable x takes value a) and B (variable y takes value b) occur is expressed as: ( ) ( , ) ( ) ( ) ( ) P A B = P A∩ B = P x= a ∩ y= b (8.13) Conditional probability: P(A | B) is the conditional probability that event A will take place if event B already took place ( | ) P A B ( ∩ B) P( B) P A = (8.14) It follows that: ( ) ( | ) ( ) P A∩ B = P A B P B (8.15) By the same reasoning, we have: ( ) ( | ) ( ) P A∩ B = P B A P A (8.16) Hence, ( | ) ( ) P( B| A) P( A) P A B P B = (8.17) Bayes' theorem: ( | ) P A B ( | ) ( ) P( B) P B A P A = (8.18) Marginal Probability: The so-called marginal probability that variable x will take value x i is given by: N ∑ (8.19) P( x= x) = P( x= x, y= y ) x i i j j= 1 To compute the marginal, one needs to know the joint distribution of variables x and y. Often, one does not know it and one can only estimate it. Note that if x is a multidimensional variate, then the marginal is a joint distribution over the variate spanned by x. The joint distribution is far richer than the marginals. The marginals of N variables taking K values corresponds to N(K-1) probabilities, while their joint distribution corresponds to K N-1 probabilities. © A.G.Billard 2004 – Last Update March 2011

185<br />

Joint probability:<br />

The joint probability that the two events A (variable x takes value a) and B (variable y takes value<br />

b) occur is expressed as:<br />

( )<br />

( , ) ( ) ( ) ( )<br />

P A B = P A∩ B = P x= a ∩ y= b<br />

(8.13)<br />

Conditional probability:<br />

P(A | B) is the conditional probability that event A will take place if event B already took place<br />

( | )<br />

P A B<br />

( ∩ B)<br />

P( B)<br />

P A<br />

= (8.14)<br />

It follows that:<br />

( ) ( | ) ( )<br />

P A∩ B = P A B P B<br />

(8.15)<br />

By the same reasoning, we have:<br />

( ) ( | ) ( )<br />

P A∩ B = P B A P A<br />

(8.16)<br />

Hence,<br />

( | ) ( ) P( B|<br />

A) P( A)<br />

P A B P B<br />

= (8.17)<br />

Bayes' theorem:<br />

( | )<br />

P A B<br />

( | ) ( )<br />

P( B)<br />

P B A P A<br />

= (8.18)<br />

Marginal Probability:<br />

The so-called marginal probability that variable x will take value x i is given by:<br />

N<br />

∑ (8.19)<br />

P( x= x) = P( x= x, y=<br />

y )<br />

x i i j<br />

j=<br />

1<br />

To compute the marginal, one needs to know the joint distribution of variables x and y. Often, one<br />

does not know it and one can only estimate it. Note that if x is a multidimensional variate, then the<br />

marginal is a joint distribution over the variate spanned by x.<br />

The joint distribution is far richer than the marginals. The marginals of N variables taking K values<br />

corresponds to N(K-1) probabilities, while their joint distribution corresponds to K N-1 probabilities.<br />

© A.G.Billard 2004 – Last Update March 2011

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!