MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA MACHINE LEARNING TECHNIQUES - LASA
18 correlation across the two components covariance is zero c c 0 ij ji X and i X j . If the two components are uncorrelated, their = = . The covariance matrix is, by definition, always symmetric. It has, i thus, an orthogonal basis, defined by N Eigenvectors e , i 1,... N λ i : = with associated eigenvalues Ce i i = λ e (2.3) i The Eigenvalues λ i are calculated by solving the equation: C− λ I = 0 (2.4) i When I is the N× N identity matrix and the determinant of the matrix. If the data vector has N components, the characteristic equation becomes of order N. This is easy to solve only if N is small. By ordering the eigenvectors in the order of descending eigenvalues (largest first), one can create an ordered orthogonal basis with the first eigenvector having the direction of largest variance of the data. In this way, the eigenvector corresponding to the largest eigenvalue is the direction along which the variance of the data is maximal. The directions of eigenvectors are drawn in the Figure 2-1 right as vectors. The first eigenvector having the largest eigenvalue points to the direction of largest variance (the longest axis of the ellipse) whereas the second eigenvector is orthogonal to the first one (pointing to the second axis of the ellipse). 2.1.1 Dimensionality Reduction By chosing the k
19 ( ) Xʹ′ = W X − µ (2.6) i i i Components of X ' can be seen as the coordinates in the orthogonal base of i i In order to reconstruct the original data vector using the property of an orthogonal matrix X from i X i X = W X ʹ′ + µ T i i i W W − 1 T = . ' X . , one must compute: Now, instead of using all the eigenvectors of the covariance matrix, one may represent the data in terms of only a few basis vectors of the orthogonal basis. If we denote the reduced transfer matrix W , that contains only the k first eigenvectors. The reduced transformation is, thus: k ( ) Xʹ′ = W X − µ i k i i i X ' lives now in a coordinates system of dimension k. Such a transformation reduces the meansquare error between the original data point and its projection. If the data is concentrated in a linear subspace, this provides a way to compress data without losing much information and simplifying the representation. By picking the eigenvectors having the largest eigenvalues we lose as little information as possible in the mean-square sense. One can e.g. choose a fixed number of eigenvectors and their respective eigenvalues and get a consistent representation, or abstraction of the data. This preserves a varying amount of energy of the original data. Alternatively, we can choose approximately the same amount of energy and a varying amount of eigenvectors and their respective eigenvalues. This would in turn give approximately consistent amount of information with the expense of varying representations with regard to the dimension of the subspace. © A.G.Billard 2004 – Last Update March 2011
- Page 1 and 2: SCHOOL OF ENGINEERING MACHINE LEARN
- Page 3 and 4: 3 4. 4 Regression Techniques ......
- Page 5 and 6: 5 9.2.2 Probability Distributions,
- Page 7 and 8: 7 Journals: • Machine Learning
- Page 9 and 10: 9 Performance What would be an opti
- Page 11 and 12: 11 1.2.3 Key features for a good le
- Page 13 and 14: 13 1.3.2 Crossvalidation To ensure
- Page 15 and 16: 15 In particular, we will consider
- Page 17: 17 2.1 Principal Component Analysis
- Page 21 and 22: 21 2.1.2.2 Reconstruction error min
- Page 23 and 24: 23 PCA is an example of PP approach
- Page 25 and 26: 25 Algorithm: If one further assume
- Page 27 and 28: 27 The CCA algorithm consists thus
- Page 29 and 30: 29 Figure 2-6: Mixture of variables
- Page 31 and 32: 31 2.3.2 Why Gaussian variables are
- Page 33 and 34: 33 • In our general definition of
- Page 35 and 36: 35 2.3.5 ICA Ambiguities We cannot
- Page 37 and 38: 37 Denote by g the derivative of th
- Page 39 and 40: 39 3 Clustering and Classification
- Page 41 and 42: 41 An agglomerative clustering star
- Page 43 and 44: 43 3.1.1.1 The CURE Clustering Algo
- Page 45 and 46: 45 Disadvantages of hierarchical cl
- Page 47 and 48: 47 Cases where K-means might be vie
- Page 49 and 50: 49 3.1.4 Clustering with Mixtures o
- Page 51 and 52: 51 k ( σ j ) 2 = k ∑ i α = r k
- Page 53 and 54: 53 Theα are the so-called mixing c
- Page 55 and 56: 55 Figure 3-16: Clustering with 3 G
- Page 57 and 58: 57 When the transformation A is lin
- Page 59 and 60: 59 C: X → Y ( ) C x K = arg max
- Page 61 and 62: 61 Figure 3-18: Linear combination
- Page 63 and 64: 63 Figure 3-19: Bayes classificatio
- Page 65 and 66: 65 ⎛⎛ min ⎜⎜ w ⎝⎝ N i=
- Page 67 and 68: 67 T ( yi − xi w) 2 M ⎛⎛ ⎞
18<br />
correlation across the two components<br />
covariance is zero c c 0<br />
ij<br />
ji<br />
X and<br />
i<br />
X<br />
j<br />
. If the two components are uncorrelated, their<br />
= = . The covariance matrix is, by definition, always symmetric. It has,<br />
i<br />
thus, an orthogonal basis, defined by N Eigenvectors e , i 1,... N<br />
λ<br />
i<br />
:<br />
= with associated eigenvalues<br />
Ce<br />
i<br />
i<br />
= λ e<br />
(2.3)<br />
i<br />
The Eigenvalues λ<br />
i<br />
are calculated by solving the equation:<br />
C− λ I = 0<br />
(2.4)<br />
i<br />
When I is the N× N identity matrix and the determinant of the matrix.<br />
If the data vector has N components, the characteristic equation becomes of order N. This is easy<br />
to solve only if N is small.<br />
By ordering the eigenvectors in the order of descending eigenvalues (largest first), one can create<br />
an ordered orthogonal basis with the first eigenvector having the direction of largest variance of<br />
the data. In this way, the eigenvector corresponding to the largest eigenvalue is the direction<br />
along which the variance of the data is maximal. The directions of eigenvectors are drawn in the<br />
Figure 2-1 right as vectors. The first eigenvector having the largest eigenvalue points to the<br />
direction of largest variance (the longest axis of the ellipse) whereas the second eigenvector is<br />
orthogonal to the first one (pointing to the second axis of the ellipse).<br />
2.1.1 Dimensionality Reduction<br />
By chosing the k