MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
150<br />
6.8.2 Bayesian self-organizing map<br />
Adapted from Yin, H.; Allinson, N.M. Self-organizing mixture networks for probability density<br />
estimation IEEE Transactions on Neural Networks, Volume 12, Issue 2, March 2001 Page(s):405<br />
– 411.<br />
The Bayesian self-organizing map (BSOM) is a method for estimating a probability distribution<br />
generating data points on the basis of a Bayesian stochastic model. BSOM can be used to<br />
estimate the parameters of Gaussian Mixture Models (GMM). In this case, BSOM estimates<br />
GMM parameters by minimizing the Kullback–Leibler information metric and as such provides an<br />
alternative to the classical Expectation-Maximization (EM) method for estimating the GMM<br />
parameters which performs better in terms of both convergence speed and escape of local<br />
minima. Since BSOM makes no assumption on the form of the distribution (thanks to the KL<br />
metric), it can be applied to estimate other mixture of distribution than purely Gaussian<br />
distribution.<br />
The term SOM in BSOM is due to the fact that the update rule uses a concept similar to the<br />
neighborhood of the update rule in self-organizing map (SOM) neural network. BSOM creates a<br />
set K probability density distribution (PDF). At the instar of EM that updates the parameters of the<br />
pdf globally so as to maximize the total probability that the mixture explains well the data, BSOM<br />
updates each pdf separately by using only a subset of the data located in a neightbourhood<br />
around the data point for which the pdf is maximal. The advantage is that the update and<br />
estimation steps are faster. However, the drawback is that like Kohonen network, the algorithm<br />
depends very much on the choice of the hyperparameters (nm of pdf, size of the neighbourhood,<br />
learning rate).<br />
6.8.2.1 Stochastic Model and Parameter Estimation<br />
BSOM proceeds as follows. Suppose that the distribution of data points is given by p( x ). BSOM<br />
will build an estimate ˆp( x ) by constructing a mixture distribution of K probability distribution<br />
functions (pdf) p ( x ) with associated parametersθ i , i = 1... K, on a d-dimensional input space<br />
i<br />
x. If P ,... 1<br />
P are the prior probabilities of each pdf, then the joint probability density for each data<br />
K<br />
sample is given by:<br />
K<br />
pˆ x p x|<br />
P<br />
( ) ( θ )<br />
= ∑ (6.61)<br />
i=<br />
1<br />
i i i<br />
BSOM will update incrementally the mixture so as to minimize the divergence between the two<br />
distributions measured by the Kullback-Leibler metric:<br />
I<br />
( )<br />
( )<br />
⎛⎛ ˆ<br />
log<br />
p x ⎞⎞<br />
=− p( x)<br />
dx<br />
⎜⎜<br />
p x ⎟⎟<br />
⎝⎝ ⎠⎠<br />
∫ (6.62)<br />
The update step (or M-step by analogy to EM) consists of re-estimating the parameters of each<br />
pdf by minimizing I via its partial derivatives over each parameter<br />
∂I<br />
∂θ<br />
ij<br />
under the constraint that<br />
© A.G.Billard 2004 – Last Update March 2011