01.11.2014 Views

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

150<br />

6.8.2 Bayesian self-organizing map<br />

Adapted from Yin, H.; Allinson, N.M. Self-organizing mixture networks for probability density<br />

estimation IEEE Transactions on Neural Networks, Volume 12, Issue 2, March 2001 Page(s):405<br />

– 411.<br />

The Bayesian self-organizing map (BSOM) is a method for estimating a probability distribution<br />

generating data points on the basis of a Bayesian stochastic model. BSOM can be used to<br />

estimate the parameters of Gaussian Mixture Models (GMM). In this case, BSOM estimates<br />

GMM parameters by minimizing the Kullback–Leibler information metric and as such provides an<br />

alternative to the classical Expectation-Maximization (EM) method for estimating the GMM<br />

parameters which performs better in terms of both convergence speed and escape of local<br />

minima. Since BSOM makes no assumption on the form of the distribution (thanks to the KL<br />

metric), it can be applied to estimate other mixture of distribution than purely Gaussian<br />

distribution.<br />

The term SOM in BSOM is due to the fact that the update rule uses a concept similar to the<br />

neighborhood of the update rule in self-organizing map (SOM) neural network. BSOM creates a<br />

set K probability density distribution (PDF). At the instar of EM that updates the parameters of the<br />

pdf globally so as to maximize the total probability that the mixture explains well the data, BSOM<br />

updates each pdf separately by using only a subset of the data located in a neightbourhood<br />

around the data point for which the pdf is maximal. The advantage is that the update and<br />

estimation steps are faster. However, the drawback is that like Kohonen network, the algorithm<br />

depends very much on the choice of the hyperparameters (nm of pdf, size of the neighbourhood,<br />

learning rate).<br />

6.8.2.1 Stochastic Model and Parameter Estimation<br />

BSOM proceeds as follows. Suppose that the distribution of data points is given by p( x ). BSOM<br />

will build an estimate ˆp( x ) by constructing a mixture distribution of K probability distribution<br />

functions (pdf) p ( x ) with associated parametersθ i , i = 1... K, on a d-dimensional input space<br />

i<br />

x. If P ,... 1<br />

P are the prior probabilities of each pdf, then the joint probability density for each data<br />

K<br />

sample is given by:<br />

K<br />

pˆ x p x|<br />

P<br />

( ) ( θ )<br />

= ∑ (6.61)<br />

i=<br />

1<br />

i i i<br />

BSOM will update incrementally the mixture so as to minimize the divergence between the two<br />

distributions measured by the Kullback-Leibler metric:<br />

I<br />

( )<br />

( )<br />

⎛⎛ ˆ<br />

log<br />

p x ⎞⎞<br />

=− p( x)<br />

dx<br />

⎜⎜<br />

p x ⎟⎟<br />

⎝⎝ ⎠⎠<br />

∫ (6.62)<br />

The update step (or M-step by analogy to EM) consists of re-estimating the parameters of each<br />

pdf by minimizing I via its partial derivatives over each parameter<br />

∂I<br />

∂θ<br />

ij<br />

under the constraint that<br />

© A.G.Billard 2004 – Last Update March 2011

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!