02.03.2013 Views

Target Discovery and Validation Reviews and Protocols

Target Discovery and Validation Reviews and Protocols

Target Discovery and Validation Reviews and Protocols

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

42 Imoto et al.<br />

p(D |G) = ∫ p(D,θ | G)dθ = ∫ p(D |θ,G)p(θ |G)dθ .<br />

Here, p(D | , G) is the likelihood function, <strong>and</strong> p( | G) is the prior distribution<br />

on the parameter . Because the normalizing constant p(D) does not depend on<br />

G, we can find<br />

π post (G|D) ∝ π prior (G)p(D|G).<br />

Hence, for computing the posterior probability for structural learning of the<br />

Bayesian networks, we need to compute the high-dimensional integral in Eq. 2.<br />

For learning Bayesian networks, Note 5 gives some information.<br />

3.2.2. Bayesian Networks for Discrete Data<br />

Suppose that we have a set of r<strong>and</strong>om variables {X 1 ,…,X p } <strong>and</strong> that a r<strong>and</strong>om<br />

variable takes one of m values {u 1 ,…,u m }. Then, we put the conditional<br />

probability as<br />

θ jkl = Pr (X j = u k | Pa(X j ) = u jl )<br />

for j = 1,…,p; k = 1,…,m; l = 1,…,m |Pa (X j )| m<br />

. Note that ∑ = 1 holds.<br />

Here, ujl is the lth entry of the state table of Pa(Xj ). For example, in Fig. 4,<br />

Pa(X1 ) = {X2 , X3 } contains m2 θ k =1 jkl<br />

entries: u11 = (u1 , u1 ), u12 = (u1 , u2 ),…,u1m2 =<br />

(um , um ). In this case, we can assume that Xj |P a(Xj ) = ujl follows the multinomial<br />

distribution with probabilities θj1l ,…,θjml .<br />

Suppose we have n independent observations D = {x1 ,…, xn } for {X1 ,…,Xp },<br />

where xij is one of the m values {u1 ,…, um }. Based on these observations, the<br />

likelihood function of θ = (θjkl ) j,k,l is written by<br />

N jkl<br />

p(D |θ, G) = ∏ ∏∏θ<br />

jkl ,<br />

where Njkl is the number of pairs satisfying (Xj , Pa(Xj )) = (uk , ujl ) in D. In the<br />

computation of the marginal likelihood p(D|G) given in Eq. 2 for the Bayesian<br />

networks for discrete data, the Dirichlet distribution is usually used as the prior<br />

distribution on the parameter p(|G). Suppose that the prior distribution on can<br />

be decomposed as where jl =(θj1l ,…,θjml ) t<br />

<strong>and</strong> jl = (αj1l ,…,αjml ) t p(θ | G) = ∏ j ∏ l p(θ jl | α jl ),<br />

. Here, jl is called a hyperparameter vector. We assume<br />

the Dirichlet prior for p(jl |jl ) as<br />

p(θ jl|α jl) = Γ(α jkl )<br />

Γ( kl )<br />

∏ αj ′<br />

k'<br />

j<br />

k<br />

∏ θ<br />

k<br />

l<br />

α jkl −1<br />

jkl<br />

,<br />

(2)<br />

(3)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!