02.03.2013 Views

Target Discovery and Validation Reviews and Protocols

Target Discovery and Validation Reviews and Protocols

Target Discovery and Validation Reviews and Protocols

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Gene Networks 43<br />

where Γ ( . ) is the gamma function. Then, the marginal likelihood p(D|G) can be<br />

analytically computed as<br />

Γ(α j⋅l ) Γ(α jkl + N jkl )<br />

p(D | G) = ∏∏<br />

∏<br />

,<br />

(4)<br />

Γ(α j ⋅l + N j⋅l ) Γ(α jkl )<br />

where α j·l = ∑ α jkl <strong>and</strong> N By replacing P(D|G) in Eqs. 3 by 4,<br />

k<br />

j ·l = ∑ k Njkl .<br />

we obtain πpost (G|D) for the multinomial distribution <strong>and</strong> the Dirichlet prior<br />

to perform the structural learning of the Bayesian networks for discrete data.<br />

This criterion is called the BDe metric originally derived by Cooper <strong>and</strong><br />

Herskovits (18).<br />

3.2.3. Bayesian Networks for Continuous Data<br />

j<br />

l<br />

Let X j be a continuous type r<strong>and</strong>om variable. This situation is more realistic for<br />

gene network inference problem from microarray data, because microarray data<br />

essentially take continuous variables. In this sense, when we apply the Bayesian<br />

network for discrete data to microarray data, we need to discretize expression data<br />

into several categories. The number of categories is usually set to be three, i.e.,<br />

“suppressed,” “unchanged,” <strong>and</strong> “overexpressed.” However, actually the number<br />

of categories is a parameter to be estimated, <strong>and</strong> we should notice that the discretization<br />

leads to information loss. Therefore, in microarray data analysis, a computational<br />

method that can h<strong>and</strong>le microarray data as continuous data is advocated.<br />

As for discretization of gene expression data, several methods are listed in Note 6.<br />

The decomposition of the joint probability can be hold for continuous<br />

variables by using densities instead of the probabilistic measure <strong>and</strong> we have<br />

f (xi1,..., xip |θ) = ∏ f j (xij |pa(X j) i, θ j) ,<br />

j<br />

t t t<br />

θ = (θ1 , ...θp) where f, f1 ,…, fp are densities, is the parameter vector, <strong>and</strong><br />

pa(Xj ) i is the set of observations of Pa(Xj ) measured by ith microarray. The<br />

likelihood function is then given by<br />

p(D |θ,G) = ∏ ∏ f j (xij |pa(X j ) i , θ j ) .<br />

i<br />

j<br />

Therefore, in the Bayesian networks for continuous data, the construction of<br />

the conditional densities f j (xij |pa(X j ) i , θ j ) is crucial, <strong>and</strong> this is essentially<br />

the same as the regression problem.<br />

The linear regression model can be used to construct the conditional density<br />

(6) <strong>and</strong> is written as<br />

k

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!