12.07.2015 Views

View - ResearchGate

View - ResearchGate

View - ResearchGate

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

4 BidautTable 1Input File Format Used by BDrunGene name Mutant1 Mutant2 Mutant3Gene_1 Expression_value_1_1 Expression_value_1_2 Expression_value_1_3Gene_2 Expression_value_2_1 Expression_value_2_2 Expression_value_2_3Gene_3 Expression_value_3_1 Expression_value_3_2 Expression_value_3_3Values are tab-delimited. Expression values is a generic term and may be an absolute expressionvalue, or a ratio of experiment/control. Log values are not acceptable as BD performs thefactorization on positive, additive distributions. The uncertainties file is the exact same format.A snapshot of the annotations (April 2005) is also available from the supportingwebsite (annot.txt).• The list of experiments (conditions in the Rosetta data set) must be supplied forlater visualization in ClutrFree. The list is provided on the supporting website asa tab-delimited file (expnames.txt).3. MethodsFirst, the approach in Subheading 3.1. is discussed. Then pattern recognitionin Subheading 3.2. is performed, followed by visualization, interpretation,and functional analysis in Subheading 3.3.3.1. Introduction to the BD AlgorithmThe BD algorithm is a matrix factorization algorithm that retrieves simultaneouslytwo matrices A and P, which when multiplied together, reconstruct theexpression data D under the noise ε:D = AP + εD is the gene-expression data matrix, and P a set of basic vectors in whichthe data is projected. The A matrix is a set of coefficients that allows the reconstructionof D through multiplication of A and P, i.e., the contribution of eachbasic vector to each gene (Fig. 1). For more details on the underlying mathematics(see ref. 11). Briefly, BD is a Gibbs Sampler that samples the solutionspace using an atomic prior (12) and minimizes the χ 2 distance between data Dand model A . P. The algorithm operates in two stages: first, the burn-in stage,during which the Markov chain reaches an area of high probability and equilibrates.The second stage is the sampling stage, during which samples are takento construct a distribution for A and P elements, leading to a measure of meanand standard deviation for each element.3.1.1. Application to the Rosetta CompendiumThe two matrices P and A generated by BD from the Rosetta Compendiumcontain, respectively, a series of patterns and the distribution of those patterns

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!