12.07.2015 Views

View - ResearchGate

View - ResearchGate

View - ResearchGate

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

36 Wang and Ochsfunction (9). Herein the authors present the methodology and demonstrate how toapply it to estimate gene function.NMF aims to solve a problem in which a data matrix (D) can be decomposed byD = M + ε = AP +ε (1)and M represents a reconstruction of the data from two new matrices, A (amplitude)and P (pattern), and ε is the error on each element of D. For microarraydata, the matrix D provides the estimates of mRNA levels for genes, such thateach column corresponds to the estimate for a single condition, and each rowrepresents levels for a single gene. A row of D corresponds to the processedintensity for a single gene across all conditions. A and P are the decomposedmatrices, which define the assignment of genes to patterns (A) and the behaviorof patterns across condition (P). Therefore, each row of matrix P can be viewedas representing an expression pattern, and each column of matrix A can beviewed as representing the amplitude distribution of each gene in the correspondingexpression pattern. Therefore, genes linked within a column are linked to abehavior represented by the row P, and these genes can be expected to be linkedto one or more biological behaviors. By comparing genes of unknown functionin these groups to genes of known function, the function of the unknown genescan be predicted. Similarly, by noting the genes that have a behavior related tobiological processes (such as genes expressed at a specific phase of the cellcycle), the biological role of these genes can be predicted.The key issue to determine in applying an NMF approach to a problem is thecost function that will guide the analysis to the desired result. The cost functiondetermines how the algorithm measures the difference between the data (D) andthe estimation of the data, M. For instance, if two genes varied simultaneouslybut with different amplitudes, a Pearson correlation would be more useful thana measure that took into account differences in levels of expression, such asEuclidean distance. The change in cost function is the primary improvement inleast squares nonnegative matrix factorization (LS-NMF) for microarray data,as LS-NMF minimizesEs2D−M ⎛ Dij− M ⎞ij= = ∑σ⎜ ⎟ =ij ⎝ σij⎠the normalized χ 2 measure, instead of22⎛ Dij −∑AikP⎞kj⎜⎟k∑⎜⎟ijσ ij⎜⎝⎟⎠(2)E∑∑= D − M 2= D − M 2( ) = ⎜ −e ij ij ij ik kjijij ⎝D A Pk⎛∑⎞⎟⎠2(3)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!