Identifying Differentially Expressed Gene Combinations 179maximizing S crossover 2D projections, would be likely to capture planar crossesburied in the higher dimensional space.3.1.4. The ECFsThe F-statistic originates in analysis of variance (13,14) when testing whetherthe variation in a response of interest, say gene expression, depends on a classlabel.It is proportional to the ratio of the between class variance to the withinclass variance.2K −1∑n I F n ( x − xkkk )=2− ( x − xk)∑kj ,where x k,jis the gene expression of the j-th individuals in class k, k = 1, . . ., K;x - kis the mean expression of samples in class k and x is the overall meanexpression in the pooled classes. If the between group variance is much largerthan the within group variance, it can be inferred that gene expression isrelated to phenotype.The ECF extends the F-statistic to test for coexpression in pairs of genes (9).If X and Y represent the expression intensities for a pair of genes, the conditionalF-statistic (for outcome X conditional on Y = y) can be written as:2 2p*∑ σkk X ( 1−ρk)kF =XY=y2⎡⎛µ ρ σ µ ρ σ ⎞ ⎛ρσYkk′ XkYkk′Xki Xk∑ pp − −k < k ′k k′ ( µ µ )−XkXk′⎜⎝ σ σ ⎟ +ρσ ⎞ ⎤′ ′k′ X⎢k′−⎜YkYk′ ⎠ ⎝ σ σ ⎟y⎥⎣⎢YkYk⎠ ⎦⎥′where p k= n k/n is the proportion of samples in class k, ρ kis the class-conditionalcorrelation of X and Y, µ xkand µ Ykare the class conditional means of X and Y,and σ xkand σ Ykare the class conditional standard deviations. The expressionabove depends on a specific value Y = y. To take the expectation over all possiblevalues of Y, the conditional F-statistic is then weighted by the probabilitydensity of Y and integrated as follows:* *E ( F ) = F f ( y)dyY X|Y=y ∫ X|Y=y YThis leads to the final form of the ECF-statistic:Y⎡⎤*2 2EY( FX|Y=y) = ⎢∑pkσ Χ( 1−ρk)⎥ ∑∑pkkpk′ pk′′ {[( µX−µX)k k′⎣ k⎦ 'k < k k″− ( µYρk kσX / σk Y−µ ρσ σk Yk′ k′X/′ Y)k k′2+ ( ρσ / σ −ρ σ / σ ) µ ]−1k Χk Yk k′Χk′ Yk′Yk″2 2+ ρσk ΧσY − ρk′σΧσYσ }( / / ) .k k k′ k′ Yk″kj(6)(7)

