Sample A: Cover Page of Thesis, Project, or Dissertation Proposal
Sample A: Cover Page of Thesis, Project, or Dissertation Proposal
Sample A: Cover Page of Thesis, Project, or Dissertation Proposal
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
transf<strong>or</strong>mation and quantile n<strong>or</strong>malization <strong>of</strong> the data by array [18]. The additive model is as<br />
follows:<br />
Y ' ( & ( %<br />
ij<br />
" , (2)<br />
where i is the samples and j is the ProbeSets. This model averages the (PM only) probes per<br />
i<br />
sample, accepting some random err<strong>or</strong> in the model ( % ij ), and assumes that the probes have been<br />
designed such that the accumulated probe affinities, "j = 0 [18]. The alg<strong>or</strong>ithm implements the<br />
median polish [16] to detect outlier probes, which violate the probe affinity assumption <strong>of</strong> the<br />
model. In both <strong>of</strong> these alg<strong>or</strong>ithms, some ProbeSet value is determined f<strong>or</strong> every set on the array.<br />
In contrast, the BaFL cleansing protocol eliminates probes and through enf<strong>or</strong>cing a minimum set<br />
size (f<strong>or</strong> statistical rig<strong>or</strong>) and consistency <strong>of</strong> set members across samples, may result in the<br />
removal <strong>of</strong> entire ProbeSets. This <strong>of</strong>ten results in the absence <strong>of</strong> a large fraction <strong>of</strong> the <strong>or</strong>iginal<br />
data set: in the case <strong>of</strong> the Bhattacharjee dataset 66% <strong>of</strong> the <strong>or</strong>iginal ProbeSets are removed. It<br />
must be acknowledged that the disparity in the number <strong>of</strong> genes in the input set makes a<br />
straightf<strong>or</strong>ward comparison <strong>of</strong> the output lists <strong>of</strong> the three methods problematic, but it is possible<br />
to highlight some sources <strong>of</strong> err<strong>or</strong> that the statistical methods do not identify and exclude. The<br />
output data files are included in the Supplementary Materials, in the Data folder<br />
Down-Selection<br />
A Microarray experiment typically has 10,000 features to expl<strong>or</strong>e, at the gene level, <strong>of</strong> which<br />
around half are expected to be expressed; generally only a small prop<strong>or</strong>tion <strong>of</strong> the genes will be<br />
differentially expressed. Many <strong>of</strong> the statistical methods used to determine whether differences<br />
70<br />
j<br />
ij