02.08.2013 Views

Sample A: Cover Page of Thesis, Project, or Dissertation Proposal

Sample A: Cover Page of Thesis, Project, or Dissertation Proposal

Sample A: Cover Page of Thesis, Project, or Dissertation Proposal

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

transf<strong>or</strong>mation and quantile n<strong>or</strong>malization <strong>of</strong> the data by array [18]. The additive model is as<br />

follows:<br />

Y ' ( & ( %<br />

ij<br />

" , (2)<br />

where i is the samples and j is the ProbeSets. This model averages the (PM only) probes per<br />

i<br />

sample, accepting some random err<strong>or</strong> in the model ( % ij ), and assumes that the probes have been<br />

designed such that the accumulated probe affinities, "j = 0 [18]. The alg<strong>or</strong>ithm implements the<br />

median polish [16] to detect outlier probes, which violate the probe affinity assumption <strong>of</strong> the<br />

model. In both <strong>of</strong> these alg<strong>or</strong>ithms, some ProbeSet value is determined f<strong>or</strong> every set on the array.<br />

In contrast, the BaFL cleansing protocol eliminates probes and through enf<strong>or</strong>cing a minimum set<br />

size (f<strong>or</strong> statistical rig<strong>or</strong>) and consistency <strong>of</strong> set members across samples, may result in the<br />

removal <strong>of</strong> entire ProbeSets. This <strong>of</strong>ten results in the absence <strong>of</strong> a large fraction <strong>of</strong> the <strong>or</strong>iginal<br />

data set: in the case <strong>of</strong> the Bhattacharjee dataset 66% <strong>of</strong> the <strong>or</strong>iginal ProbeSets are removed. It<br />

must be acknowledged that the disparity in the number <strong>of</strong> genes in the input set makes a<br />

straightf<strong>or</strong>ward comparison <strong>of</strong> the output lists <strong>of</strong> the three methods problematic, but it is possible<br />

to highlight some sources <strong>of</strong> err<strong>or</strong> that the statistical methods do not identify and exclude. The<br />

output data files are included in the Supplementary Materials, in the Data folder<br />

Down-Selection<br />

A Microarray experiment typically has 10,000 features to expl<strong>or</strong>e, at the gene level, <strong>of</strong> which<br />

around half are expected to be expressed; generally only a small prop<strong>or</strong>tion <strong>of</strong> the genes will be<br />

differentially expressed. Many <strong>of</strong> the statistical methods used to determine whether differences<br />

70<br />

j<br />

ij

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!