Sample A: Cover Page of Thesis, Project, or Dissertation Proposal
Sample A: Cover Page of Thesis, Project, or Dissertation Proposal
Sample A: Cover Page of Thesis, Project, or Dissertation Proposal
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Although a list <strong>of</strong> 325 candidate genes is not large by the standards <strong>of</strong> Microarray experiments,<br />
and may reflect real biological contributions to a complex phenotype, to produce a practically<br />
useful diagnostic test one would like the smallest possible list <strong>of</strong> genes that have strong effects.<br />
There are many ways to additionally down-select features, one <strong>of</strong> which was used by the<br />
investigat<strong>or</strong>s <strong>of</strong> the Stearman experiment, comparative genomics [4]. Here we chose to use a<br />
m<strong>or</strong>e traditional statistical approach, in which feature selection from the 325 ProbeSets was<br />
accomplished by inc<strong>or</strong>p<strong>or</strong>ating the Bonferroni c<strong>or</strong>rection [8, 9]. The Bonferroni is a stringent<br />
c<strong>or</strong>rection to accommodate the multiple hypothesis tests. The underlying assumption <strong>of</strong> the<br />
Bonferroni c<strong>or</strong>rection is that all null hypotheses are true (the mean expressions are equal) [8, 9].<br />
Thereby, only the ProbeSets with extreme differences in the mean ProbeSet intensity will survive<br />
the c<strong>or</strong>rection. There has been considerable debate as to whether the extreme rig<strong>or</strong> <strong>of</strong> the<br />
Bonferroni method is appropriate f<strong>or</strong> expression Microarray experiments, and one <strong>of</strong> the<br />
questions is how much potentially valuable inf<strong>or</strong>mation we lose by applying this technique [8].<br />
As bef<strong>or</strong>e the perf<strong>or</strong>mance <strong>of</strong> the candidate gene list in classifying samples is assessed using<br />
kNN, LDA and RF classifiers [10-13] to build models that are then judged by AUC sc<strong>or</strong>es [14,<br />
15].<br />
Materials and Methods<br />
As previously described, the ‘Bhattacharjee’, dataset [3], consists <strong>of</strong> 17 n<strong>or</strong>mal and 237 diseased<br />
samples, including 51 adenocarcinoma replicates, with disease categ<strong>or</strong>y assigned after<br />
histopathological examination. From this study we used 125 <strong>of</strong> the 190 adenocarcinoma array<br />
results and 13 <strong>of</strong> the 17 n<strong>or</strong>mal results; the selection criteria are described below. The second,<br />
‘Stearman’, dataset (http:/www.ncbi.nlm.nih.gov/geo/; accession number GSE2514) consists <strong>of</strong><br />
39 tissue samples, all replicated, from 5 male and 5 female patients (four samples were taken<br />
93