02.08.2013 Views

Sample A: Cover Page of Thesis, Project, or Dissertation Proposal

Sample A: Cover Page of Thesis, Project, or Dissertation Proposal

Sample A: Cover Page of Thesis, Project, or Dissertation Proposal

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Although a list <strong>of</strong> 325 candidate genes is not large by the standards <strong>of</strong> Microarray experiments,<br />

and may reflect real biological contributions to a complex phenotype, to produce a practically<br />

useful diagnostic test one would like the smallest possible list <strong>of</strong> genes that have strong effects.<br />

There are many ways to additionally down-select features, one <strong>of</strong> which was used by the<br />

investigat<strong>or</strong>s <strong>of</strong> the Stearman experiment, comparative genomics [4]. Here we chose to use a<br />

m<strong>or</strong>e traditional statistical approach, in which feature selection from the 325 ProbeSets was<br />

accomplished by inc<strong>or</strong>p<strong>or</strong>ating the Bonferroni c<strong>or</strong>rection [8, 9]. The Bonferroni is a stringent<br />

c<strong>or</strong>rection to accommodate the multiple hypothesis tests. The underlying assumption <strong>of</strong> the<br />

Bonferroni c<strong>or</strong>rection is that all null hypotheses are true (the mean expressions are equal) [8, 9].<br />

Thereby, only the ProbeSets with extreme differences in the mean ProbeSet intensity will survive<br />

the c<strong>or</strong>rection. There has been considerable debate as to whether the extreme rig<strong>or</strong> <strong>of</strong> the<br />

Bonferroni method is appropriate f<strong>or</strong> expression Microarray experiments, and one <strong>of</strong> the<br />

questions is how much potentially valuable inf<strong>or</strong>mation we lose by applying this technique [8].<br />

As bef<strong>or</strong>e the perf<strong>or</strong>mance <strong>of</strong> the candidate gene list in classifying samples is assessed using<br />

kNN, LDA and RF classifiers [10-13] to build models that are then judged by AUC sc<strong>or</strong>es [14,<br />

15].<br />

Materials and Methods<br />

As previously described, the ‘Bhattacharjee’, dataset [3], consists <strong>of</strong> 17 n<strong>or</strong>mal and 237 diseased<br />

samples, including 51 adenocarcinoma replicates, with disease categ<strong>or</strong>y assigned after<br />

histopathological examination. From this study we used 125 <strong>of</strong> the 190 adenocarcinoma array<br />

results and 13 <strong>of</strong> the 17 n<strong>or</strong>mal results; the selection criteria are described below. The second,<br />

‘Stearman’, dataset (http:/www.ncbi.nlm.nih.gov/geo/; accession number GSE2514) consists <strong>of</strong><br />

39 tissue samples, all replicated, from 5 male and 5 female patients (four samples were taken<br />

93

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!