Sample A: Cover Page of Thesis, Project, or Dissertation Proposal
Sample A: Cover Page of Thesis, Project, or Dissertation Proposal
Sample A: Cover Page of Thesis, Project, or Dissertation Proposal
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Data Analysis Overview<br />
1. Each <strong>of</strong> three probe-cleansing methods (RMA, dCHIP, BaFL) is used to generate<br />
ProbeSet values, on the same sample sets from each experiment (2-state case).<br />
2. Down selection was perf<strong>or</strong>med f<strong>or</strong> each cleansing methods’ interpretation <strong>of</strong> the data.<br />
Down selection yields the identification <strong>of</strong> differentially expressed genes, starting with<br />
the ProbeSet values produced by each method, based on the outcome <strong>of</strong> a Welch’s t-test<br />
<strong>of</strong> those values across the sample sets [5]. Three such down selection lists are generated:<br />
one list <strong>of</strong> DE genes from each experiment and a third that is the intersection <strong>of</strong> those two<br />
lists. The values <strong>of</strong> the genes in the lists (Stearman DE, Bhattacharjee DE, and<br />
Intersection <strong>of</strong> DE) then are used as input to three types <strong>of</strong> classifiers; kNN [6, 7], LDA<br />
[8, 9] and RF [10-12], and the resulting models are assessed, based on the AUC curves<br />
[13, 14], f<strong>or</strong> their cross-experiment sample class prediction ability relative to the base<br />
model (ALL), the complete set <strong>of</strong> genes’ values.<br />
3. A second type <strong>of</strong> comparison uses the two candidate gene lists proposed by the<br />
Bhattacharjee, et al. auth<strong>or</strong>s [3] and the candidate gene list proposed by the Stearman, et<br />
al. auth<strong>or</strong>s [4], sub-selected in each case f<strong>or</strong> those genes passed by the BaFL pipeline (but<br />
not necessarily identified as DE). A fourth candidate gene list comprised <strong>of</strong> the BaFL-<br />
passed and intersecting t-test identified DE genes f<strong>or</strong> both BaFL datasets, and is the same<br />
final list which was used in step 2. The four lists used the ProbeSet values <strong>or</strong>iginally<br />
suggested by each cleansing method, (not the values that resulted from the methods <strong>of</strong> the<br />
<strong>or</strong>iginal papers, since the underlying sample sets have been modified) and then proceeds<br />
as in step 2 f<strong>or</strong> a comparison <strong>of</strong> classification strengths based on the three types <strong>of</strong><br />
models.<br />
These steps are discussed in m<strong>or</strong>e detail in the following sections.<br />
68