Sample A: Cover Page of Thesis, Project, or Dissertation Proposal
Sample A: Cover Page of Thesis, Project, or Dissertation Proposal
Sample A: Cover Page of Thesis, Project, or Dissertation Proposal
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
f<strong>or</strong> kNN were (k=3, l=2, with the Euclidean distance), and default settings in R were used f<strong>or</strong> RF<br />
and LDA [16]. We chose to use three different methods in <strong>or</strong>der to expl<strong>or</strong>e whether the<br />
classification perf<strong>or</strong>mance was specific to the classification alg<strong>or</strong>ithm, and these three were<br />
selected specifically because they are the most commonly cited in Microarray analysis papers and<br />
because their perf<strong>or</strong>mance requires minimal parameter tuning [6, 11, 12, 20, 21, 23, 27, 28, 34,<br />
35]. Linear discriminant analysis attempts to find the linear combination <strong>of</strong> features which best<br />
separates the data into their distinct classes, by weighting the features based upon their ability to<br />
separate the classes [8, 9, 20]. Conversely, kNN and RF classify samples based upon the<br />
characteristics <strong>of</strong> closely neighb<strong>or</strong>ing samples [6, 36]. The entire ensemble <strong>of</strong> features is utilized<br />
f<strong>or</strong> the kNN alg<strong>or</strong>ithm while RF stochastically builds f<strong>or</strong>ests <strong>of</strong> classification tress based upon the<br />
strongest classifying features [10, 11]. After training with values from one experiment, the<br />
models were used in tests against the other experiment and the perf<strong>or</strong>mance was assessed: that is,<br />
the Bhattacharjee gene lists were used f<strong>or</strong> training and then the models were used to predict the<br />
Stearman sample classes, and vice versa, f<strong>or</strong> each <strong>of</strong> the types <strong>of</strong> gene lists described above [6,<br />
36, 37]. This led to 9 comparisons in which the Bhattacharjee data were used as the training set<br />
(RMA, dCHIP and BaFL cleansing post t-test, against 3 types <strong>of</strong> models) and 9 comparisons in<br />
which the Stearman data were used as the training set.<br />
The same classification alg<strong>or</strong>ithms were invoked f<strong>or</strong> the comparison <strong>of</strong> the auth<strong>or</strong>’s lists to the<br />
purely BaFL-derived list <strong>of</strong> 325 DE ProbeSets. This set <strong>of</strong> experiments is designed to be similar<br />
to that <strong>of</strong> the validation <strong>of</strong> a final candidate list. Here, we compared the 325 BaFL intersecting<br />
DE ProbeSets, to the BaFL-allowed ProbeSets in the auth<strong>or</strong>’s published lists. Validation <strong>of</strong> a<br />
candidate list necessitates perturbing the designed models over iterative analysis to approach a<br />
reliable perf<strong>or</strong>mance metric [6, 36, 37]. Perturbation <strong>of</strong> our models was done through random<br />
74