02.08.2013 Views

Sample A: Cover Page of Thesis, Project, or Dissertation Proposal

Sample A: Cover Page of Thesis, Project, or Dissertation Proposal

Sample A: Cover Page of Thesis, Project, or Dissertation Proposal

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

f<strong>or</strong> kNN were (k=3, l=2, with the Euclidean distance), and default settings in R were used f<strong>or</strong> RF<br />

and LDA [16]. We chose to use three different methods in <strong>or</strong>der to expl<strong>or</strong>e whether the<br />

classification perf<strong>or</strong>mance was specific to the classification alg<strong>or</strong>ithm, and these three were<br />

selected specifically because they are the most commonly cited in Microarray analysis papers and<br />

because their perf<strong>or</strong>mance requires minimal parameter tuning [6, 11, 12, 20, 21, 23, 27, 28, 34,<br />

35]. Linear discriminant analysis attempts to find the linear combination <strong>of</strong> features which best<br />

separates the data into their distinct classes, by weighting the features based upon their ability to<br />

separate the classes [8, 9, 20]. Conversely, kNN and RF classify samples based upon the<br />

characteristics <strong>of</strong> closely neighb<strong>or</strong>ing samples [6, 36]. The entire ensemble <strong>of</strong> features is utilized<br />

f<strong>or</strong> the kNN alg<strong>or</strong>ithm while RF stochastically builds f<strong>or</strong>ests <strong>of</strong> classification tress based upon the<br />

strongest classifying features [10, 11]. After training with values from one experiment, the<br />

models were used in tests against the other experiment and the perf<strong>or</strong>mance was assessed: that is,<br />

the Bhattacharjee gene lists were used f<strong>or</strong> training and then the models were used to predict the<br />

Stearman sample classes, and vice versa, f<strong>or</strong> each <strong>of</strong> the types <strong>of</strong> gene lists described above [6,<br />

36, 37]. This led to 9 comparisons in which the Bhattacharjee data were used as the training set<br />

(RMA, dCHIP and BaFL cleansing post t-test, against 3 types <strong>of</strong> models) and 9 comparisons in<br />

which the Stearman data were used as the training set.<br />

The same classification alg<strong>or</strong>ithms were invoked f<strong>or</strong> the comparison <strong>of</strong> the auth<strong>or</strong>’s lists to the<br />

purely BaFL-derived list <strong>of</strong> 325 DE ProbeSets. This set <strong>of</strong> experiments is designed to be similar<br />

to that <strong>of</strong> the validation <strong>of</strong> a final candidate list. Here, we compared the 325 BaFL intersecting<br />

DE ProbeSets, to the BaFL-allowed ProbeSets in the auth<strong>or</strong>’s published lists. Validation <strong>of</strong> a<br />

candidate list necessitates perturbing the designed models over iterative analysis to approach a<br />

reliable perf<strong>or</strong>mance metric [6, 36, 37]. Perturbation <strong>of</strong> our models was done through random<br />

74

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!