02.08.2013 Views

Sample A: Cover Page of Thesis, Project, or Dissertation Proposal

Sample A: Cover Page of Thesis, Project, or Dissertation Proposal

Sample A: Cover Page of Thesis, Project, or Dissertation Proposal

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

averaged to give a final approximation <strong>of</strong> the gain ratio; where the randomness <strong>of</strong> initial selection<br />

prevented the discovery <strong>of</strong> both solutions the single solution gain ratio was used. The gain ratio<br />

was calculated as such:<br />

gainratio ( X ) "<br />

gain(<br />

X )<br />

, where (5.2)<br />

splitin(<br />

X )<br />

n Ti<br />

0 Ti<br />

-<br />

splitin(<br />

X ) " ) ! 1 log2<br />

.<br />

+<br />

(5.3)<br />

i"<br />

1 T / T ,<br />

Here Ti represents the number <strong>of</strong> classification calls in each cluster [4]. As individual ProbeSet’s<br />

inherent clustering ability increases, the inf<strong>or</strong>mation gain ratio approach 1.0 [3, 4, 10]. Down-<br />

selection to the most inf<strong>or</strong>mative ProbeSet features was set by a gain ratio threshold criterion.<br />

F<strong>or</strong> this dataset gain ratio limits <strong>of</strong> 0.7, 0.8, 0.9 yield ProbeSet lists <strong>of</strong> length 43, 18 and 8,<br />

respectively. Average probe gains greater than <strong>or</strong> equal to 0.5, 0.6, and 0.7 yield ProbeSets lists<br />

<strong>of</strong> length 95, 29 and 9.<br />

Perf<strong>or</strong>mance <strong>of</strong> the gain criterion down selected ProbeSets was monit<strong>or</strong>ed with supervised kNN<br />

and LDA classification perf<strong>or</strong>mances. Training and testing was done under 100 X 2 validations<br />

with random splitting <strong>of</strong> samples in training and test sets, with sampling done with replacement<br />

[3, 11, 12]. Sampling with replacement in such a scenario allows f<strong>or</strong> the stochastic elimination<br />

and pseudo-replication <strong>of</strong> samples, although replication may have not been consistent to the same<br />

training/test set. Maj<strong>or</strong>ity voting was implemented f<strong>or</strong> the kNN alg<strong>or</strong>ithm, with the analysis <strong>of</strong> 3<br />

nearest neighb<strong>or</strong>s. The area under the receiver operating curve (AUC) was weighted f<strong>or</strong> each <strong>of</strong><br />

117

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!