02.08.2013 Views

Sample A: Cover Page of Thesis, Project, or Dissertation Proposal

Sample A: Cover Page of Thesis, Project, or Dissertation Proposal

Sample A: Cover Page of Thesis, Project, or Dissertation Proposal

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

sample cleansing process and these encompass 4,248 ProbeSets. B<strong>or</strong>rowing from machine<br />

learning practices, a gain ratio was calculated from k-means clustering analysis <strong>of</strong> individual<br />

ProbeSets to perf<strong>or</strong>m down selection to the most significant features [3, 4]. This gain ratio was<br />

also calculated at the individual probe level and then averaged across the probes belonging to an<br />

accepted set, to give a ProbeSet response. We demonstrate that both approaches improve the<br />

model’s perf<strong>or</strong>mance in a supervised classification analysis, implemented using either the kNN<br />

classifier based on Euclidean space [3, 5, 6], <strong>or</strong> Fisher’s linear discriminant analysis (LDA) [7, 8].<br />

In this case the emphasis is not on comparing analysis methods, but rather focuses on discovery<br />

<strong>of</strong> intriguing biological phenomenon revealed by using the BaFL pipeline to select the most<br />

unambiguous signal. Eighteen significant ProbeSets were selected, possessing a gain criterion<br />

greater than 0.8, and this set <strong>of</strong> genes suggest that an imp<strong>or</strong>tant mechanism underlying<br />

tum<strong>or</strong>ogenesis is abn<strong>or</strong>mal cytokinesis. The biological significance <strong>of</strong> these genes are validated<br />

by a literature survey in the discussion section.<br />

Materials and Methods<br />

A subset <strong>of</strong> samples were selected from the BaFL cleansed Bhattacharjee dataset to construct a<br />

multiclass dataset containing NSCLC tum<strong>or</strong> biopsies and adjacent/n<strong>or</strong>mal biopsies. This dataset<br />

contained the 125 adenocarcinomas and 13 n<strong>or</strong>mals, which comprised the two state disease model<br />

previously considered, and an additional 17 squamous samples. The BaFL cleansing pipeline<br />

was applied across all samples, with the result that 24,022 probes were found to be common to<br />

the three states; these lie in 4,248 ProbeSets (with at least 4 acceptable probes each).<br />

115

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!