02.08.2013 Views

Sample A: Cover Page of Thesis, Project, or Dissertation Proposal

Sample A: Cover Page of Thesis, Project, or Dissertation Proposal

Sample A: Cover Page of Thesis, Project, or Dissertation Proposal

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 4: Data Mining<br />

Adenocarcinoma is a non small cell (NSCLC) lung cancer sub-type, and the most frequent type <strong>of</strong><br />

lung cancer found in the w<strong>or</strong>ld today [1, 2]. Adenocarcinomas are peripherally located in the<br />

lungs and develop from clara cells, alveoli, and mucin producing cells [1]. While, tobacco<br />

smoking has been well established as an initiating condition f<strong>or</strong> lung cancer, with 80-90% <strong>of</strong> lung<br />

cancer cases arising in tobacco smokers, adenocarcinoma in particular is most common among<br />

women, non-smokers, and the young [1]. Given that the incidence rate <strong>of</strong> adenocarcinoma is<br />

increasing and affecting non-traditional patients, understanding the disease is <strong>of</strong> immediate<br />

concern [1, 2].<br />

Using the methods described in the previous chapters, we have created two 2-class datasets, one<br />

from a subset <strong>of</strong> the Bhattacharjee dataset and the other from the <strong>or</strong>iginal Stearman (human<br />

subset) experiments [3, 4]. In this chapter we begin with the down-selected ProbeSet list,<br />

presented in Chapter 3, consisting <strong>of</strong> those 325 differentially expressed ProbeSets common to<br />

both datasets. The values that the BaFL pipeline yields f<strong>or</strong> these ProbeSets lead to datasets with<br />

considerable latent structure; we will demonstrate that this latent structure is superi<strong>or</strong> to that <strong>of</strong><br />

RMA and dCHIP supplied values using two widely-accepted dimensionality reduction methods:<br />

Principal Components Analysis [5, 6], which is linear , and a Laplacian method which is non-<br />

linear [7]. In validating the results <strong>of</strong> these analyses, we use sample c<strong>or</strong>relation to expl<strong>or</strong>e the<br />

gene/ProbeSet clusters.<br />

92

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!