02.08.2013 Views

Sample A: Cover Page of Thesis, Project, or Dissertation Proposal

Sample A: Cover Page of Thesis, Project, or Dissertation Proposal

Sample A: Cover Page of Thesis, Project, or Dissertation Proposal

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

ejecting the null hypothesis [8], and applied to the two datasets as interpreted by the BaFL<br />

pipeline. This c<strong>or</strong>rection enabled us to elucidate a small subset <strong>of</strong> undeniably differentially<br />

expressed genes as pertinent to lung cancer. The candidate list <strong>of</strong> genes when based on BaFL<br />

pipeline values yields reasonable classification perf<strong>or</strong>mance across 3 independent alg<strong>or</strong>ithms<br />

when an appropriately sized dataset is used (Figure 4.8). The size <strong>of</strong> the data set is imp<strong>or</strong>tant:<br />

when the Stearman dataset was used f<strong>or</strong> training purposes all three models struggled, with the<br />

BaFL data perf<strong>or</strong>ming the best in all cases Statnikov, et al. used the Bhattacharjee multiclass data<br />

in their pipeline f<strong>or</strong> the cancer diagnosis and biomarker discovery, in which whey rep<strong>or</strong>ted the<br />

maximum pri<strong>or</strong> probability <strong>of</strong> a dominant diagnostic categ<strong>or</strong>y <strong>of</strong> 68.5% [24]. Their analysis<br />

used the data in a training approach f<strong>or</strong> 10 fold leave one out cross validation and rep<strong>or</strong>ted perfect<br />

prediction f<strong>or</strong> the training model [24]. Additional analysis utilizing the full Bhattacharjee<br />

multiclass dataset yielded binary classification accuracies <strong>of</strong> 52-56% f<strong>or</strong> random f<strong>or</strong>est and<br />

supp<strong>or</strong>t vect<strong>or</strong> machines with and without gene down selection. While multiclass classificaiton,<br />

with and without gene down selection yielded 77-82% f<strong>or</strong> random f<strong>or</strong>est and 89% f<strong>or</strong> supp<strong>or</strong>t<br />

vect<strong>or</strong> machines [25].<br />

The relevance <strong>of</strong> these 31 genes is supp<strong>or</strong>ted by the GO connections identified by the pathway<br />

and literature search, PaLS, s<strong>of</strong>tware (http://pals.bioinfo.cnio.es/), which connect 23 <strong>of</strong> the 30<br />

genes (one ProbeSet aligns to no defined gene). KEGG pathway connections link 6 <strong>of</strong> these<br />

genes, including opsteopontin, through focal adhesion and extra-cellular matrix recept<strong>or</strong><br />

intreraction [26]. Other extra cellular matrix genes include MFAP4, SPARCL1, ENG, RAMP3,<br />

and LRRC32. SPARCL1 and SPOCK <strong>or</strong> SPARC like have been investigated f<strong>or</strong> their role in<br />

lung cancer [23]. These genes along with SPP1, MIF, and PECAM1 have strong immunological<br />

associations and thereby may be essential f<strong>or</strong> angiogenesis and tum<strong>or</strong>ogenesis [19, 20, 27]. The<br />

109

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!