02.08.2013 Views

Sample A: Cover Page of Thesis, Project, or Dissertation Proposal

Sample A: Cover Page of Thesis, Project, or Dissertation Proposal

Sample A: Cover Page of Thesis, Project, or Dissertation Proposal

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

violate the linear c<strong>or</strong>relation relationship between transcript concentration and signal intensity.<br />

Probes identified as covering single nucleotide polym<strong>or</strong>phisms are identified and removed. The<br />

Ensembl database is queried to identify probes which measure single specific gene transcript<br />

regions, all other probes were excluded. The final step is to enf<strong>or</strong>ce a rule that a minimum <strong>of</strong><br />

four probes are retained, so that any given statistical estimat<strong>or</strong> <strong>of</strong> concentration has an adequate<br />

basis. <strong>Sample</strong>s are subject to many technical steps, so tests f<strong>or</strong> outliers are implemented that<br />

included comparisons <strong>of</strong> representative probe intensities and probe numbers, against the<br />

population mean. <strong>Sample</strong>s exceeding ±2 standard deviations <strong>of</strong> the average probe numbers and<br />

probe intensities are removed. ProbeSet constituents at this stage may not be identical across all<br />

samples, with differences arising from the linear range filter step. By perf<strong>or</strong>ming an intersection<br />

operation <strong>of</strong> the remaining probes across all samples, still enf<strong>or</strong>cing a minimum <strong>of</strong> four probes<br />

per ProbeSet, a final, common ProbeSet dataset is derived, which is used as the basis <strong>of</strong> all further<br />

comparisons and analyses.<br />

The suggested data models demonstrated improved perf<strong>or</strong>mance across three classification<br />

alg<strong>or</strong>ithms, and remarkable latent structure can be seen across the data models. When Bonferonni<br />

c<strong>or</strong>rection is applied and the intersecting genes identified a final candidate gene list <strong>of</strong> 30<br />

ProbeSets results. By including on/<strong>of</strong>f genes in the list, an additional ProbeSet is identified.<br />

These 31 candidate genes demonstrate notable connectivity in their GO and KEGG associations.<br />

Literature review <strong>of</strong> the genes establishes that these associations arise from properties specific to<br />

angiogenesis and tum<strong>or</strong>ogenesis. A multiclass dataset <strong>of</strong> non small cell lung cancer samples was<br />

constructed and inf<strong>or</strong>mation gain calculated from the k-means clustering efficiency. A candidate<br />

list <strong>of</strong> 18 genes is shown to possess an inf<strong>or</strong>mation gain greater than <strong>or</strong> equal to 0.8. The<br />

literature review <strong>of</strong> these 18 genes provides evidence that abn<strong>or</strong>mal cytokinesis may underlie

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!