02.08.2013 Views

Sample A: Cover Page of Thesis, Project, or Dissertation Proposal

Sample A: Cover Page of Thesis, Project, or Dissertation Proposal

Sample A: Cover Page of Thesis, Project, or Dissertation Proposal

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

cleansing process. This is demonstrated by an a pri<strong>or</strong>i extraction <strong>of</strong> reliable probe and ProbeSets<br />

from the third stage 1 dataset, shown below.<br />

Inside the Black Box<br />

The traditional blackbox approach to Microarray data analysis uses a statistical comparison <strong>of</strong><br />

probes across samples in classes <strong>of</strong> the experiment at hand, discards (in some cases) <strong>or</strong> weights<br />

component probes acc<strong>or</strong>ding to some ‘fitness to a model’ scheme, and then aggregates the<br />

measurements to give a single ProbeSet value. Thereafter the ProbeSet value is the only fact<strong>or</strong><br />

used as input to machine learning and statistical alg<strong>or</strong>ithm development [22-24, 58]. F<strong>or</strong><br />

diagnostic purposes, if the predictive results <strong>of</strong> these methods are acceptable then the goal has<br />

been achieved. However, biological investigat<strong>or</strong>s are <strong>of</strong>ten motivated by the desire to understand<br />

the mechanisms that cause a gene to appear on such a list [7, 8, 20, 21, 50]. Being able to target<br />

specific mechanisms may allow an investigat<strong>or</strong> to select a ‘discarded’ probe f<strong>or</strong> further study:<br />

here we are thinking particularly <strong>of</strong> those probes that are discarded because they respond to SNPs<br />

in the coding region, which may in fact be extremely imp<strong>or</strong>tant to the phenotype, if the<br />

investigat<strong>or</strong> can apply a follow-up test to qualify the samples. Despite our attempt to identify all<br />

such fact<strong>or</strong>s, it is clear that we have not done so, since we end up with three response classes and<br />

not two in the last analysis stage. We propose that, by doing Welch’s T-test at the probe level<br />

during the aggregation process, an estimate <strong>of</strong> the presence <strong>of</strong> such fact<strong>or</strong>s is produced, and the<br />

resulting probeset value can be annotated, i.e. affixed with a numerical <strong>or</strong> categ<strong>or</strong>ical denotation<br />

(such as our ‘U’, ‘DE’ and ‘S’ labels), based upon the agreement <strong>of</strong> T-tests results.<br />

Uninf<strong>or</strong>mative probesets thus are comprised only <strong>of</strong> probes showing no difference in the means<br />

between classes (f<strong>or</strong> a given allowed variation) while the DE probesets are comprised only <strong>of</strong><br />

probes all <strong>of</strong> which show a difference in means between classes, as depicted in 2.8. These<br />

59

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!