02.08.2013 Views

Sample A: Cover Page of Thesis, Project, or Dissertation Proposal

Sample A: Cover Page of Thesis, Project, or Dissertation Proposal

Sample A: Cover Page of Thesis, Project, or Dissertation Proposal

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

though in Chapter 2 we showed that these disguise what are really two classes <strong>of</strong> change, S and<br />

DE. This was done because there is no way to replicate that level <strong>of</strong> discrimination with the RMA<br />

and dCHIP pipelines, and the goal in this chapter was to compare the effects <strong>of</strong> using the<br />

pipelines. We suggest that it is how the data is integrated that is causing some <strong>of</strong> the<br />

reproducibility issues with Microarray data, not the data itself, at least f<strong>or</strong> within-platf<strong>or</strong>m<br />

reproducibility. As shown in Figures 3.4-5, there is an observable decrease in classification<br />

perf<strong>or</strong>mance that occurs when using the m<strong>or</strong>e rig<strong>or</strong>ously pruned (and sh<strong>or</strong>ter) gene list, which is<br />

counterintuitive if the greater rig<strong>or</strong> really led to greater quality. It seems likely that the greater<br />

rig<strong>or</strong> is really selecting f<strong>or</strong> lab-specific and experiment-specific fact<strong>or</strong>s, rather than sample state<br />

relevant fact<strong>or</strong>s. Demonstrating that the BaFL pipeline is a m<strong>or</strong>e effective cleansing approach<br />

than RMA and dCHIP is a difficult task when the underlying cause <strong>of</strong> differences cannot be<br />

exactly isolated, and when the sources <strong>of</strong> data are clinical samples with incomplete and variable<br />

levels <strong>of</strong> replication. Direct comparisons recapitulate a result seen by others, namely that RMA<br />

and dCHIP are not consistent with one another within a dataset, n<strong>or</strong> able to perf<strong>or</strong>m well across<br />

experiments, while BaFL shares about the same overlap with each. Nevertheless, in this chapter<br />

we have provided evidence that BaFL-pipeline ProbeSet values followed by a simple t-test f<strong>or</strong><br />

differential expression yields a m<strong>or</strong>e effective candidate gene list f<strong>or</strong> training a model f<strong>or</strong><br />

classifying the results <strong>of</strong> additional experiments that do the competing methods. The advantages<br />

f<strong>or</strong> disease diagnostic purposes are the much smaller size <strong>of</strong> the candidate gene list and the<br />

relative lack <strong>of</strong> sensitivity to the specific type <strong>of</strong> model. As with any clinical experiment a larger<br />

sample size leads to m<strong>or</strong>e robust results, and a meta-experiment, with at least two independent<br />

experiments, is most effective. It was also <strong>of</strong> note that the most stable differentially expressed<br />

genes are not necessarily the most inf<strong>or</strong>mative f<strong>or</strong> the disease phenotype.<br />

91

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!