Sample A: Cover Page of Thesis, Project, or Dissertation Proposal
Sample A: Cover Page of Thesis, Project, or Dissertation Proposal
Sample A: Cover Page of Thesis, Project, or Dissertation Proposal
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
The process <strong>of</strong> determining a candidate gene list is <strong>of</strong>ten multi-step, with a (relatively) simple<br />
statistical method being used to obtain an initial down-selected list, in which significant<br />
expression change is identified, followed by a m<strong>or</strong>e sophisticated technique, in <strong>or</strong>der to suggest a<br />
final subset <strong>of</strong> genes in which c<strong>or</strong>relation to the fact<strong>or</strong> <strong>or</strong> phenotype <strong>of</strong> interest is robust [52, 69-<br />
71, 77-83]. The assessment <strong>of</strong> these subsets can be done using supervised <strong>or</strong> unsupervised<br />
methods [69]. Clustering is the most common f<strong>or</strong>m <strong>of</strong> the unsupervised methods, where the goal<br />
is to achieve homogeneous clusters [84]. Supervised learning methods develop models from<br />
training data and assess the quality <strong>of</strong> prediction <strong>of</strong> the test data [77, 85]. Perf<strong>or</strong>mance metrics<br />
are necessary f<strong>or</strong> choosing among the learning alg<strong>or</strong>ithms: the most common metric is the area<br />
under the receiver operating curve, which inc<strong>or</strong>p<strong>or</strong>ates the sensitivity and specificity <strong>of</strong> the<br />
classification results [86]. Other metrics include precision-recall, cost-sensitive analysis, etc. [87,<br />
88].<br />
Specific Aims<br />
A number <strong>of</strong> confounding fact<strong>or</strong>s to Microarray experiments are well described in the scientific<br />
literature [17, 18, 22, 26, 59]: to these fact<strong>or</strong>s is attributed the relative irreproducibility <strong>of</strong><br />
Microarray analysis results [35, 44, 57, 68]. While a number <strong>of</strong> investigat<strong>or</strong>s have rep<strong>or</strong>ted the<br />
effect <strong>of</strong> removing individual classes <strong>of</strong> contributions on the robustness <strong>of</strong> the results, to our<br />
knowledge no investigation has removed the complete set <strong>of</strong> fact<strong>or</strong>s which we have established in<br />
our cleansing pipeline. Of the sophisticated probe cleansing alg<strong>or</strong>ithms that have been developed<br />
and are commonly used, all proceed by identifying and eliminating probes with large variance,<br />
without expl<strong>or</strong>ing the underlying cause <strong>of</strong> that variance [45, 46, 48, 49, 89]. This black box<br />
method leads to both inclusion <strong>of</strong> probes having dubious properties and exclusion <strong>of</strong> probes that<br />
21