02.08.2013 Views

Sample A: Cover Page of Thesis, Project, or Dissertation Proposal

Sample A: Cover Page of Thesis, Project, or Dissertation Proposal

Sample A: Cover Page of Thesis, Project, or Dissertation Proposal

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Table 2.1: Probe Numbers per filter. The number <strong>of</strong> probes removed per filter step, when run<br />

independently (starting value is 409,600). Values in parenthesis refer to filter effects upon the (201,920)<br />

PM only probes. Note that the same probe can be removed f<strong>or</strong> multiple reasons; theref<strong>or</strong>e a simple<br />

summarization <strong>of</strong> probe filter steps does not add up to the number <strong>of</strong> probes lost over all <strong>of</strong> the filters in a<br />

straight f<strong>or</strong>ward manner. F<strong>or</strong> example, if one also considers the probes which were removed f<strong>or</strong> missing<br />

sequence inf<strong>or</strong>mation, the Biophysical Filter removes 2.47% <strong>of</strong> all probes having sequence inf<strong>or</strong>mation,<br />

and 2.44% <strong>of</strong> PM only probes having sequence inf<strong>or</strong>mation.<br />

Filter Probes removed % Probes Lost<br />

Unidentified Target Filter 11,432 (2,836) 2.79% (1.40%)<br />

SNP Filter 7,286 (7,286) 1.78% (3.61%)<br />

Cross-hybridization Filter 246,994 (39,314) 60.30% (19.47%)<br />

Biophysical Filter 21,159 (7,747) 5.17% (3.84%)<br />

Visualization is <strong>of</strong>ten helpful in guiding the user to a possible cause f<strong>or</strong> technical problems. In<br />

Figure 2.1 we show a virtual array image, generated with the R package affy [19], f<strong>or</strong> the<br />

Bhattacharjee dataset. This highlights a consistent low-intensity artifact, observed within a small<br />

region <strong>of</strong> the arrays in batch 10 (red circle), affecting ~ 5,600 probes (2#r 2 ; radius = 30). To<br />

generate the data f<strong>or</strong> this figure, a mock .CEL file f<strong>or</strong> each batch was generated by averaging the<br />

intensity <strong>of</strong> all the probes at a (x, y) location f<strong>or</strong> the samples within an individual batch. Since our<br />

methodology constrains the final dataset to consist <strong>of</strong> cleansed probes common across all samples<br />

<strong>of</strong> interest regardless <strong>of</strong> the batch, these probes won’t be included in the final set <strong>of</strong> acceptable<br />

probes and may lead to the loss <strong>of</strong> the related ProbeSet as well, if sufficient component probes are<br />

removed. Because these probes behave well in all <strong>of</strong> the other batches, the statistical methods<br />

retained the probes but interpreted the related signal f<strong>or</strong> the samples in batch 10 as being<br />

significantly lower in expression relative to the other samples, regardless <strong>of</strong> the disease class.<br />

43

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!