02.08.2013 Views

Sample A: Cover Page of Thesis, Project, or Dissertation Proposal

Sample A: Cover Page of Thesis, Project, or Dissertation Proposal

Sample A: Cover Page of Thesis, Project, or Dissertation Proposal

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

platf<strong>or</strong>m: only the CDF and probe sequence files are required in <strong>or</strong>der to flag problematic probes.<br />

Thus the <strong>or</strong>der <strong>of</strong> the first four steps is irrelevant and can be set to optimize the computational<br />

efficiency. Using our data the cross hybridization filter (II), implemented here only f<strong>or</strong> the PM<br />

probes, reduces the dataset most drastically, so if it is applied first the succeeding steps will be<br />

accomplished m<strong>or</strong>e quickly. Once steps (I)-(IV) have been completed the results are applicable<br />

to any future experiments using the same chip design and sequence files. The last two steps<br />

described above, (V) and (VI), are experiment/measurement dependent, and it is here that an<br />

investigat<strong>or</strong>’s choices will affect what appears in the final gene list. Scanner response limits can<br />

be re-set in the code, to reflect the behavi<strong>or</strong> <strong>of</strong> individual instruments<br />

Batch and <strong>Sample</strong> Filtering<br />

Technical steps will cause the amount <strong>of</strong> target, the labeling <strong>of</strong> that target and the effective length<br />

<strong>of</strong> the target to vary independently <strong>of</strong> the biological fact<strong>or</strong>s. Similarly, biological fact<strong>or</strong>s, such as<br />

secondary infections in cancer patients that lead to dramatic gene expression differences<br />

compared to uninfected cancer patients, may obscure the effect <strong>of</strong> interest. Technical differences<br />

tend to be seen in ‘batch’ effects, i.e. in groups <strong>of</strong> samples processed in parallel, while biological<br />

effects must be screened by comparing an array to the set <strong>of</strong> all arrays in its class (which may<br />

include multiple batches) [19]. The Bhattacharjee data set was explicitly batch annotated [47],<br />

while f<strong>or</strong> the Stearman dataset the scan date was used as a proxy f<strong>or</strong> batch annotation: there were<br />

4 dates but in 2-day pairs one month apart, so our assumption is that this reflects only two<br />

technical batches. In the following discussion, both individual probe and aggregated ProbeSet<br />

values were used to compare individual array to batch and sample class trends, as follows:<br />

I. Probes-per-<strong>Sample</strong><br />

36

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!