02.08.2013 Views

Sample A: Cover Page of Thesis, Project, or Dissertation Proposal

Sample A: Cover Page of Thesis, Project, or Dissertation Proposal

Sample A: Cover Page of Thesis, Project, or Dissertation Proposal

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

(Carr, ms in review). This ProbeFATE system was developed f<strong>or</strong> PostgresSQL 8.0.3 [38] and<br />

installed onto an AMD Anthon Tm 64 bit dual c<strong>or</strong>e process<strong>or</strong> running SUSE LINUX Tm 10.0 as the<br />

operating system. Python 2.4.1 [39] scripts were developed with the psycopg2 2.0.2 [40] module<br />

to automate the cleansing process and modify the existing system. Through this module data<br />

could be extracted and manipulated and analyzed in the R 2.3.1 language environment [41], via<br />

the python rpy 1.0 module [42]. Additional s<strong>of</strong>tware and modules included Oligoarrayaux 2.3<br />

[43] f<strong>or</strong> the calculation <strong>of</strong> probe thermodynamics and the python MySQLdb 1.2.0 [44] module to<br />

enable querying <strong>of</strong> the public domain Ensembl mysql database [45].<br />

Datasets<br />

Two independent datasets were used in testing the effects <strong>of</strong> the filtering alg<strong>or</strong>ithms. Both were<br />

studies <strong>of</strong> adenocarcinoma patients in which the assays were perf<strong>or</strong>med using the Affymetrix<br />

AG-U95Av2 GeneChip TM , so consistency <strong>of</strong> probe placement along the transcripts in the samples<br />

is assured. Using this platf<strong>or</strong>m, samples are assayed by 409,600 probes across 12,625 defined<br />

genes [46]. The largest, <strong>or</strong> ‘Bhattacharjee’, dataset (www.genome.wi.mit.edu/MPR/lung) contains<br />

measurements taken from 203 snap-frozen lung biopsy tissue samples. The tissues, as described<br />

by Bhattacharjee, et al [47], consist <strong>of</strong> 17 n<strong>or</strong>mal and 237 diseased samples, including 51<br />

adenocarcinoma replicates, with disease categ<strong>or</strong>y assigned after histopathological examination.<br />

The diseased samples are sub-classified into 5 states: 190 adenocarcinomas, 21 squamous cell<br />

lung carcinomas, 20 pulmonary carcinomas, and 6 small-cell lung carcinomas (SCLC) [48]. From<br />

this study we used 125 <strong>of</strong> the 190 adenocarcinoma array results and 13 <strong>of</strong> the 17 n<strong>or</strong>mal results;<br />

the selection criteria are described below. The second, ‘Stearman’, dataset<br />

(http:/www.ncbi.nlm.nih.gov/geo/; accession number GSE2514) consists <strong>of</strong> 39 tissue samples, all<br />

replicated, from 5 male and 5 female patients (four samples were taken from each patient: 2<br />

31

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!