23.11.2012 Views

Chapter 2 - University of British Columbia

Chapter 2 - University of British Columbia

Chapter 2 - University of British Columbia

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

DEVELOPMENT AND APPLICATION OF AN INTEGRATIVE GENOMICS APPROACH TO<br />

LUNG CANCER<br />

by<br />

RAJAGOPAL CHARI<br />

B.Sc., <strong>University</strong> <strong>of</strong> <strong>British</strong> <strong>Columbia</strong>, 2001<br />

B.Sc., <strong>University</strong> <strong>of</strong> <strong>British</strong> <strong>Columbia</strong>, 2004<br />

A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF<br />

THE REQUIREMENTS FOR THE DEGREE OF<br />

DOCTOR OF PHILOSOPHY<br />

in<br />

THE FACULTY OF GRADUATE STUDIES<br />

(Pathology and Laboratory Medicine)<br />

THE UNIVERSITY OF BRITISH COLUMBIA<br />

(Vancouver)<br />

June 2010<br />

© Rajagopal Chari, 2010


Abstract<br />

Lung cancer has the highest mortality rate amongst all diagnosed malignancies with<br />

adenocarcinoma (AC) being the most commonly diagnosed subtype <strong>of</strong> this disease in North<br />

America. The dismal survival statistics <strong>of</strong> lung cancer patients are largely due to the detection<br />

<strong>of</strong> the disease at an advanced stage and to a lesser extent, the limited efficacy <strong>of</strong> current front<br />

line treatments.<br />

Genomic approaches, namely gene expression analysis, have provided tremendous insight into<br />

lung cancer. While many gene expression changes have been identified, most changes are<br />

likely reactive to changes which have a primary role in cancer development. Moreover, one<br />

feature which can discern primary from reactive changes is the presence <strong>of</strong> concordant DNA<br />

level alteration.<br />

Many well known genes involved in cancer such as TP53 and CDKN2A have been shown to be<br />

affected by multiple mechanisms <strong>of</strong> alteration such as somatic mutation in or loss <strong>of</strong> DNA<br />

sequence. For a given gene, one tumor may be affected by one mechanism while another<br />

tumor may be affected by a different mechanism. Although this level <strong>of</strong> multi-dimensional<br />

analysis has been performed for specific genes, such analysis has not been done at the<br />

genome-wide level.<br />

This thesis highlights the development and application <strong>of</strong> a multi-dimensional genetic and<br />

epigenetic approach to identify frequently aberrant genes and pathways in lung AC. I present,<br />

first, the design and implementation <strong>of</strong> the system for integrative genomic multi-dimensional<br />

analysis <strong>of</strong> cancer genomes, epigenomes and transcriptomes (SIGMA 2 ). Next, analyzing a<br />

multi-dimensional dataset generated from ten lung AC specimens with non-malignant controls, I<br />

identified novel genes and pathways that would have been missed if a non-integrative approach<br />

were used. Finally, examining genes involved with EGFR signaling, I identified a gene, signal<br />

receptor protein alpha (SIRPA), which had not been previously shown to be associated with<br />

lung cancer.<br />

Taken together, these findings demonstrate the power <strong>of</strong> a multi-dimensional approach to<br />

identify important genes and pathways in lung cancer. Moreover, identifying key genes using a<br />

multi-dimensional approach on a small sample set suggests the need <strong>of</strong> large datasets may be<br />

circumvented by using a more comprehensive approach on a smaller set <strong>of</strong> samples.<br />

ii


Table <strong>of</strong> Contents<br />

Abstract ......................................................................................................................................... ii<br />

Table <strong>of</strong> Contents ......................................................................................................................... iii<br />

List <strong>of</strong> Tables ............................................................................................................................... vii<br />

List <strong>of</strong> Figures ............................................................................................................................. viii<br />

List <strong>of</strong> Abbreviations ...................................................................................................................... x<br />

Acknowledgements ..................................................................................................................... xii<br />

Dedication ................................................................................................................................... xiii<br />

Co-Authorship Statement ........................................................................................................... xiv<br />

<strong>Chapter</strong> 1: Introduction ................................................................................................................. 1<br />

1.1 Lung cancer ......................................................................................................................... 2<br />

1.2 Genomic pr<strong>of</strong>iling <strong>of</strong> lung cancer ......................................................................................... 3<br />

1.2.1 Gene expression analysis ............................................................................................. 3<br />

1.2.2 DNA copy number analysis .......................................................................................... 4<br />

1.2.3 Loss <strong>of</strong> heterozygosity (LOH) and allelic imbalance ..................................................... 5<br />

1.3 Somatic mutations in lung cancer ....................................................................................... 5<br />

1.4 Epigenetic alterations in lung cancer ................................................................................... 6<br />

1.4.1 DNA methylation ........................................................................................................... 6<br />

1.5 Current level <strong>of</strong> integrative analysis .................................................................................... 7<br />

1.6 Need for an integrative approach to study lung cancer ....................................................... 7<br />

1.7 Bioinformatic tools for genomic analysis ............................................................................. 8<br />

1.8 Thesis theme ....................................................................................................................... 9<br />

1.9 Objectives and hypothesis .................................................................................................. 9<br />

1.10 Specific aims and outline <strong>of</strong> thesis .................................................................................. 10<br />

1.11 Description <strong>of</strong> high throughput data in this thesis ............................................................ 13<br />

1.12 Other relevant contributions not included as chapters in this thesis ............................... 13<br />

1.12.1 Development <strong>of</strong> tools for genomic analysis .............................................................. 14<br />

1.12.2 Baseline gene expression in non-malignant lung tissue ........................................... 14<br />

1.12.3 Differential gene expression analysis in lung cancer ................................................ 15<br />

1.12.4 Integration <strong>of</strong> gene dosage and gene expression in lung cancer ............................. 16<br />

1.13 References ...................................................................................................................... 18<br />

iii


<strong>Chapter</strong> 2: SIGMA 2 : A system for the integrative genomic multi-dimensional analysis <strong>of</strong> cancer<br />

genomes, epigenomes, and transcriptomes 1 .............................................................................. 24<br />

2.1 Introduction ........................................................................................................................ 25<br />

2.2 Implementation .................................................................................................................. 26<br />

2.3 Results and discussion ...................................................................................................... 26<br />

2.3.1 Look and feel <strong>of</strong> SIGMA 2 ............................................................................................ 26<br />

2.3.2 Description <strong>of</strong> application scope and functionality ...................................................... 27<br />

2.3.3 Approach to integration between array platforms and assays .................................... 27<br />

2.3.4 Format requirements <strong>of</strong> input data .............................................................................. 27<br />

2.3.5 Description <strong>of</strong> user interface ....................................................................................... 28<br />

2.3.6 Analysis <strong>of</strong> data from a single assay type ................................................................... 29<br />

2.3.7 Analysis <strong>of</strong> data from multiple assays in a given 'omics dimension ............................ 30<br />

2.3.8 Combinatorial analysis <strong>of</strong> multiple 'omics dimensions - gene dosage and gene<br />

expression ........................................................................................................................... 30<br />

2.3.9 Group comparison analysis - single ‘omics dimension ............................................... 31<br />

2.3.10 Group comparison analysis - integrating multiple 'omics dimensions ....................... 31<br />

2.3.11 Multi-dimensional analysis <strong>of</strong> a breast cancer genome ............................................ 31<br />

2.3.12 Exporting data and results ........................................................................................ 32<br />

2.4 Conclusions ....................................................................................................................... 32<br />

2.5 Availability and requirements ............................................................................................ 33<br />

2.6 References ........................................................................................................................ 46<br />

<strong>Chapter</strong> 3: An integrative multi-dimensional genetic and epigenetic strategy to identify aberrant<br />

genes and pathways in cancer 2 .................................................................................................. 48<br />

3.1 Background ....................................................................................................................... 49<br />

3.2 Methods ............................................................................................................................. 50<br />

3.2.1 Data generation and acquisition ................................................................................. 50<br />

3.2.2 Data processing and normalization ............................................................................ 51<br />

3.2.3 Strategy for integrative analysis .................................................................................. 52<br />

3.2.4 Multiple concerted disruption (MCD) analysis ............................................................ 53<br />

3.2.5 Simulated data analysis .............................................................................................. 54<br />

3.2.6 Pathway enrichment analysis ..................................................................................... 54<br />

3.2.6 Survival and differential gene expression analysis in publicly available datasets....... 55<br />

3.3 Results and discussion ...................................................................................................... 55<br />

3.3.1 Analysis <strong>of</strong> individual genomic dimensions ................................................................. 55<br />

iv


3.3.2 Multi-dimensional analysis (MDA) reveals a higher proportion <strong>of</strong> intra-sample<br />

deregulated gene expression can be explained when more dimensions are analyzed ....... 56<br />

3.3.3 MDA reveals genes are disrupted at higher frequencies when examining multiple<br />

dimensions as compared to any single dimension alone .................................................... 56<br />

3.3.4 MDA identifies significantly enriched cancer related pathways .................................. 58<br />

3.3.5 MDA <strong>of</strong> the Neuregulin signaling pathway reveals a complex pattern <strong>of</strong> deregulation<br />

............................................................................................................................................. 59<br />

3.3.6 Genes exhibiting multiple concerted disruption (MCD) - biological and clinical<br />

significance .......................................................................................................................... 60<br />

3.3.7 Association <strong>of</strong> genes exhibiting MCD and triple negative breast cancers (TNBC) ..... 62<br />

3.4 Conclusions ....................................................................................................................... 63<br />

3.5 References ........................................................................................................................ 75<br />

<strong>Chapter</strong> 4: Uniparental disomy is a prevalent genetic mechanism <strong>of</strong> oncogene disruption in lung<br />

adenocarcinoma 3 ........................................................................................................................ 79<br />

4.1 Introduction ........................................................................................................................ 80<br />

4.2 Methods ............................................................................................................................. 81<br />

4.2.1 Genome wide pr<strong>of</strong>iling <strong>of</strong> clinical lung adenocarcinoma specimens ........................... 81<br />

4.2.2 Determination <strong>of</strong> regions <strong>of</strong> uniparental disomy (UPD) in clinical lung tumors ........... 81<br />

4.2.3 Determining frequent regions <strong>of</strong> UPD, gain and loss .................................................. 82<br />

4.2.4 Determination <strong>of</strong> UPD in cancer cell lines .................................................................. 82<br />

4.2.5 Expression analysis <strong>of</strong> genes in focal regions <strong>of</strong> UPD ............................................... 82<br />

4.3 Results .............................................................................................................................. 83<br />

4.3.1 Detection <strong>of</strong> UPD using allele specific copy number analysis .................................... 83<br />

4.3.2 UPD is prevalent and non-random in the lung cancer genome with comparable<br />

frequencies to gain and loss ................................................................................................ 83<br />

4.3.3 Overlap <strong>of</strong> major oncogenes and tumor suppressor genes in regions <strong>of</strong> gain, loss, and<br />

UPD ..................................................................................................................................... 84<br />

4.3.4 UPD is prevalent at oncogenes across multiple cancer types .................................... 84<br />

4.3.5 Identification <strong>of</strong> novel candidate oncogenes using focal regions <strong>of</strong> UPD ................... 85<br />

4.4 Discussion ......................................................................................................................... 85<br />

4.5 Conclusion ......................................................................................................................... 87<br />

4.6 References ...................................................................................................................... 108<br />

<strong>Chapter</strong> 5: Integrating the multiple dimensions <strong>of</strong> genomic and epigenomic landscapes <strong>of</strong><br />

cancer 4 ...................................................................................................................................... 111<br />

5.1 Introduction ...................................................................................................................... 112<br />

v


5.2 Genomic alterations ........................................................................................................ 113<br />

5.2.1 Chromosomal aberrations ........................................................................................ 113<br />

5.2.2 Gene dosage, allelic imbalance, mutational status ................................................... 113<br />

5.2.3 Genomic landscape: Gains, losses and uniparental disomy .................................... 116<br />

5.3 Epigenomic alterations .................................................................................................... 117<br />

5.3.1 The cancer methylome ............................................................................................. 117<br />

5.3.2 Integration <strong>of</strong> cancer genomic and epigenomic events ............................................ 119<br />

5.4 Relating genetic and epigenetic events to changes in the transcriptome through<br />

integrative analysis ................................................................................................................ 120<br />

5.4.1 Multiple mechanisms <strong>of</strong> gene disruption ................................................................... 121<br />

5.4.2 Multiple mechanisms <strong>of</strong> disrupting non-coding RNA levels ...................................... 121<br />

5.4.3 Multi-dimensional integration <strong>of</strong> genome, epigenome, and transcriptome ............... 122<br />

5.4.4 Disruption <strong>of</strong> multiple components in biological pathways ........................................ 124<br />

5.4.5 Identification <strong>of</strong> a novel gene involved with EGFR signaling deregulated in<br />

adenocarcinoma ................................................................................................................ 125<br />

5.4.6 Prevalence <strong>of</strong> SIRPA deregulation and association with clinical characteristics ...... 126<br />

5.5 Tracking clonal expansion in spatial dimensions ............................................................ 127<br />

5.6 Evaluating the biological significance <strong>of</strong> integrative genomics findings .......................... 127<br />

5.5 References ...................................................................................................................... 144<br />

<strong>Chapter</strong> 6: Conclusions ............................................................................................................. 162<br />

6.1 Summary ......................................................................................................................... 163<br />

6.1.1 Development <strong>of</strong> the integrative genetic and epigenetic approach ............................ 163<br />

6.1.2 Identification <strong>of</strong> a prevalent genetic alteration in lung adenocarcinoma ................... 164<br />

6.1.3 Application <strong>of</strong> the integrative approach to lung adenocarcinoma specimens ........... 165<br />

6.2 Conclusions ..................................................................................................................... 166<br />

6.3 Future directions .............................................................................................................. 168<br />

6.4 References ...................................................................................................................... 171<br />

APPENDIX I: List <strong>of</strong> publications .............................................................................................. 174<br />

APPENDIX II: Description <strong>of</strong> cell lines ...................................................................................... 183<br />

APPENDIX III: Sources <strong>of</strong> data ................................................................................................. 184<br />

APPENDIX IV: MCD strategy and Kaplan-Meier analysis <strong>of</strong> TUSC3 ....................................... 185<br />

APPENDIX V: Kaplan-Meier and Oncomine expression analysis <strong>of</strong> frequent MCD genes ...... 186<br />

APPENDIX VI: Summary <strong>of</strong> Kaplan-Meier survival analysis .................................................... 188<br />

APPENDIX VII: Copy <strong>of</strong> UBC Research Ethics Board certificate <strong>of</strong> approval........................... 189<br />

vi


List <strong>of</strong> Tables<br />

Table 2.1. Features required for integrative analysis .................................................................. 44<br />

Table 2.2. Summary <strong>of</strong> Input, analysis, output for each dimension ............................................ 45<br />

Table 4.1. Regions <strong>of</strong> the genome exhibiting frequent UPD ....................................................... 99<br />

Table 4.2. List <strong>of</strong> major oncogenes and tumor suppressor genes assessed ............................ 101<br />

Table 4.3. Overlap <strong>of</strong> oncogenes in frequent regions <strong>of</strong> genomic alteration ............................. 102<br />

Table 4.4. Overlap <strong>of</strong> tumor suppressor genes in frequent regions <strong>of</strong> genomic alteration ....... 103<br />

Table 4.5. Cell lines and oncogene loci with homozygous mutation ......................................... 104<br />

Table 4.6. Summary <strong>of</strong> homozygous mutation analysis in cancer cell lines ............................. 105<br />

Table 4.7. RefSeq genes in focal regions <strong>of</strong> UPD .................................................................... 106<br />

Table 4.8. Genes overexpressed in focal regions <strong>of</strong> UPD ........................................................ 107<br />

Table 5.1. List <strong>of</strong> s<strong>of</strong>tware for integrative analysis .................................................................... 141<br />

Table 5.2. List <strong>of</strong> genomic resources and databases ............................................................... 142<br />

Table 5.3. Genes interacting with SIRPA as identified by network analysis ............................. 143<br />

vii


List <strong>of</strong> Figures<br />

Figure 1.1. Multiple mechanisms <strong>of</strong> alteration leading to same downstream consequences ..... 17<br />

Figure 2.1. Main structural components <strong>of</strong> SIGMA2. .................................................................. 34<br />

Figure 2.2. Data structure hierarchy. .......................................................................................... 35<br />

Figure 2.3. Algorithm for integrating between different array platforms ...................................... 36<br />

Figure 2.4. SIGMA2 interface. .................................................................................................... 37<br />

Figure 2.5. Consensus calling and heterogeneous array analysis. ............................................ 38<br />

Figure 2.6. Integrative genetic analysis <strong>of</strong> HCC2218 .................................................................. 40<br />

Figure 2.7. Two-group two dimensional comparison <strong>of</strong> 37 NSCLC and 16 SCLC cancer cell<br />

lines. ............................................................................................................................................ 41<br />

Figure 2.8. Multi-dimensional perspective <strong>of</strong> chromosome 17 <strong>of</strong> the HCC2218 breast cancer cell<br />

line. ............................................................................................................................................. 42<br />

Figure 3.1. Genomic pr<strong>of</strong>iles <strong>of</strong> breast cancer cell lines. ............................................................ 65<br />

Figure 3.2. Quantitative and qualitative benefits <strong>of</strong> integrative analyses. ................................... 66<br />

Figure 3.3. Determination and application <strong>of</strong> a disruption frequency threshold. ......................... 68<br />

Figure 3.4. Impact <strong>of</strong> multi-dimensional analysis on low frequency events. ............................... 69<br />

Figure 3.5. Pathway analysis <strong>of</strong> the 1162 genes identified by multi-dimensional analysis. ........ 70<br />

Figure 3.6. Complex deregulation <strong>of</strong> the Neuregulin/ERBB2 signaling pathway. ....................... 71<br />

Figure 3.7. Deregulation <strong>of</strong> PTEN occurs differently between samples. ..................................... 72<br />

Figure 3.8. Multiple concerted disruption (MCD) analysis and its application to triple negative<br />

breast cancer. ............................................................................................................................. 73<br />

Figure 4.1. Detection <strong>of</strong> UPD using allele specific copy number. ............................................... 88<br />

Figure 4.2. Comparison <strong>of</strong> frequent regions <strong>of</strong> gain, loss and UPD in the lung adenocarcinoma<br />

genome ....................................................................................................................................... 90<br />

Figure 4.3. Venn diagram illustrating the amount <strong>of</strong> the genome covered by gain, loss, and UPD<br />

.................................................................................................................................................... 92<br />

Figure 4.4. Genomic pr<strong>of</strong>ile <strong>of</strong> an individual lung adenocarcinoma sample ............................... 93<br />

Figure 4.5. Examination <strong>of</strong> UPD events at the KRAS and RB1 loci ............................................ 95<br />

Figure 4.6. Relationship <strong>of</strong> homozygous mutation at oncogenes and genomic alteration .......... 96<br />

Figure 4.7. Identification <strong>of</strong> E2F3 in a focal region <strong>of</strong> UPD ......................................................... 97<br />

Figure 5.1. Advances in cancer genomic landscape post Y2K. ................................................ 129<br />

Figure 5.2. SNP array analysis to identify areas <strong>of</strong> altered copy number and allelic composition<br />

in a clinical lung cancer specimen. ........................................................................................... 130<br />

viii


Figure 5.3. Overlay <strong>of</strong> chromosomal regions <strong>of</strong> gain, loss and UPD (copy number neutral LOH)<br />

inherent to the T47D breast cancer cell line. ............................................................................ 131<br />

Figure 5.4. Integration <strong>of</strong> copy number, allelic status, DNA methylation, and gene expression for<br />

a single lung adenocarcinoma sample. ..................................................................................... 132<br />

Figure 5.5. Integration <strong>of</strong> copy number, allelic status, DNA methylation, and gene expression for<br />

a single lung adenocarcinoma sample. ..................................................................................... 134<br />

Figure 5.6. Identification <strong>of</strong> multiple disrupted components in a biological pathway. ................ 136<br />

Figure 5.7. Multi-dimensional analysis <strong>of</strong> the epidermal growth factor receptor signaling<br />

pathway. .................................................................................................................................... 137<br />

Figure 5.8. Prevalence <strong>of</strong> SIRPA underexpression and its relationship with PTPN6 and smoking<br />

status. ....................................................................................................................................... 138<br />

Figure 5.9. Kaplan-Meier analysis <strong>of</strong> SIRPA in four independent microarray datasets. ........... 139<br />

Figure 5.10. Automated detection <strong>of</strong> selected clonal populations <strong>of</strong> cells within a cancer biopsy<br />

tissue section. ........................................................................................................................... 140<br />

ix


List <strong>of</strong> Abbreviations<br />

Abbreviation Definition<br />

AC Adenocarcinoma<br />

ASCN Allele specific copy number<br />

BRAF v-raf murine sarcoma viral oncogene homolog B1<br />

CDKN2A Cyclin-dependent kinase inhibitor 2A<br />

CGH Comparative Genomic Hybridization<br />

CNV Copy number variation<br />

DNA Deoxyribonucleic Acid<br />

EGFR Epidermal Growth Factor Receptor<br />

FISH Fluorescence in-situ hybridization<br />

GWAS Genome wide association studies<br />

KRAS v-Ki-ras2 Kirsten rat sarcoma viral oncogene<br />

LOH Loss <strong>of</strong> Heterozygosity<br />

MASI Mutant allele specific imbalance<br />

MCD Multiple Concerted Disruption<br />

MDA Multi-Dimensional Analysis<br />

MUC1 Mucin 1<br />

NSCLC Non-small cell lung cancer<br />

PCR Polymerase Chain Reaction<br />

qPCR Quantitative PCR<br />

x


RB1 Retinoblastoma 1<br />

RNA Ribonucleic Acid<br />

RRM2 Ribonucleotide Reductase Subunit M2<br />

SIGMA System for integrative genomic microarray analysis<br />

SIGMA2 System for integrative genomic multi-dimensional analysis<br />

SIRPA Signal Regulatory Protein Alpha<br />

SKY Spectral karyotyping<br />

SNP Single nucleotide polymorphism<br />

TUSC3 Tumor suppressor candidate 3<br />

UPD Uniparental Disomy<br />

xi


Acknowledgements<br />

I would like to acknowledge the contributions <strong>of</strong> many <strong>of</strong> my colleagues in the Wan Lam Lab<br />

who contributed to this work, especially the co-authors <strong>of</strong> each <strong>of</strong> the manuscript chapters<br />

presented herein. Detailed acknowledgements from the published version <strong>of</strong> <strong>Chapter</strong> 2 is listed<br />

below:<br />

<strong>Chapter</strong> 2: We thank William W. Lockwood and Timon P.H. Buys for useful discussion and<br />

critical reading <strong>of</strong> manuscript, Ashleen Shadeo for providing data for breast cancer samples,<br />

and Anna Chu, Byron Cline, Devon Macey, Andrew Thomson, Lan Wei, Reginald Sacdalan,<br />

Tiffany Chao, and Laura Aslan for help with s<strong>of</strong>tware development.<br />

I would also like to acknowledge generous scholarship support from the Canadian Institutes <strong>of</strong><br />

Health Research and Michael Smith Foundation for Health Research.<br />

The research presented in this thesis was funded by the following granting agencies: Genome<br />

Canada/ Genome <strong>British</strong> <strong>Columbia</strong>, Canadian Cancer Society Research Institute (CCS20485),<br />

Canadian Institute <strong>of</strong> Health Research (MOP 86731, MOP 77903), National Institutes <strong>of</strong> Health<br />

(R01 DE15965-01), National Cancer Institute Early Detection Research Network (5U01<br />

CA84971-10), Canary Foundation, and Canadian Breast Cancer Research Alliance.<br />

xii


Dedication<br />

To my family.<br />

xiii


Co-Authorship Statement<br />

<strong>Chapter</strong>s 2 to 5 were co-authored as manuscripts for publication. The following author lists<br />

apply for each chapter:<br />

<strong>Chapter</strong> 2: Chari R, Coe BP, Wedselt<strong>of</strong>t C, Benetti M, Wilson IM, Vucic EA, MacAulay C, Ng<br />

RT, Lam WL. (2008) SIGMA2: a system for the integrative genomic multi-dimensional analysis<br />

<strong>of</strong> cancer genomes, epigenomes, and transcriptomes. BMC Bioinformatics, 9(1):422, 1-12.<br />

Contribution: I am the first author <strong>of</strong> this manuscript. I designed and developed the s<strong>of</strong>tware<br />

and wrote the manuscript. The co-authors <strong>of</strong> this manuscript were either undergraduate<br />

students who I mentored on this project or were fellow graduate students who tested the<br />

s<strong>of</strong>tware and provided important user feedback.<br />

<strong>Chapter</strong> 3: Chari R, Coe BP, Vucic EA, Lockwood WW, Lam WL. (2010) An integrative multidimensional<br />

genetic and epigenetic strategy to identify aberrant genes and pathways in cancer.<br />

BMC Systems Biology, 4(1):67, 1-14.<br />

Contribution: I am the first author <strong>of</strong> this manuscript. I acquired most <strong>of</strong> the data through<br />

generating genomic pr<strong>of</strong>iles and downloaded the rest <strong>of</strong> the data from public resources. I<br />

conceived the analysis for the manuscript and wrote the manuscript.<br />

<strong>Chapter</strong> 4: Chari R, Lockwood WW, Soh J, Coe BP, Tam K, MacAulay C, Minna JD, Lam S,<br />

Gazdar AF, Lam WL. (2010) Uniparental disomy is a prevalent mechanism <strong>of</strong> genetic alteration<br />

in lung adenocarcinoma.<br />

Contribution: I am the first author <strong>of</strong> this manuscript. I generated all <strong>of</strong> the data and performed<br />

all <strong>of</strong> the analyses for this manuscript and my co-authors provided useful information through<br />

comments and other supporting data.<br />

xiv


<strong>Chapter</strong> 5: Chari R, Thu KL, Wilson IM, Lockwood WW, Lonergan KM, Coe BP, Mall<strong>of</strong>f CA,<br />

Gazdar AF, Lam S, Garnis C, MacAulay CE, Alvarez CE, Lam WL. (2010) Integrating the<br />

multiple dimensions <strong>of</strong> genomic and epigenomic landscapes <strong>of</strong> cancer. Cancer and Metastasis<br />

Reviews, 29(1):73-93.<br />

Contribution: I am the first author <strong>of</strong> this manuscript. I orchestrated the study, performed all<br />

analyses and wrote the manuscript with the help <strong>of</strong> my supervisor. Other co-authors provided<br />

useful information, data, or comments.<br />

xv


<strong>Chapter</strong> 1: Introduction<br />

1


1.1 Lung cancer<br />

Lung cancer has the highest mortality rate amongst all diagnosed malignancies [1]. In 2009, it<br />

is estimated that 24,000 individuals will be diagnosed with lung cancer with approximately<br />

21,000 individuals succumbing to this disease (Canadian Cancer Statistics 2009,<br />

www.cancer.ca). Lung cancer is classified into two main types: non-small cell lung cancer<br />

(NSCLC) and small cell lung cancer (SCLC) and within NSCLC, the two major histological<br />

subtypes are adenocarcinoma (AC) and squamous cell carcinoma (SqCC) with large cell<br />

carcinoma (LCC) being the third most common histological subtype . AC accounts for the<br />

highest percentage <strong>of</strong> all lung cancer cases, representing almost half <strong>of</strong> all NSCLCs diagnosed.<br />

The primary etiological factor associated with lung cancer is tobacco smoke exposure. While<br />

the majority <strong>of</strong> lung cancer patients have a heavy smoke exposure history, there is an<br />

increasing percentage <strong>of</strong> lung cancer patients (25%) where primary smoke exposure is not the<br />

associated cause <strong>of</strong> the disease [2]. Moreover, when examining the association <strong>of</strong> smoke<br />

history and histological subtypes diagnosed, while all subtypes have an association with smoke<br />

exposure, SCLC and SqCC show the most strongest associations [3]. In addition, amongst<br />

never smokers, the majority <strong>of</strong> cases are <strong>of</strong> the adenocarcinoma subtype [2].<br />

Examining across the spectrum <strong>of</strong> all NSCLC patients, independent <strong>of</strong> stage, only 15% <strong>of</strong> all<br />

lung cancer patients will achieve five-year survival with the median survival time <strong>of</strong> lung cancer<br />

patients less than one year. Stratification by stage reveals that those individuals diagnosed<br />

early (stage IA) have a superior rate <strong>of</strong> five year survival as compared to those diagnosed late<br />

(stage IV) (50% vs. 2%) [4]. Given the overall survival independent <strong>of</strong> stage is closer to stage<br />

IV than stage IA, it is clear that the paltry survival statistics are largely due to the late diagnosis<br />

<strong>of</strong> this disease and to a lesser extent, the nominal response rate observed by conventional<br />

chemotherapies [5].<br />

2


While overall therapeutic strategies have provided limited benefit to prolonging patient survival,<br />

there has been moderate success in the application <strong>of</strong> targeted therapeutics. Specifically,<br />

pharmacological agents against the epidermal growth factor receptor (EGFR) tyrosine kinase<br />

have shown selective efficacy in a subset <strong>of</strong> lung AC patients [6-12]. Hence, in addition to<br />

improving early detection strategies, another main focus <strong>of</strong> lung cancer research is the<br />

identification <strong>of</strong> novel therapeutic targets. One such approach that can be used to identify<br />

targets is through the application <strong>of</strong> genomic tools to clinical lung cancer specimens.<br />

1.2 Genomic pr<strong>of</strong>iling <strong>of</strong> lung cancer<br />

1.2.1 Gene expression analysis<br />

One <strong>of</strong> the first applications <strong>of</strong> high throughput genome technologies was to the assessment <strong>of</strong><br />

messenger RNA (mRNA) levels [13, 14]. While the first, landmark cancer-related studies were<br />

done in breast and hematological malignancies [15-17], substantial findings were made in the<br />

analysis <strong>of</strong> lung cancer. Specifically, lung cancer gene expression studies have identified<br />

genes differentially expressed in tumors, genes associated with angiogenic potential, genes<br />

associated with chemoresistance, expression signatures defining subclasses <strong>of</strong> lung cancer,<br />

expression signatures associated with patient prognosis, and expression signatures from<br />

normal bronchial epithelium samples to detect lung cancer [18-34]. In addition, much work has<br />

also been done to understand baseline gene expression in non-malignant lung tissue as well its<br />

changes with respect to heavy smoke exposure [35-38]. These studies are as important as<br />

studies involving lung cancer samples as they provide an important reference level <strong>of</strong> gene<br />

expression to decipher the dysregulated gene expression in tumors.<br />

However, from a given analysis <strong>of</strong> differential expression in tumors, there are typically<br />

hundreds, if not thousands, <strong>of</strong> genes which may show aberrant gene expression in tumors when<br />

compared to non-malignant tissue. Moreover, it is likely that a proportion <strong>of</strong> the genes which<br />

are aberrantly expressed are not integral or causal to tumor development as many gene<br />

3


expression changes are reactive to changes in expression <strong>of</strong> other genes. In addition, using<br />

gene expression alone, one cannot discern which changes are causal and which changes are<br />

reactive. One approach to assign causality with gene expression changes is to identify<br />

alterations at the DNA level such as somatic mutation, changes in gene dosage (DNA copy<br />

number), or epigenetic changes such aberrant DNA methylation or histone modification which<br />

can explain the observed differential expression.<br />

1.2.2 DNA copy number analysis<br />

Alterations in gene dosage, whereby segments <strong>of</strong> DNA in the genome are either replicated or<br />

lost, have shown to be important in lung cancer [39-41]. Typically, these gains and losses <strong>of</strong><br />

DNA are detected through the comparison <strong>of</strong> a genome from a tumor sample with a genome<br />

that is normal or non-malignant. It is thought that these increases or decreases in amounts <strong>of</strong><br />

specific gene sequence could allow for increased or decreased expression <strong>of</strong> that gene.<br />

Technological advances have allowed for the high throughput assessment <strong>of</strong> DNA copy number<br />

changes in the cancer genome namely through microarray comparative genomic hybridization<br />

(CGH) [42, 43]. Briefly, this technology capitalizes on differential fluorescence labelling where<br />

DNA from the tumor sample and DNA from the normal sample, each labelled with different<br />

fluorescent dyes, are hybridized together on the same chip and differences in fluorescent<br />

intensities are measured. Moreover, array CGH pr<strong>of</strong>iling <strong>of</strong> both lung cancer cell lines and<br />

tumors have identified areas <strong>of</strong> the genome which are frequently gained or lost [44-52].<br />

Specifically, these areas <strong>of</strong> copy number alteration have targeted known oncogenes such as<br />

MYC, EGFR, MDM2, TERT, and tumor suppressor genes such as CDKN2A, TP53 and RB1.<br />

However, these alterations typically do not occur in 100% <strong>of</strong> lung tumors (e.g. the EGFR locus<br />

is gained/amplified in 10-20% <strong>of</strong> cases). In addition, the amplification and deletion events<br />

typically encompass multiple genes and as such, more <strong>of</strong>ten than not, only a subset <strong>of</strong> those<br />

genes will have a downstream consequence at the gene expression level. Hence, integration <strong>of</strong><br />

4


gene dosage with gene expression analysis would be useful to discern the target gene(s) <strong>of</strong> a<br />

given copy number alteration.<br />

1.2.3 Loss <strong>of</strong> heterozygosity (LOH) and allelic imbalance<br />

Loss <strong>of</strong> heterozygosity (LOH) is a common genetic event in cancer [53]. In the normal cell,<br />

each somatic chromosome has two copies, with one copy (or allele) originiating from each<br />

parent. Subsequently, in the tumor, a specific segment from one <strong>of</strong> the copies <strong>of</strong> the<br />

chromosome is lost, resulting in loss <strong>of</strong> heterozygosity.<br />

Frequent regions <strong>of</strong> LOH have also been identified in the lung cancer genome [54-58]. While<br />

initial studies involved the use <strong>of</strong> microsatellite markers placed throughout the genome and<br />

thus, the resolution <strong>of</strong> these changes were limited, the application <strong>of</strong> SNP arrays were able to<br />

refine these areas into specific chromosome arms [45, 46]. In addition to advances in SNP<br />

array technology, analysis approaches were also developed that increased the detection<br />

sensitivity <strong>of</strong> regions <strong>of</strong> LOH / allelic imbalance [59-63]. Although most areas with altered gene<br />

dosage will also be detected as LOH (in case <strong>of</strong> copy number loss) and allelic imbalance (in<br />

case <strong>of</strong> copy number gain), there are also areas in the genome which exhibit LOH but no<br />

change in copy number, termed copy neutral LOH or uniparental disomy. However, the role <strong>of</strong><br />

UPD in lung cancer is not well understood.<br />

1.3 Somatic mutations in lung cancer<br />

Somatic mutations have also shown to be important in cancer development. In addition,<br />

mutational analysis is also used for screening purposes in high risk populations (e.g. BRCA1/2<br />

and hereditary breast cancer) as well as criteria for receiving targeted chemotherapy (e.g.<br />

EGFR mutation and EGFR inhibitors). Many studies have been undertaken to identify<br />

mutations in genes involved with important cellular processes pertinent to the cancer phenotype<br />

such as DNA repair and cellular proliferation and have successfully identified key genes to a<br />

5


number <strong>of</strong> different cancer types. Moreover, it can be classified that while oncogenes typically<br />

harbour activating mutations, tumor suppressor genes <strong>of</strong>ten harbour inactivating mutations.<br />

In lung adenocarcinoma, the most well known genes shown to be mutated are EGFR, KRAS,<br />

LKB1 (or STK11), TP53 and CDKN2A [30, 54, 64, 65], with some mutations such as EGFR and<br />

KRAS showing preferential mutation patterns based on smoking history. A recent study<br />

assessing other well known oncogenes and tumor suppressor genes showed there were a<br />

number <strong>of</strong> other genes also observed to be mutated in lung adenocarcinoma [64]. However,<br />

due to technological and material limitations at the time, many <strong>of</strong> these studies only assess<br />

small numbers <strong>of</strong> genes in a given study and thus, genome wide screening for somatic<br />

mutations is unfeasible. While high throughput sequencing technologies to assess sequence<br />

mutation on a genome scale have become available, challenges associated with cost and data<br />

analysis preclude the use in a routine manner.<br />

1.4 Epigenetic alterations in lung cancer<br />

1.4.1 DNA methylation<br />

Another DNA level mechanism which can affect gene expression is through the methylation <strong>of</strong><br />

DNA at gene promoters. DNA methylation is a reversible chemical modification which has<br />

shown to have a prominent role in the silencing <strong>of</strong> tumor suppressor genes. Specifically, this<br />

modification targets cytosines whereby a methyl (CH3) is added to the carbon 5 moiety <strong>of</strong><br />

cytosine.<br />

It is thought that in cancer, the majority <strong>of</strong> the genome loses its methylation but small areas in<br />

the gene promoters, known as CpG islands, gain methylation [66-69]. Generally, it is thought<br />

that the acquired methylation targets tumor suppressor genes while the areas <strong>of</strong> lost<br />

methylation facilitate the activation <strong>of</strong> repetitive areas <strong>of</strong> the genome which can lead to<br />

increased genomic instability. In addition, aberrant DNA methylation <strong>of</strong> critical genes have been<br />

6


utilized for early detection purposes as well as a target for therapeutic intervention [70-72],<br />

emphasizing its key role in cancer.<br />

In lung cancer, a number <strong>of</strong> specific genes such as CDKN2A (or p16), RASSF1A, and MGMT<br />

have shown to harbour increased promoter methylation [73]. While many <strong>of</strong> these methylation<br />

events were discovered using single locus assays, recent advances have allowed for the high<br />

throughput analysis <strong>of</strong> 1000s <strong>of</strong> genes in a single experiment [74-79]. As such, applications <strong>of</strong><br />

these high throughput approaches in lung cancer are likely to identify novel methylated genes.<br />

Similar to array CGH analysis, though many methylated genes are likely to be identified, it will<br />

be important to validate if these alterations affect downstream gene expression.<br />

1.5 Current level <strong>of</strong> integrative analysis<br />

At the time this thesis started, there were a small number <strong>of</strong> whole genome integrative studies<br />

which primarily focused on the integration <strong>of</strong> gene dosage and gene expression. In fact, the<br />

majority <strong>of</strong> the integrative analysis would be done at single locus level such as the examination<br />

<strong>of</strong> gene dosage and expression <strong>of</strong> HER2 (ERBB2) oncogene in breast cancer [80]. Moreover,<br />

there were a limited number <strong>of</strong> gene dosage or gene expression studies in lung cancer.<br />

However, from recent studies involving multiple cancer types, including lung cancer, it has been<br />

shown that anywhere between 20% and 60% <strong>of</strong> genes in regions <strong>of</strong> copy number change also<br />

exhibit a concerted change in gene expression [52, 81-84]. Conversely, when the proportion <strong>of</strong><br />

differential expression associated with gene dosage alteration was examined, it was found that<br />

only 11% <strong>of</strong> the observed differential expression could be attributed to high level DNA copy<br />

number change [83]. Thus, it is clear that gene dosage alterations are responsible for only a<br />

part <strong>of</strong> the overall dysregulated gene expression and that other mechanisms are likely involved.<br />

1.6 Need for an integrative approach to study lung cancer<br />

As discussed earlier, a gene such as CDKN2A has been shown to be inactivated by both gene<br />

dosage loss and increased promoter methylation. Thus, it is very likely that when examining a<br />

7


large number <strong>of</strong> tumors, that a given gene may be affected by one mechanism in tumor (e.g.<br />

gene dosage increase) and another mechanism in a different tumor (e.g. DNA<br />

hypomethylation), but both leading to the same net effect (Figure 1.1). In addition, if the specific<br />

event (e.g. gene dosage increase) occurs at a low frequency, but cumulatively, the deregulation<br />

occurs at a high frequency, then examining only gene dosage or DNA methylation would<br />

preclude the identification <strong>of</strong> such potentially important genes. Hence, it should be apparent<br />

that an integrative, multi-dimensional genetic and epigenetic approach is needed to identify<br />

novel genes which would have escaped previous, single dimensional analyses.<br />

1.7 Bioinformatic tools for genomic analysis<br />

While many s<strong>of</strong>tware packages exist for the analysis <strong>of</strong> high throughput gene expression data<br />

[85-88], at the start <strong>of</strong> my thesis project, s<strong>of</strong>tware packages for the visualization and analysis <strong>of</strong><br />

DNA copy number data were very limited [89-95]. A summary <strong>of</strong> array CGH analysis<br />

methodologies and s<strong>of</strong>tware packages is provided in this review [96]. Moreover, three <strong>of</strong> the<br />

key challenges at the time were (i) the increase in data generated from a single experiment, (ii)<br />

the effective visualization <strong>of</strong> this data for easy interpretation, and (iii) the microarray platform<br />

dependence <strong>of</strong> the majority <strong>of</strong> s<strong>of</strong>tware packages.<br />

With respect to the increase in data generation, the first generation <strong>of</strong> microarrays used for<br />

array CGH typically comprised <strong>of</strong> two to three thousand data points. As such, s<strong>of</strong>tware for both<br />

visualization and analysis were developed to effectively handle this level <strong>of</strong> data complexity.<br />

For example, since array CGH data in fact represents discrete levels <strong>of</strong> copy number<br />

throughout the genome, one <strong>of</strong> the data analysis steps required is segmentation which<br />

effectively smoothes data based on genomic position. The first version <strong>of</strong> DNACopy [97], one<br />

<strong>of</strong> the first algorithms to segment array CGH data, would need a significant amount <strong>of</strong> time to<br />

execute when applied to arrays with 100,000 data points or greater and eventually, a new<br />

version <strong>of</strong> the program was developed a few years later [98]. Similarly, in terms <strong>of</strong> visualization,<br />

most programs displayed array CGH data in an ordinal manner whereby the relative genomic<br />

8


position was on the x-axis and the log ratio <strong>of</strong> the data point was drawn on the y-axis. While<br />

this type <strong>of</strong> visualization can provide a quick genome summary <strong>of</strong> a single sample, it is difficult<br />

to readily link to information such as protein coding genes from this type <strong>of</strong> visualization.<br />

Finally, s<strong>of</strong>tware developed by microarray manufacturers such as Affymetrix, Agilent or<br />

Nimblegen were specifically tailored to handle data from their respective microarray platforms.<br />

Thus, aggregate analysis <strong>of</strong> data emanating from different microarray platforms, but analyzing<br />

samples with common characteristics, could not be analyzed in a concerted manner resulting in<br />

under-utilization <strong>of</strong> the increasingly available array CGH data in the public domain. Most<br />

importantly, no tools existed to integrate multiple dimensions <strong>of</strong> data such as global gene<br />

dosage and gene expression, let alone integration with DNA methylation. Hence, it is clear that<br />

with these apparent challenges, the development <strong>of</strong> such bioinformatic tools was needed.<br />

1.8 Thesis theme<br />

The theme <strong>of</strong> this thesis is the development and utilization <strong>of</strong> an integrative genetic and<br />

epigenetic approach to identify novel aberrant genes and pathways that may be involved in the<br />

tumorigenesis <strong>of</strong> lung adenocarcinoma. This will be achieved by employing genome wide<br />

genetic and epigenetic pr<strong>of</strong>iling experiments <strong>of</strong> lung adenocarcinoma samples and the<br />

subsequent integration <strong>of</strong> this data using novel bioinformatics tools and approaches.<br />

1.9 Objectives and hypothesis<br />

The objective <strong>of</strong> this work is to demonstrate the importance <strong>of</strong> employing an integrative<br />

approach to understand genetic and epigenetic alterations and their consequence on gene<br />

expression. The hypothesis can be broken down to three parts:<br />

(A) Genes/pathways which are important to tumorigenesis are disrupted by multiple<br />

mechanisms in lung cancer.<br />

9


(B) By using an integrative approach, looking at the global genetic and epigenetic regulation <strong>of</strong><br />

gene expression, changes at the DNA level which have downstream effects at the gene<br />

expression level will be identified.<br />

(C) This approach will lead to the identification <strong>of</strong> more genes that are disrupted than previously<br />

anticipated and these genes will be enriched in key pathways and functions important to lung<br />

tumorigenesis.<br />

1.10 Specific aims and outline <strong>of</strong> thesis<br />

This thesis consists <strong>of</strong> four manuscripts assembled in a non-chronological order to best address<br />

the objectives and hypothesis <strong>of</strong> this thesis.<br />

Aim 1: Development <strong>of</strong> a platform for multi 'omics data integration and analysis<br />

<strong>Chapter</strong> 2 discusses the development <strong>of</strong> an integrative analysis s<strong>of</strong>tware package called a<br />

system for the integrative genomic multi-dimensional analysis <strong>of</strong> cancer genomes, epigenomes,<br />

and transcriptomes (SIGMA 2 ). The development <strong>of</strong> this application was necessary prior to the<br />

undertaking <strong>of</strong> the analysis <strong>of</strong> the vast amount <strong>of</strong> data generated from the utilized high<br />

throughput, genome-wide technologies.<br />

As discussed earlier in this chapter, there were very few bioionformatic tools available for the<br />

analysis <strong>of</strong> array CGH data, let alone for integrative analysis <strong>of</strong> gene dosage and gene<br />

expression. Prior to the development <strong>of</strong> SIGMA 2 , I developed the pre-cursor version <strong>of</strong> this<br />

s<strong>of</strong>tware SIGMA [95]. SIGMA provided the basic framework in terms <strong>of</strong> the user interfaces,<br />

database communication, data structures and "look and feel" that would be utilized in SIGMA 2 .<br />

Moreover, one <strong>of</strong> the key challenges when SIGMA was developed was the effective<br />

visualization and analysis <strong>of</strong> large datasets generated by newer, high density array CGH<br />

platforms. At the time, the majority <strong>of</strong> data that were generated were on platforms comprised <strong>of</strong><br />

3000 measurements per sample but, newer technologies were being developed which<br />

10


generated over 500,000 data points per sample, representing a 100-fold increase in information<br />

obtained from each experiment [99]. Hence, the base s<strong>of</strong>tware architecture used in SIGMA 2<br />

was already capable <strong>of</strong> handling large amounts <strong>of</strong> data.<br />

Aim 2: Demonstration <strong>of</strong> an integrative approach using model systems<br />

<strong>Chapter</strong> 3 discusses the demonstration <strong>of</strong> an integrative, multi-dimensional approach on tumor<br />

cell line model systems. Using a set <strong>of</strong> breast cancer cell lines, I examine the gene dosage,<br />

allelic composition, DNA methylation, and gene expression pr<strong>of</strong>iles in an integrative manner to<br />

delineate which genes and pathways would be missed or less significant if such an approach<br />

was not used. This demonstrative study was needed to show the key advantages and benefits<br />

<strong>of</strong> an integrative approach. While cell lines are artificial systems and may have acquired<br />

alterations that are beneficial to grow in vitro, it is important that a sample source was used<br />

where material limitations did not exist. For each <strong>of</strong> the genetic or epigenetic pr<strong>of</strong>iling studies,<br />

sufficient amounts <strong>of</strong> DNA and RNA are needed and when more assays are done in a given<br />

sample, more material is required. Moreover, when whole tumor samples are microdissected to<br />

ensure high tumor cell purity, this inherently will reduce the amount <strong>of</strong> usable sample material.<br />

As such, it is important that the quantitative and qualitative benefits <strong>of</strong> utilizing an integrative<br />

approach are sufficient to warrant using clinical samples. At the time this study was initiated,<br />

SNP array and array CGH pr<strong>of</strong>iles were available for breast cancer cell lines and thus, only<br />

generation <strong>of</strong> DNA methylation and gene expression pr<strong>of</strong>iles were needed to complete this set.<br />

Given the purpose <strong>of</strong> this study was to demonstrate the effectiveness <strong>of</strong> the integrative<br />

approach, while data from lung cancer cell lines would have been most optimal, the source <strong>of</strong><br />

data has limited relevance to the purpose <strong>of</strong> this aim.<br />

Aim 3: Characterization <strong>of</strong> DNA level alterations in lung adenocarcinoma<br />

A number <strong>of</strong> studies have been done to identify gene dosage alterations in lung cancer and in<br />

lung adenocarcinoma specifically. These studies were done on a number <strong>of</strong> different array<br />

11


platforms, with one <strong>of</strong> the latest studies done using Affymetrix SNP arrays. One <strong>of</strong> the benefits<br />

<strong>of</strong> Affymetrix SNP arrays is the ability to simultaneously detect changes in gene dosage as well<br />

as allelic imbalance. Allelic imbalance, though should be determined using a patient matched<br />

non-malignant sample as a control, has also been determined using a pool <strong>of</strong> unmatched non-<br />

malignant samples. While the ability to detect imbalance using unmatched control samples is<br />

important when matched control samples are not available, this may falsely score regions as<br />

imbalanced but in fact are not, and vice versa. In addition, samples in these different studies<br />

were typically not microdissected and thus, tumor cell purity in the samples would be variable.<br />

Thus, those samples with low tumor cell content would make it difficult to detect genetic<br />

alterations. Moreover, there has been a recent drastic increase in resolution with the newest<br />

SNP arrays, with the ability to measure over 4X as many SNPs and over 8X as many spots for<br />

gene dosage. <strong>Chapter</strong> 4 discusses the application <strong>of</strong> a new SNP array technology to<br />

microdissected lung adenocarcinoma specimens with the goal <strong>of</strong> identifying genetic alterations<br />

at the highest resolution currently available.<br />

Aim 4: Application <strong>of</strong> an integrative approach to clinical lung adenocarcinoma<br />

specimens<br />

With the approach and necessary tools developed and now demonstrated to be beneficial using<br />

a model dataset, chapter 5 discusses the application <strong>of</strong> the integrative approach to lung<br />

adenocarcinoma specimens. While the published chapter provides an overview <strong>of</strong> cancer<br />

genome and epigenome landscapes, sections 5.4.3 to 5.4.4 present some <strong>of</strong> the quantitative<br />

and qualitative benefits <strong>of</strong> integrative analysis specific to the analysis <strong>of</strong> a lung adenocarcinoma<br />

dataset. In addition, sections 5.4.5 to 5.4.6 discusses key findings in terms <strong>of</strong> genes and<br />

pathways that were identified from this integrative analysis.<br />

12


1.11 Description <strong>of</strong> high throughput data in this thesis<br />

Throughout this thesis, a number <strong>of</strong> platform technologies were utilized to generate high<br />

throughput, genome wide data. Below is a summary <strong>of</strong> all platforms used in each chapter.<br />

In <strong>Chapter</strong> 3, for nine breast cancer cell lines and one control cell line (MCF10A), the following<br />

pr<strong>of</strong>iles were generated: Affymetrix SNP 500K for the analysis <strong>of</strong> allelic status; whole genome<br />

tiling path array CGH for the analysis <strong>of</strong> gene dosage; Illumina Infinium HumanMethylation27 for<br />

DNA methylation analysis; and Affymetrix U133 Plus 2.0 for the analysis <strong>of</strong> gene expression.<br />

In <strong>Chapter</strong> 4, for the 46 tumors and matched non-malignant tissue as well as the cancer cell<br />

lines, the Affymetrix SNP 6.0 platform was utilized to measure total copy number and allelic<br />

imbalance. For a subset <strong>of</strong> tumors, gene expression pr<strong>of</strong>iles were generated using a custom<br />

Affymetrix platform designed by Rosetta Inpharmatics.<br />

In <strong>Chapter</strong> 5, for the ten tumors and matched non-malignant tissue samples, the following<br />

pr<strong>of</strong>iles were generated: Affymetrix SNP 6.0 for the analysis <strong>of</strong> allelic status and gene dosage;<br />

Illumina Infinium HumanMethylation27 for DNA methylation analysis; and Affymetrix HuEx 1.0<br />

ST array for the analysis <strong>of</strong> gene expression. Quantitative RT-PCR was performed using the<br />

Applied Biosystems TaqMan gene expression assay.<br />

1.12 Other relevant contributions not included as chapters in this<br />

thesis<br />

In this thesis, I have chosen to include a small portion <strong>of</strong> my overall work in order to achieve a<br />

coherent theme. However, in this section, I have outlined specific contributions, which I’ve<br />

either led or participated as 2 nd author, that I’ve deemed are relevant to the theme <strong>of</strong> lung<br />

cancer and genomics.<br />

13


1.12.1 Development <strong>of</strong> tools for genomic analysis<br />

As mentioned earlier, the precursor version <strong>of</strong> SIGMA2 was SIGMA [95]. This tool was built as<br />

an interactive database <strong>of</strong> cancer cell line array CGH pr<strong>of</strong>iles and provided a means for<br />

effective visualization <strong>of</strong> high density array CGH data as well as sharing <strong>of</strong> data. One <strong>of</strong> the<br />

other problems that arose for high density array CGH data is the availability <strong>of</strong> efficient analysis<br />

algorithms to delineate gains and losses. Most algorithms that were developed for array CGH<br />

analysis were developed for arrays with 2000 to 3000 data points and their execution times did<br />

not scale up efficiently when the arrays were generating 100,000 to 1,000,000 data points. To<br />

address this problem, I contributed to the development <strong>of</strong> a segmentation and calling algorithm<br />

named FACADE [100].<br />

1.12.2 Baseline gene expression in non-malignant lung tissue<br />

Though gene expression studies studying malignant samples are important, it is also critical to<br />

define what genes are expressed in non-malignant samples as these are used in reference to<br />

determine aberrant gene expression. There were two studies I was involved with which<br />

addressed this question. First, we examined gene expression <strong>of</strong> non-malignant, smoke<br />

damaged bronchial epithelium using serial analysis <strong>of</strong> gene expression (SAGE) [37]. We found<br />

that there were specific genes that showed high tissue specificity to the bronchial epithelium<br />

with limited representation in other tissues and that there were differences between bronchial<br />

epithelial samples and lung parenchyma, which are samples adjacent to tumors typically used<br />

as non-malignant controls comprised <strong>of</strong> a mixture <strong>of</strong> different cells.<br />

In the study described above, bronchial epithelium samples from current and former smokers<br />

were grouped together. Hence, the next logical question was to assess the effect <strong>of</strong> active<br />

smoking on the bronchial epithelium. In the second study, a group <strong>of</strong> never smoker samples<br />

were added to the groups <strong>of</strong> current and former smokers and the gene expression pr<strong>of</strong>iles <strong>of</strong><br />

the three groups were compared. We first identified a set <strong>of</strong> genes which were differentially<br />

14


expressed in response to smoke exposure and found a subset <strong>of</strong> genes that were reversible<br />

upon smoking cessation and another subset <strong>of</strong> genes irreversible upon smoking cessation [35].<br />

Those genes which were irreversibly altered after heavy smoke exposure may have implications<br />

in affecting future risk <strong>of</strong> developing lung cancer. Moreover, these findings also suggest that<br />

when trying to identify cancer-specific changes when unmatched control samples are not<br />

available, clinical characteristics such as smoking status should be taken into consideration<br />

when comparisons are made.<br />

1.12.3 Differential gene expression analysis in lung cancer<br />

With the non-malignant, baseline gene expression defined, differential gene expression in early<br />

stages <strong>of</strong> lung cancer and locally invasive squamous cell carcinoma were then assessed [101].<br />

It was found that genes associated with epidermal development were increased in expression<br />

and mucociliary function were decreased in expression in carcinoma-in-situ as well as in<br />

precancerous stages. Finally, genes associated with tissue re-modelling were also altered in<br />

expression in local invasive cancer and also showed altered expression in carcinoma-in-situ,<br />

suggesting this function is affected early in cancer development.<br />

The Wnt pathway has been shown to be aberrant in many cancer types. At the time the study<br />

began, there were two branches <strong>of</strong> the pathway that were known to exist, canonical and non-<br />

canonical, whose activation resulted in different downstream consequences. While the<br />

canonical branch was the primary focus <strong>of</strong> most researchers studying this pathway, we sought<br />

to assess the role <strong>of</strong> the non-canonical branch in lung squamous cell carcinoma using semi-<br />

quantitative and quantitative PCR <strong>of</strong> genes which were a part <strong>of</strong> the non-canonical branch [102].<br />

From this study, it was found that (i) these non-canonical genes were expressed in the normal<br />

lung and (ii) some <strong>of</strong> these non-canonical genes were differentially expressed in tumors.<br />

An important consideration in the analysis <strong>of</strong> differential gene expression in cancer is the use <strong>of</strong><br />

suitable reference genes for data normalization. This consideration is critical to both<br />

15


quantitative PCR experiments as well as microarray experiments where relative quantifications<br />

are typically used. To address this, using SAGE, where quantification <strong>of</strong> expression is absolute,<br />

genes whose expression was constant across both malignant and non-malignant samples were<br />

identified [103]. Those genes demonstrated better constancy than some genes which are<br />

typically used as controls for gene expression analysis.<br />

1.12.4 Integration <strong>of</strong> gene dosage and gene expression in lung cancer<br />

The first level <strong>of</strong> genomic integration that needed to be accomplished was the integration <strong>of</strong><br />

gene dosage and gene expression. In one study using cancer cell lines, hot spots <strong>of</strong> DNA<br />

amplification were identified throughout the genome. When specifically examining lung cancer<br />

cell lines and subsequently coupling this with gene expression data, it was found that 50% <strong>of</strong><br />

genes in these frequently amplified regions show correlation between gene dosage and gene<br />

expression [52]. Moreover, it was also observed that different components <strong>of</strong> the EGFR<br />

signaling pathway were amplified in different cell lines illustrating that for a given pathway, one<br />

can underestimate the frequency <strong>of</strong> pathway disruption when only well known genes in the<br />

pathway are assessed.<br />

In a second study involving clinical lung tumors, a genomic region which was preferentially<br />

amplified in squamous cell carcinomas as compared to adenocarcinomas was identified.<br />

Further integration with gene expression data allowed for the identification <strong>of</strong> the target gene,<br />

BRF2, in this amplified region [104]. Moreover, gene dosage and protein expression level<br />

assessment <strong>of</strong> CIS samples for BRF2 showed that amplification and overexpression were<br />

present, suggesting that this event is occurring at an early stage <strong>of</strong> tumorigenesis.<br />

16


Figure 1.1<br />

a Normal Tumor b Normal Tumor<br />

c Normal Tumor<br />

Copy Number Loss /<br />

Loss <strong>of</strong> heterozygosity (LOH)<br />

d<br />

e<br />

Normal<br />

Tumor<br />

Normal<br />

Tumor<br />

Allelic Imbalance<br />

AATACGCGCGCGTCGCATCCAGCATGAACAGA<br />

TTATGCGCGCGCAGCGTAGGTCGTACTTGTCT<br />

AATACGCGCGCGTCGCATCCAGCATGAACAGA<br />

TTATGCGCGCGCAGCGTAGGTCGTACTTGTCT<br />

DNA Hypermethylation<br />

AATACGCGCGCGTCGCATCCAGCATGAACAGA<br />

TTATGCGCGCGCAGCGTAGGTCGTACTTGTCT<br />

AATACGCGCGCGTCGCATCCAGCATGAACTGA<br />

TTATGCGCGCGCAGCGTAGGTCGTACTTGACT<br />

Somatic mutation<br />

Figure 1.1. Multiple mechanisms <strong>of</strong> alteration leading to the same downstream consequences.<br />

(a) Illustration <strong>of</strong> copy number loss. Loss <strong>of</strong> a particular chromosomal region<br />

in tumors. (b) Illustration <strong>of</strong> allelic imbalance. While both alleles are present, there is a<br />

preferential increase <strong>of</strong> one <strong>of</strong> the alleles. (c) Ilustration <strong>of</strong> uniparental disomy. While<br />

overall the total number <strong>of</strong> DNA copies is normal, one part <strong>of</strong> an allele is lost and replaced<br />

by a part from the other allele. (d) Promoter hypermethylation in tumors which results in<br />

suppression <strong>of</strong> gene transcription. (e) Somatic mutation in the tumor which can lead to the<br />

transcription <strong>of</strong> a truncated (possibly non-functional) transcript. Mechanisms shown in (a),<br />

(c), (d), and (e) can lead to the same net downstream effect in loss <strong>of</strong> gene and protein<br />

expression. For (a), (b), and (c), though whole chormosomes are shown, these events can<br />

vary in scale from a focal region <strong>of</strong> change to a whole chormosome arm. The green arrow<br />

represents the transcription start site.<br />

17<br />

Uniparental disomy (UPD)<br />

Premature stop,<br />

truncated transcript


1.13 References<br />

1. Jemal A, Siegel R, Ward E, Hao Y, Xu J, Thun MJ: Cancer statistics, 2009. CA Cancer<br />

J Clin 2009, 59(4):225-249.<br />

2. Sun S, Schiller JH, Gazdar AF: Lung cancer in never smokers--a different disease.<br />

Nat Rev Cancer 2007, 7(10):778-790.<br />

3. Khuder SA: Effect <strong>of</strong> cigarette smoking on major histological types <strong>of</strong> lung cancer:<br />

a meta-analysis. Lung Cancer 2001, 31(2-3):139-148.<br />

4. Detterbeck FC, B<strong>of</strong>fa DJ, Tanoue LT: The new lung cancer staging system. Chest<br />

2009, 136(1):260-271.<br />

5. Herbst RS, Lynch TJ, Sandler AB: Beyond doublet chemotherapy for advanced nonsmall-cell<br />

lung cancer: combination <strong>of</strong> targeted agents with first-line<br />

chemotherapy. Clin Lung Cancer 2009, 10(1):20-27.<br />

6. Kim KS, Jeong JY, Kim YC, Na KJ, Kim YH, Ahn SJ, Baek SM, Park CS, Park CM, Kim<br />

YI et al: Predictors <strong>of</strong> the response to gefitinib in refractory non-small cell lung<br />

cancer. Clin Cancer Res 2005, 11(6):2244-2251.<br />

7. Kim TE, Murren JR: Erlotinib OSI/Roche/Genentech. Curr Opin Investig Drugs 2002,<br />

3(9):1385-1395.<br />

8. Miller VA, Kris MG, Shah N, Patel J, Azzoli C, Gomez J, Krug LM, Pao W, Rizvi N, Pizzo<br />

B et al: Bronchioloalveolar pathologic subtype and smoking history predict<br />

sensitivity to gefitinib in advanced non-small-cell lung cancer. J Clin Oncol 2004,<br />

22(6):1103-1109.<br />

9. Mitsudomi T, Kosaka T, Endoh H, Horio Y, Hida T, Mori S, Hatooka S, Shinoda M,<br />

Takahashi T, Yatabe Y: Mutations <strong>of</strong> the epidermal growth factor receptor gene<br />

predict prolonged survival after gefitinib treatment in patients with non-small-cell<br />

lung cancer with postoperative recurrence. J Clin Oncol 2005, 23(11):2513-2520.<br />

10. Pao W, Miller V, Zakowski M, Doherty J, Politi K, Sarkaria I, Singh B, Heelan R, Rusch<br />

V, Fulton L et al: EGF receptor gene mutations are common in lung cancers from<br />

"never smokers" and are associated with sensitivity <strong>of</strong> tumors to gefitinib and<br />

erlotinib. Proc Natl Acad Sci U S A 2004, 101(36):13306-13311.<br />

11. Sirotnak FM, Zakowski MF, Miller VA, Scher HI, Kris MG: Efficacy <strong>of</strong> cytotoxic agents<br />

against human tumor xenografts is markedly enhanced by coadministration <strong>of</strong><br />

ZD1839 (Iressa), an inhibitor <strong>of</strong> EGFR tyrosine kinase. Clin Cancer Res 2000,<br />

6(12):4885-4892.<br />

12. Paez JG, Janne PA, Lee JC, Tracy S, Greulich H, Gabriel S, Herman P, Kaye FJ,<br />

Lindeman N, Boggon TJ et al: EGFR mutations in lung cancer: correlation with<br />

clinical response to gefitinib therapy. Science 2004, 304(5676):1497-1500.<br />

13. Schena M, Shalon D, Davis RW, Brown PO: Quantitative monitoring <strong>of</strong> gene<br />

expression patterns with a complementary DNA microarray. Science 1995,<br />

270(5235):467-470.<br />

14. Schena M, Shalon D, Heller R, Chai A, Brown PO, Davis RW: Parallel human genome<br />

analysis: microarray-based expression monitoring <strong>of</strong> 1000 genes. Proc Natl Acad<br />

Sci U S A 1996, 93(20):10614-10619.<br />

15. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh<br />

ML, Downing JR, Caligiuri MA et al: Molecular classification <strong>of</strong> cancer: class<br />

discovery and class prediction by gene expression monitoring. Science 1999,<br />

286(5439):531-537.<br />

16. Perou CM, Sorlie T, Eisen MB, van de Rijn M, Jeffrey SS, Rees CA, Pollack JR, Ross<br />

DT, Johnsen H, Akslen LA et al: Molecular portraits <strong>of</strong> human breast tumours.<br />

Nature 2000, 406(6797):747-752.<br />

17. Sorlie T, Perou CM, Tibshirani R, Aas T, Geisler S, Johnsen H, Hastie T, Eisen MB, van<br />

de Rijn M, Jeffrey SS et al: Gene expression patterns <strong>of</strong> breast carcinomas<br />

18


distinguish tumor subclasses with clinical implications. Proc Natl Acad Sci U S A<br />

2001, 98(19):10869-10874.<br />

18. Fukumoto S, Yamauchi N, Moriguchi H, Hippo Y, Watanabe A, Shibahara J, Taniguchi<br />

H, Ishikawa S, Ito H, Yamamoto S et al: Overexpression <strong>of</strong> the aldo-keto reductase<br />

family protein AKR1B10 is highly correlated with smokers' non-small cell lung<br />

carcinomas. Clin Cancer Res 2005, 11(5):1776-1785.<br />

19. Heighway J, Knapp T, Boyce L, Brennand S, Field JK, Betticher DC, Ratschiller D,<br />

Gugger M, Donovan M, Lasek A et al: Expression pr<strong>of</strong>iling <strong>of</strong> primary non-small cell<br />

lung cancer for target identification. Oncogene 2002, 21(50):7749-7763.<br />

20. Hu J, Bianchi F, Ferguson M, Cesario A, Margaritora S, Granone P, Goldstraw P, Tetlow<br />

M, Ratcliffe C, Nicholson AG et al: Gene expression signature for angiogenic and<br />

nonangiogenic non-small-cell lung cancer. Oncogene 2005, 24(7):1212-1219.<br />

21. Larsen JE, Pavey SJ, Passmore LH, Bowman R, Clarke BE, Hayward NK, Fong KM:<br />

Expression pr<strong>of</strong>iling defines a recurrence signature in lung squamous cell<br />

carcinoma. Carcinogenesis 2007, 28(3):760-766.<br />

22. Larsen JE, Pavey SJ, Passmore LH, Bowman RV, Hayward NK, Fong KM: Gene<br />

expression signature predicts recurrence in lung adenocarcinoma. Clin Cancer<br />

Res 2007, 13(10):2946-2954.<br />

23. Lau SK, Boutros PC, Pintilie M, Blackhall FH, Zhu CQ, Strumpf D, Johnston MR, Darling<br />

G, Keshavjee S, Waddell TK et al: Three-gene prognostic classifier for early-stage<br />

non small-cell lung cancer. J Clin Oncol 2007, 25(35):5562-5569.<br />

24. Oshita F, Ikehara M, Sekiyama A, Hamanaka N, Saito H, Yamada K, Noda K, Kameda<br />

Y, Miyagi Y: Genomic-wide cDNA microarray screening to correlate gene<br />

expression pr<strong>of</strong>ile with chemoresistance in patients with advanced lung cancer. J<br />

Exp Ther Oncol 2004, 4(2):155-160.<br />

25. Potti A, Mukherjee S, Petersen R, Dressman HK, Bild A, Koontz J, Kratzke R, Watson<br />

MA, Kelley M, Ginsburg GS et al: A genomic strategy to refine prognosis in earlystage<br />

non-small-cell lung cancer. N Engl J Med 2006, 355(6):570-580.<br />

26. Raponi M, Zhang Y, Yu J, Chen G, Lee G, Taylor JM, Macdonald J, Thomas D,<br />

Moskaluk C, Wang Y et al: Gene expression signatures for predicting prognosis <strong>of</strong><br />

squamous cell and adenocarcinomas <strong>of</strong> the lung. Cancer Res 2006, 66(15):7466-<br />

7472.<br />

27. Remmelink M, Mijatovic T, Gustin A, Mathieu A, Rombaut K, Kiss R, Salmon I,<br />

Decaestecker C: Identification by means <strong>of</strong> cDNA microarray analyses <strong>of</strong> gene<br />

expression modifications in squamous non-small cell lung cancers as compared<br />

to normal bronchial epithelial tissue. Int J Oncol 2005, 26(1):247-258.<br />

28. Singhal S, Amin KM, Kruklitis R, DeLong P, Friscia ME, Litzky LA, Putt ME, Kaiser LR,<br />

Albelda SM: Alterations in cell cycle genes in early stage lung adenocarcinoma<br />

identified by expression pr<strong>of</strong>iling. Cancer Biol Ther 2003, 2(3):291-298.<br />

29. Spira A, Beane JE, Shah V, Steiling K, Liu G, Schembri F, Gilman S, Dumas YM, Calner<br />

P, Sebastiani P et al: Airway epithelial gene expression in the diagnostic evaluation<br />

<strong>of</strong> smokers with suspect lung cancer. Nat Med 2007, 13(3):361-366.<br />

30. Sun Z, Wigle DA, Yang P: Non-overlapping and non-cell-type-specific gene<br />

expression signatures predict lung cancer survival. J Clin Oncol 2008, 26(6):877-<br />

883.<br />

31. Wang T, Hopkins D, Schmidt C, Silva S, Houghton R, Takita H, Repasky E, Reed SG:<br />

Identification <strong>of</strong> genes differentially over-expressed in lung squamous cell<br />

carcinoma using combination <strong>of</strong> cDNA subtraction and microarray analysis.<br />

Oncogene 2000, 19(12):1519-1528.<br />

32. Wikman H, Seppanen JK, Sarhadi VK, Kettunen E, Salmenkivi K, Kuosma E, Vainio-<br />

Siukola K, Nagy B, Karjalainen A, Sioris T et al: Caveolins as tumour markers in lung<br />

cancer detected by combined use <strong>of</strong> cDNA and tissue microarrays. J Pathol 2004,<br />

203(1):584-593.<br />

19


33. Garber ME, Troyanskaya OG, Schluens K, Petersen S, Thaesler Z, Pacyna-Gengelbach<br />

M, van de Rijn M, Rosen GD, Perou CM, Whyte RI et al: Diversity <strong>of</strong> gene expression<br />

in adenocarcinoma <strong>of</strong> the lung. Proc Natl Acad Sci U S A 2001, 98(24):13784-13789.<br />

34. Shedden K, Taylor JM, Enkemann SA, Tsao MS, Yeatman TJ, Gerald WL, Eschrich S,<br />

Jurisica I, Giordano TJ, Misek DE et al: Gene expression-based survival prediction in<br />

lung adenocarcinoma: a multi-site, blinded validation study. Nat Med 2008,<br />

14(8):822-827.<br />

35. Chari R, Lonergan KM, Ng RT, MacAulay C, Lam WL, Lam S: Effect <strong>of</strong> active smoking<br />

on the human bronchial epithelium transcriptome. BMC Genomics 2007, 8:297.<br />

36. Spira A, Beane J, Shah V, Liu G, Schembri F, Yang X, Palma J, Brody JS: Effects <strong>of</strong><br />

cigarette smoke on the human airway epithelial cell transcriptome. Proc Natl Acad<br />

Sci U S A 2004, 101(27):10143-10148.<br />

37. Lonergan KM, Chari R, Deleeuw RJ, Shadeo A, Chi B, Tsao MS, Jones S, Marra M,<br />

Ling V, Ng R et al: Identification <strong>of</strong> novel lung genes in bronchial epithelium by<br />

serial analysis <strong>of</strong> gene expression. Am J Respir Cell Mol Biol 2006, 35(6):651-661.<br />

38. Beane J, Sebastiani P, Liu G, Brody JS, Lenburg ME, Spira A: Reversible and<br />

permanent effects <strong>of</strong> tobacco smoke exposure on airway epithelial gene<br />

expression. Genome Biol 2007, 8(9):R201.<br />

39. Balsara BR, Testa JR: Chromosomal imbalances in human lung cancer. Oncogene<br />

2002, 21(45):6877-6883.<br />

40. Sato M, Shames DS, Gazdar AF, Minna JD: A translational view <strong>of</strong> the molecular<br />

pathogenesis <strong>of</strong> lung cancer. J Thorac Oncol 2007, 2(4):327-343.<br />

41. Thomas RK, Weir B, Meyerson M: Genomic approaches to lung cancer. Clin Cancer<br />

Res 2006, 12(14 Pt 2):4384s-4391s.<br />

42. Albertson DG, Collins C, McCormick F, Gray JW: Chromosome aberrations in solid<br />

tumors. Nat Genet 2003, 34(4):369-376.<br />

43. Lockwood WW, Chari R, Chi B, Lam WL: Recent advances in array comparative<br />

genomic hybridization technologies and their applications in human genetics. Eur<br />

J Hum Genet 2006, 14(2):139-148.<br />

44. Garnis C, Lockwood WW, Vucic E, Ge Y, Girard L, Minna JD, Gazdar AF, Lam S,<br />

MacAulay C, Lam WL: High resolution analysis <strong>of</strong> non-small cell lung cancer cell<br />

lines by whole genome tiling path array CGH. Int J Cancer 2006, 118(6):1556-1564.<br />

45. Janne PA, Li C, Zhao X, Girard L, Chen TH, Minna J, Christiani DC, Johnson BE,<br />

Meyerson M: High-resolution single-nucleotide polymorphism array and clustering<br />

analysis <strong>of</strong> loss <strong>of</strong> heterozygosity in human lung cancer cell lines. Oncogene 2004,<br />

23(15):2716-2726.<br />

46. Zhao X, Li C, Paez JG, Chin K, Janne PA, Chen TH, Girard L, Minna J, Christiani D, Leo<br />

C et al: An integrated view <strong>of</strong> copy number and allelic alterations in the cancer<br />

genome using single nucleotide polymorphism arrays. Cancer Res 2004,<br />

64(9):3060-3071.<br />

47. Zhao X, Weir BA, LaFramboise T, Lin M, Beroukhim R, Garraway L, Beheshti J, Lee JC,<br />

Naoki K, Richards WG et al: Homozygous deletions and chromosome<br />

amplifications in human lung carcinomas revealed by single nucleotide<br />

polymorphism array analysis. Cancer Res 2005, 65(13):5561-5570.<br />

48. Weir BA, Woo MS, Getz G, Perner S, Ding L, Beroukhim R, Lin WM, Province MA, Kraja<br />

A, Johnson LA et al: Characterizing the cancer genome in lung adenocarcinoma.<br />

Nature 2007, 450(7171):893-898.<br />

49. Chitale D, Gong Y, Taylor BS, Broderick S, Brennan C, Somwar R, Golas B, Wang L,<br />

Motoi N, Szoke J et al: An integrated genomic analysis <strong>of</strong> lung cancer reveals loss<br />

<strong>of</strong> DUSP4 in EGFR-mutant tumors. Oncogene 2009, 28(31):2773-2783.<br />

50. Kendall J, Liu Q, Bakleh A, Krasnitz A, Nguyen KC, Lakshmi B, Gerald WL, Powers S,<br />

Mu D: Oncogenic cooperation and coamplification <strong>of</strong> developmental transcription<br />

factor genes in lung cancer. Proc Natl Acad Sci U S A 2007, 104(42):16663-16668.<br />

20


51. Tonon G, Wong KK, Maulik G, Brennan C, Feng B, Zhang Y, Khatry DB, Protopopov A,<br />

You MJ, Aguirre AJ et al: High-resolution genomic pr<strong>of</strong>iles <strong>of</strong> human lung cancer.<br />

Proc Natl Acad Sci U S A 2005, 102(27):9625-9630.<br />

52. Lockwood WW, Chari R, Coe BP, Girard L, Macaulay C, Lam S, Gazdar AF, Minna JD,<br />

Lam WL: DNA amplification is a ubiquitous mechanism <strong>of</strong> oncogene activation in<br />

lung and other cancers. Oncogene 2008, 27(33):4615-4624.<br />

53. Cavenee WK: Loss <strong>of</strong> heterozygosity in stages <strong>of</strong> malignancy. Clin Chem 1989,<br />

35(7 Suppl):B48-52.<br />

54. Bepler G, Garcia-Blanco MA: Three tumor-suppressor regions on chromosome 11p<br />

identified by high-resolution deletion mapping in human non-small-cell lung<br />

cancer. Proc Natl Acad Sci U S A 1994, 91(12):5513-5517.<br />

55. Fong KM, Zimmerman PV, Smith PJ: Microsatellite instability and other molecular<br />

abnormalities in non-small cell lung cancer. Cancer Res 1995, 55(1):28-30.<br />

56. Merlo A, Gabrielson E, Askin F, Sidransky D: Frequent loss <strong>of</strong> chromosome 9 in<br />

human primary non-small cell lung cancer. Cancer Res 1994, 54(3):640-642.<br />

57. Merlo A, Gabrielson E, Mabry M, Vollmer R, Baylin SB, Sidransky D: Homozygous<br />

deletion on chromosome 9p and loss <strong>of</strong> heterozygosity on 9q, 6p, and 6q in<br />

primary human small cell lung cancer. Cancer Res 1994, 54(9):2322-2326.<br />

58. Sundaresan V, Heppell-Parton A, Coleman N, Miozzo M, Sozzi G, Ball R, Cary N,<br />

Hasleton P, Fowler W, Rabbitts P: Somatic genetic changes in lung cancer and<br />

precancerous lesions. Ann Oncol 1995, 6 Suppl 1:27-31; discussion 31-22.<br />

59. Huang J, Wei W, Chen J, Zhang J, Liu G, Di X, Mei R, Ishikawa S, Aburatani H, Jones<br />

KW et al: CARAT: a novel method for allelic detection <strong>of</strong> DNA copy number<br />

changes using high density oligonucleotide arrays. BMC Bioinformatics 2006, 7:83.<br />

60. Yamamoto G, Nannya Y, Kato M, Sanada M, Levine RL, Kawamata N, Hangaishi A,<br />

Kurokawa M, Chiba S, Gilliland DG et al: Highly sensitive method for genomewide<br />

detection <strong>of</strong> allelic composition in nonpaired, primary tumor specimens by use <strong>of</strong><br />

affymetrix single-nucleotide-polymorphism genotyping microarrays. Am J Hum<br />

Genet 2007, 81(1):114-126.<br />

61. LaFramboise T, Weir BA, Zhao X, Beroukhim R, Li C, Harrington D, Sellers WR,<br />

Meyerson M: Allele-specific amplification in cancer revealed by SNP array<br />

analysis. PLoS Comput Biol 2005, 1(6):e65.<br />

62. Li C, Beroukhim R, Weir BA, Winckler W, Garraway LA, Sellers WR, Meyerson M: Major<br />

copy proportion analysis <strong>of</strong> tumor samples using SNP arrays. BMC Bioinformatics<br />

2008, 9:204.<br />

63. Lin M, Wei LJ, Sellers WR, Lieberfarb M, Wong WH, Li C: dChipSNP: significance<br />

curve and clustering <strong>of</strong> SNP-array-based loss-<strong>of</strong>-heterozygosity data.<br />

Bioinformatics 2004, 20(8):1233-1240.<br />

64. Ding L, Getz G, Wheeler DA, Mardis ER, McLellan MD, Cibulskis K, Sougnez C,<br />

Greulich H, Muzny DM, Morgan MB et al: Somatic mutations affect key pathways in<br />

lung adenocarcinoma. Nature 2008, 455(7216):1069-1075.<br />

65. Suda K, Tomizawa K, Mitsudomi T: Biological and clinical significance <strong>of</strong> KRAS<br />

mutations in lung cancer: an oncogenic driver that contrasts with EGFR mutation.<br />

Cancer Metastasis Rev 2010.<br />

66. Feinberg AP: Phenotypic plasticity and the epigenetics <strong>of</strong> human disease. Nature<br />

2007, 447(7143):433-440.<br />

67. Feinberg AP, Gehrke CW, Kuo KC, Ehrlich M: Reduced genomic 5-methylcytosine<br />

content in human colonic neoplasia. Cancer Res 1988, 48(5):1159-1161.<br />

68. Feinberg AP, Tycko B: The history <strong>of</strong> cancer epigenetics. Nat Rev Cancer 2004,<br />

4(2):143-153.<br />

69. Lo PK, Sukumar S: Epigenomics and breast cancer. Pharmacogenomics 2008,<br />

9(12):1879-1902.<br />

70. Decitabine: 2'-deoxy-5-azacytidine, Aza dC, DAC, dezocitidine, NSC 127716. Drugs<br />

R D 2003, 4(3):179-184.<br />

21


71. Shivapurkar N, Gazdar AF: DNA Methylation Based Biomarkers in Non-Invasive<br />

Cancer Screening. Curr Mol Med.<br />

72. Anglim PP, Alonzo TA, Laird-Offringa IA: DNA methylation-based biomarkers for<br />

early detection <strong>of</strong> non-small cell lung cancer: an update. Mol Cancer 2008, 7:81.<br />

73. Heller G, Zielinski CC, Zochbauer-Muller S: Lung cancer: From single-gene<br />

methylation to methylome pr<strong>of</strong>iling. Cancer Metastasis Rev 2010.<br />

74. Bibikova M, Lin Z, Zhou L, Chudin E, Garcia EW, Wu B, Doucet D, Thomas NJ, Wang Y,<br />

Vollmer E et al: High-throughput DNA methylation pr<strong>of</strong>iling using universal bead<br />

arrays. Genome Res 2006, 16(3):383-393.<br />

75. Weber M, Davies JJ, Wittig D, Oakeley EJ, Haase M, Lam WL, Schubeler D:<br />

Chromosome-wide and promoter-specific analyses identify sites <strong>of</strong> differential<br />

DNA methylation in normal and transformed human cells. Nat Genet 2005,<br />

37(8):853-862.<br />

76. Shames DS, Girard L, Gao B, Sato M, Lewis CM, Shivapurkar N, Jiang A, Perou CM,<br />

Kim YH, Pollack JR et al: A genome-wide screen for promoter methylation in lung<br />

cancer identifies novel methylation markers for multiple malignancies. PLoS Med<br />

2006, 3(12):e486.<br />

77. Thu KL, Pikor LA, Kennett JY, Alvarez CE, Lam WL: Methylation analysis by DNA<br />

immunoprecipitation. J Cell Physiol 2009, 222(3):522-531.<br />

78. Khulan B, Thompson RF, Ye K, Fazzari MJ, Suzuki M, Stasiek E, Figueroa ME, Glass<br />

JL, Chen Q, Montagna C et al: Comparative isoschizomer pr<strong>of</strong>iling <strong>of</strong> cytosine<br />

methylation: the HELP assay. Genome Res 2006, 16(8):1046-1055.<br />

79. Omura N, Li CP, Li A, Hong SM, Walter K, Jimeno A, Hidalgo M, Goggins M: Genomewide<br />

pr<strong>of</strong>iling <strong>of</strong> methylated promoters in pancreatic adenocarcinoma. Cancer Biol<br />

Ther 2008, 7(7):1146-1156.<br />

80. Slamon DJ, Godolphin W, Jones LA, Holt JA, Wong SG, Keith DE, Levin WJ, Stuart SG,<br />

Udove J, Ullrich A et al: Studies <strong>of</strong> the HER-2/neu proto-oncogene in human breast<br />

and ovarian cancer. Science 1989, 244(4905):707-712.<br />

81. Coe BP, Chari R, Lockwood WW, Lam WL: Evolving strategies for global gene<br />

expression analysis <strong>of</strong> cancer. J Cell Physiol 2008, 217(3):590-597.<br />

82. Heidenblad M, Lindgren D, Veltman JA, Jonson T, Mahlamaki EH, Gorunova L, van<br />

Kessel AG, Schoenmakers EF, Hoglund M: Microarray analyses reveal strong<br />

influence <strong>of</strong> DNA copy number alterations on the transcriptional patterns in<br />

pancreatic cancer: implications for the interpretation <strong>of</strong> genomic amplifications.<br />

Oncogene 2005, 24(10):1794-1801.<br />

83. Hyman E, Kauraniemi P, Hautaniemi S, Wolf M, Mousses S, Rozenblum E, Ringner M,<br />

Sauter G, Monni O, Elkahloun A et al: Impact <strong>of</strong> DNA amplification on gene<br />

expression patterns in breast cancer. Cancer Res 2002, 62(21):6240-6245.<br />

84. Pollack JR, Sorlie T, Perou CM, Rees CA, Jeffrey SS, Lonning PE, Tibshirani R,<br />

Botstein D, Borresen-Dale AL, Brown PO: Microarray analysis reveals a major direct<br />

role <strong>of</strong> DNA copy number alteration in the transcriptional program <strong>of</strong> human<br />

breast tumors. Proc Natl Acad Sci U S A 2002, 99(20):12963-12968.<br />

85. Brazma A, Robinson A, Cameron G, Ashburner M: One-stop shop for microarray<br />

data. Nature 2000, 403(6771):699-700.<br />

86. Tusher VG, Tibshirani R, Chu G: Significance analysis <strong>of</strong> microarrays applied to the<br />

ionizing radiation response. Proc Natl Acad Sci U S A 2001, 98(9):5116-5121.<br />

87. Rajagopalan D: A comparison <strong>of</strong> statistical methods for analysis <strong>of</strong> high density<br />

oligonucleotide array data. Bioinformatics 2003, 19(12):1469-1476.<br />

88. Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, Speed TP:<br />

Exploration, normalization, and summaries <strong>of</strong> high density oligonucleotide array<br />

probe level data. Biostatistics 2003, 4(2):249-264.<br />

89. Myers CL, Chen X, Troyanskaya OG: Visualization-based discovery and analysis <strong>of</strong><br />

genomic aberrations in microarray data. BMC Bioinformatics 2005, 6:146.<br />

22


90. Chen W, Erdogan F, Ropers HH, Lenzner S, Ullmann R: CGHPRO -- a comprehensive<br />

data analysis tool for array CGH. BMC Bioinformatics 2005, 6:85.<br />

91. Kim SY, Nam SW, Lee SH, Park WS, Yoo NJ, Lee JY, Chung YJ: ArrayCyGHt: a web<br />

application for analysis and visualization <strong>of</strong> array-CGH data. Bioinformatics 2005,<br />

21(10):2554-2555.<br />

92. Autio R, Hautaniemi S, Kauraniemi P, Yli-Harja O, Astola J, Wolf M, Kallioniemi A: CGH-<br />

Plotter: MATLAB toolbox for CGH-data analysis. Bioinformatics 2003, 19(13):1714-<br />

1715.<br />

93. Lingjaerde OC, Baumbusch LO, Liestol K, Glad IK, Borresen-Dale AL: CGH-Explorer: a<br />

program for analysis <strong>of</strong> array-CGH data. Bioinformatics 2005, 21(6):821-822.<br />

94. Chi B, DeLeeuw RJ, Coe BP, MacAulay C, Lam WL: SeeGH--a s<strong>of</strong>tware tool for<br />

visualization <strong>of</strong> whole genome array comparative genomic hybridization data.<br />

BMC Bioinformatics 2004, 5:13.<br />

95. Chari R, Lockwood WW, Coe BP, Chu A, Macey D, Thomson A, Davies JJ, MacAulay<br />

C, Lam WL: SIGMA: a system for integrative genomic microarray analysis <strong>of</strong><br />

cancer genomes. BMC Genomics 2006, 7:324.<br />

96. Chari R, Lockwood WW, Lam WL: Computational methods for the analysis <strong>of</strong> array<br />

comparative genomic hybridization. Cancer Inform 2007, 2:48-58.<br />

97. Olshen AB, Venkatraman ES, Lucito R, Wigler M: Circular binary segmentation for<br />

the analysis <strong>of</strong> array-based DNA copy number data. Biostatistics 2004, 5(4):557-572.<br />

98. Venkatraman ES, Olshen AB: A faster circular binary segmentation algorithm for<br />

the analysis <strong>of</strong> array CGH data. Bioinformatics 2007, 23(6):657-663.<br />

99. Coe BP, Ylstra B, Carvalho B, Meijer GA, Macaulay C, Lam WL: Resolving the<br />

resolution <strong>of</strong> array CGH. Genomics 2007, 89(5):647-653.<br />

100. Coe BP, Chari R, MacAulay C, Lam WL: FACADE: A fast and sensitive algorithm for<br />

the segmentation and calling <strong>of</strong> high resolution array CGH data. Nucleic Acids Res<br />

2010, Revision.<br />

101. Lonergan KM, Chari R, Coe BP, Wilson IM, Tsao MS, Ng RT, MacAulay C, Lam S, Lam<br />

WL: Transcriptome pr<strong>of</strong>iles <strong>of</strong> carcinoma-in-situ and invasive non-small cell lung<br />

cancer as revealed by SAGE. PLoS One 2010, Accepted.<br />

102. Lee EHL, Chari R, Lam A, Ng RT, Yee J, English J, Evans KG, MacAulay C, Lam S,<br />

Lam WL: Disruption <strong>of</strong> the non-canonical WNT pathway in lung squamous cell<br />

carcinoma. Clinical Medicine: Oncology 2008, 2:169-179.<br />

103. Chari R, Lonergan KM, Pikor LA, Coe BP, Zhu CQ, Chan THW, MacAulay C, Tsao MS,<br />

Lam S, Ng RT et al: A sequence-based approach to identify reference genes for<br />

gene expression analysis. BMC Medical Genomics 2010, Submitted.<br />

104. Lockwood WW, Chari R, Coe BP, Thu KL, Garnis C, Mall<strong>of</strong>f CA, Campbell J, Williams<br />

AC, Hwang D, Zhu CQ et al: BRF2 – A Novel Lineage Specific Oncogene in Lung<br />

Squamous Cell Carcinoma. PLoS Med 2010, Revisions.<br />

23


<strong>Chapter</strong> 2: SIGMA 2 : A system for the integrative genomic<br />

multi-dimensional analysis <strong>of</strong> cancer genomes, epigenomes,<br />

and transcriptomes 1<br />

1 A version <strong>of</strong> this chapter has been published. Chari R, Coe BP, Wedselt<strong>of</strong>t C, Benetti M,<br />

Wilson IM, Vucic EA, MacAulay C, Ng RT, Lam WL. (2008) SIGMA2: A system for the<br />

integrative genomic multi-dimensional analysis <strong>of</strong> cancer genomes, epigenomes, and<br />

transcriptomes. BMC Bioinformatics 9:422. doi:10.1186/1471-2105-9-422. Please see the<br />

published version <strong>of</strong> this chapter for all supplementary materials.<br />

24


2.1 Introduction<br />

Multiple mechanisms <strong>of</strong> gene disruption have been shown to be important in the development <strong>of</strong><br />

cancer. Genetic alterations (mutations, changes in gene dosage, allele imbalance) and<br />

epigenetic alterations (changes in DNA methylation and histone modification states) are<br />

responsible for changing the expression <strong>of</strong> genes. High throughput approaches have afforded<br />

the ability to interrogate the genomic, epigenomic and gene expression (transcriptomic) pr<strong>of</strong>iles<br />

at unprecedented resolution [1-6]. However, a gene can be disrupted by one or by a<br />

combination <strong>of</strong> mechanisms, therefore, investigation in a single 'omics dimension (genomics,<br />

epigenomics, or transcriptomics) alone cannot detect all disrupted genes in a given tumor.<br />

Moreover, individual tumors may have different patterns <strong>of</strong> gene disruption, by different<br />

mechanisms for a given gene while achieving the same net effect on phenotype. Hence, a<br />

multi-dimensional approach is required to identify the causal events at the DNA level and<br />

understand their downstream consequences.<br />

The current state <strong>of</strong> s<strong>of</strong>tware for global pr<strong>of</strong>ile comparison typically focuses on analyzing and<br />

displaying data from a single dimension, for example CGH Fusion (infoQuant Ltd, London, UK)<br />

for DNA copy number pr<strong>of</strong>ile analysis and GeneSpring (Agilent Technologies, Santa Clara, CA,<br />

USA) for gene expression pr<strong>of</strong>ile analysis. S<strong>of</strong>tware for integrative analysis have been<br />

restricted to working with datasets derived from limited combination <strong>of</strong> technology platforms<br />

(Table 1) [7-10]. Though different s<strong>of</strong>tware can analyze data generated from different<br />

platforms, the ability to perform meta-analysis using data from multiple microarray platforms is<br />

limited to a small number <strong>of</strong> s<strong>of</strong>tware packages. Consequently, integrative analysis <strong>of</strong> cancer<br />

genomes typically involves no more than two types <strong>of</strong> data, most commonly the integration <strong>of</strong><br />

gene dosage and gene expression data [11-16] and recently expanded to integrating allelic<br />

information [17]. S<strong>of</strong>tware to perform multi-dimensional analysis are therefore greatly in<br />

demand.<br />

25


Here, we present SIGMA 2 , a novel s<strong>of</strong>tware package which allows users to integrate data from<br />

the various 'omics disciplines such as genomics, epigenomics and transcriptomics. Multi-<br />

dimensional datasets can be simultaneously compared, analyzed and visualized with respect to<br />

individual dimensions, allowing combinatorial integration <strong>of</strong> the different assays belonging to the<br />

different 'omics. The identification <strong>of</strong> genes altered at multiple levels such as copy number,<br />

LOH, DNA methylation and the detection <strong>of</strong> consequential changes in gene expression can be<br />

concertedly performed, establishing SIGMA 2 as a tool to facilitate the high throughput systems<br />

biology analysis <strong>of</strong> cancer. SIGMA 2 is freely available for academic and research use from our<br />

website, http://www.flintbox.com/technology.asp?Page=3716.<br />

2.2 Implementation<br />

SIGMA 2 is implemented in Java, and requires version 1.6+ <strong>of</strong> the runtime compiler. In addition,<br />

the statistical package R and database application MySQL are also required. The java interface<br />

communicates with MySQL using a JDBC connector and with R using the JRI package by JGR<br />

(Figure 2.1). MySQL is used for data storage and querying while R is used for the<br />

segmentation and statistical analysis. All genomic coordinate information was obtained from<br />

<strong>University</strong> <strong>of</strong> California Santa Cruz (UCSC) genome databases [18].<br />

2.3 Results and discussion<br />

2.3.1 Look and feel <strong>of</strong> SIGMA 2<br />

The novel multi-dimensional ‘omics data analysis s<strong>of</strong>tware SIGMA 2 is built on the framework <strong>of</strong><br />

a facile visualization tool called SIGMA, which can display alignment <strong>of</strong> genomic data from a<br />

built-in static database [7]. The arsenal <strong>of</strong> functionalities introduced in SIGMA 2 is shown in<br />

Table 2.1.<br />

26


2.3.2 Description <strong>of</strong> application scope and functionality<br />

SIGMA 2 is built to handle a variety <strong>of</strong> analysis techniques typically used in the high-throughput<br />

study <strong>of</strong> cancer, allowing the combinatorial integration <strong>of</strong> multiple 'omics disciplines. The<br />

hierarchy, which underlies the program, groups data into genome, epigenome, and<br />

transcriptome is shown in Figure 2.2a and the overall functionality map is given in Figure 2.2b<br />

and listed in Table 2.2. With each 'omics dimension, data sets may be imported representing<br />

any <strong>of</strong> the major types <strong>of</strong> biological measurements being assayed, for example, (i) examining<br />

both DNA copy number and LOH assays within the genomic bundle, (ii) examining both DNA<br />

methylation and histone modification status within the epigenomics bundle, and (iii) examining<br />

both gene expression pr<strong>of</strong>iles and microRNAs expression assays within the transcriptomic<br />

bundle. Each assay may branch into data sources from a multitude <strong>of</strong> technology platforms.<br />

2.3.3 Approach to integration between array platforms and assays<br />

SIGMA 2 treats all data in the context <strong>of</strong> genome position based on the relevant human genome<br />

build using the UCSC genome assemblies. An interval-based approach is used to sample<br />

across different array platforms and assays and data from each interval are merged together.<br />

Briefly, this is done by querying data at fixed genomic intervals for each platform and<br />

subsequently taking an average <strong>of</strong> the measurements within each interval. The algorithm is<br />

listed in Figure 2.3.<br />

2.3.4 Format requirements <strong>of</strong> input data<br />

Standard tab-delimited text files are used for the input <strong>of</strong> data for all <strong>of</strong> the assay types. For<br />

genomic data, specifically array CGH, normalization is recommended using external algorithms<br />

such as CGH-Norm and MANOR [19, 20]. Segmentation analysis can be performed within<br />

SIGMA 2 , but results from external analysis can be imported and used in the consensus calling<br />

feature. The algorithms which can be called within SIGMA 2 currently include DNACopy and<br />

GLAD [21, 22]. Multiple sample batch importing is available to facilitate efficient loading <strong>of</strong><br />

27


datasets. To utilize this, the user must create an information file which describes each sample<br />

in the dataset. Formatting requirements <strong>of</strong> the information file are specified in the manual.<br />

Alternatively, for Affymetrix SNP array analysis, data should also be pre-processed and<br />

normalized using the appropriate s<strong>of</strong>tware, such as CNAG before importing into SIGMA 2 [23].<br />

Genotyping calls should be made prior to importing, using the "AA", "AB" and "BB" convention.<br />

If the genotype call does not exist, "NC" must be specified. For epigenomic data, data from<br />

affinity based-approaches (MeDIP [6] and ChIP [24]) should contain a value representing the<br />

level <strong>of</strong> enrichment and the genomic coordinates for each spot. Similarly, for bisulphite-based<br />

approaches [25], a percent <strong>of</strong> converted CpGs should be provided along with the genomic<br />

coordinates for each spot. Finally, for transcriptome data, gene expression data from Affymetrix<br />

experiments can be directly imported and processed as CEL files and are normalized using the<br />

MAS 5.0 algorithm implemented in the "affy" package <strong>of</strong> R. For any assay type, custom data<br />

can be imported whereby the user provides a map <strong>of</strong> the platform based on the given genome<br />

build, and the unique identifier for the map must be used for the data generated from those<br />

experiments.<br />

2.3.5 Description <strong>of</strong> user interface<br />

The main user interface in SIGMA 2 utilizes a tabbed window-pane which allows the user to<br />

open multiple visualizations simultaneously (Figure 2.4). The left part <strong>of</strong> the window manages<br />

the analyses and projects which belong to the current user and button shortcuts for the main<br />

functionality are spread along the top <strong>of</strong> the window. Using an example <strong>of</strong> an array CGH pr<strong>of</strong>ile<br />

from the Agilent 244K platform, we demonstrate the step-wise interrogation <strong>of</strong> a region <strong>of</strong><br />

interest [26]. Briefly, using the highlighting toolbar button, the user can select a region <strong>of</strong><br />

interest and subsequently, by clicking the right mouse button, the user can search for annotated<br />

genes within the specified genomic coordinates.<br />

28


2.3.6 Analysis <strong>of</strong> data from a single assay type<br />

The first, and most basic, level <strong>of</strong> analysis is from a single assay type. For array CGH, multiple<br />

options for segmentation algorithms are available within the program and results from externally<br />

run segmentation can be imported as well. However, each segmentation algorithm has its<br />

advantages and disadvantages depending on the type <strong>of</strong> data used and the quality <strong>of</strong> data at<br />

hand. A unique feature <strong>of</strong> SIGMA 2 is the ability to take a consensus <strong>of</strong> multiple algorithms using<br />

"And" or "Or" logic between algorithms. Moreover, a level <strong>of</strong> consensus can be specified<br />

(Figure 2.5a). For example, if an experiment is analyzed using five approaches, the user can<br />

select areas <strong>of</strong> gain and loss which were detected by one algorithm, at least three algorithms,<br />

all five algorithms, etc. For LOH, basic analysis using the number <strong>of</strong> consecutive markers that<br />

exhibit LOH is used to determine its status. Affinity-based approaches for DNA methylation and<br />

histone modification states or bead-based percentage <strong>of</strong> CpG island methylation is analyzed by<br />

either direct thresholding or z-transform thresholding. For any <strong>of</strong> the different assay types,<br />

when examining across a number <strong>of</strong> samples, a frequency <strong>of</strong> alteration can be calculated and<br />

plotted.<br />

For data from different array platforms, but assaying the same biological measurement, the<br />

algorithm for integration is used to derive common data. This feature is most applicable to DNA<br />

copy number data due to the number <strong>of</strong> array CGH platforms. This allows for better utilization<br />

<strong>of</strong> publicly available data and thus, increasing sample size for statistical analysis. Similar to the<br />

multiple sample analysis <strong>of</strong> data on the sample platform, a frequency <strong>of</strong> altered states can be<br />

generated and plotted. Figure 5A shows the concerted analysis <strong>of</strong> a sample pr<strong>of</strong>iled on the<br />

Affymetrix 500K SNP array, Agilent 244K CGH array and the whole genome tiling path BAC<br />

array (Figure 2.5b).<br />

29


2.3.7 Analysis <strong>of</strong> data from multiple assays in a given 'omics dimension<br />

Within a given 'omics dimension, multiple assay types can be analyzed in combination. For<br />

example, it is useful to investigate copy number and LOH and the interplay between DNA<br />

methylation and different states <strong>of</strong> histone modification. Typically, in regions <strong>of</strong> copy number<br />

loss, LOH is also observed. However, LOH can also occur in regions which are copy number<br />

neutral, indicating a change in allelic status which is not interpretable by one dimension alone.<br />

Here, we show a sample for which copy number and LOH information exists, a region <strong>of</strong> copy<br />

number loss associated with LOH (Figure 2.6). In terms <strong>of</strong> epigenetics, DNA methylation and<br />

states <strong>of</strong> histone methylation and acetylation have been known to be biologically relevant. With<br />

high throughput technologies available to assay these dimensions, this type <strong>of</strong> analysis will<br />

become more prevalent.<br />

2.3.8 Combinatorial analysis <strong>of</strong> multiple 'omics dimensions - gene dosage and gene<br />

expression<br />

The most common analysis <strong>of</strong> multiple 'omics dimensions is the influence <strong>of</strong> the genome on the<br />

transcriptome. A number <strong>of</strong> s<strong>of</strong>tware packages have started to incorporate approaches to<br />

examining gene dosage and gene expression [8, 9, 27]. In SIGMA 2 , there are multiple<br />

functionalities which allow the user to link DNA copy number to gene expression. For a single<br />

group <strong>of</strong> samples, with matching DNA copy number and gene expression pr<strong>of</strong>iles, the user can<br />

determine associations through two main options: a) using a correlation-based approach,<br />

correlating the log ratios with the normalized gene expression intensities and b) using a<br />

statistical-based approach comparing the expression in samples with copy number changes<br />

against those without copy number change utilizing the Mann Whitney U test, analogous to<br />

approaches taken in previous studies [27]. Spearman, Kendall or Pearson correlation<br />

coefficients can be calculated for option a). Similarly, this functionality is also available for<br />

correlating epigenetic pr<strong>of</strong>iles and gene expression.<br />

30


In addition to single group analysis, two-dimensional genome/transcriptome analysis can be<br />

applied to two-group comparison analysis. For example, if patterns <strong>of</strong> copy number alterations<br />

are compared between two groups and a particular region is more frequently gained in one<br />

group than another, the expression data can subsequently compared between the groups <strong>of</strong><br />

sample to determine if there is an association between gene dosage and gene expression.<br />

That is, we would expect the group with more frequent copy number gain to have higher<br />

expression than the other group. Notably, this functionality does not require both copy number<br />

and expression data to exist for the same sample, but allows the user to select an independent<br />

dataset for expression data comparisons (Figure 2.7).<br />

2.3.9 Group comparison analysis - single ‘omics dimension<br />

Finally, for two groups <strong>of</strong> samples, the user can compare the distribution <strong>of</strong> changes between<br />

two groups to determine if the patterns are statistically different using a Fisher's Exact test. For<br />

DNA copy number, it is the distribution <strong>of</strong> gain and losses; for DNA methylation or histone<br />

modification states, the proportion <strong>of</strong> samples that meet the threshold <strong>of</strong> enrichment for each<br />

group (low or high); and for LOH, proportion <strong>of</strong> samples with LOH for a region for each group.<br />

2.3.10 Group comparison analysis - integrating multiple 'omics dimensions<br />

This type <strong>of</strong> analysis can be performed with a single sample or multiple samples, thus allowing<br />

combinatorial (“and”) analysis for large datasets. In addition, the user can also identify "or"<br />

events, where a change in any <strong>of</strong> the dimensions can be flagged. This is more important in<br />

multi-sample datasets as one dimension may not capture complex alterations <strong>of</strong> a particular<br />

region.<br />

2.3.11 Multi-dimensional analysis <strong>of</strong> a breast cancer genome<br />

Using the breast cancer cell line HCC2218, we show the integration <strong>of</strong> genomic, epigenomic,<br />

and transcriptomic data. Interestingly, when we examine the ERBB2 gene on chromosome 17,<br />

we show concurrent amplification, LOH, loss <strong>of</strong> methylation and drastic increase in gene<br />

31


expression (Figure 2.8). ERBB2 has shown to be an important gene in breast cancer<br />

development and therapeutic intervention. This demonstrates the value in integrating multiple<br />

dimensions to understand complex alteration patterns in disease samples where multiple<br />

causes can lead to a single effect.<br />

2.3.12 Exporting data and results<br />

High resolution images can be exported for all types <strong>of</strong> visualizations in SIGMA 2 . Histogram<br />

plots <strong>of</strong> gene expression, heatmaps with clustering <strong>of</strong> gene expression, karyogram plots and<br />

frequency histogram plots are the main types <strong>of</strong> visualization available. Frequency histogram<br />

data which is used to generate the plots can also be exported. Integrated plots with data plotted<br />

serially or overlaid are also available for analysis involving multiple genomic and epigenomic<br />

dimensions. Genes which are obtained from the conjunctive (And) and disjunctive (Or) multi-<br />

dimensional analysis can be exported with their status. Results <strong>of</strong> statistical analysis such as<br />

Fisher's exact comparisons and U-test comparisons <strong>of</strong> gene expression can be exported<br />

against annotate gene lists based on user-specified human genome builds. Currently, April<br />

2003 (hg15), May 2004 (hg17) and March 2006 (hg18) are the available genome builds [18].<br />

As new builds are released, support for those builds will be available. Finally, data from multi-<br />

platform integration can be exported based on based pair position for additional external<br />

analysis if necessary.<br />

2.4 Conclusions<br />

With the increase in high-throughput data covering multiple dimensions <strong>of</strong> the genome,<br />

epigenome and transcriptome, the approaches and tools to analyze this data must advance<br />

accordingly to handle, analyze and interpret this data in an integrated manner. SIGMA 2 meets<br />

these requirements and provides the framework for the incorporation <strong>of</strong> data from future<br />

approaches and technologies. Specifically, with the movement from array to sequence-based<br />

32


technologies, the ability to assimilate sequence data with the various 'omics data sets will<br />

become a future requirement <strong>of</strong> s<strong>of</strong>tware packages.<br />

2.5 Availability and requirements<br />

Project name: SIGMA 2<br />

Operating system(s): Java SE V.1.6+, R Project V.2.5+, Windows XP or Vista<br />

License: Free for academic and research use; commercial users please contact<br />

33


Figure 2.1<br />

R<br />

-Segmentation<br />

-Statistical analysis<br />

RMySQL<br />

MySQL<br />

-Data storage<br />

-Querying<br />

SIGMA 2<br />

JGR / JRI<br />

JDBC<br />

Java<br />

-User interface<br />

-Visualization<br />

34<br />

Link to<br />

external<br />

resources<br />

Biological Databases<br />

• PubMed<br />

• OMIM<br />

• NCBI Gene<br />

• UCSC Genome Browser<br />

• GEO Pr<strong>of</strong>iles<br />

• Database <strong>of</strong> Genomic Variants<br />

Figure 2.1. Main structural components <strong>of</strong> SIGMA2. Data and genome mapping information<br />

is stored in the MySQL database. Segmentation analysis using DNACopy and<br />

GLAD and statistical analysis is performed using R, with results stored in database. Java<br />

was used to program the application, specifically for the user interface and the different<br />

types <strong>of</strong> visualization. Base-pair positions and gene annotations are linked to other biological<br />

databases to facilitate further interrogation by the user.


Figure 2.2<br />

a<br />

Omics<br />

Assay<br />

Platform<br />

b<br />

Combinatorial Integration<br />

Genome Epigenome Transcriptome<br />

DNA Copy Number Allelic Imbalance (LOH) DNA Methylation Histone<br />

modification<br />

BAC<br />

array CGH<br />

Single sample<br />

Oligo<br />

array CGH<br />

Multiple samples (one group)<br />

Multiple samples (two groups)<br />

SNP Microsatellite<br />

Arrays markers<br />

Segmentation analysis for array CGH to identify regions <strong>of</strong> gain<br />

A<br />

and loss<br />

Moving average thresholding for affinity based approaches<br />

B (MeDIP for DNA methylation, ChIP-on-chip for histone<br />

modification states)<br />

C Regions <strong>of</strong> loss <strong>of</strong> heterozygosity (LOH)<br />

D Regions <strong>of</strong> copy number change and LOH<br />

E Regions <strong>of</strong> copy number neutrality and LOH (e.g. UPD)<br />

F Regions <strong>of</strong> copy number AND methylation alteration ("two" hit)<br />

Regions <strong>of</strong> copy number OR methylation alteration<br />

G<br />

(compensatory change with same net effect)<br />

Epigenetic interplay between DNA methylation and various<br />

H<br />

modification states <strong>of</strong> histones<br />

Correlation <strong>of</strong> copy number and gene expression (dataset with<br />

I<br />

matched copy number and expression pr<strong>of</strong>iles)<br />

Statistical comparison <strong>of</strong> samples with copy number change<br />

J versus without copy number change (dataset with matched copy<br />

number and expression pr<strong>of</strong>iles) using Mann Whitney U-test<br />

MeDIP - Bisulphite-<br />

array CGH based<br />

methods<br />

Single Platform /<br />

Single Assay<br />

35<br />

ChIP-onchip<br />

Single ‘omics<br />

(Multiple assays)<br />

A,B,C,Q,R,S D,E,H<br />

A,B,C,L,Q,R,S<br />

A,B,C,L,M,Q,R,S<br />

D,E,H<br />

D,E,H<br />

Gene & MicroRNA<br />

Expression<br />

SAGE Microarrays<br />

Combinatorial Integration<br />

(Multiple ‘omics)<br />

F,G,O,P<br />

F,G,I,J,K,O,P<br />

F,G,I,J,K,N,O,P<br />

Correlation <strong>of</strong> DNA methylation and gene expression (dataset<br />

K<br />

with matched DNA methylation and expression pr<strong>of</strong>iles)<br />

Identify recurrent changes (copy number alterations, common<br />

L<br />

enrichment patterns [MeDIP, ChIP], regions <strong>of</strong> LOH)<br />

Statistical comparison <strong>of</strong> patters <strong>of</strong> recurrent changes between<br />

M<br />

two groups using Fisher's exact test<br />

Two-dimensional two-group comparisons (statistical comparison<br />

N <strong>of</strong> expression pr<strong>of</strong>iles <strong>of</strong> genes in regions <strong>of</strong> difference identified<br />

by Fisher's exact comparison)<br />

Identify "And" events between three or more DNA-based<br />

O dimensions (copy number, LOH, DNA methylation, histone<br />

modification states)<br />

Identify "Or" events between three or more DNA-based<br />

P dimensions (copy number, LOH, DNA methylation, histone<br />

modification states)<br />

Q Cancer gene discovery<br />

R Lists <strong>of</strong> genes for systems/function/pathway analysis<br />

Linking to public biological databases (PubMed, NCBI Gene,<br />

S OMIM, NCBI GEO Pr<strong>of</strong>iles, UCSC Genome Browser, Database<br />

<strong>of</strong> Genomic Variants)<br />

Figure 2.2. Data structure hierarchy. (a) Data hierarchy describing the relationship<br />

between platforms, assays and 'omics disciplines. (b) Functionality map <strong>of</strong> SIGMA2. List <strong>of</strong><br />

the various functions and the output from that function that can be performed given the<br />

number <strong>of</strong> samples or sample groups and dimensions. Multiple sample analysis (single<br />

group and two group) are microarray platform independent. Functions listed in boxes are in<br />

addition to those listed in the box preceding the arrows.


Figure 2.3<br />

numSamples


37<br />

Figure 2.4<br />

b<br />

d<br />

c<br />

e<br />

a<br />

Search for genes,<br />

link to databases<br />

Figure 2.4. SIGMA2 interface. Description <strong>of</strong> the SIGMA2 user interface using a single sample visualization as an<br />

example. (a) Customizable toolbar with shortcut buttons, (b) Project/Analysis tree to track work within and between<br />

sessions, (c) Main display area using tab-based navigation, (d) Information console and (e) Genome features tracks. Here, a<br />

copy number change is displayed in the context <strong>of</strong> CpG islands (red), microRNAs (orange) and regions annotated in the<br />

database <strong>of</strong> genomic variants (blue).


Figure 2.5. Consensus calling and heterogeneous array analysis. (a) Consensus calling<br />

using multiple algorithms. Multiple algorithms (and different parameters) can be selected to<br />

analyze a given array CGH sample and this can be defined for each array platform<br />

independently as each platform may have exhibit different noise and ratio response<br />

characteristics. (b) Heterogeneous array analysis using data from multiple array CGH<br />

platforms. Sample from the Agilent 244K, Affymetrix SNP 500K and whole genome BAC array<br />

were segmented to define areas <strong>of</strong> gain and loss. Subsequently, the results were aggregated<br />

into a frequency histogram plot showing the common areas <strong>of</strong> gain and loss across the three<br />

samples.<br />

38


Figure 2.5<br />

a<br />

b<br />

A�ymetrix<br />

SNP 500K<br />

Agilent<br />

244K CGH<br />

39<br />

BCCA<br />

WGTP 32K


Figure 2.6<br />

HCC2218 HCC2218 HCC2218BL<br />

Copy Number<br />

Figure 2.6. Integrative genetic analysis <strong>of</strong> HCC2218. Parallel visualization and analysis<br />

<strong>of</strong> the copy number and genotype pr<strong>of</strong>iles <strong>of</strong> the breast cancer cell line HCC2218. Genotype<br />

pr<strong>of</strong>ile <strong>of</strong> the matching normal blood lymphoblast line (HCC2218BL) is also provided to<br />

define regions <strong>of</strong> LOH. DNA copy number pr<strong>of</strong>ile was generated on the BCCA whole<br />

genome tiling path BAC array and genotype pr<strong>of</strong>iles are from the Affymetrix SNP 10K array<br />

{Zhao, 2004 #38}. This region <strong>of</strong> chromosome arm 3q has a defined segmental copy<br />

number loss and the boundary <strong>of</strong> the change is evident from the LOH pr<strong>of</strong>ile. In the genotype<br />

pr<strong>of</strong>ile, the horizontal blue lines indicate a SNP transition from heterozygous in normal<br />

to homozygous in the tumor, indicating LOH.<br />

40<br />

LOH


Figure 2.7<br />

a b<br />

NSCLC SCLC<br />

c<br />

Figure 2.7. Two-group two dimensional comparison <strong>of</strong> 37 NSCLC and 16 SCLC<br />

cancer cell lines. First, segmentation analysis is performed to delineate gains and losses<br />

in each sample. Next, a statistical comparison <strong>of</strong> the distribution <strong>of</strong> gains and losses<br />

between the two groups is done using the Fisher’s exact test. (a) Using the interactive<br />

search, one <strong>of</strong> the regions <strong>of</strong> difference identified is on chromosome 7, with a NSCLC and<br />

SCLC sample aligned next to each other. The NSCLC has a clear segmental gain <strong>of</strong> that<br />

region, with the SCLC not having the gain. The right-most graph is a frequency plot summary<br />

<strong>of</strong> two sample sets (NSCLC and SCLC). NSCLC is color-coded in red while SCLC in<br />

green, and the overlap appears in yellow. The frequency <strong>of</strong> chromosome arm 7p gain is<br />

higher in the red group. (b) A heatmap is shown representing 15 NSCLC and 15 SCLC gene<br />

expression pr<strong>of</strong>iles, <strong>of</strong> the specific genes in the region highlighted in yellow. (c) When<br />

examining gene expression data <strong>of</strong> EGFR specifically, a gene in this region, we can see that<br />

the expression is drastically higher in NSCLC vs. SCLC, as predicted by the higher<br />

frequency <strong>of</strong> gain in NSCLC vs. SCLC <strong>of</strong> that region. Gene expression data are represented<br />

as log2 <strong>of</strong> the normalized intensities.<br />

41


Figure 2.8. Multi-dimensional perspective <strong>of</strong> chromosome 17 <strong>of</strong> the HCC2218 breast<br />

cancer cell line. Copy number, LOH, and DNA methylation, and pr<strong>of</strong>iling identifies an<br />

amplification <strong>of</strong> ERBB2 coinciding with allelic imbalance and loss <strong>of</strong> methylation. When<br />

examining the gene expression, the expression <strong>of</strong> HCC2218 is significantly higher than a panel<br />

<strong>of</strong> normal luminal and myoepithelial cell lines [28].<br />

42


43<br />

Figure 2.8<br />

DNA Copy Number Allelic imbalance (LOH) DNA Methylation<br />

HCC2218 Luminal Myoepithelial


Table 2.1. Features required for integrative analysis<br />

Features required for integrative<br />

analysis<br />

Nexus CGH<br />

44<br />

CGH Fusion<br />

ISA-CGH<br />

VAMP<br />

*CGH<br />

Analytics<br />

Built-in segmentation for array CGH � � � � � � �<br />

Consensus calling using multiple<br />

segmentation algorithms<br />

�<br />

Array platform-independent<br />

combined CGH analysis<br />

� �<br />

�<br />

Custom microarray data handling � � � � � � �<br />

Basic copy number and expression<br />

integration<br />

� � �<br />

�<br />

Alignment and analysis <strong>of</strong> genetic<br />

and epigenetic data<br />

Multi-dimensional visualization <strong>of</strong><br />

� � �<br />

genetic, epigenetic and gene<br />

expression data<br />

�<br />

Two group statistical comparison<br />

Two group combinatorial gene<br />

� � � �<br />

dosage and gene expression<br />

comparison<br />

�<br />

Linking to external biological<br />

databases<br />

� � � � � � � �<br />

Linking to external gene expression<br />

(GEOPr<strong>of</strong>iles)<br />

�<br />

Context-based visualization <strong>of</strong><br />

genome features<br />

� � � � �<br />

Conversion <strong>of</strong> data between<br />

different genome assemblies<br />

� � � �<br />

Free for academic/research use � � � � � �<br />

MD-SeeGH<br />

SIGMA<br />

SIGMA 2


Table 2.2. Summary <strong>of</strong> Input, analysis, output for each dimension<br />

'Omics<br />

classification<br />

Assay(s)<br />

measured<br />

Genomics Copy number Array CGH<br />

Input Functionality*** Output<br />

Segmentation<br />

Direct thresholding<br />

Moving average-based thresholding<br />

Z-transformation <strong>of</strong> moving average<br />

Whole genome visualization<br />

45<br />

Regions <strong>of</strong> gain and loss<br />

Gene lists for further<br />

analysis<br />

High-resolution karyogram<br />

images<br />

Frequency histograms<br />

Genomics LOH SNPs* LOH based on consecutive altered markers Regions <strong>of</strong> LOH<br />

Genomics LOH<br />

Microsatellite<br />

markers<br />

Same as above Same as above<br />

Genomics<br />

Epigenomics<br />

Epigenomics<br />

Epigenomics<br />

Epigenomics<br />

Copy number,<br />

LOH<br />

DNA<br />

methylation<br />

DNA<br />

methylation<br />

Histone<br />

modification<br />

states<br />

DNA<br />

methylation,<br />

Histone<br />

modification<br />

states<br />

Transcriptomics Gene<br />

expression**<br />

Transcriptomics Gene<br />

expression**<br />

Genomics,<br />

Transcriptomics<br />

Genomics,<br />

Epigenomics<br />

Genomics,<br />

Epigenomics<br />

Genomics,<br />

Epigenomics,<br />

Transcriptomics<br />

Copy number,<br />

Gene<br />

expression<br />

Copy number,<br />

DNA<br />

methylation<br />

LOH, DNA<br />

methylation<br />

Copy number,<br />

LOH, DNA<br />

methylation,<br />

Histone<br />

modification<br />

Gene Expression<br />

MeDIP +<br />

array CGH<br />

Bilsulphitebased<br />

ChIP-on-chip<br />

Microarrays<br />

SAGE<br />

Identify regions <strong>of</strong> uniparental disomy<br />

(UPD): LOH with no copy number change<br />

Direct thresholding<br />

Moving average-based thresholding<br />

Z-transformation <strong>of</strong> moving average<br />

Visualization against genome position<br />

Thresholding <strong>of</strong> proportion <strong>of</strong> methylated<br />

CpG’s<br />

Direct thresholding<br />

Moving average-based thresholding<br />

Z-transformation <strong>of</strong> moving average<br />

Epigenetic interplay<br />

Heatmap visualization, clustering<br />

Histograms<br />

Statistical comparisons<br />

Heatmap visualization, clustering<br />

Histograms<br />

Statistical comparisons<br />

Correlation analysis <strong>of</strong> copy number and<br />

expression<br />

Statistical comparison <strong>of</strong> expression in<br />

regions <strong>of</strong> copy number difference (two<br />

group analysis)<br />

Identify regions <strong>of</strong> concerted change in<br />

BOTH copy number and methylation ("twohit")<br />

Identify regions with change in copy number<br />

OR DNA methylation<br />

Identify allele-specific methylation events<br />

Identify co-ordinate genetic, epigenetic and<br />

gene expression changes<br />

Regions <strong>of</strong> enrichment and<br />

lack <strong>of</strong> methylation<br />

Gene lists for further<br />

analysis<br />

Regions <strong>of</strong> enrichment and<br />

lack <strong>of</strong> enrichment<br />

Gene lists for further<br />

analysis<br />

Regions <strong>of</strong> mutually<br />

exclusive change between<br />

chromatin state and DNA<br />

methylation<br />

Expression <strong>of</strong> genes <strong>of</strong><br />

interested based on DNA<br />

analysis<br />

Expression <strong>of</strong> genes <strong>of</strong><br />

interested based on DNA<br />

analysis<br />

Genes whose expression<br />

is strongly regulatd by copy<br />

number<br />

p-values for associations<br />

p-values for group<br />

comparison<br />

Regions <strong>of</strong> allele specific<br />

aberrant methylation<br />

Genes altered at multiple<br />

levels


2.6 References<br />

1. Garnis C, Buys TP, Lam WL: Genetic alteration and gene expression modulation<br />

during cancer progression. Mol Cancer 2004, 3:9.<br />

2. Ishkanian AS, Mall<strong>of</strong>f CA, Watson SK, DeLeeuw RJ, Chi B, Coe BP, Snijders A,<br />

Albertson DG, Pinkel D, Marra MA et al: A tiling resolution DNA microarray with<br />

complete coverage <strong>of</strong> the human genome. Nat Genet 2004, 36(3):299-303.<br />

3. Khulan B, Thompson RF, Ye K, Fazzari MJ, Suzuki M, Stasiek E, Figueroa ME, Glass<br />

JL, Chen Q, Montagna C et al: Comparative isoschizomer pr<strong>of</strong>iling <strong>of</strong> cytosine<br />

methylation: the HELP assay. Genome Res 2006, 16(8):1046-1055.<br />

4. Lockwood WW, Chari R, Chi B, Lam WL: Recent advances in array comparative<br />

genomic hybridization technologies and their applications in human genetics. Eur<br />

J Hum Genet 2006, 14(2):139-148.<br />

5. Rauch T, Li H, Wu X, Pfeifer GP: MIRA-assisted microarray analysis, a new<br />

technology for the determination <strong>of</strong> DNA methylation patterns, identifies frequent<br />

methylation <strong>of</strong> homeodomain-containing genes in lung cancer cells. Cancer Res<br />

2006, 66(16):7939-7947.<br />

6. Weber M, Davies JJ, Wittig D, Oakeley EJ, Haase M, Lam WL, Schubeler D:<br />

Chromosome-wide and promoter-specific analyses identify sites <strong>of</strong> differential<br />

DNA methylation in normal and transformed human cells. Nat Genet 2005,<br />

37(8):853-862.<br />

7. Chari R, Lockwood WW, Coe BP, Chu A, Macey D, Thomson A, Davies JJ, MacAulay<br />

C, Lam WL: SIGMA: a system for integrative genomic microarray analysis <strong>of</strong><br />

cancer genomes. BMC Genomics 2006, 7:324.<br />

8. Conde L, Montaner D, Burguet-Castell J, Tarraga J, Medina I, Al-Shahrour F, Dopazo J:<br />

ISACGH: a web-based environment for the analysis <strong>of</strong> Array CGH and gene<br />

expression which includes functional pr<strong>of</strong>iling. Nucleic Acids Res 2007, 35(Web<br />

Server issue):W81-85.<br />

9. La Rosa P, Viara E, Hupe P, Pierron G, Liva S, Neuvial P, Brito I, Lair S, Servant N,<br />

Robine N et al: VAMP: visualization and analysis <strong>of</strong> array-CGH, transcriptome and<br />

other molecular pr<strong>of</strong>iles. Bioinformatics 2006, 22(17):2066-2073.<br />

10. Chi B, deLeeuw RJ, Coe BP, Ng RT, MacAulay C, Lam WL: MD-SeeGH: a platform for<br />

integrative analysis <strong>of</strong> multi-dimensional genomic data. BMC Bioinformatics 2008,<br />

9:243.<br />

11. Carrasco DR, Tonon G, Huang Y, Zhang Y, Sinha R, Feng B, Stewart JP, Zhan F,<br />

Khatry D, Protopopova M et al: High-resolution genomic pr<strong>of</strong>iles define distinct<br />

clinico-pathogenetic subgroups <strong>of</strong> multiple myeloma patients. Cancer Cell 2006,<br />

9(4):313-325.<br />

12. Chin K, DeVries S, Fridlyand J, Spellman PT, Roydasgupta R, Kuo WL, Lapuk A, Neve<br />

RM, Qian Z, Ryder T et al: Genomic and transcriptional aberrations linked to breast<br />

cancer pathophysiologies. Cancer Cell 2006, 10(6):529-541.<br />

13. Coe BP, Lockwood WW, Girard L, Chari R, Macaulay C, Lam S, Gazdar AF, Minna JD,<br />

Lam WL: Differential disruption <strong>of</strong> cell cycle pathways in small cell and non-small<br />

cell lung cancer. Br J Cancer 2006, 94(12):1927-1935.<br />

14. Lockwood WW, Chari R, Coe BP, Girard L, Macaulay C, Lam S, Gazdar AF, Minna JD,<br />

Lam WL: DNA amplification is a ubiquitous mechanism <strong>of</strong> oncogene activation in<br />

lung and other cancers. Oncogene 2008.<br />

15. Neve RM, Chin K, Fridlyand J, Yeh J, Baehner FL, Fevr T, Clark L, Bayani N, Coppe JP,<br />

Tong F et al: A collection <strong>of</strong> breast cancer cell lines for the study <strong>of</strong> functionally<br />

distinct cancer subtypes. Cancer Cell 2006, 10(6):515-527.<br />

46


16. Stransky N, Vallot C, Reyal F, Bernard-Pierrot I, de Medina SG, Segraves R, de Rycke<br />

Y, Elvin P, Cassidy A, Spraggon C et al: Regional copy number-independent<br />

deregulation <strong>of</strong> transcription in cancer. Nat Genet 2006, 38(12):1386-1396.<br />

17. Sanders MA, Verhaak RG, Geertsma-Kleinekoort WM, Abbas S, Horsman S, van der<br />

Spek PJ, Lowenberg B, Valk PJ: SNPExpress: integrated visualization <strong>of</strong> genomewide<br />

genotypes, copy numbers and gene expression levels. BMC Genomics 2008,<br />

9:41.<br />

18. Karolchik D, Kuhn RM, Baertsch R, Barber GP, Clawson H, Diekhans M, Giardine B,<br />

Harte RA, Hinrichs AS, Hsu F et al: The UCSC Genome Browser Database: 2008<br />

update. Nucleic Acids Res 2008, 36(Database issue):D773-779.<br />

19. Khojasteh M, Lam WL, Ward RK, MacAulay C: A stepwise framework for the<br />

normalization <strong>of</strong> array CGH data. BMC Bioinformatics 2005, 6:274.<br />

20. Neuvial P, Hupe P, Brito I, Liva S, Manie E, Brennetot C, Radvanyi F, Aurias A, Barillot<br />

E: Spatial normalization <strong>of</strong> array-CGH data. BMC Bioinformatics 2006, 7:264.<br />

21. Hupe P, Stransky N, Thiery JP, Radvanyi F, Barillot E: Analysis <strong>of</strong> array CGH data:<br />

from signal ratio to gain and loss <strong>of</strong> DNA regions. Bioinformatics 2004, 20(18):3413-<br />

3422.<br />

22. Venkatraman ES, Olshen AB: A faster circular binary segmentation algorithm for<br />

the analysis <strong>of</strong> array CGH data. Bioinformatics 2007, 23(6):657-663.<br />

23. Nannya Y, Sanada M, Nakazaki K, Hosoya N, Wang L, Hangaishi A, Kurokawa M,<br />

Chiba S, Bailey DK, Kennedy GC et al: A robust algorithm for copy number<br />

detection using high-density oligonucleotide single nucleotide polymorphism<br />

genotyping arrays. Cancer Res 2005, 65(14):6071-6079.<br />

24. Ballestar E, Paz MF, Valle L, Wei S, Fraga MF, Espada J, Cigudosa JC, Huang TH,<br />

Esteller M: Methyl-CpG binding proteins identify novel sites <strong>of</strong> epigenetic<br />

inactivation in human cancer. EMBO J 2003, 22(23):6335-6345.<br />

25. Bibikova M, Lin Z, Zhou L, Chudin E, Garcia EW, Wu B, Doucet D, Thomas NJ, Wang Y,<br />

Vollmer E et al: High-throughput DNA methylation pr<strong>of</strong>iling using universal bead<br />

arrays. Genome Res 2006, 16(3):383-393.<br />

26. Coe BP, Ylstra B, Carvalho B, Meijer GA, Macaulay C, Lam WL: Resolving the<br />

resolution <strong>of</strong> array CGH. Genomics 2007, 89(5):647-653.<br />

27. van Wieringen WN, Belien JA, Vosse SJ, Achame EM, Ylstra B: ACE-it: a tool for<br />

genome-wide integration <strong>of</strong> gene dosage and RNA expression data. Bioinformatics<br />

2006, 22(15):1919-1920.<br />

28. Grigoriadis A, Mackay A, Reis-Filho JS, Steele D, Iseli C, Stevenson BJ, Jongeneel CV,<br />

Valgeirsson H, Fenwick K, Iravani M et al: Establishment <strong>of</strong> the epithelial-specific<br />

transcriptome <strong>of</strong> normal and malignant human breast cells based on MPSS and<br />

array expression data. Breast Cancer Res 2006, 8(5):R56.<br />

47


<strong>Chapter</strong> 3: An integrative multi-dimensional genetic and<br />

epigenetic strategy to identify aberrant genes and pathways<br />

in cancer 2<br />

2 A version <strong>of</strong> this chapter has been published. Chari R, Coe BP, Vucic EA, Lockwood WW,<br />

Lam WL. (2010) An integrative multi-dimensional genetic and epigenetic strategy to identify<br />

aberrant genes and pathways in cancer. BMC Systems Biology, 4(1):67, 1-14. Please see the<br />

published version <strong>of</strong> this chapter for all supplementary materials.<br />

48


3.1 Background<br />

Genomic analyses have substantially improved our knowledge <strong>of</strong> cancer. Gene expression<br />

pr<strong>of</strong>iling, for example, is utilized to delineate subtypes <strong>of</strong> breast cancer, and has facilitated the<br />

derivation <strong>of</strong> predictive and prognostic signatures [1-5]. However, not all <strong>of</strong> the gene expression<br />

changes observed are causal to cancer development, and global gene expression analysis<br />

alone cannot distinguish between causal and reactive changes. Corresponding alteration at the<br />

DNA level is regarded as evidence <strong>of</strong> causality; for example, gene deletion or gene silencing by<br />

methylation. Hence, examining genetic and epigenetic events in conjunction with the changes<br />

in gene expression pattern should improve the identification <strong>of</strong> causal changes that lead to<br />

disease phenotype.<br />

Analysis <strong>of</strong> gene copy number alone has correlated breast cancer genome features with poor<br />

prognosis based on the degree <strong>of</strong> genomic instability observed [6]. In terms <strong>of</strong> gene discovery,<br />

specific genomic regions containing important loci have been shown to be frequently gained or<br />

lost [7-11]. Integrative analyses <strong>of</strong> gene dosage and gene expression in breast cancer have<br />

revealed specific genes which are deregulated at the gene expression level as a result <strong>of</strong><br />

changes in DNA copy number. From a global perspective, studies have shown a broad range<br />

in concordance between DNA amplification and overexpression <strong>of</strong> genes. This variability is<br />

attributable to the sensitivity <strong>of</strong> the methods used in detecting gene copy number and gene<br />

expression changes as well as the number <strong>of</strong> genes examined [12-15]. Conversely, when<br />

examining gene overexpression, it was found that only 10.5% <strong>of</strong> the overexpression could be<br />

attributable to gene amplification [14]. It is certain that altered gene expression can not only be<br />

attributed to disruption <strong>of</strong> regulatory/signaling cascades and downstream effects, but also to a<br />

multitude <strong>of</strong> causal genetic and epigenetic aberrations.<br />

We reason that by examining multiple genomic dimensions simultaneously, with a dimension<br />

representing a genome wide assay measuring DNA level alterations such as gene copy number<br />

or DNA methylation, we are likely to achieve the following: (i) explain a greater fraction <strong>of</strong> the<br />

49


observed gene expression deregulation as compared with explaining expression deregulation<br />

using only a single dimension, (ii) improve the discovery <strong>of</strong> critical oncogenes and tumor<br />

suppressor genes (TSGs) by focusing on those genes altered simultaneously at multiple<br />

genomic dimensions, and (iii) begin to understand the complex mechanisms <strong>of</strong> dysregulation <strong>of</strong><br />

oncogenic pathways. In this study, we demonstrate the power <strong>of</strong> an integrative genomics<br />

approach by performing multi-dimensional analyses (MDA) <strong>of</strong> the genome, epigenome, and<br />

transcriptome <strong>of</strong> breast cancer cell lines. We illustrate and demonstrate the need for integrative<br />

analysis <strong>of</strong> multiple genomic dimensions by showing the co-operative contribution <strong>of</strong> DNA<br />

mechanisms to explaining differential gene expression. Using a strategy to identify genes<br />

exhibiting congruent alteration in copy number, DNA methylation, and allelic (or loss <strong>of</strong><br />

heterozygosity, LOH) status, which we term multiple concerted disruption (MCD) analysis, we<br />

find genes representing key nodes in pathways as well as genes which exhibit prognostic<br />

significance. In examining the neuregulin pathway, we observe the variability among samples<br />

in the mechanism <strong>of</strong> dysregulation <strong>of</strong> this commonly altered breast cancer pathway, highlighting<br />

the importance <strong>of</strong> multi-dimensional correlative analysis <strong>of</strong> a given pathway in individual tumor<br />

samples -- in addition to the conventional approach <strong>of</strong> identifying loci simply based on frequency<br />

<strong>of</strong> disruption in a cohort. Finally, examining the subset <strong>of</strong> triple negative breast cancer cell<br />

(TNBC) lines, we show that a downstream target <strong>of</strong> FGFR2, a recently implicated oncogene in<br />

TNBC, COL1A1 is frequently affected by MCD even though in FGFR2 itself is rarely affected.<br />

Notably, this is the first such in-depth genomic, epigenomic, and transcriptomic analyses <strong>of</strong><br />

breast cancer.<br />

3.2 Methods<br />

3.2.1 Data generation and acquisition<br />

Commonly used breast cancer (HCC38, HCC1008, HCC1143, HCC1395, HCC1599, HCC1937,<br />

HCC2218, BT474, MCF-7) and non-cancer (MCF10A) cell lines were selected for analyses<br />

(Additional File 1 or Appendix II). Copy number pr<strong>of</strong>iles were obtained from the SIGMA<br />

50


database [11, 16]. These pr<strong>of</strong>iles were generated using a whole genome tiling path microarray<br />

CGH platform [17, 18]. Expression pr<strong>of</strong>iles for BT474 and MCF-7 were obtained from the NCI<br />

Cancer Biomedical Informatics Grid (caBIG, https://cabig.nci.nih.gov), MCF10A pr<strong>of</strong>ile from<br />

GEO (GSM254525), and the rest were generated using Affymetrix U133 Plus 2.0 platform at the<br />

McGill <strong>University</strong> and Genome Quebec Innovation Centre. Affymetrix 500K SNP array data<br />

were obtained from caBIG. DNA methylation pr<strong>of</strong>iles were generated using the Illumina<br />

Infinium methylation platform at the Genomics Lab, Wellcome Trust Centre for Human<br />

Genetics. A summary <strong>of</strong> the sources <strong>of</strong> all the data used is provided in Additional File 2 or<br />

Appendix III. Gene expression and methylation data generated were deposited in NCBI GEO<br />

(GSE17768 and GSE17769).<br />

3.2.2 Data processing and normalization<br />

Array CGH data were normalized using a stepwise normalization framework [19]. In addition,<br />

data were filtered based on a stringent standard deviation cut-<strong>of</strong>f <strong>of</strong> 0.075 between replicate<br />

spots, with those exceeding this cut-<strong>of</strong>f excluded from further analysis. To identify regions <strong>of</strong><br />

gain and loss, smoothing and segmentation analysis was performed using aCGH-Smooth [20]<br />

as previously described [21]. Copy number status for clones which were filtered from above<br />

were inferred using neighboring clones within a 1 Mb window.<br />

Affymetrix SNP array data were normalized and genotyped using the "oligo" package in R,<br />

specifically using the crlmm algorithm for genotyping [22]. Genotype calls whose confidences<br />

were less than 0.95 were termed "No Call" (NC). Subsequently, genotype pr<strong>of</strong>iles were<br />

analyzed using dChip [23] and LOH was determined using a panel <strong>of</strong> 60 normal genotypes from<br />

the HapMap dataset [24] as provided by dChip, as matching blood lymphoblast pr<strong>of</strong>iles were<br />

not available. LOH ("L"), Retention ("R"), and No Call ("N") status was determined for every<br />

marker in each sample. Analysis parameters used were as specified in the dChip manual.<br />

51


Raw gene expression pr<strong>of</strong>iles from all ten cell lines were normalized using the "rma" package in<br />

R (Additional File 3). Gene expression data were further filtered using the Affymetrix MAS 5.0<br />

Call values ("P","M", and "A"). Since the comparison <strong>of</strong> differential expression was one cancer<br />

line to one normal, both call values could not be "Absent" in order to be retained for analysis.<br />

Methylation data were normalized and processed using Illumina BeadStudio s<strong>of</strong>tware<br />

(http://www.illumina.com/s<strong>of</strong>tware/genomestudio_s<strong>of</strong>tware.ilmn, Illumina, Inc., San Diego, CA,<br />

USA). Beta-values and confidence p-values were retained for further analysis. Beta-values<br />

with associated confidence p-values > 0.05 were excluded. Data from all genomic dimensions<br />

were mapped to the hg18 (March 2006) genome assembly.<br />

3.2.3 Strategy for integrative analysis<br />

Copy number and LOH pr<strong>of</strong>iles were mapped to genes using the mapping <strong>of</strong> the Affymetrix<br />

U133 Plus 2.0 platform as well as the UCSC Genome Browser [25]. Methylation data were<br />

linked to the other three types <strong>of</strong> data using either the RefSeq gene symbol as specified by the<br />

Illumina mapping file (Illumina), or the RefSeq accession number. Differential expression was<br />

determined by subtracting the expression value in the non-malignant line MCF10A from the<br />

value in each cancer line. Since the obtained gene expression values after RMA normalization<br />

were represented in log2 space, a gene was considered differentially expressed if the difference<br />

between the cancer line and MCF10A was greater than 1, which corresponded to a two-fold<br />

expression difference. DNA methylation status was determined by subtracting beta-values, with<br />

hypermethylation defined as a positive difference between tumor and normal (≥ 0.25) and<br />

hypomethylation defined as a negative difference between tumor and normal (≤ -0.25). Briefly,<br />

a beta value for a given CpG site ranges from 0 to 1 and represents the ratio <strong>of</strong> the methylated<br />

signal over the total signal (methylated plus unmethylated signal). These thresholds are<br />

comparable to those used in previous studies using an earlier Illumina methylation platform [26].<br />

Using this mapping strategy, 12,910 unique genes were mapped across platforms<br />

corresponding to 24,708 <strong>of</strong> the ~27,000 Illumina Infinium probes and to 27,053 probes <strong>of</strong> the<br />

52


Affymetrix U133 Plus 2.0 platform. Visualization <strong>of</strong> multi-dimensional data was performed using<br />

the SIGMA2 s<strong>of</strong>tware [27].<br />

To determine the genetic events that caused (or could explain) gene expression status, we first<br />

identified a set <strong>of</strong> overexpressed and underexpressed genes for each cell line sample relative to<br />

MCF10A based on differential expression criteria mentioned above. Each cancer sample may<br />

have a different number <strong>of</strong> differentially expressed genes. Second, for each differentially<br />

expressed gene in each sample, we examined the copy number status, methylation status, and<br />

allelic status. A differential expression was considered "explained" when the observed<br />

expression change matched the expected change at the DNA level. If a gene was<br />

overexpressed, the causal copy number status would be a gain, DNA methylation status would<br />

be hypomethylation, or allelic status would be allelic imbalance. Conversely, if a gene was<br />

underexpressed, the causal copy number status would be a loss, DNA methylation status would<br />

be hypermethylation, or allelic status would be LOH. From this point forward, when a change in<br />

allele status with overexpression is discussed, it will be denoted as allelic imbalance (AI).<br />

Conversely, for underexpression, a change in allele status will be denoted as loss <strong>of</strong><br />

heterozygosity (LOH). While changes in methylation or changes in gene dosage leading to<br />

differential expression are more commonly discussed, previous studies have shown that<br />

changes in allele status without change in copy number (copy neutral AI or LOH) can also lead<br />

to differential gene expression due to preferential allelic expression [28-30].<br />

3.2.4 Multiple concerted disruption (MCD) analysis<br />

To determine what are likely key nodes in pathways and functions, we hypothesize that, in<br />

addition to being altered frequently (by one mechanism or multiple mechanisms), these genes<br />

also exhibit multiple concerted disruption (MCD) in a given sample. That is, a congruent<br />

change in gene copy number (gain or loss) accompanied by allelic imbalance and change in<br />

DNA methylation (hypomethylation or hypermethylation) resulting in a change in gene<br />

expression (over or underexpression). Moreover, the MCD events would be used as a similar<br />

53


screening approach to gene amplifications (multi-copy increases) or homozygous deletions<br />

whereby the expectation is that these events would occur at a lower frequency than disruptions<br />

through one mechanism alone and observation <strong>of</strong> these events would signify importance to the<br />

genes in question.<br />

In this study, the MCD strategy can be broken down into four sequential steps. First, using a<br />

pre-defined frequency threshold, we identify a set <strong>of</strong> the most frequently differentially expressed<br />

genes. Second, we identify the most frequently differentially expressed genes from step 1<br />

whose expression change is frequently associated with concerted change in at least one DNA<br />

dimension (either DNA copy number, DNA methylation or allelic status) within the same sample.<br />

Next, we further refine this subset <strong>of</strong> genes from step 2 by selecting those having concerted<br />

change in all dimensions in the same sample which we term as MCD. Finally, we introduce an<br />

additional level <strong>of</strong> stringency by requiring a minimum frequency <strong>of</strong> MCD in the given cohort. At<br />

the end <strong>of</strong> the process, we identify a small subset <strong>of</strong> genes which exhibit disruption through<br />

multiple mechanisms and show consequential change in gene expression.<br />

3.2.5 Simulated data analysis<br />

Using the status <strong>of</strong> DNA alteration and expression for every gene in every sample, data within<br />

each sample were shuffled and randomized ten times to create ten simulated datasets. Each<br />

dataset was analyzed for overall disruption frequency and MCD and all results were then<br />

aggregated to determine the frequency distribution <strong>of</strong> different thresholds observed in the<br />

randomized data analysis.<br />

3.2.6 Pathway enrichment analysis<br />

For pathway analysis, Ingenuity Pathway Analysis s<strong>of</strong>tware was used (Ingenuity Systems, CA,<br />

USA). Specifically, the core and comparison analyses were used, with focus on canonical<br />

signaling pathways. Briefly, for a given function or pathway, statistical significance <strong>of</strong> pathway<br />

enrichment is calculated using a right-tailed Fisher's exact test based on the number <strong>of</strong> genes<br />

54


annotated, number <strong>of</strong> genes represented in the input dataset, and the total number <strong>of</strong> genes<br />

being assessed in the experiment. A pathway was deemed significant if the p-value <strong>of</strong><br />

enrichment was ≤ 0.05 (adjusted for multiple comparisons using a Benjamini-Hochberg<br />

correction).<br />

3.2.6 Survival and differential gene expression analysis in publicly available datasets<br />

For survival analysis, Kaplan-Meier analysis was performed using the statistical toolbox in<br />

Matlab (Mathworks). For each gene, the expression data were sorted from lowest to highest<br />

expression across the sample set and survival times were compared between the top 1/3 and<br />

bottom 1/3 <strong>of</strong> the samples. Two publicly available gene expression microarray datasets with<br />

survival data were utilized for this analysis [4, 31]. For the Sorlie et al dataset, individuals whose<br />

cause <strong>of</strong> death was not breast cancer were excluded from the analysis and missing data due to<br />

quality control issues were filled using the knn method in the “impute” package in Bioconductor<br />

[32]. Of the 23 genes selected by our MCD analysis (see Results), 17 were represented in<br />

either dataset. Survival distributions were compared using a log rank test and two-tailed p-<br />

values unadjusted for multiple comparisons were reported.<br />

Subsequently, these 17 genes were further evaluated for differential expression in publicly<br />

available expression datasets <strong>of</strong> clinical breast cancer samples using the Oncomine database<br />

[33].<br />

3.3 Results and discussion<br />

3.3.1 Analysis <strong>of</strong> individual genomic dimensions<br />

When examining each genomic dimension alone, we see that many <strong>of</strong> the common features<br />

identified are consistent with the current knowledge <strong>of</strong> breast cancer genomes, for example,<br />

previously reported chromosomal regions <strong>of</strong> frequent copy number gain, segmental loss and<br />

loss <strong>of</strong> heterozygosity (LOH) / allelic imbalance (AI) (Figure 3.1a) [6, 8, 11, 12, 34]. While<br />

55


many regions <strong>of</strong> frequent LOH/AI do overlap with regions <strong>of</strong> copy number change, others are in<br />

regions <strong>of</strong> neutral copy number. Key genes implicated in breast cancer reside in these specific<br />

regions and are altered expectedly (Figure 3.1b).<br />

3.3.2 Multi-dimensional analysis (MDA) reveals a higher proportion <strong>of</strong> intra-sample<br />

deregulated gene expression can be explained when more dimensions are analyzed<br />

The impact <strong>of</strong> integrative, multi-dimensional analysis on gene discovery is observed at two<br />

levels: (i) within an individual sample as well as (ii) across a set <strong>of</strong> samples. Within a given<br />

sample, we see that by sequentially examining more genomic dimensions at the DNA level, i.e.<br />

gene dosage, allelic status, and DNA methylation, we can explain a higher proportion <strong>of</strong> the<br />

differential gene expression changes observed. Interestingly, although this proportion may vary<br />

between samples, it always increases with every additional dimension examined (Figure 3.2a).<br />

For example, in HCC1395, a single genomic dimension alone can explain as much as 64.4% <strong>of</strong><br />

overexpression but when using all three DNA based dimensions, whereby gene overexpression<br />

can be explained by disruption at the DNA level in at least one dimension, as much as 75.7% <strong>of</strong><br />

aberrant overexpression can be explained. Similarly, in HCC1937, an increase from 56.9% to<br />

74.7% explainable underexpression is observed when moving from one to three genomic<br />

dimensions respectively. Conversely, in HCC2218, we observe 44% and 36% <strong>of</strong><br />

overexpression and underexpression respectively when using all three DNA dimensions. This<br />

suggests that the majority <strong>of</strong> differential expression in sample HCC2218 is most likely a result <strong>of</strong><br />

complex gene-gene trans-regulation and consequently, highlights the individual differences<br />

between samples.<br />

3.3.3 MDA reveals genes are disrupted at higher frequencies when examining multiple<br />

dimensions as compared to any single dimension alone<br />

When considering across a sample set, we see that analysis <strong>of</strong> multiple genomic dimensions<br />

leads to the discovery <strong>of</strong> more disrupted genes than what would be detected using a single<br />

dimension <strong>of</strong> analysis alone. For each identified gene, we gain insight in how multiple<br />

56


mechanisms are complementary in gene disruption (Figure 3.2b). For example, the tumor<br />

suppressor gene caspase 1 (CASP1) has been thought to be deactivated through DNA<br />

hypermethylation in multiple cancer types [35, 36]. The gene is underexpressed in all nine<br />

cases examined in this study. In a subset <strong>of</strong> these cases, the observed underexpression can<br />

be attributed to copy number loss. Interestingly, in the remaining cases, DNA hypermethylation<br />

and copy neutral LOH are observed. Similarly, in another example, GNAS is differentially<br />

expressed in all nine cases, with a subset <strong>of</strong> cases showing concerted copy number change<br />

while the remaining cases reveal concerted change in DNA methylation. Notably, our<br />

conclusion is supported by recent studies <strong>of</strong> glioblastoma, that also showed higher than<br />

expected disruption frequencies <strong>of</strong> specific genes when multiple genomic dimensions were<br />

analyzed [37, 38]. These examples illustrate how deregulated genes can be detected in more<br />

cases when multiple, but complementary, approaches are used.<br />

Until very recently, multi-dimensional genomic analysis typically represented the parallel<br />

examination <strong>of</strong> gene dosage and gene expression. To demonstrate the power <strong>of</strong> examining<br />

multiple dimensions, we examine the frequency <strong>of</strong> gene expression deregulation explained by<br />

congruent alteration at the DNA level. Briefly, for each gene, a sample is determined to have a<br />

DNA explained gene expression change if any <strong>of</strong> the following criteria are met; gene<br />

overexpression should be accompanied with either (i) copy number gain, (ii) copy neutral allelic<br />

imbalance, or (iii) hypomethylation and gene underexpression should be accompanied with<br />

either (i) copy number loss, (ii) copy neutral LOH, or (iii) hypermethylation.<br />

To determine an appropriate frequency <strong>of</strong> disruption threshold, ten random, simulated datasets<br />

were generated and a distribution plot was generated for all <strong>of</strong> the observed frequencies from<br />

0/9 to 9/9 across all simulations (Figure 3.3a). The proportion <strong>of</strong> observed frequencies ≥ 5/9<br />

was 0.086 but for ≥ 6/9, the proportion was 0.020. Thus, since the 6/9 threshold was the first<br />

threshold ≤ 0.05, 6/9 was used for further analysis. Using this threshold, we found that 437<br />

differentially expressed genes have a corresponding change in gene dosage. Scaling this<br />

57


approach to examining the whole genome at multiple dimensions, we anticipate identifying more<br />

disrupted genes. When we added the remaining dimensions to account for differential<br />

expression, at the same frequency cut-<strong>of</strong>f, we identified the mechanism <strong>of</strong> disruption for 1162<br />

deregulated genes (Figure 3.3b, Additional File 4).<br />

The impact <strong>of</strong> multi-dimensional integrative analysis on cancer gene discovery is the enhanced<br />

detection <strong>of</strong> genes which are disrupted by multiple mechanisms but at lower frequencies for<br />

individual mechanisms. Collectively, the detection <strong>of</strong> gene dosage, allelic conversion and<br />

change in methylation status enable the identification <strong>of</strong> such genes as frequently disrupted.<br />

Using the list <strong>of</strong> 1162 genes, the distributions <strong>of</strong> alteration frequencies for each genomic<br />

dimension or combination <strong>of</strong> dimensions were assessed (Figure 3.4a). Examining the median<br />

frequencies in each box plot, there is a sequential increase in the median as more dimensions<br />

are examined. This point can be further validated using specific genes. For example, the CD70<br />

and ENG genes are underexpressed in the majority <strong>of</strong> samples. Using copy number analysis<br />

alone, the observed frequency <strong>of</strong> disruption (loss and underexpression) is 44% and 22%<br />

respectively. If we then examine the methylation status, in the remaining cases not explained<br />

by DNA copy number, we observe an additional 33% <strong>of</strong> cases exhibiting hypermethylation and<br />

underexpression for ENG (red) and 22% for CD70 (blue). Finally, when we also examine allelic<br />

status, we observe an additional 22% <strong>of</strong> cases with copy neutral LOH and gene<br />

underexpression for CD70 and 11% for ENG. In total, by using all three dimensions, the<br />

cumulative frequency <strong>of</strong> disruption is 88% for CD70 and 77% for ENG (Figure 3.4b). This<br />

example demonstrates the utility <strong>of</strong> a multi-dimensional approach to elucidate events which<br />

would escape conventional single dimensional analysis.<br />

3.3.4 MDA identifies significantly enriched cancer related pathways<br />

Using the set <strong>of</strong> 1162 genes identified by MDA (Additional File 4) and the similar lists <strong>of</strong> genes<br />

identified from each <strong>of</strong> the simulated datasets, pathway analyses were performed with Ingenuity<br />

Pathway Analysis. From the pathway analysis <strong>of</strong> MDA genes and focusing only on canonical<br />

58


signaling pathways, 53 pathways were significantly enriched for at a Benjamini-Hochberg<br />

corrected p-value <strong>of</strong> 0.05 (Additional File 5). In contrast, using the gene lists from the 10<br />

simulated datasets, nine <strong>of</strong> the 10 pathway analyses yielded no significant pathways enriched<br />

for at the same p-value with one <strong>of</strong> the pathway analyses yielding one significant pathway.<br />

Similar results from Gene Ontology analysis were obtained using the publicly available<br />

GATHER database [39] (Additional File 6). Specific pathways involved in breast cancer,<br />

ovarian cancer, and prostate cancer were amongst the ones identified as most significant<br />

(Figure 3.5). Consequently, these results suggest that the genes identified using MDA have a<br />

high degree <strong>of</strong> biological relevance.<br />

3.3.5 MDA <strong>of</strong> the Neuregulin signaling pathway reveals a complex pattern <strong>of</strong> deregulation<br />

Among the 53 pathways which were statistically over-represented from our list <strong>of</strong> 1162 genes,<br />

one <strong>of</strong> the pathways identified is the neuregulin pathway. This pathway contains the well known<br />

breast cancer oncogene ERBB2 as well as other genes known to be affected in breast and<br />

other cancers [40-43]. Examining the components <strong>of</strong> this pathway, we observe that some are<br />

genes commonly altered while others are infrequently altered across our sample set by multiple<br />

patterns <strong>of</strong> genomic alteration, and some genes which behave oppositely in different samples<br />

(Figure 3.6).<br />

While genes such as HRAS (down), BAD (down), HSP90AB1 (up), SOS2 (up) and RPS6KB1<br />

(up) generally exhibit consistent differential expression with concerted change at the DNA level<br />

across our sample set, genes such as GRB7, PTEN, and MAP2K1 exhibit both overexpression<br />

and underexpression, with concerted DNA change, in different samples. For example, if we<br />

examine PTEN, we observe copy number loss, LOH, DNA hypermethylation and consequent<br />

underexpression in HCC1395 while HCC1008 contains copy number gain, with DNA<br />

hypomethylation and consequent overexpression (Figure 3.7). The impact <strong>of</strong> such a difference<br />

on a downstream targets was recently shown in a breast cancer study where AKT and mTOR<br />

phosphorylation were higher in cases with low PTEN expression compared to those with high<br />

59


PTEN expression [44]. Using this pathway as an example, though average features across a<br />

sample set are important, those differences between samples in the same pathway may also<br />

play an important role and thus, may have a consequence on the biology <strong>of</strong> the tumor.<br />

3.3.6 Genes exhibiting multiple concerted disruption (MCD) - biological and clinical<br />

significance<br />

We have demonstrated that we can identify more disrupted genes in a given sample when<br />

considering any mechanism <strong>of</strong> disruption. On the other hand, those genes which exhibit<br />

multiple concerted disruptions (MCD) across all DNA dimensions -- i.e. overexpression <strong>of</strong> a<br />

gene due to increased gene dosage, which led to allelic imbalance, and DNA hypomethylation<br />

at the same locus relieving regulation -- may likely have strong biological significance.<br />

Likewise, underexpression due to reduced gene copy number, resulting in LOH, and<br />

complementary DNA hypermethylation, leading to gene silencing may also be significant. By<br />

employing multiple dimensions <strong>of</strong> interrogation, genes exhibiting MCD are captured.<br />

To determine what frequency <strong>of</strong> MCD was deemed significant, we performed a similar analysis<br />

<strong>of</strong> the 10 simulated datasets from before and assessed the proportion <strong>of</strong> events at each<br />

frequency <strong>of</strong> MCD from 0/9 to 1/9 (Figure 3.8a). It was found that by random chance, a gene<br />

exhibiting MCD in 1/9 would occur 0.3% <strong>of</strong> the time. Thus, using this threshold <strong>of</strong> at least one<br />

MCD event, 974 genes were identified (Additional File 7). Interestingly, the overlap <strong>of</strong> the<br />

MDA list (1162 genes) with the MCD list (974 genes) yielded 375 genes.<br />

The MCD strategy sequentially refines the roster <strong>of</strong> target genes with the intent <strong>of</strong> identifying<br />

critical genes for tumorigenesis (Additional File 8 or Appendix IV). Such genes which exhibit<br />

multiple mechanisms <strong>of</strong> deregulation, for example, may represent important nodes in pathways<br />

such as hub proteins [45], whereby disruption <strong>of</strong> the gene has an effect on multiple downstream<br />

targets or genes with biological and/or clinical relevance. Thus, although these genes may not<br />

be affected at a high frequency across the sample set, their disruption at multiple levels in<br />

60


individual samples would signify importance in tumorigenesis. As shown earlier, 375 genes<br />

identified by both MDA and MCD. If we further employed a criterion <strong>of</strong> frequent MCD, whereby<br />

this event occurs in 4/9 <strong>of</strong> cases (signifying high recurrence), we detect 23 genes (Additional<br />

File 8 or Appendix IV). Among the 23 genes identified are TUSC3 (8p22), ELK3 (12q23), and<br />

CCNA1 (13q12.3-q13).<br />

TUSC3 resides at 8p22, a locus frequently deleted across multiple epithelial cancers [46-49].<br />

ELK3 is an ETS domain transcription factor which, in mice, acts as a transcriptional inhibitor in<br />

the absence <strong>of</strong> RAS, but is a transcriptional activator in the presence <strong>of</strong> RAS [50]. Recently,<br />

ELK3 was shown to be underexpressed in a panel <strong>of</strong> breast cancer lines as well clinical breast<br />

tumor specimens [51]. CCNA1 was shown to be hypermethylated in multiple cancer types,<br />

including breast cancer [52].<br />

To validate the relevance <strong>of</strong> the 23 MCD genes in clinical breast cancer samples, we evaluated<br />

gene expression levels associated with survival and examined multiple publicly available<br />

microarray datasets using the Oncomine database [33]. Of these 23 genes, 17 were<br />

represented in either the van de Vijver et al or Sorlie et al datasets. Interestingly, eight <strong>of</strong> these<br />

genes, demonstrated a statistically significant association with patient survival in at least one <strong>of</strong><br />

the two independent datasets (Additional File 9 or Appendix V, Additional File 10 or<br />

Appendix VI) [4, 31]. Moreover, when comparing the percentage <strong>of</strong> survival-associated genes<br />

(8/17, 47.1%) in the MCD gene list with what was expected without pre-selection (27.1%), the<br />

increased percentage was statistically significant based on the binomial test (p = 0.04131806).<br />

To further evaluate the clinical significance <strong>of</strong> these genes, we utilized the Oncomine database<br />

(Additional File 9 or Appendix V). It should be noted the caveat <strong>of</strong> the Oncomine analysis is<br />

that it may not detect all low levels <strong>of</strong> differential expression. TUSC3 is shown as an example <strong>of</strong><br />

one <strong>of</strong> the genes whose expression correlates with survival (Additional File 8 or Appendix IV,<br />

see Methods). Notably, in ovarian cancer, TUSC3, in conjunction with EFA6R, also correlated<br />

with poor survival [53]. The observations that TUSC3 is altered frequently by multiple<br />

61


mechanisms at the DNA and RNA level and shows a strong association with patient survival,<br />

highlight the use <strong>of</strong> MCD in systematically identifying biologically, and potentially clinically,<br />

relevant genes.<br />

3.3.7 Association <strong>of</strong> genes exhibiting MCD and triple negative breast cancers (TNBC)<br />

In this study, the majority <strong>of</strong> samples used (5/9) were <strong>of</strong> the triple negative subtype <strong>of</strong> breast<br />

cancer; a subtype which is estrogen receptor (ER) negative, progesterone receptor (PR)<br />

negative, and HER2 negative and represents between 10% and 20% <strong>of</strong> all diagnosed breast<br />

malignancies [54-57] . Genomic analyses <strong>of</strong> triple negative breast cancers (TNBCs) have been<br />

previously performed [58-61] and they revealed a heterogeneous and complex view <strong>of</strong> this<br />

breast cancer subtype. A recent study, however, had implicated fibroblast growth factor<br />

receptor 2 (FGFR2) as novel therapeutic target amplified in TNBCs [57]. Interestingly, from a<br />

meta-analysis <strong>of</strong> array CGH data, this gene was found to be amplified in 4% <strong>of</strong> TNBC cases<br />

[57]. Thus, we assessed the status <strong>of</strong> FGFR2 and its downstream targets in our multi-<br />

dimensional dataset.<br />

While FGFR2 is not amplified in any <strong>of</strong> the five TNBC cell lines, all <strong>of</strong> the five cell lines showed<br />

overexpression <strong>of</strong> FGFR2 with one <strong>of</strong> the cell lines exhibiting a low level gain <strong>of</strong> a region<br />

encompassing FGFR2 (HCC1937). From this analysis, within the sample set <strong>of</strong> TNBC cell<br />

lines, though FGFR2 is overexpressed, it was not frequently associated with DNA level<br />

alterations.<br />

However, examining downstream targets <strong>of</strong> FGFR2 revealed a striking finding. Using the<br />

knowledge database <strong>of</strong> Ingenuity Pathway Analysis, one <strong>of</strong> the downstream components<br />

affected at the expression level, which was also on both the MDA (Additional File 4) and MCD<br />

(Additional File 7) lists, was COL1A1. Remarkably, <strong>of</strong> the five TNBC cell lines, four exhibited<br />

DNA alteration associated overexpression <strong>of</strong> COL1A1 with two lines exhibiting MCD at COL1A1<br />

and two other lines having DNA copy number associated overexpression. The remaining line<br />

62


exhibited DNA copy number associated overexpression <strong>of</strong> FGFR2 (Figure 3.8b). Hence, every<br />

TNBC line was affected at either FGFR2 or COL1A1 at both the DNA and RNA levels.<br />

Interestingly, COL1A1 has been shown to be both prognostic and predictive in multiple cancer<br />

types, including breast cancer [3, 5, 62, 63].<br />

3.4 Conclusions<br />

In conclusion, we have demonstrated that a multi-dimensional genomic approach is superior to<br />

analysis <strong>of</strong> one or two genomic dimensions alone. Each additional genomic dimension<br />

surveyed increases the amount <strong>of</strong> aberrant gene expression that can be explained within<br />

individual samples. As a by-product, when examining across a sample set, multi-dimensional<br />

genomic analysis can identify relevant genes that may be overlooked due to low frequencies <strong>of</strong><br />

disruption by the individual mechanisms. The increased frequency <strong>of</strong> gene disruption detected,<br />

due to the consideration <strong>of</strong> multiple mechanisms <strong>of</strong> disruption, could potentially reduce the<br />

sample size <strong>of</strong> study cohort needed for gene discovery.<br />

Secondly, while the increased detection <strong>of</strong> genes disrupted using multi-dimensional analysis is<br />

useful for achieving a more comprehensive identification <strong>of</strong> deregulated pathways and gene<br />

networks, it also presents a challenge in prioritizing which genes are likely key nodes or hubs in<br />

the affected pathways and networks. Hence, one way to prioritize is to identify genes with<br />

evidence <strong>of</strong> multiple concerted disruption. The Knudson two-hit hypothesis suggests that tumor<br />

suppressor genes require two allelic hits to disrupt gene function. Bi-allelic alteration, such as<br />

homozygous deletion, or concerted genetic and epigenetic changes, are well documented<br />

causal mechanisms <strong>of</strong> gene disruption. Likewise, hypomethylation and increased gene dosage<br />

are known mechanisms for gene overexpression. The bi-allelic disruption phenomenon<br />

(leading to loss or gain <strong>of</strong> function) provides a means to identify causative genes; hence,<br />

parallel analysis <strong>of</strong> the genome and epigenome in the same tumor is <strong>of</strong> great benefit. In this<br />

study, we have developed a stepwise gene selection strategy to identify multiple concerted<br />

disruptions using an integrative genomics approach.<br />

63


In this study, three DNA dimensions, which have current affordable high throughput assays,<br />

were examined. However, we envision that new techniques for analysis <strong>of</strong> additional aspects<br />

such as histone modification states and gene mutation status will reveal mechanisms that would<br />

explain even more gene expression changes within individual samples. The identification <strong>of</strong> a<br />

number <strong>of</strong> key cancer-related genes and pathways using a relatively small sample size<br />

suggests that limitations in requiring large sample sizes for studies to identify relevant genes<br />

and pathways may be circumvented by our comprehensive approach. Consequently, this<br />

concept can be projected to current technologies such as high throughput sequencing where it<br />

may prove more prudent to perform this analysis in multiple dimensions in a smaller number <strong>of</strong><br />

samples rather than in one dimension in many more samples at a comparable cost. Finally,<br />

observing the same gene in a given pathway being deregulated in a completely different<br />

manner between samples highlights one <strong>of</strong> the shortcomings <strong>of</strong> group-based analysis and<br />

highlights the eventual need to move to systems analysis <strong>of</strong> tumors as individual entities.<br />

64


Figure 3.1<br />

a<br />

CN Gain<br />

Frequency<br />

CN Loss<br />

Frequency<br />

LOH<br />

Frequency<br />

CN Neutral LOH<br />

Frequency<br />

b<br />

BRCA2<br />

ESR1<br />

ERBB2<br />

BRCA1<br />

1<br />

0.5<br />

0<br />

0<br />

1<br />

0.5<br />

0.5 1 1.5 2 2.5 3<br />

0<br />

0<br />

1<br />

0.5 1 1.5 2 2.5 3<br />

0.5<br />

0<br />

0<br />

1<br />

0.5 1 1.5 2 2.5 3<br />

0.5<br />

ESR1<br />

0<br />

0 0.5 1 1.5 2 2.5 3<br />

HCC38<br />

HCC1008<br />

HCC1143<br />

HCC1395<br />

HCC1599<br />

HCC1937<br />

HCC2218<br />

BT474<br />

MCF7<br />

TP53<br />

Genomic Position (Gbp)<br />

HCC38<br />

HCC1008<br />

HCC1143<br />

HCC1395<br />

HCC1599<br />

HCC1937<br />

Copy Number Gain<br />

Copy Number Loss<br />

LOH<br />

Retention (no LOH)<br />

Figure 3.1. Genomic pr<strong>of</strong>iles <strong>of</strong> breast cancer cell lines. (a) Whole genome frequency<br />

analysis copy number gain (red), copy number loss (green), loss <strong>of</strong> heterozygosity/allelic<br />

imbalance (AI) (top blue) and copy number neutral LOH/AI (bottom blue). Vertical lines<br />

through all four graphs represent the genomic location <strong>of</strong> key breast cancer genes, using the<br />

hg18 build <strong>of</strong> the human genome map. (b) Illustration <strong>of</strong> copy number and LOH/AI status for<br />

ESR1, BRCA1, BRCA2, ERBB2 and TP53 in each <strong>of</strong> the samples. Each <strong>of</strong> these DNA<br />

events is evident in all <strong>of</strong> these genes.<br />

65<br />

HCC2218<br />

BT474<br />

MCF7<br />

BRCA2<br />

TP53<br />

ERBB2<br />

BRCA1


Figure 3.2. Quantitative and qualitative benefits <strong>of</strong> integrative analyses. (a) Heatmap and<br />

bar plot illustration <strong>of</strong> the additive benefit <strong>of</strong> multi-dimensional DNA analysis for the explanation<br />

<strong>of</strong> consequential differential gene expression. Within a sample, when sequentially adding a DNA<br />

dimension <strong>of</strong> analysis, an increasing percentage <strong>of</strong> observed differential gene expression can<br />

be explained. For each dimension or combination <strong>of</strong> dimensions, in the bar plot, the median<br />

value is used (grey bars). Heatmaps display the percentage <strong>of</strong> differential expression explained<br />

by DNA mechanisms, with values near to 100 either dark red (overexpression) or green<br />

(underexpression) and values closer to 0 in white. (b) Two specific genes GNAS and CASP1<br />

are given as examples to show multiple and complementary mechanisms <strong>of</strong> gene disruption,<br />

illustrating the importance <strong>of</strong> multi-dimensional analysis (MDA).<br />

66


67<br />

Figure 3.2<br />

a<br />

Hypo<br />

AI<br />

CNG<br />

CNG Or Hypo Or AI<br />

Hyper<br />

LOH<br />

CNL<br />

CNL Or Hyper Or LOH<br />

b<br />

GNAS<br />

Gene Expression<br />

DNA Copy Number<br />

DNA Methylation<br />

Allelic Status<br />

CASP1<br />

Gene Expression<br />

DNA Copy Number<br />

DNA Methylation<br />

Allelic Status<br />

HCC38<br />

HCC1008<br />

HCC1143<br />

HCC1395<br />

HCC1599<br />

HCC1937<br />

HCC2218<br />

BT474<br />

0.197 0.266 0.176 0.134 0.203 0.171 0.144 0.194 0.180<br />

0.319 0.325 0.325 0.337 0.215 0.421 0.132 0.271 0.122<br />

0.708 0.401 0.372 0.644 0.440 0.464 0.321 0.458 0.500<br />

0.821 0.686 0.655 0.757 0.612 0.743 0.435 0.679 0.629<br />

0.103 0.062 0.126 0.236 0.145 0.161 0.183 0.172 0.166<br />

0.425 0.512 0.516 0.523 0.316 0.569 0.197 0.348 0.226<br />

0.367 0.584 0.573 0.499 0.408 0.473 0.203 0.549 0.419<br />

0.522 0.705 0.790 0.702 0.562 0.747 0.363 0.721 0.558<br />

HCC38<br />

HCC1008<br />

HCC1143<br />

HCC1395<br />

HCC1599<br />

HCC1937<br />

HCC2218<br />

BT474<br />

MCF7<br />

MCF7<br />

0 0.2 0.4 0.6 0.8<br />

Proportion <strong>of</strong> Overexpression Explained<br />

0 0.2 0.4 0.6 0.8<br />

Proportion <strong>of</strong> Underexpression Explained<br />

Legend:<br />

GE: Gene Expression: Over Under<br />

CN: DNA Copy Number: Gain Loss<br />

L: Allelic Status: LOH<br />

M: DNA Methylation: Hypo Hyper


Figure 3.3<br />

a<br />

Proportion <strong>of</strong> genes in<br />

random simulations<br />

b<br />

0.30<br />

0.25<br />

0.20<br />

0.15<br />

0.10<br />

0.05<br />

CN Or AI/LOH Or Meth<br />

CN<br />

0<br />

Meth<br />

AI/LOH<br />

0 1 2 3 4 5 6 7 8 9<br />

Disruption frequency<br />

0 200 400 600 800 1000 1200<br />

# <strong>of</strong> genes at 6/9 cut-o�<br />

Simulated Data<br />

Experimental Data<br />

1400<br />

Figure 3.3. Determination and application <strong>of</strong> a disruption frequency threshold. (a)<br />

Results <strong>of</strong> the analyses <strong>of</strong> ten simulated datasets. Aggregating the results <strong>of</strong> the simulated<br />

analyses, the proportion <strong>of</strong> random simulations at the observed frequency thresholds are<br />

shown. From these analysis, approximately 2% <strong>of</strong> the simulations were ≥ 6/9. (b) Using a<br />

frequency cut-<strong>of</strong>f <strong>of</strong> 6/9, the number <strong>of</strong> genes disrupted at that frequency using a single or<br />

combination <strong>of</strong> DNA dimensions. With a single dimension alone, we can maximally identify<br />

437 genes which are differentially expressed and exhibit a concerted change at the DNA<br />

level in a minimum <strong>of</strong> 6/9 samples. However, using all three dimensions, we find that 1162<br />

genes are in fact differentially expressed and contain at least one concerted change in one<br />

<strong>of</strong> the DNA dimensions. This represents over a two-fold increase in the number <strong>of</strong> genes<br />

identified.<br />

68


Figure 3.4<br />

a<br />

b<br />

Disruption Frequency<br />

Cumulative Frequency<br />

9<br />

8<br />

7<br />

6<br />

5<br />

4<br />

3<br />

2<br />

1<br />

0<br />

DNA<br />

Methylation<br />

9<br />

8<br />

7<br />

6<br />

5<br />

4<br />

3<br />

2<br />

1<br />

0<br />

Copy Number<br />

LOH<br />

DNA Methylation<br />

Copy Number<br />

LOH<br />

Copy Number Or LOH<br />

AI/LOH Or<br />

DNA Methylation<br />

Copy Number<br />

Or AI/LOH<br />

Copy Number Or<br />

DNA Methylation<br />

Copy Number Or<br />

DNA Methylation<br />

Copy Number Or<br />

AI/LOH Or<br />

DNA Methylation<br />

LOH Or DNA Methylation<br />

Frequency<br />

threshold<br />

Copy Number Or<br />

LOH Or DNA<br />

Methylation<br />

Figure 3.4. Impact <strong>of</strong> multi-dimensional analysis on low frequency events. (a) Box<br />

plot analysis <strong>of</strong> the frequency distribution <strong>of</strong> single and multi-dimensional analyses (MDA) <strong>of</strong><br />

the 1162 genes differentially expressed with a concerted change in one <strong>of</strong> the DNA dimensions.<br />

The area in red represents the number <strong>of</strong> genes (<strong>of</strong> the 1162) that would be missed if<br />

only a single DNA dimension was examined, while the area in blue represents the genes<br />

that would be detected. Examining the median values for the three right-most boxes, we<br />

see that by even using the box with the highest median (copy number), we would not be<br />

able to detect about 50% <strong>of</strong> the 1162 genes. (b) Two specific examples highlighting the<br />

importance <strong>of</strong> multi-dimensional genomic analysis. Using single dimensional analyses<br />

(green shade) alone, CD70 (blue line graph) and ENG (red line graph) disruption occur at<br />

very low frequencies (44% and 33% respectively). However, when examining two (red<br />

shade) or three genomic dimensions (blue shade), the disruption <strong>of</strong> these genes occurs at<br />

very high frequencies, 88% and 77% respectively. Frequency threshold <strong>of</strong> 6/9 is denoted<br />

with a black dotted line.<br />

69


70<br />

Figure 3.5<br />

-log(pvalue)<br />

5.0<br />

4.0<br />

3.0<br />

2.0<br />

1.0<br />

0.0<br />

Molecular Mechanisms<br />

<strong>of</strong> Cancer<br />

Cell Cycle: G1/S Checkpoint<br />

Regulation<br />

Aryl Hydrocarbon<br />

Receptor Signaling<br />

Breast Cancer Regulation<br />

by Stathmin1<br />

Legend:<br />

Multi-Dimensional Analyis<br />

Simulated Data Sets<br />

Ovarian Cancer Signaling<br />

Prostate Cancer Signaling<br />

p53 Signaling<br />

Neuregulin Signaling<br />

PI3K/AKT Signaling<br />

Threshold<br />

Cell Cycle: G2/M DNA Damage<br />

Checkpoint Regulation<br />

Figure 3.5. Pathway analysis <strong>of</strong> the 1162 genes identified by multi-dimensional analysis. Ingenuity Pathway<br />

Analysis <strong>of</strong> the 1162 genes identified by MDA as well as genes meeting the same frequency criteria (6/9) from the<br />

analysis <strong>of</strong> the ten simulated datasets. In total, using the list <strong>of</strong> 1162 MDA genes, 53 canonical signaling pathways<br />

were identified as significant after multiple testing correction using a Benjamini-Hochberg correction (Additional File<br />

5). In contrast, using the same statistical criteria, nine <strong>of</strong> the 10 simulated datasets yielded no significant pathways<br />

with one <strong>of</strong> the datasets yielding one pathway. In this figure, ten <strong>of</strong> the most well known, cancer-related pathways are<br />

shown. The yellow threshold line represents a Benjamini-Hochberg corrected p-value <strong>of</strong> 0.05 with bars above that<br />

line deemed significant.


Figure 3.6<br />

*<br />

*<br />

ERBB2<br />

ERBB2<br />

HSP90AB1<br />

GRB2<br />

SOS2<br />

HRAS<br />

RAF1<br />

MAP2K1<br />

ERK1/2<br />

ERK1/2<br />

ELK1<br />

S1 S2 S3 S4 S5 S6 S7 S8 S9<br />

GE<br />

CN<br />

L<br />

M<br />

S1 S2 S3 S4 S5 S6 S7 S8 S9<br />

GE<br />

CN<br />

L<br />

M<br />

S1 S2 S3 S4 S5 S6 S7 S8 S9<br />

GE<br />

CN<br />

L<br />

M<br />

S1 S2 S3 S4 S5 S6 S7 S8 S9<br />

GE<br />

CN<br />

L<br />

M<br />

S1 S2 S3 S4 S5 S6 S7 S8 S9<br />

GE<br />

CN<br />

L<br />

M<br />

S1 S2 S3 S4 S5 S6 S7 S8 S9<br />

GE<br />

CN<br />

L<br />

M<br />

S1 S2 S3 S4 S5 S6 S7 S8 S9<br />

GE<br />

CN<br />

L<br />

M<br />

S1 S2 S3 S4 S5 S6 S7 S8 S9<br />

GE<br />

CN<br />

L<br />

M<br />

S1 S2 S3 S4 S5 S6 S7 S8 S9<br />

GE<br />

CN<br />

L<br />

M<br />

MYC<br />

Proliferation &<br />

Differentiation<br />

EREG<br />

ERBB2<br />

PRKCI<br />

ERBB4<br />

S1 S2 S3 S4 S5 S6 S7 S8 S9<br />

GE<br />

CN<br />

L<br />

M<br />

*<br />

GRB7<br />

ERBB2IP<br />

STAT5<br />

ERRFI1<br />

S1 S2 S3 S4 S5 S6 S7 S8 S9<br />

GE<br />

CN<br />

L<br />

M<br />

Cell Cycle<br />

S1 S2 S3 S4 S5 S6 S7 S8 S9<br />

GE<br />

CN<br />

L<br />

M<br />

S1 S2 S3 S4 S5 S6 S7 S8 S9<br />

GE<br />

CN<br />

L<br />

M<br />

S1 S2 S3 S4 S5 S6 S7 S8 S9<br />

GE<br />

CN<br />

L<br />

M<br />

S1 S2 S3 S4 S5 S6 S7 S8 S9<br />

GE<br />

CN<br />

L<br />

M<br />

Mitogenic<br />

Signalling<br />

Figure 3.6. Complex deregulation <strong>of</strong> the Neuregulin/ERBB2 signaling pathway. Each<br />

gene is color-coded red and green to represent over and underexpression respectively.<br />

Genes colored both represent genes which are over and underexpressed in different<br />

samples. Beside each gene is the status for gene expression, copy number, LOH/AI and<br />

DNA methylation, with the alterations in each dimension colored as per the legend. DNA<br />

alterations are only shown when a change in gene expression is observed. It should be<br />

noted that LOH can be derived from multiple mechanisms. In this study, we do not distinguish<br />

between the which mechanisms. Likewise, methylation changes may affect one or<br />

both alleles. In this study, we do not distinguish the status <strong>of</strong> the alleles individually. Genes<br />

denoted with * have one sample exhibiting multiple concerted disruption (MCD). Samples<br />

are coded as follows: S1 = HCC38, S2 = HCC1008, S3 = HCC1143, S4 = HCC1395, S5 =<br />

HCC1599, S6 = HCC1937, S7 = HCC2218, S8 = BT474, and S9 = MCF7.<br />

71<br />

PDK1<br />

AKT2<br />

mTOR<br />

RPS6KB1<br />

RPS6KB1<br />

*<br />

PIK3R1<br />

*<br />

*<br />

ERBB4<br />

EREG<br />

PIP2 PIP3<br />

S1 S2 S3 S4 S5 S6 S7 S8 S9<br />

GE<br />

CN<br />

L<br />

M<br />

S1 S2 S3 S4 S5 S6 S7 S8 S9<br />

GE<br />

CN<br />

L<br />

M<br />

S1 S2 S3 S4 S5 S6 S7 S8 S9<br />

GE<br />

CN<br />

L<br />

M<br />

ERBB4<br />

S1 S2 S3 S4 S5 S6 S7 S8 S9<br />

GE<br />

CN<br />

L<br />

M<br />

S1 S2 S3 S4 S5 S6 S7 S8 S9<br />

GE<br />

CN<br />

L<br />

M<br />

*<br />

PTEN<br />

BAD<br />

CDKN1B<br />

RPS6<br />

S1 S2 S3 S4 S5 S6 S7 S8 S9<br />

GE<br />

CN<br />

L<br />

M<br />

PI3K-AKT<br />

Signalling<br />

Survival &<br />

Proliferation<br />

S1 S2 S3 S4 S5 S6 S7 S8 S9<br />

GE<br />

CN<br />

L<br />

M<br />

S1 S2 S3 S4 S5 S6 S7 S8 S9<br />

GE<br />

CN<br />

L<br />

M<br />

S1 S2 S3 S4 S5 S6 S7 S8 S9<br />

GE<br />

CN<br />

L<br />

M<br />

S1 S2 S3 S4 S5 S6 S7 S8 S9<br />

GE<br />

CN<br />

L<br />

M<br />

Legend:<br />

GE: Gene Expression: Over Under<br />

CN: DNA Copy Number: Gain Loss<br />

L: Allelic Status: LOH<br />

M: DNA Methylation: Hypo Hyper


72<br />

Figure 3.7<br />

Sample: HCC1008<br />

Copy Number Gain Retention<br />

Beta value<br />

Log2 Intensity<br />

0.4<br />

0.2<br />

0<br />

12<br />

8<br />

4<br />

0<br />

DNA Methylation<br />

MCF10A HCC1008<br />

Gene Expression<br />

MCF10A HCC1008<br />

Sample: HCC1395<br />

PTEN<br />

Copy Number Loss LOH<br />

Log2 Intensity<br />

0.8<br />

0.4<br />

Beta value 1.2<br />

0<br />

10<br />

8<br />

6<br />

4<br />

2<br />

0<br />

DNA Methylation<br />

MCF10A HCC1395<br />

Gene Expression<br />

MCF10A HCC1395<br />

Figure 3.7. Deregulation <strong>of</strong> PTEN occurs differently between samples. In HCC1008 (left), PTEN is overexpressed with an<br />

associated gain in copy number and hypomethylation. Conversely, in HCC1395 (right), PTEN is underexpressed, with an<br />

associated loss in copy number, LOH, and DNA hypermethylation. This illustrates how each tumor may behave differently<br />

from another.


Figure 3.8. Multiple concerted disruption (MCD) analysis and its application to triple<br />

negative breast cancer. (a) Analysis <strong>of</strong> ten simulated datasets to determine the proportion <strong>of</strong><br />

random simulations at each observed frequency <strong>of</strong> MCD. Notably, 99.7% <strong>of</strong> random<br />

simulations had a MCD frequency <strong>of</strong> 0/9, with the remaining 0.3% at 1/9. Moreover, no<br />

simulations showed a frequency ≥ 2/9. Thus, the observation <strong>of</strong> an MCD event suggests the<br />

event is likely non-random. (b) Using the knowledge database <strong>of</strong> Ingenuity Pathway Analysis,<br />

upstream and downstream components <strong>of</strong> FGFR2 were selected to assess their role in the<br />

subset <strong>of</strong> triple negative breast cancer (TNBC) cell lines. Only components which were shown<br />

to have a direct or indirect expression level relationship were selected. Of the seven<br />

components identified (four upstream and three downstream <strong>of</strong> FGFR2), one upstream<br />

component (FGF2) and one downstream component (COL1A1) were present in both the MDA<br />

list (Additional File 4) and MCD list (Additional File 7). FGF2, colored in green, is shown to be<br />

frequently underexpressed while COL1A2, colored in red, is frequently overexpressed.<br />

Interestingly, examining FGFR2 and COL1A1, while FGFR2 overexpression is not frequently<br />

associated with DNA level alteration, COL1A1 is frequently affected at DNA level. Moreover, in<br />

the five TNBC cell lines examined, four have DNA level alteration <strong>of</strong> COL1A1 and the remaining<br />

line has DNA level alteration <strong>of</strong> FGFR2.<br />

73


Figure 3.8<br />

a<br />

b<br />

Proportion <strong>of</strong> random simulations<br />

1<br />

0.9<br />

0.8<br />

0.7<br />

0.6<br />

0.5<br />

0.4<br />

0.3<br />

0.2<br />

0.1<br />

0<br />

0 1 2 3 4 5 6 7 8 9<br />

MCD Frequency<br />

FGFR2<br />

Gene Expression<br />

DNA Copy Number<br />

DNA Methylation<br />

Allelic Status<br />

COL1A1<br />

Gene Expression<br />

DNA Copy Number<br />

DNA Methylation<br />

Allelic Status<br />

FGF2<br />

HCC38<br />

* *<br />

HCC1008<br />

IGF2<br />

HCC1143<br />

TP63<br />

HCC1599<br />

HCC1937<br />

74<br />

FGFR2<br />

COL1A1<br />

*Sample has MCD<br />

RUNX2<br />

TGFB1<br />

Legend:<br />

GE: Gene Expression: Over Under<br />

CN: DNA Copy Number: Gain Loss<br />

L: Allelic Status: LOH<br />

M: DNA Methylation: Hypo Hyper


3.5 References<br />

1. Chang JC, Wooten EC, Tsimelzon A, Hilsenbeck SG, Gutierrez MC, Elledge R, Mohsin<br />

S, Osborne CK, Chamness GC, Allred DC et al: Gene expression pr<strong>of</strong>iling for the<br />

prediction <strong>of</strong> therapeutic response to docetaxel in patients with breast cancer.<br />

Lancet 2003, 362(9381):362-369.<br />

2. Coe BP, Chari R, Lockwood WW, Lam WL: Evolving strategies for global gene<br />

expression analysis <strong>of</strong> cancer. J Cell Physiol 2008.<br />

3. Perou CM, Sorlie T, Eisen MB, van de Rijn M, Jeffrey SS, Rees CA, Pollack JR, Ross<br />

DT, Johnsen H, Akslen LA et al: Molecular portraits <strong>of</strong> human breast tumours.<br />

Nature 2000, 406(6797):747-752.<br />

4. Sorlie T, Perou CM, Tibshirani R, Aas T, Geisler S, Johnsen H, Hastie T, Eisen MB, van<br />

de Rijn M, Jeffrey SS et al: Gene expression patterns <strong>of</strong> breast carcinomas<br />

distinguish tumor subclasses with clinical implications. Proc Natl Acad Sci U S A<br />

2001, 98(19):10869-10874.<br />

5. van 't Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AA, Mao M, Peterse HL, van der<br />

Kooy K, Marton MJ, Witteveen AT et al: Gene expression pr<strong>of</strong>iling predicts clinical<br />

outcome <strong>of</strong> breast cancer. Nature 2002, 415(6871):530-536.<br />

6. Fridlyand J, Snijders AM, Ylstra B, Li H, Olshen A, Segraves R, Dairkee S, Tokuyasu T,<br />

Ljung BM, Jain AN et al: Breast tumor copy number aberration phenotypes and<br />

genomic instability. BMC Cancer 2006, 6:96.<br />

7. Albertson DG, Ylstra B, Segraves R, Collins C, Dairkee SH, Kowbel D, Kuo WL, Gray<br />

JW, Pinkel D: Quantitative mapping <strong>of</strong> amplicon structure by array CGH identifies<br />

CYP24 as a candidate oncogene. Nat Genet 2000, 25(2):144-146.<br />

8. Chin SF, Wang Y, Thorne NP, Teschendorff AE, Pinder SE, Vias M, Naderi A, Roberts I,<br />

Barbosa-Morais NL, Garcia MJ et al: Using array-comparative genomic hybridization<br />

to define molecular portraits <strong>of</strong> primary breast cancers. Oncogene 2007,<br />

26(13):1959-1970.<br />

9. Jain AN, Chin K, Borresen-Dale AL, Erikstein BK, Eynstein Lonning P, Kaaresen R,<br />

Gray JW: Quantitative analysis <strong>of</strong> chromosomal CGH in human breast tumors<br />

associates copy number abnormalities with p53 status and patient survival. Proc<br />

Natl Acad Sci U S A 2001, 98(14):7952-7957.<br />

10. Naylor TL, Greshock J, Wang Y, Colligon T, Yu QC, Clemmer V, Zaks TZ, Weber BL:<br />

High resolution genomic analysis <strong>of</strong> sporadic breast cancer using array-based<br />

comparative genomic hybridization. Breast Cancer Res 2005, 7(6):R1186-1198.<br />

11. Shadeo A, Lam WL: Comprehensive copy number pr<strong>of</strong>iles <strong>of</strong> breast cancer cell<br />

model genomes. Breast Cancer Res 2006, 8(1):R9.<br />

12. Chin K, DeVries S, Fridlyand J, Spellman PT, Roydasgupta R, Kuo WL, Lapuk A, Neve<br />

RM, Qian Z, Ryder T et al: Genomic and transcriptional aberrations linked to breast<br />

cancer pathophysiologies. Cancer Cell 2006, 10(6):529-541.<br />

13. Chin SF, Teschendorff AE, Marioni JC, Wang Y, Barbosa-Morais NL, Thorne NP, Costa<br />

JL, Pinder SE, van de Wiel MA, Green AR et al: High-resolution aCGH and<br />

expression pr<strong>of</strong>iling identifies a novel genomic subtype <strong>of</strong> ER negative breast<br />

cancer. Genome Biol 2007, 8(10):R215.<br />

14. Hyman E, Kauraniemi P, Hautaniemi S, Wolf M, Mousses S, Rozenblum E, Ringner M,<br />

Sauter G, Monni O, Elkahloun A et al: Impact <strong>of</strong> DNA amplification on gene<br />

expression patterns in breast cancer. Cancer Res 2002, 62(21):6240-6245.<br />

15. Pollack JR, Sorlie T, Perou CM, Rees CA, Jeffrey SS, Lonning PE, Tibshirani R,<br />

Botstein D, Borresen-Dale AL, Brown PO: Microarray analysis reveals a major direct<br />

role <strong>of</strong> DNA copy number alteration in the transcriptional program <strong>of</strong> human<br />

breast tumors. Proc Natl Acad Sci U S A 2002, 99(20):12963-12968.<br />

75


16. Chari R, Lockwood WW, Coe BP, Chu A, Macey D, Thomson A, Davies JJ, MacAulay<br />

C, Lam WL: SIGMA: a system for integrative genomic microarray analysis <strong>of</strong><br />

cancer genomes. BMC Genomics 2006, 7:324.<br />

17. Ishkanian AS, Mall<strong>of</strong>f CA, Watson SK, DeLeeuw RJ, Chi B, Coe BP, Snijders A,<br />

Albertson DG, Pinkel D, Marra MA et al: A tiling resolution DNA microarray with<br />

complete coverage <strong>of</strong> the human genome. Nat Genet 2004, 36(3):299-303.<br />

18. Lockwood WW, Coe BP, Williams AC, MacAulay C, Lam WL: Whole genome tiling<br />

path array CGH analysis <strong>of</strong> segmental copy number alterations in cervical cancer<br />

cell lines. Int J Cancer 2007, 120(2):436-443.<br />

19. Khojasteh M, Lam WL, Ward RK, MacAulay C: A stepwise framework for the<br />

normalization <strong>of</strong> array CGH data. BMC Bioinformatics 2005, 6:274.<br />

20. Jong K, Marchiori E, Meijer G, Vaart AV, Ylstra B: Breakpoint identification and<br />

smoothing <strong>of</strong> array comparative genomic hybridization data. Bioinformatics 2004,<br />

20(18):3636-3637.<br />

21. Coe BP, Lockwood WW, Girard L, Chari R, Macaulay C, Lam S, Gazdar AF, Minna JD,<br />

Lam WL: Differential disruption <strong>of</strong> cell cycle pathways in small cell and non-small<br />

cell lung cancer. Br J Cancer 2006, 94(12):1927-1935.<br />

22. Carvalho B, Bengtsson H, Speed TP, Irizarry RA: Exploration, normalization, and<br />

genotype calls <strong>of</strong> high-density oligonucleotide SNP array data. Biostatistics 2007,<br />

8(2):485-499.<br />

23. Lin M, Wei LJ, Sellers WR, Lieberfarb M, Wong WH, Li C: dChipSNP: significance<br />

curve and clustering <strong>of</strong> SNP-array-based loss-<strong>of</strong>-heterozygosity data.<br />

Bioinformatics 2004, 20(8):1233-1240.<br />

24. Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, Fiegler H, Shapero MH,<br />

Carson AR, Chen W et al: Global variation in copy number in the human genome.<br />

Nature 2006, 444(7118):444-454.<br />

25. Karolchik D, Kuhn RM, Baertsch R, Barber GP, Clawson H, Diekhans M, Giardine B,<br />

Harte RA, Hinrichs AS, Hsu F et al: The UCSC Genome Browser Database: 2008<br />

update. Nucleic Acids Res 2008, 36(Database issue):D773-779.<br />

26. Bibikova M, Lin Z, Zhou L, Chudin E, Garcia EW, Wu B, Doucet D, Thomas NJ, Wang Y,<br />

Vollmer E et al: High-throughput DNA methylation pr<strong>of</strong>iling using universal bead<br />

arrays. Genome Res 2006, 16(3):383-393.<br />

27. Chari R, Coe BP, Wedselt<strong>of</strong>t C, Benetti M, Wilson IM, Vucic EA, MacAulay C, Ng RT,<br />

Lam WL: SIGMA2: a system for the integrative genomic multi-dimensional analysis<br />

<strong>of</strong> cancer genomes, epigenomes, and transcriptomes. BMC Bioinformatics 2008,<br />

9:422.<br />

28. Soh J, Okumura N, Lockwood WW, Yamamoto H, Shigematsu H, Zhang W, Chari R,<br />

Shames DS, Tang X, MacAulay C et al: Oncogene mutations, copy number gains<br />

and mutant allele specific imbalance (MASI) frequently occur together in tumor<br />

cells. PLoS One 2009, 4(10):e7464.<br />

29. Tuna M, Knuutila S, Mills GB: Uniparental disomy in cancer. Trends Mol Med 2009,<br />

15(3):120-128.<br />

30. Yan H, Yuan W, Velculescu VE, Vogelstein B, Kinzler KW: Allelic variation in human<br />

gene expression. Science 2002, 297(5584):1143.<br />

31. van de Vijver MJ, He YD, van't Veer LJ, Dai H, Hart AA, Voskuil DW, Schreiber GJ,<br />

Peterse JL, Roberts C, Marton MJ et al: A gene-expression signature as a predictor<br />

<strong>of</strong> survival in breast cancer. N Engl J Med 2002, 347(25):1999-2009.<br />

32. Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, Botstein D,<br />

Altman RB: Missing value estimation methods for DNA microarrays. Bioinformatics<br />

2001, 17(6):520-525.<br />

33. Rhodes DR, Kalyana-Sundaram S, Mahavisno V, Varambally R, Yu J, Briggs BB,<br />

Barrette TR, Anstet MJ, Kincead-Beal C, Kulkarni P et al: Oncomine 3.0: genes,<br />

pathways, and networks in a collection <strong>of</strong> 18,000 cancer gene expression pr<strong>of</strong>iles.<br />

Neoplasia 2007, 9(2):166-180.<br />

76


34. Johnson N, Speirs V, Curtin NJ, Hall AG: A comparative study <strong>of</strong> genome-wide SNP,<br />

CGH microarray and protein expression analysis to explore genotypic and<br />

phenotypic mechanisms <strong>of</strong> acquired antiestrogen resistance in breast cancer.<br />

Breast Cancer Res Treat 2008, 111(1):55-63.<br />

35. Jee CD, Lee HS, Bae SI, Yang HK, Lee YM, Rho MS, Kim WH: Loss <strong>of</strong> caspase-1<br />

gene expression in human gastric carcinomas and cell lines. Int J Oncol 2005,<br />

26(5):1265-1271.<br />

36. Ueki T, Takeuchi T, Nishimatsu H, Kajiwara T, Moriyama N, Narita Y, Kawabe K, Ueki K,<br />

Kitamura T: Silencing <strong>of</strong> the caspase-1 gene occurs in murine and human renal<br />

cancer cells and causes solid tumor growth in vivo. Int J Cancer 2001, 91(5):673-<br />

679.<br />

37. McLendon R, Friedman A, Bigner D, Van Meir EG, Brat DJ, Mastrogianakis M, Olson JJ,<br />

Mikkelsen T, Lehman N, Aldape K et al: Comprehensive genomic characterization<br />

defines human glioblastoma genes and core pathways. Nature 2008.<br />

38. Parsons DW, Jones S, Zhang X, Lin JC, Leary RJ, Angenendt P, Mankoo P, Carter H,<br />

Siu IM, Gallia GL et al: An integrated genomic analysis <strong>of</strong> human glioblastoma<br />

multiforme. Science 2008, 321(5897):1807-1812.<br />

39. Chang JT, Nevins JR: GATHER: a systems approach to interpreting genomic<br />

signatures. Bioinformatics 2006, 22(23):2926-2933.<br />

40. Bachman KE, Argani P, Samuels Y, Silliman N, Ptak J, Szabo S, Konishi H, Karakas B,<br />

Blair BG, Lin C et al: The PIK3CA gene is mutated with high frequency in human<br />

breast cancers. Cancer Biol Ther 2004, 3(8):772-775.<br />

41. Slamon DJ, Godolphin W, Jones LA, Holt JA, Wong SG, Keith DE, Levin WJ, Stuart SG,<br />

Udove J, Ullrich A et al: Studies <strong>of</strong> the HER-2/neu proto-oncogene in human breast<br />

and ovarian cancer. Science 1989, 244(4905):707-712.<br />

42. Stein D, Wu J, Fuqua SA, Roonprapunt C, Yajnik V, D'Eustachio P, Moskow JJ,<br />

Buchberg AM, Osborne CK, Margolis B: The SH2 domain protein GRB-7 is coamplified,<br />

overexpressed and in a tight complex with HER2 in breast cancer. Embo<br />

J 1994, 13(6):1331-1340.<br />

43. Lockwood WW, Chari R, Coe BP, Girard L, Macaulay C, Lam S, Gazdar AF, Minna JD,<br />

Lam WL: DNA amplification is a ubiquitous mechanism <strong>of</strong> oncogene activation in<br />

lung and other cancers. Oncogene 2008, 27(33):4615-4624.<br />

44. Stemke-Hale K, Gonzalez-Angulo AM, Lluch A, Neve RM, Kuo WL, Davies M, Carey M,<br />

Hu Z, Guan Y, Sahin A et al: An integrative genomic and proteomic analysis <strong>of</strong><br />

PIK3CA, PTEN, and AKT mutations in breast cancer. Cancer Res 2008, 68(15):6084-<br />

6091.<br />

45. Wang E, Lenferink A, O'Connor-McCourt M: Cancer systems biology: exploring<br />

cancer-associated genes on cellular networks. Cell Mol Life Sci 2007, 64(14):1752-<br />

1762.<br />

46. Bova GS, Carter BS, Bussemakers MJ, Emi M, Fujiwara Y, Kyprianou N, Jacobs SC,<br />

Robinson JC, Epstein JI, Walsh PC et al: Homozygous deletion and frequent allelic<br />

loss <strong>of</strong> chromosome 8p22 loci in human prostate cancer. Cancer Res 1993,<br />

53(17):3869-3873.<br />

47. Chinen K, Isomura M, Izawa K, Fujiwara Y, Ohata H, Iwamasa T, Nakamura Y:<br />

Isolation <strong>of</strong> 45 exon-like fragments from 8p22-->p21.3, a region that is commonly<br />

deleted in hepatocellular, colorectal, and non-small cell lung carcinomas.<br />

Cytogenet Cell Genet 1996, 75(2-3):190-196.<br />

48. Cooke SL, Pole JC, Chin SF, Ellis IO, Caldas C, Edwards PA: High-resolution array<br />

CGH clarifies events occurring on 8p in carcinogenesis. BMC Cancer 2008,<br />

8(1):288.<br />

49. Yaremko ML, Recant WM, Westbrook CA: Loss <strong>of</strong> heterozygosity from the short arm<br />

<strong>of</strong> chromosome 8 is an early event in breast cancers. Genes Chromosomes Cancer<br />

1995, 13(3):186-191.<br />

77


50. Giovane A, Pintzas A, Maira SM, Sobieszczuk P, Wasylyk B: Net, a new ets<br />

transcription factor that is activated by Ras. Genes Dev 1994, 8(13):1502-1513.<br />

51. He J, Pan Y, Hu J, Albarracin C, Wu Y, Dai JL: Pr<strong>of</strong>ile <strong>of</strong> Ets gene expression in<br />

human breast carcinoma. Cancer Biol Ther 2007, 6(1):76-82.<br />

52. Shames DS, Girard L, Gao B, Sato M, Lewis CM, Shivapurkar N, Jiang A, Perou CM,<br />

Kim YH, Pollack JR et al: A genome-wide screen for promoter methylation in lung<br />

cancer identifies novel methylation markers for multiple malignancies. PLoS Med<br />

2006, 3(12):e486.<br />

53. Pils D, Horak P, Gleiss A, Sax C, Fabjani G, Moebus VJ, Zielinski C, Reinthaller A,<br />

Zeillinger R, Krainer M: Five genes from chromosomal band 8p22 are significantly<br />

down-regulated in ovarian carcinoma: N33 and EFA6R have a potential impact on<br />

overall survival. Cancer 2005, 104(11):2417-2429.<br />

54. Cheang MC, Voduc D, Bajdik C, Leung S, McKinney S, Chia SK, Perou CM, Nielsen<br />

TO: Basal-like breast cancer defined by five biomarkers has superior prognostic<br />

value than triple-negative phenotype. Clin Cancer Res 2008, 14(5):1368-1376.<br />

55. Gluz O, Liedtke C, Gottschalk N, Pusztai L, Nitz U, Harbeck N: Triple-negative breast<br />

cancer--current status and future directions. Ann Oncol 2009, 20(12):1913-1927.<br />

56. Rakha EA, El-Sayed ME, Green AR, Lee AH, Robertson JF, Ellis IO: Prognostic<br />

markers in triple-negative breast cancer. Cancer 2007, 109(1):25-32.<br />

57. Turner N, Lambros MB, Horlings HM, Pearson A, Sharpe R, Natrajan R, Geyer FC, van<br />

Kouwenhove M, Kreike B, Mackay A et al: Integrative molecular pr<strong>of</strong>iling <strong>of</strong> triple<br />

negative breast cancers identifies amplicon drivers and potential therapeutic<br />

targets. Oncogene 2010.<br />

58. Andre F, Job B, Dessen P, Tordai A, Michiels S, Liedtke C, Richon C, Yan K, Wang B,<br />

Vassal G et al: Molecular characterization <strong>of</strong> breast cancer with high-resolution<br />

oligonucleotide comparative genomic hybridization array. Clin Cancer Res 2009,<br />

15(2):441-451.<br />

59. Bertucci F, Finetti P, Cervera N, Esterni B, Hermitte F, Viens P, Birnbaum D: How basal<br />

are triple-negative breast cancers? Int J Cancer 2008, 123(1):236-240.<br />

60. Han W, Jung EM, Cho J, Lee JW, Hwang KT, Yang SJ, Kang JJ, Bae JY, Jeon YK, Park<br />

IA et al: DNA copy number alterations and expression <strong>of</strong> relevant genes in triplenegative<br />

breast cancer. Genes Chromosomes Cancer 2008, 47(6):490-499.<br />

61. Kreike B, van Kouwenhove M, Horlings H, Weigelt B, Peterse H, Bartelink H, van de<br />

Vijver MJ: Gene expression pr<strong>of</strong>iling and histopathological characterization <strong>of</strong><br />

triple-negative/basal-like breast carcinomas. Breast Cancer Res 2007, 9(5):R65.<br />

62. Ramaswamy S, Ross KN, Lander ES, Golub TR: A molecular signature <strong>of</strong> metastasis<br />

in primary solid tumors. Nat Genet 2003, 33(1):49-54.<br />

63. Wang Y, Klijn JG, Zhang Y, Sieuwerts AM, Look MP, Yang F, Talantov D, Timmermans<br />

M, Meijer-van Gelder ME, Yu J et al: Gene-expression pr<strong>of</strong>iles to predict distant<br />

metastasis <strong>of</strong> lymph-node-negative primary breast cancer. Lancet 2005,<br />

365(9460):671-679.<br />

64. Richardson AL, Wang ZC, De Nicolo A, Lu X, Brown M, Miron A, Liao X, Iglehart JD,<br />

Livingston DM, Ganesan S: X chromosomal abnormalities in basal-like human<br />

breast cancer. Cancer Cell 2006, 9(2):121-132.<br />

65. Radvanyi L, Singh-Sandhu D, Gallichan S, Lovitt C, Pedyczak A, Mallo G, Gish K, Kwok<br />

K, Hanna W, Zubovits J et al: The gene associated with trichorhinophalangeal<br />

syndrome in humans is overexpressed in breast cancer. Proc Natl Acad Sci U S A<br />

2005, 102(31):11005-11010.<br />

66. Finak G, Bertos N, Pepin F, Sadekova S, Souleimanova M, Zhao H, Chen H, Omeroglu<br />

G, Meterissian S, Omeroglu A et al: Stromal gene expression predicts clinical<br />

outcome in breast cancer. Nat Med 2008, 14(5):518-527.<br />

67. Karnoub AE, Dash AB, Vo AP, Sullivan A, Brooks MW, Bell GW, Richardson AL, Polyak<br />

K, Tubo R, Weinberg RA: Mesenchymal stem cells within tumour stroma promote<br />

breast cancer metastasis. Nature 2007, 449(7162):557-563.<br />

78


<strong>Chapter</strong> 4: Uniparental disomy is a prevalent genetic<br />

mechanism <strong>of</strong> oncogene disruption in lung adenocarcinoma 3<br />

3 A version <strong>of</strong> this chapter will be submitted for publication with the following author list: Chari<br />

R, Lockwood WW, Soh J, Coe BP, Tam K, MacAulay CE, Minna JD, Lam S, Gazdar AF, Lam<br />

WL. (2010) Uniparental disomy is a prevalent genetic mechanism <strong>of</strong> oncogene disruption in<br />

lung adenocarcinoma.<br />

79


4.1 Introduction<br />

Genetic alterations play a significant role in a variety <strong>of</strong> malignancies [1, 2]. Typically, these<br />

alterations have been represented by either changes in gene dosage (DNA copy number) or<br />

somatic mutations such as total copy number gain or activating mutations <strong>of</strong> oncogenes and<br />

total copy number loss or inactivating mutations <strong>of</strong> tumor suppressor genes. Loss <strong>of</strong><br />

heterozygosity is also a common alteration whereby one allele is lost and <strong>of</strong>ten, results in a loss<br />

<strong>of</strong> total copy number. However, there are instances in which where one allele is lost but the<br />

remaining allele is duplicated resulting in no net change in copy number, termed copy neutral<br />

loss <strong>of</strong> heterozygosity or somatic uniparental disomy (UPD).<br />

Although somatic UPD had been shown previously in malignancies such as retinoblastoma [3],<br />

recent studies have shown an increased prominence <strong>of</strong> this alteration [4]. This largely been a<br />

result <strong>of</strong> advances in technology to detect somatic UPD and advances in the methodologies to<br />

define UPD [5, 6]. Moreover, frequent regions <strong>of</strong> somatic UPD have been identified in many<br />

different cancer types such as colorectal cancer [7, 8], lymphoma [9, 10], myelodysplastic<br />

syndrome (MDS) [11-13], basal cell carcinoma [14], hepatoblastoma [15], and ovarian cancer<br />

[16]. In addition, while the target gene <strong>of</strong> some <strong>of</strong> these regions have been associated to tumor<br />

suppressors such as RB1 and TP53, where the gene is likely mutated, the targets have also<br />

been associated with oncogenes. For example, mutation with somatic UPD has been observed<br />

at loci such as JAK2 [6, 17], CBL [12, 18], FLT3 [19] in hematological malignancies. However,<br />

such associations have been limited in epithelial malignancies.<br />

Recently, we have illustrated the concept <strong>of</strong> mutant allele specific imbalance (MASI) in lung<br />

cancer [20]. It was found that a highly activated state for EGFR and KRAS is achieved through<br />

either copy number amplification <strong>of</strong> the mutated allele for EGFR and UPD <strong>of</strong> the mutated allele<br />

for KRAS. With the observed frequency <strong>of</strong> UPD at KRAS as such, we sought to assess the<br />

impact and prevalence <strong>of</strong> UPD in the lung adenocarcinoma genome. Strikingly, we found that<br />

the amount <strong>of</strong> the genome affected frequently by UPD was comparable to that <strong>of</strong> copy number<br />

80


gain and loss. When examining major oncogenes and tumor suppressor genes, while most<br />

oncogenes were associated with frequent areas <strong>of</strong> gain, we found a subset <strong>of</strong> both known and<br />

novel oncogenes that were frequently affected by UPD. Finally, examining oncogenes with<br />

homozygous mutation in multiple cancer types, we observe frequent UPD at these genes<br />

suggesting this mechanism <strong>of</strong> oncogene activation is prevalent across multiple cancer types.<br />

4.2 Methods<br />

4.2.1 Genome wide pr<strong>of</strong>iling <strong>of</strong> clinical lung adenocarcinoma specimens<br />

Forty-six lung adenocarcinoma cases were obtained from Vancouver General Hospital under<br />

approved ethics. Cases were reviewed by a pathologist and tumors were microdissected to<br />

ensure maximal tumor cell content (≥ 70%). Five hundred nanograms <strong>of</strong> genomic DNA were<br />

extracted from each tumor and adjacent non-malignant tissue were prepared and hybridized to<br />

the Affymetrix Genome-Wide Human SNP 6.0 array platform as per manufacturer's instructions.<br />

CEL files, the raw data files generated, were then processed using the Affymetrix Genotyping<br />

Console version 3.0.2 to generate .chp files using the birdseed v2 genotyping algorithm.<br />

4.2.2 Determination <strong>of</strong> regions <strong>of</strong> uniparental disomy (UPD) in clinical lung tumors<br />

CEL files and .chp files were imported into Partek Genomics Suite (PGS) using the s<strong>of</strong>tware's<br />

recommended default settings. First, to determine total copy number, paired copy number<br />

intensities were calculated for each sample using the intensity in the tumor vs. it's matched non-<br />

malignant sample. Paired copy number intensities were then analyzed using the Genomic<br />

Segmentation method in PGS with all parameters run at default except for the number <strong>of</strong><br />

markers which was set to 50. Subsequently, allele specific copy number (ASCN) analysis was<br />

used to determine regions <strong>of</strong> allelic imbalance. A region was deemed imbalanced if the<br />

imbalance proportion was ≥ 0.15 (as recommended by PGS). Finally, a region was called UPD<br />

if the region was imbalance and no change in total copy number was present.<br />

81


4.2.3 Determining frequent regions <strong>of</strong> UPD, gain and loss<br />

To determine frequent regions <strong>of</strong> gain, loss, and UPD, the frequency <strong>of</strong> each alteration was<br />

determined for each SNP probe on the somatic chromosomes. A frequency threshold <strong>of</strong> 40%<br />

was used. To smooth out regions <strong>of</strong> UPD (and for gain and loss as well), a three step process<br />

was performed. First, adjacent probes with frequencies greater than the threshold were merged<br />

together. Second, to account for dips in frequency where one region is split into two, if the dip is<br />

less than 1 Mb in size, the regions were merged. Finally, smoothed regions <strong>of</strong> UPD, gain, or<br />

loss which were less than 100 probes in size were removed.<br />

4.2.4 Determination <strong>of</strong> UPD in cancer cell lines<br />

Raw SNP 6.0 data (.CEL files) from cancer cell lines were obtained from the Wellcome Trust<br />

Sanger CGP database. CEL files were then genotyped similarly as above to generate CHP<br />

files. To define a total copy number and allele specific copy number reference, SNP 6.0 data,<br />

generated from 72 CEPH HapMap samples, were obtained from Affymetrix and were also<br />

genotyped. Unpaired copy number and allele specific copy number analyses were performed,<br />

as described above, to determine regions <strong>of</strong> allelic imbalance without a change in total copy<br />

number using Partek Genomics Suite.<br />

4.2.5 Expression analysis <strong>of</strong> genes in focal regions <strong>of</strong> UPD<br />

For 16 <strong>of</strong> the 46 tumor/non-malignant tissue pairs, gene expression pr<strong>of</strong>iles on a custom<br />

Affymetrix chip were generated. The 32 samples were normalized using the RMA algorithm<br />

[21] in the Bioconductor s<strong>of</strong>tware suite in R [22]. To determine overexpression in a given<br />

sample pair for a given gene, since expression values are in log2 space, expression values in<br />

non-malignant samples were subtracted from expression values in the tumor. A two-fold<br />

expression change was deemed significant for this analysis.<br />

82


4.3 Results<br />

4.3.1 Detection <strong>of</strong> UPD using allele specific copy number analysis<br />

To determine regions <strong>of</strong> UPD, we first determined regions <strong>of</strong> allelic imbalance using an allele<br />

specific copy number based approach. This approach has been shown to identify more regions<br />

<strong>of</strong> imbalance than previous call-based approaches [6, 12]. In the first example, where no UPD<br />

is present, we observe a chromosome exhibiting no change in total copy number as compared<br />

to its matched control and also no imbalance between the alleles, represented by shift between<br />

the blue and red data points (Figure 4.1a). However, in the next two samples, we do observe<br />

large shifts between the blue and red data points. Specifically, one example illustrates a region<br />

<strong>of</strong> UPD with a region <strong>of</strong> gain on chromosome arm 12q (Figure 4.1b) and another example<br />

illustrates a whole chromosome UPD event on chromosome 14 (Figure 4.1c). The blue data<br />

points in the UPD regions are not completely at zero but slightly above due to cells that do not<br />

carry the UPD alteration.<br />

4.3.2 UPD is prevalent and non-random in the lung cancer genome with comparable<br />

frequencies to gain and loss<br />

With the ability to detect UPD as shown above as well as identifying UPD at the KRAS<br />

oncogene from a previous study, we then assessed the prevalence <strong>of</strong> UPD in the genome.<br />

Using a 40% frequency threshold, we determine the regions <strong>of</strong> the genome affected by UPD at<br />

this frequency. In total, 153 regions were identified (Table 4.1). Moreover, when examining<br />

areas <strong>of</strong> frequent gain and loss (at similar frequency thresholds), we observe that the amount <strong>of</strong><br />

the genome affected by frequent UPD is comparable to that <strong>of</strong> frequent gain and loss (Figure<br />

4.2). While there was some overlap with the regions <strong>of</strong> loss and UPD, there was very little<br />

overlap between gain and UPD, even though we would expect some level <strong>of</strong> overlap by random<br />

chance. Using megabases <strong>of</strong> the genome as a metric, we observe 650 Mb affected by gain,<br />

500 Mb by loss and 400 Mb by UPD, with 7 Mb overlap in gain and UPD and 58 Mb overlap<br />

83


etween loss and UPD (Figure 4.3). Strikingly, all three alterations cover over 49% <strong>of</strong> the<br />

genome. It should also be noted that the observation <strong>of</strong> comparable levels <strong>of</strong> gain, loss and<br />

UPD at the frequency level is also seen when examining samples on an individual basis (Figure<br />

4.4).<br />

4.3.3 Overlap <strong>of</strong> major oncogenes and tumor suppressor genes in regions <strong>of</strong> gain, loss,<br />

and UPD<br />

We then assessed how major oncogenes and tumor suppressor genes associated with the<br />

three levels <strong>of</strong> genetic alteration. Using a list <strong>of</strong> 112 genes derived from a number <strong>of</strong> sources<br />

[23, 24] (Table 4.2), we found 52 <strong>of</strong> these genes to overlap with at least one <strong>of</strong> frequent gain,<br />

loss, or UPD. Major oncogenes such as EGFR, MYC, AKT1, MDM2, and ERBB2 are affected<br />

frequently by copy number gain, which has been shown previously [25-29] (Table 4.3).<br />

Similarly, major tumor suppressor genes such as FHIT, RARB and CDKN2A are affected by<br />

frequent copy number loss (Table 4.4). Interestingly, while expected tumor suppressor genes<br />

such as BRCA2 and RB1 are affected by UPD, a subset <strong>of</strong> seven oncogenes were affected by<br />

UPD. Specifically, UPD was observed at KRAS, as shown previously, PIK3CA, BCL6 and<br />

FLT3. Moreover, examining KRAS (Figure 4.5a) and RB1 (Figure 4.5b) specifically, we see<br />

that the UPD events are <strong>of</strong> different sizes between different samples.<br />

4.3.4 UPD is prevalent at oncogenes across multiple cancer types<br />

We observed frequent UPD at oncogenes in lung cancer. We sought to assess the prevalence<br />

<strong>of</strong> UPD at oncogenes across multiple cancer types. For this analysis, we utilized SNP 6.0 array<br />

data for over 700 cancer cell lines from the Wellcome Trust Sanger database where somatic<br />

mutation data were also available. In total, 67 instances <strong>of</strong> homozygous mutation at 13<br />

oncogene loci were assessed (Table 4.5). It was found that while copy number gain was the<br />

most prevalent genetic alteration, a significant proportion <strong>of</strong> samples exhibited UPD (Figure<br />

4.6a, Table 4.6). Examining the genes with the most samples harbouring homozygous<br />

84


mutation, KRAS and BRAF, the overall trend is consistent with what is observed at these two<br />

loci (Figure 4.6a). An example <strong>of</strong> UPD at KRAS in NCI-H2030 and BRAF in A427 are<br />

illustrated in Figure 4.6b.<br />

For this analysis, cancer cell lines were utilized as the samples represent a more homogeneous<br />

population <strong>of</strong> cells. In contrast, clinical tumors, even after microdissection, still may contain<br />

small amounts <strong>of</strong> contaminating normal cells. As such, determining if a mutation is<br />

homozygous in clinical lung tumors is challenging. With available KRAS mutation data, we<br />

assessed the frequency <strong>of</strong> gain, loss and UPD in KRAS mutant tumors and observe a similar<br />

distribution pattern observed in the cell lines (Figure 4.6c)<br />

4.3.5 Identification <strong>of</strong> novel candidate oncogenes using focal regions <strong>of</strong> UPD<br />

Selecting the more focal regions <strong>of</strong> UPD within the set <strong>of</strong> 153 regions, we identified 35 <strong>of</strong> the<br />

regions which contained three or less RefSeq annotated genes. In total, 64 RefSeq genes were<br />

identified across all 35 regions (Table 4.7) and amongst these genes was E2F3 (Figure 4.7a).<br />

Examining paired gene expression for a subset <strong>of</strong> the 46 tumor/normal pairs, it was found that<br />

10/16 pairs showed overexpression <strong>of</strong> E2F3 (Figure 4.7b). E2F3 has previously shown to be<br />

overexpressed in lung cancer and also shown to have a role in other cancer types [30, 31].<br />

4.4 Discussion<br />

We have shown the unexpected and wide prevalence <strong>of</strong> UPD in the lung adenocarcinoma<br />

genome and have also observed a large number <strong>of</strong> both known and novel oncogenes harbored<br />

in these regions <strong>of</strong> frequent UPD. While there have been previous studies utilizing SNP arrays<br />

on lung adenocarcinoma tumors [26, 30], there are likely a number <strong>of</strong> reasons why these<br />

frequent regions were likely missed. First, the tumors utilized in this study were microdissected<br />

to ensure a high proportion <strong>of</strong> tumor cells (≥ 70% was required) were analyzed. This is<br />

important as previous studies have shown the impact <strong>of</strong> tissue heterogeneity and the ability to<br />

detect alterations [32, 33]. Secondly, for every tumor used, matched non-malignant tissue was<br />

85


obtained, pr<strong>of</strong>iled and used as the control. While it has been shown that unmatched references<br />

can be used to detect UPD, the resultant UPD may not be called correctly all the time. Finally,<br />

the progression from call-based approaches to allele specific copy number-based approaches<br />

can also increase the detection <strong>of</strong> UPD [6, 12]. Taken together, these improvements could<br />

explain the observed results.<br />

While it is interesting to observe these frequent regions <strong>of</strong> UPD in the lung adenocarcinoma<br />

genome, the larger implications <strong>of</strong> these findings may not be readily apparent. In the cases <strong>of</strong><br />

somatically mutated oncogenes or tumor suppressor genes, the existence <strong>of</strong> UPD in these<br />

cases is clear as UPD is used to select the mutated allele to result in a homozygous mutation<br />

state. We have previously shown that mutant allelic specific imbalance (MASI), either through<br />

allele specific amplification or UPD, is associated with a poorer prognosis [20]. To assess the<br />

prevalence <strong>of</strong> UPD at homozygously mutated oncogene sites, we analyzed cancer cell lines<br />

encompassing multiple cancer types for UPD at mutated oncogenes. While the most frequent<br />

genomic alteration observed is copy number gain, frequent UPD also occurs. The distribution<br />

<strong>of</strong> alterations observed across all genes is consistent with the most frequently mutated<br />

oncogenes, KRAS and BRAF. The result <strong>of</strong> these UPD events is preferential expression <strong>of</strong> the<br />

mutated allele.<br />

It should also be noted that with the amount <strong>of</strong> frequent UPD detected, there are regions likely<br />

selected for reasons other than somatic mutation. For example, like in the cases <strong>of</strong> imprinted<br />

regions, there could be preferential selection <strong>of</strong> an unmethylated or methylated allele which in<br />

turn, could regulate downstream gene expression. Previous studies have assessed the<br />

relationship between regions <strong>of</strong> UPD and DNA methylation patterns in cancer [8, 34, 35].<br />

Alternatively, in order to achieve downstream differential expression, in addition to preferential<br />

selection based on methylation, it has also been shown that for a given gene, transcription may<br />

involve only one <strong>of</strong> the alleles [36-39] and thus, selection may be based on transcriptional<br />

efficiency. Hence, it is important that the genetic data on UPD be integrated with methylation<br />

86


and gene expression data to refine these regions <strong>of</strong> UPD with many genes to a small number <strong>of</strong><br />

candidate oncogenes and tumor suppressor genes.<br />

Though many <strong>of</strong> the regions <strong>of</strong> UPD identified were large and encompassed a number <strong>of</strong><br />

genes, approximately 1/5 <strong>of</strong> the regions identified contained three or less genes. As such, this is<br />

one approach for narrowing down candidate gene targets. Using gene expression data on a<br />

subset <strong>of</strong> the pr<strong>of</strong>iled cases used for UPD, we assessed the gene expression pr<strong>of</strong>iles <strong>of</strong> the 64<br />

genes encompassed in the 35 focal regions. Of the 64 genes, 57 were represented on the<br />

gene expression microarray platform used. Fifteen <strong>of</strong> the 57 genes were overexpressed in at<br />

least 25% <strong>of</strong> the samples (4/16) (Table 4.8). In addition to E2F3, other genes within the set <strong>of</strong><br />

15 genes have shown interesting biological function. For example, GPR39 has been shown to<br />

activate EGFR signaling as well as protect cells form apoptosis [40, 41]; SLC7A11 has been<br />

shown to have a role in drug resistance [42] and was assessed as a therapeutic target for small<br />

cell lung cancer [43]; PDGFD has been implicated in many different cancer types [44]; and<br />

PRDM8, a histone methyltransferase, is a member <strong>of</strong> the PRDM transcription factor family and<br />

these factors have been implicated as proto-oncogenes [45].<br />

4.5 Conclusion<br />

In summary, we have shown an unexpectedly high prevalence <strong>of</strong> UPD in the lung<br />

adenocarcinoma genome, with comparable amounts <strong>of</strong> the genome affected being comparable<br />

to copy number gain and loss. While a number <strong>of</strong> known oncogenes were shown to be in<br />

regions <strong>of</strong> frequent UPD, potentially novel lung oncogenes have also been shown to be affected<br />

by UPD with downstream consequential change in gene expression. Further studies are needed<br />

to elicit their roles in lung adenocarcinoma.<br />

87


Figure 4.1. Detection <strong>of</strong> UPD using allele specific copy number. Total copy number (top)<br />

and allelic specific copy number (bottom) plots. In the allele specific copy number plot, the red<br />

data points represent the level <strong>of</strong> the major allele and the blue data points represent the level <strong>of</strong><br />

the minor allele. The total copy number plot represents a the sum <strong>of</strong> the allele specific copy<br />

number. (a) Sample with neutral copy number and no imbalance <strong>of</strong> chromosome 12. While the<br />

total copy number is neutral, when examining the allele specific copy number, imbalance<br />

between the alleles is evident. (b) Sample with regions <strong>of</strong> copy number gains and UPD (in<br />

orange) on chromosome 12q. (c) Sample with whole chromosome UPD <strong>of</strong> chromosome 14.<br />

88


Figure 4.1<br />

a<br />

b<br />

c<br />

# <strong>of</strong> copies<br />

# <strong>of</strong> copies<br />

4<br />

2<br />

0<br />

3<br />

2<br />

1<br />

0<br />

# <strong>of</strong> copies<br />

# <strong>of</strong> copies<br />

# <strong>of</strong> copies<br />

# <strong>of</strong> copies<br />

4<br />

2<br />

0<br />

3<br />

2<br />

1<br />

0<br />

4<br />

2<br />

0<br />

3<br />

2<br />

1<br />

0<br />

Chromosome 12<br />

Chromosome 12q<br />

Chromosome 14<br />

89<br />

Total copy number<br />

Allele speci�c<br />

copy number<br />

Total copy number<br />

Allele speci�c<br />

copy number<br />

Total copy number<br />

Allele speci�c<br />

copy number


Figure 4.2. Comparison <strong>of</strong> frequent regions <strong>of</strong> gain, loss and UPD in the lung<br />

adenocarcinoma genome. Frequent regions <strong>of</strong> gain (red), loss (green) and UPD (blue) in the<br />

lung adenocarcinoma genome. Only regions which were altered in at least 40% <strong>of</strong> the samples,<br />

by either gain, loss, or UPD, are shown. Frequent regions <strong>of</strong> gain (such as 5p, 7p, 8q, 17q and<br />

20q) and loss (such as 3p, 8p, 9p, 13q), which have previously been shown, are detected. The<br />

fourth column, composite ("C"), represents areas <strong>of</strong> overlap between gain and UPD (red) and<br />

loss and UPD (green).<br />

90


Figure 4.2<br />

1 G L U C 2 G L U C 3 G L U C 4 G L U C<br />

5 G L U C 6 G L U C 7 G L U C 8 G L U C<br />

9 G L U C 10 G L U C 11 G L U C 12 G L U C<br />

13 G L U C 14 G L U C 15 G L U C 16 G L U C<br />

17 G L U C 18 G L U C 19 G L U C 20 G L U C<br />

21 G L U C 22 G L U C<br />

91<br />

G - Gain<br />

L - Loss<br />

U - UPD<br />

C - Composite


Figure 4.3<br />

Gain Loss<br />

642 441<br />

7<br />

Figure 4.3. Venn diagram illustrating the amount <strong>of</strong> the genome covered by frequent<br />

gain, loss, and UPD. Numbers provided are in megabases (Mb) <strong>of</strong> genome sequence.<br />

92<br />

335<br />

UPD<br />

58


Figure 4.4. Genomic pr<strong>of</strong>ile <strong>of</strong> an individual lung adenocarcinoma sample. Regions <strong>of</strong><br />

gain (red), loss (green), and UPD (blue) are shown in this single lung adenocarcinoma pr<strong>of</strong>ile.<br />

Comparable amounts <strong>of</strong> the genome are affected by all three <strong>of</strong> these alterations.<br />

93


94<br />

Figure 4.4<br />

1 2 3 4 5 6 7 8<br />

9 10 11 12 13 14 15 16<br />

17 18 19 20 21 22 Gain<br />

Loss<br />

UPD


95<br />

Figure 4.5<br />

a<br />

chr12 (p12.1-p11.21) 12q12 15 22<br />

b<br />

85060201<br />

85070205<br />

85070159<br />

85060358<br />

85050147<br />

85060186<br />

85050235<br />

85070021<br />

85070093<br />

85060276<br />

85050241<br />

85040031<br />

85060354<br />

85060256<br />

85050172<br />

85040001<br />

85050140<br />

85060342<br />

85060206<br />

85060311<br />

85060251<br />

85050207<br />

85060098<br />

85070205<br />

85060186<br />

85060098<br />

85060342<br />

85060251<br />

85050177<br />

85070081<br />

85050011<br />

85060358<br />

85060256<br />

85050147<br />

85040001<br />

85060221<br />

85070085<br />

85060216<br />

85060068<br />

85050172<br />

85070061<br />

85060311<br />

85070093<br />

85040031<br />

KRAS<br />

KRAS UPD Regions<br />

chr13 (q13.3-q21.1) 13 p12 11.2 21.1 q31.1 q34<br />

RB1 UPD Regions<br />

Figure 4.5. Examination <strong>of</strong> UPD events at the KRAS and RB1 loci. KRAS shown in (a) and RB1 shown in (b). The<br />

region <strong>of</strong> UPD encompassing these loci varies in size between samples, with some samples illustrating larger sizes <strong>of</strong> UPD<br />

than others. The existence <strong>of</strong> these different size events are likely a result <strong>of</strong> a different underlying mechanism <strong>of</strong> UPD.<br />

RB1


Figure 4.6<br />

a b<br />

c<br />

Percent <strong>of</strong> cases<br />

All Genes (n=67)<br />

KRAS (n=33)<br />

BRAF (n=11)<br />

Gain<br />

Loss<br />

UPD<br />

Neutral<br />

50 KRAS (n=21)<br />

40<br />

30<br />

20<br />

10<br />

0<br />

Gain UPD Loss Neutral<br />

# <strong>of</strong> copies # <strong>of</strong> alleles<br />

# <strong>of</strong> copies # <strong>of</strong> alleles<br />

3<br />

2<br />

1<br />

0<br />

4<br />

2<br />

0<br />

3<br />

2<br />

1<br />

0<br />

4<br />

2<br />

0<br />

A427<br />

Chromosome 7 BRAF<br />

NCI-H2030<br />

KRAS Chromosome 12<br />

Figure 4.6. Relationship <strong>of</strong> homozygous mutation at oncogenes and genomic alteration.<br />

Using the Wellcome Trust Sanger COSMIC database for somatic mutation data and<br />

SNP 6.0 data available for over 700 cancer cell lines from their database, prevalence <strong>of</strong><br />

UPD was assessed in this dataset. Specifically, only those cell lines with oncogenes and<br />

homozygous mutation were analyzed. (a) In total, 67 instances <strong>of</strong> homozygous mutation at<br />

an oncogene loci were identified. While a large fraction <strong>of</strong> cases exhibited copy number<br />

increase (51%), the second most prominent alteration is UPD (34%). Of the 12 different<br />

genes assessed, KRAS and BRAF are the most frequently homozygously mutated oncogenes<br />

and those two genes show similar frequency distribution patterns <strong>of</strong> genomic alteration<br />

to the whole set. (b) An example <strong>of</strong> UPD at BRAF in A427 and KRAS in NCI-H2030<br />

where both BRAF and KRAS are homozygously mutated. (c) With available mutation data<br />

on KRAS from the 46 lung tumor/matched non-malignant tissue pairs, similar analysis was<br />

performed and it was found that the patterns <strong>of</strong> genomic alteration were similar to what was<br />

observed in cancer cell lines.<br />

96<br />

Allele Speci�c<br />

Copy Number<br />

Total Copy Number<br />

Allele Speci�c<br />

Copy Number<br />

Total Copy Number


Figure 4.7. Identification <strong>of</strong> E2F3 in a focal region <strong>of</strong> UPD. (a) One <strong>of</strong> the focal regions<br />

identified was located on chromosomal region 6p22.3. There were only three RefSeq<br />

annotated genes that were completely encompassed within this region: E2F3, ID4, and<br />

MBOAT1. The UCSC Genome Browser (genome build hg18) was used to identify genes and<br />

visualize region [46]. (b) Analyzing gene expression amongst a subset <strong>of</strong> the tumors pr<strong>of</strong>iled on<br />

the SNP array, it was found that E2F3 was the most frequently overexpressed amongst the<br />

three genes assessed, with a frequency <strong>of</strong> overexpression <strong>of</strong> 62.5%.<br />

97


Figure 4.7<br />

a<br />

b<br />

6p25.3<br />

Log2 fold change<br />

6p25.1<br />

2.5<br />

2.0<br />

1.5<br />

1.0<br />

0.5<br />

0<br />

-0.5<br />

6p24.3<br />

6p24.1<br />

6p23<br />

6p22.3<br />

6p22.2<br />

6p22.1<br />

Samples 1 to 16<br />

98<br />

6p21.33<br />

6p21.31<br />

6p21.2<br />

6p21.1<br />

6p12.3<br />

6p12.2<br />

6p12.1<br />

6p11.2<br />

6p11.1


Table 4.1. Regions <strong>of</strong> the genome exhibiting frequent UPD<br />

Chr BPStart BPEnd # <strong>of</strong> Chr BPStart BPEnd # <strong>of</strong><br />

markers<br />

markers<br />

1 57240523 57708781 261 6 8651004 9629747 293<br />

1 88627690 88937185 107 6 18605671 20757531 903<br />

1 213006484 213325225 153 6 71651506 72037387 152<br />

2 33915294 34575063 336 6 73271789 76714973 996<br />

2 102028604 102246459 130 6 82824129 84992961 566<br />

2 103342214 104703078 369 6 87778426 91268388 1147<br />

2 107288361 107909534 151 6 97800818 101044085 940<br />

2 113475801 113597836 110 6 105272060 114467061 2939<br />

2 123104889 123771933 262 6 116466834 119757979 966<br />

2 129285355 130086326 353 6 121042746 123214382 558<br />

2 132852444 133093232 131 6 125234328 126352419 406<br />

2 134976541 137102451 519 6 130012107 130399246 197<br />

2 138356995 142103903 1239 6 131442876 133188489 644<br />

2 148597921 149606831 175 6 134265257 144908460 3331<br />

2 150732270 153458836 918 6 147524934 170759956 9496<br />

2 154569594 157096948 576 7 109870908 110924988 272<br />

2 158283600 163854187 1560 7 119981096 121751585 455<br />

2 165114877 166441984 353 7 122940488 123963613 280<br />

2 167549158 172376141 1670 7 125810873 126573092 258<br />

2 173385819 175183422 649 8 82600012 84910522 562<br />

2 178506144 178971533 144 8 87475660 91661039 1094<br />

2 182593441 183843529 379 8 109473113 113912000 1026<br />

2 185976166 192057947 1482 9 32370194 32717136 127<br />

2 195998062 198251401 604 10 86487543 87543389 442<br />

2 214938398 217672315 1092 10 97401670 97992222 155<br />

2 222098215 223638933 514 10 99819620 100845692 389<br />

2 224915228 225683208 295 11 7088070 8626013 675<br />

2 234086493 235425331 673 11 9745400 16571856 2963<br />

3 38947562 40467146 489 11 17785506 20708448 1441<br />

3 75597086 77391013 520 11 22527476 27641093 1997<br />

3 120734475 121565472 248 11 31290899 36427391 2111<br />

3 126442524 128090913 466 11 37624038 46083889 3066<br />

3 131310768 131908449 194 11 78488179 78811363 210<br />

3 133156739 140816203 2231 11 81391754 83571743 737<br />

3 141841646 145572883 1095 11 85133890 86695809 613<br />

3 148635117 153815363 1592 11 92858003 94403193 618<br />

3 154850963 162426559 2109 11 99344667 102033351 1020<br />

3 163593089 164340331 178 11 103294671 104342367 382<br />

3 171033453 175715353 1551 11 106728968 107654049 275<br />

3 179650656 194058084 4585 11 111090113 111676300 118<br />

99


Chr BPStart BPEnd # <strong>of</strong> Chr BPStart BPEnd # <strong>of</strong><br />

markers<br />

markers<br />

4 56867092 57264514 111 11 114516077 115632014 460<br />

4 59586524 62756073 878 11 121222174 122326244 431<br />

4 68014200 73981908 1418 11 127365170 128934004 622<br />

4 75048874 79252113 1473 11 131240209 132411274 555<br />

4 81069345 81632642 125 12 4901875 5859088 616<br />

4 83519602 86772381 1036 12 7251496 15972576 2966<br />

4 95266916 96342489 299 12 19068978 20077939 414<br />

4 99713479 100262528 179 12 21583225 28079788 2489<br />

4 102215835 109269168 1774 12 29252288 31395946 1005<br />

4 110431176 111696150 384 12 36144018 38218718 356<br />

4 113392931 114074009 165 12 43846459 44709526 206<br />

4 119368629 120609480 324 12 46714417 47193357 116<br />

4 122057241 123147706 383 12 49541041 50471244 235<br />

4 128685983 130719464 455 12 53439918 53858695 150<br />

4 138792901 139993006 454 12 74639231 75745944 244<br />

5 52835375 56296363 1274 12 92376456 96256397 1605<br />

5 59963217 62083765 600 12 97428291 100976394 1075<br />

5 64017546 65834248 562 12 102585961 120121516 6328<br />

5 70702961 72963048 707 12 124477764 127353782 1525<br />

5 74029627 81781885 2453 12 128730700 129400678 383<br />

5 86329552 88138619 328 13 17943628 32731810 6266<br />

5 90373713 90752434 102 13 35198318 36512766 543<br />

5 95109222 96467147 461 13 39604817 41933751 808<br />

5 98108008 98876920 190 13 43687180 52149041 2627<br />

5 106982444 108758672 584 13 80666670 81857944 333<br />

5 113412673 114060668 291 13 97080915 99917507 1076<br />

5 115177955 116152174 521 14 26463057 27074470 206<br />

5 118498618 118830120 101 14 39744666 40655458 316<br />

5 123995175 124348209 112 15 23337531 24061503 396<br />

5 130208827 131907815 390 17 1646832 4192054 695<br />

5 139359170 140952015 262 17 5443385 6771106 606<br />

5 145140525 146950721 652 17 8395366 9333459 369<br />

5 148164516 149400891 515 17 10552154 14949757 1968<br />

5 153600774 154479059 292 20 8028477 9014932 524<br />

5 156203870 159492184 1245 20 53205763 53952115 387<br />

5 162483304 163524481 416<br />

5 165711472 166223866 189<br />

5 180068185 180629495 176<br />

100


Table 4.2. List <strong>of</strong> major oncogenes and tumor suppressor genes assessed<br />

Gene Chr Gene Chr Gene Chr<br />

ABL1 9 EVI1 3 NF2 22<br />

ABL2 1 FBXW7 4 NKX2-1 14<br />

AKT1 14 FEV 2 NOTCH1 9<br />

AKT2 19 FGFR1 8 NRAS 1<br />

ALK 2 FGFR2 10 NTRK1 1<br />

APC 5 FGFR3 4 NTRK3 15<br />

ATM 11 FH 1 PDGFB 22<br />

BCL2 18 FHIT 3 PDGFRA 4<br />

BCL3 19 FLT3 13 PDGFRB 5<br />

BCL6 3 FOXO1A 13 PHOX2B 4<br />

BMPR1A 10 FOXO3A 6 PIK3CA 3<br />

BRAF 7 FOXP1 3 PIK3R1 5<br />

BRCA1 17 GNAS 20 PIM1 6<br />

BRCA2 13 GSTP1 11 PRKAR1A 17<br />

BUB1B 15 HRAS 11 PTCH 9<br />

CAV1 7 HRPT2 1 PTEN 10<br />

CBL 11 ITK 5 PTPN11 12<br />

CCND1 11 JAK2 9 RARB 3<br />

CCND2 12 JAK3 19 RASSF1A 3<br />

CCND3 6 KIT 4 RB1 13<br />

CD44 11 KRAS 12 REL 2<br />

CDH1 16 LCK 1 RET 10<br />

CDH11 16 MAF 16 RUNX1 21<br />

CDH13 16 MAFB 20 SEMA3B 3<br />

CDK4 12 MAML2 11 SMO 7<br />

CDK6 7 MAP2K4 17 STK11 19<br />

CDKN2A 9 MDM2 12 SUFU 10<br />

CEBPA 11 MEN1 11 SYK 9<br />

CHEK2 22 MET 7 TCF1 12<br />

CRK 17 MLH1 3 TIMP3 22<br />

CTNNB1 3 MLL 11 TP53 17<br />

CYLD 16 MPL 1 TSC1 9<br />

DAPK1 9 MSH2 2 TSC2 16<br />

EGFR 7 MSH6 2 TSHR 14<br />

ERBB2 17 MYC 8 VHL 3<br />

ERCC2 19 MYCL1 1 WT1 11<br />

ERG 21 MYCN 2<br />

ETV6 12 NF1 17<br />

101


Table 4.3. Overlap <strong>of</strong> oncogenes in frequent regions <strong>of</strong> genomic alteration<br />

Gene<br />

Symbol<br />

Location Gain Loss UPD<br />

ABL1 9q34.1 X<br />

ABL2 1q24-q25 X<br />

AKT1 14q32.32 X<br />

AKT2 19q13.1q13.2<br />

X<br />

BCL6 3q27 X<br />

CCND1 11q13 X<br />

CCND3 6p21 X<br />

CD44 11p13 X<br />

CDK4 12q14 X<br />

CEBPA 19q13.11 X<br />

CRK 17p13.3 X<br />

EGFR 7p12.3-p12.1 X<br />

ERBB2 17q21.1 X<br />

ETV6 12p13 X<br />

FEV 2q36 X<br />

FGFR3 4p16.3 X<br />

FLT3 13q12 X<br />

GNAS 20q13.2 X<br />

HRAS 11p15.5 X<br />

ITK 5q31-q32 X<br />

KRAS 12p12.1 X<br />

LCK 1p35-p34.3 X<br />

MAFB 20q11.2q13.1<br />

X<br />

MDM2 12q15 X<br />

MEN1 11q13 X<br />

MPL 1p34 X<br />

MYC 8q24.12q24.13<br />

X<br />

MYCL1 1p34.3 X<br />

NOTCH1 9q34.3 X<br />

NTRK1 1q21-q22 X<br />

PDGFB 22q12.3q13.1<br />

X<br />

PDGFRB 5q31-q32 X<br />

PIK3CA 3q26.3 X<br />

PIM1 6p21.2 X<br />

PRKAR1A 17q23-q24 X<br />

SMO 7q31-q32 X<br />

102


Table 4.4. Overlap <strong>of</strong> tumor suppressor genes in frequent regions <strong>of</strong> genomic alteration<br />

Gene<br />

Symbol<br />

Location Gain Loss UPD<br />

BRCA1 17q21 X<br />

BRCA2 13q12 X X<br />

CDH1 16q22.1 X<br />

CDKN2A 9p21 X<br />

CYLD 16q12-q13 X<br />

FH 1q42.1 X<br />

FHIT 3p14.2 X<br />

GSTP1 X<br />

MAP2K4 17p11.2 X X<br />

NF1 17q12 X<br />

PTPN11 12q24.1 X<br />

RARB 3p24.2 X<br />

RB1 13q14 X X<br />

TSC1 9q34 X<br />

TSC2 16p13.3 X<br />

WT1 11p13 X<br />

103


Table 4.5. Cell lines and oncogene loci with homozygous mutation<br />

Sample Primary Tissue Gene Sample Primary Tissue Gene<br />

EFM-19 breast PIK3CA NCI-H460 lung KRAS<br />

NCI-ADR-RES breast ERBB2 NCI-H727 lung KRAS<br />

OCUB-M breast PIK3CA PC-14 lung EGFR<br />

AM-38 central nervous system BRAF SHP-77 lung KRAS<br />

OMC-1 cervix PIK3CA SW1573 lung KRAS<br />

HEC-1 endometrium KRAS KYSE-450 oesophagus NOTCH1<br />

ECC4 gastrointestinal tract KRAS OVCAR-5 ovary KRAS<br />

BE-13 haematopoietic and<br />

lymphoid tissue<br />

NOTCH1 AsPC-1 pancreas KRAS<br />

HEL haematopoietic and<br />

lymphoid tissue<br />

JAK2 CAPAN-1 pancreas KRAS<br />

OPM-2 haematopoietic and<br />

lymphoid tissue<br />

FGFR3 HuP-T4 pancreas KRAS<br />

LS-174T large intestine CTNNB1 MIA-PaCa-2 pancreas KRAS<br />

LS-411N large intestine BRAF PANC-08-13 pancreas KRAS<br />

RCM-1 large intestine KRAS SW1990 pancreas KRAS<br />

SK-CO-1 large intestine KRAS YAPC pancreas KRAS<br />

SNU-C2B large intestine KRAS A375 skin BRAF<br />

SW1463 large intestine KRAS COLO-679 skin BRAF<br />

SW403 large intestine KRAS CP66-MEL skin NRAS<br />

SW620 large intestine KRAS GAK skin NRAS<br />

A427 lung CTNNB2 HT-144 skin BRAF<br />

A549 lung KRAS MEL-HO skin BRAF<br />

COLO-668 lung KRAS MEL-JUSO skin HRAS<br />

COR-L23 lung KRAS SH-4 skin BRAF<br />

COR-L23 lung RUNX1 SK-MEL-2 skin NRAS<br />

IA-LM lung KRAS SK-MEL-28 skin BRAF<br />

LCLC-97TM1 lung KRAS SK-MEL-28 skin EGFR<br />

LU-65 lung KRAS UACC-62 skin BRAF<br />

NCI-H1092 lung CTNNB3 RD s<strong>of</strong>t tissue NRAS<br />

NCI-H1155 lung KRAS BCPAP thyroid BRAF<br />

NCI-H1395 lung BRAF CAL-62 thyroid KRAS<br />

NCI-H1793 lung KRAS BB49-HNC upper<br />

aerodigestive<br />

tract<br />

HRAS<br />

NCI-H2030 lung KRAS 639-V urinary tract PIK3CA<br />

NCI-H2122 lung KRAS T-24 urinary tract HRAS<br />

NCI-H2291 lung KRAS UM-UC-3 urinary tract KRAS<br />

NCI-H2347 lung NRAS<br />

104


Table 4.6. Summary <strong>of</strong> homozygous mutation analysis in cancer cell lines<br />

Gene # <strong>of</strong> Hz mutations # UPD # Gain # Loss # Neutral<br />

KRAS 33 10 18 4 1<br />

BRAF 11 4 6 1 0<br />

NRAS 5 1 3 1 0<br />

PIK3CA 4 1 2 1 0<br />

CTNNB1 3 2 0 1 0<br />

HRAS 3 2 1 0 0<br />

EGFR 2 1 1 0 0<br />

NOTCH1 2 0 1 1 0<br />

FGFR3 1 0 1 0 0<br />

JAK2 1 0 1 0 0<br />

ERBB2 1 1 0 0 0<br />

RUNX1 1 1 0 0 0<br />

Total 67 23 34 9 1<br />

105


Table 4.7. RefSeq genes in focal regions <strong>of</strong> UPD<br />

Gene Symbol Chr Gene Symbol Chr<br />

DAB1 1 TNFAIP8 5<br />

IL1R1 2 ZNF608 5<br />

IL1RL2 2 E2F3 6<br />

GPR39 2 ID4 6<br />

EPC2 2 MBOAT1 6<br />

KIF5C 2 B3GAT2 6<br />

MBD5 2 C6orf191 6<br />

OSBPL6 2 IMMP2L 7<br />

RBM45 2 LRRN3 7<br />

CUL3 2 GRM8 7<br />

DOCK10 2 ODZ4 11<br />

FAM124B 2 CASP4 11<br />

FRG2C 3 DDI1 11<br />

ZNF717 3 PDGFD 11<br />

COL29A1 3 CADM1 11<br />

COL6A6 3 HNT 11<br />

LPHN3 4 OPCML 11<br />

ANTXR2 4 ANO2 12<br />

FGF5 4 KCNA5 12<br />

PRDM8 4 NTF3 12<br />

ADH5 4 AEBP2 12<br />

EIF4E 4 PLEKHA5 12<br />

METAP1 4 ALG10B 12<br />

SLC7A11 4 CPNE8 12<br />

ARRDC3 5 KIF21A 12<br />

CHD1 5 TMEM132B 12<br />

RGMB 5 FZD10 12<br />

FBXL17 5 ZNF10 12<br />

FER 5 ZNF140 12<br />

PJA2 5 ZNF268 12<br />

KCNN2 5 ATP10A 15<br />

DMXL1 5 PLCB1 20<br />

106


Table 4.8. Genes overexpressed in focal regions <strong>of</strong> UPD<br />

Probe ID Gene Symbol<br />

107<br />

Frequency <strong>of</strong><br />

Overexpression<br />

merck-NM_001508_a_at GPR39 13<br />

merck-AJ270693_at PLEKHA5 12<br />

merck-NM_014331_at SLC7A11 10<br />

merck-NM_001949_at E2F3 10<br />

merck-C17174_at CUL3 9<br />

merck-AA651853_at ARRDC3 7<br />

merck-AF336376_a_at PDGFD 6<br />

merck-CR624190_a_at KIF21A 6<br />

merck-NM_016522_at HNT 5<br />

merck-NM_003854_at IL1RL2 5<br />

merck-NM_000845_at GRM8 5<br />

merck-AY358331_s_at HNT 5<br />

merck-CR625009_at ZNF140 5<br />

merck-X52332_a_at ZNF10 5<br />

merck-AK127693_s_at PLCB1 4<br />

merck-NM_020226_at PRDM8 4


4.6 References<br />

1. Bell DW: Our changing view <strong>of</strong> the genomic landscape <strong>of</strong> cancer. J Pathol 2010,<br />

220(2):231-243.<br />

2. Chari R, Thu KL, Wilson IM, Lockwood WW, Lonergan KM, Coe BP, Mall<strong>of</strong>f CA, Gazdar<br />

AF, Lam S, Garnis C et al: Integrating the multiple dimensions <strong>of</strong> genomic and<br />

epigenomic landscapes <strong>of</strong> cancer. Cancer Metastasis Rev 2010.<br />

3. Zhu X, Dunn JM, Goddard AD, Squire JA, Becker A, Phillips RA, Gallie BL:<br />

Mechanisms <strong>of</strong> loss <strong>of</strong> heterozygosity in retinoblastoma. Cytogenet Cell Genet<br />

1992, 59(4):248-252.<br />

4. Tuna M, Knuutila S, Mills GB: Uniparental disomy in cancer. Trends Mol Med 2009,<br />

15(3):120-128.<br />

5. Li C, Beroukhim R, Weir BA, Winckler W, Garraway LA, Sellers WR, Meyerson M: Major<br />

copy proportion analysis <strong>of</strong> tumor samples using SNP arrays. BMC Bioinformatics<br />

2008, 9:204.<br />

6. Yamamoto G, Nannya Y, Kato M, Sanada M, Levine RL, Kawamata N, Hangaishi A,<br />

Kurokawa M, Chiba S, Gilliland DG et al: Highly sensitive method for genomewide<br />

detection <strong>of</strong> allelic composition in nonpaired, primary tumor specimens by use <strong>of</strong><br />

affymetrix single-nucleotide-polymorphism genotyping microarrays. Am J Hum<br />

Genet 2007, 81(1):114-126.<br />

7. Andersen CL, Wiuf C, Kruh<strong>of</strong>fer M, Korsgaard M, Laurberg S, Ornt<strong>of</strong>t TF: Frequent<br />

occurrence <strong>of</strong> uniparental disomy in colorectal cancer. Carcinogenesis 2007,<br />

28(1):38-48.<br />

8. Darbary HK, Dutt SS, Sait SJ, Nowak NJ, Heinaman RE, Stoler DL, Anderson GR:<br />

Uniparentalism in sporadic colorectal cancer is independent <strong>of</strong> imprint status, and<br />

coordinate for chromosomes 14 and 18. Cancer Genet Cytogenet 2009, 189(2):77-<br />

86.<br />

9. Fitzgibbon J, Iqbal S, Davies A, O'Shea D, Carlotti E, Chaplin T, Matthews J, Raghavan<br />

M, Norton A, Lister TA et al: Genome-wide detection <strong>of</strong> recurring sites <strong>of</strong><br />

uniparental disomy in follicular and transformed follicular lymphoma. Leukemia<br />

2007, 21(7):1514-1520.<br />

10. Kawamata N, Ogawa S, Seeger K, Kirschner-Schwabe R, Huynh T, Chen J, Megrabian<br />

N, Harbott J, Zimmermann M, Henze G et al: Molecular allelokaryotyping <strong>of</strong> relapsed<br />

pediatric acute lymphoblastic leukemia. Int J Oncol 2009, 34(6):1603-1612.<br />

11. Gondek LP, Tiu R, O'Keefe CL, Sekeres MA, Theil KS, Maciejewski JP: Chromosomal<br />

lesions and uniparental disomy detected by SNP arrays in MDS, MDS/MPD, and<br />

MDS-derived AML. Blood 2008, 111(3):1534-1542.<br />

12. Sanada M, Suzuki T, Shih LY, Otsu M, Kato M, Yamazaki S, Tamura A, Honda H,<br />

Sakata-Yanagimoto M, Kumano K et al: Gain-<strong>of</strong>-function <strong>of</strong> mutated C-CBL tumour<br />

suppressor in myeloid neoplasms. Nature 2009, 460(7257):904-908.<br />

13. Tiu RV, Gondek LP, O'Keefe CL, Huh J, Sekeres MA, Elson P, McDevitt MA, Wang XF,<br />

Levis MJ, Karp JE et al: New lesions detected by single nucleotide polymorphism<br />

array-based chromosomal analysis have important clinical impact in acute<br />

myeloid leukemia. J Clin Oncol 2009, 27(31):5219-5226.<br />

14. Teh MT, Blaydon D, Chaplin T, Foot NJ, Skoulakis S, Raghavan M, Harwood CA, Proby<br />

CM, Philpott MP, Young BD et al: Genomewide single nucleotide polymorphism<br />

microarray mapping in basal cell carcinomas unveils uniparental disomy as a key<br />

somatic event. Cancer Res 2005, 65(19):8597-8603.<br />

15. Suzuki M, Kato M, Yuyan C, Takita J, Sanada M, Nannya Y, Yamamoto G, Takahashi A,<br />

Ikeda H, Kuwano H et al: Whole-genome pr<strong>of</strong>iling <strong>of</strong> chromosomal aberrations in<br />

hepatoblastoma using high-density single-nucleotide polymorphism genotyping<br />

microarrays. Cancer Sci 2008, 99(3):564-570.<br />

108


16. Walsh CS, Ogawa S, Scoles DR, Miller CW, Kawamata N, Narod SA, Koeffler HP,<br />

Karlan BY: Genome-wide loss <strong>of</strong> heterozygosity and uniparental disomy in<br />

BRCA1/2-associated ovarian carcinomas. Clin Cancer Res 2008, 14(23):7645-7651.<br />

17. Kralovics R, Guan Y, Prchal JT: Acquired uniparental disomy <strong>of</strong> chromosome 9p is<br />

a frequent stem cell defect in polycythemia vera. Exp Hematol 2002, 30(3):229-236.<br />

18. Grand FH, Hidalgo-Curtis CE, Ernst T, Zoi K, Zoi C, McGuire C, Kreil S, Jones A, Score<br />

J, Metzgeroth G et al: Frequent CBL mutations associated with 11q acquired<br />

uniparental disomy in myeloproliferative neoplasms. Blood 2009, 113(24):6182-<br />

6192.<br />

19. Fitzgibbon J, Smith LL, Raghavan M, Smith ML, Debernardi S, Skoulakis S, Lillington D,<br />

Lister TA, Young BD: Association between acquired uniparental disomy and<br />

homozygous gene mutation in acute myeloid leukemias. Cancer Res 2005,<br />

65(20):9152-9154.<br />

20. Soh J, Okumura N, Lockwood WW, Yamamoto H, Shigematsu H, Zhang W, Chari R,<br />

Shames DS, Tang X, MacAulay C et al: Oncogene mutations, copy number gains<br />

and mutant allele specific imbalance (MASI) frequently occur together in tumor<br />

cells. PLoS One 2009, 4(10):e7464.<br />

21. Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, Speed TP:<br />

Exploration, normalization, and summaries <strong>of</strong> high density oligonucleotide array<br />

probe level data. Biostatistics 2003, 4(2):249-264.<br />

22. Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L,<br />

Ge Y, Gentry J et al: Bioconductor: open s<strong>of</strong>tware development for computational<br />

biology and bioinformatics. Genome Biol 2004, 5(10):R80.<br />

23. Ding L, Getz G, Wheeler DA, Mardis ER, McLellan MD, Cibulskis K, Sougnez C,<br />

Greulich H, Muzny DM, Morgan MB et al: Somatic mutations affect key pathways in<br />

lung adenocarcinoma. Nature 2008, 455(7216):1069-1075.<br />

24. Futreal PA, Coin L, Marshall M, Down T, Hubbard T, Wooster R, Rahman N, Stratton<br />

MR: A census <strong>of</strong> human cancer genes. Nat Rev Cancer 2004, 4(3):177-183.<br />

25. Lockwood WW, Chari R, Coe BP, Girard L, Macaulay C, Lam S, Gazdar AF, Minna JD,<br />

Lam WL: DNA amplification is a ubiquitous mechanism <strong>of</strong> oncogene activation in<br />

lung and other cancers. Oncogene 2008, 27(33):4615-4624.<br />

26. Weir BA, Woo MS, Getz G, Perner S, Ding L, Beroukhim R, Lin WM, Province MA, Kraja<br />

A, Johnson LA et al: Characterizing the cancer genome in lung adenocarcinoma.<br />

Nature 2007, 450(7171):893-898.<br />

27. Chitale D, Gong Y, Taylor BS, Broderick S, Brennan C, Somwar R, Golas B, Wang L,<br />

Motoi N, Szoke J et al: An integrated genomic analysis <strong>of</strong> lung cancer reveals loss<br />

<strong>of</strong> DUSP4 in EGFR-mutant tumors. Oncogene 2009, 28(31):2773-2783.<br />

28. Kendall J, Liu Q, Bakleh A, Krasnitz A, Nguyen KC, Lakshmi B, Gerald WL, Powers S,<br />

Mu D: Oncogenic cooperation and coamplification <strong>of</strong> developmental transcription<br />

factor genes in lung cancer. Proc Natl Acad Sci U S A 2007, 104(42):16663-16668.<br />

29. Garnis C, Lockwood WW, Vucic E, Ge Y, Girard L, Minna JD, Gazdar AF, Lam S,<br />

MacAulay C, Lam WL: High resolution analysis <strong>of</strong> non-small cell lung cancer cell<br />

lines by whole genome tiling path array CGH. Int J Cancer 2006, 118(6):1556-1564.<br />

30. Borczuk AC, Gorenstein L, Walter KL, Assaad AA, Wang L, Powell CA: Non-small-cell<br />

lung cancer molecular signatures recapitulate lung developmental pathways. Am J<br />

Pathol 2003, 163(5):1949-1960.<br />

31. Cooper CS, Nicholson AG, Foster C, Dodson A, Edwards S, Fletcher A, Roe T, Clark J,<br />

Joshi A, Norman A et al: Nuclear overexpression <strong>of</strong> the E2F3 transcription factor in<br />

human lung cancer. Lung Cancer 2006, 54(2):155-162.<br />

32. Goransson H, Edlund K, Rydaker M, Rasmussen M, Winquist J, Ekman S, Bergqvist M,<br />

Thomas A, Lambe M, Rosenquist R et al: Quantification <strong>of</strong> normal cell fraction and<br />

copy number neutral LOH in clinical lung cancer samples using SNP array data.<br />

PLoS One 2009, 4(6):e6057.<br />

109


33. Garnis C, Coe BP, Lam SL, MacAulay C, Lam WL: High-resolution array CGH<br />

increases heterogeneity tolerance in the analysis <strong>of</strong> clinical samples. Genomics<br />

2005, 85(6):790-793.<br />

34. Raghavan M, Lillington DM, Skoulakis S, Debernardi S, Chaplin T, Foot NJ, Lister TA,<br />

Young BD: Genome-wide single nucleotide polymorphism analysis reveals<br />

frequent partial uniparental disomy due to somatic recombination in acute<br />

myeloid leukemias. Cancer Res 2005, 65(2):375-378.<br />

35. Haruta M, Arai Y, Sugawara W, Watanabe N, Honda S, Ohshima J, Soejima H,<br />

Nakadate H, Okita H, Hata J et al: Duplication <strong>of</strong> paternal IGF2 or loss <strong>of</strong> maternal<br />

IGF2 imprinting occurs in half <strong>of</strong> Wilms tumors with various structural WT1<br />

abnormalities. Genes Chromosomes Cancer 2008, 47(8):712-727.<br />

36. Bjornsson HT, Albert TJ, Ladd-Acosta CM, Green RD, Rongione MA, Middle CM,<br />

Irizarry RA, Broman KW, Feinberg AP: SNP-specific array-based allele-specific<br />

expression analysis. Genome Res 2008, 18(5):771-779.<br />

37. Gimelbrant A, Hutchinson JN, Thompson BR, Chess A: Widespread monoallelic<br />

expression on human autosomes. Science 2007, 318(5853):1136-1140.<br />

38. Palacios R, Gazave E, Goni J, Piedrafita G, Fernando O, Navarro A, Villoslada P:<br />

Allele-specific gene expression is widespread across the genome and biological<br />

processes. PLoS One 2009, 4(1):e4150.<br />

39. Zhang K, Li JB, Gao Y, Egli D, Xie B, Deng J, Li Z, Lee JH, Aach J, Leproust EM et al:<br />

Digital RNA allelotyping reveals tissue-specific and allele-specific gene<br />

expression in human. Nat Methods 2009, 6(8):613-618.<br />

40. Alvarez CJ, Lodeiro M, Theodoropoulou M, Camina JP, Casanueva FF, Pazos Y:<br />

Obestatin stimulates Akt signalling in gastric cancer cells through beta-arrestinmediated<br />

epidermal growth factor receptor transactivation. Endocr Relat Cancer<br />

2009, 16(2):599-611.<br />

41. Dittmer S, Sahin M, Pantlen A, Saxena A, Toutzaris D, Pina AL, Geerts A, Golz S,<br />

Methner A: The constitutively active orphan G-protein-coupled receptor GPR39<br />

protects from cell death by increasing secretion <strong>of</strong> pigment epithelium-derived<br />

growth factor. J Biol Chem 2008, 283(11):7074-7081.<br />

42. Lo M, Ling V, Wang YZ, Gout PW: The xc- cystine/glutamate antiporter: a mediator<br />

<strong>of</strong> pancreatic cancer growth with a role in drug resistance. Br J Cancer 2008,<br />

99(3):464-472.<br />

43. Guan J, Lo M, Dockery P, Mahon S, Karp CM, Buckley AR, Lam S, Gout PW, Wang YZ:<br />

The xc- cystine/glutamate antiporter as a potential therapeutic target for small-cell<br />

lung cancer: use <strong>of</strong> sulfasalazine. Cancer Chemother Pharmacol 2009, 64(3):463-<br />

472.<br />

44. Wang Z, Kong D, Li Y, Sarkar FH: PDGF-D signaling: a novel target in cancer<br />

therapy. Curr Drug Targets 2009, 10(1):38-41.<br />

45. Kinameri E, Inoue T, Aruga J, Imayoshi I, Kageyama R, Shimogori T, Moore AW: Prdm<br />

proto-oncogene transcription factor family expression and interaction with the<br />

Notch-Hes pathway in mouse neurogenesis. PLoS One 2008, 3(12):e3859.<br />

46. Rhead B, Karolchik D, Kuhn RM, Hinrichs AS, Zweig AS, Fujita PA, Diekhans M, Smith<br />

KE, Rosenbloom KR, Raney BJ et al: The UCSC Genome Browser database: update<br />

2010. Nucleic Acids Res 2010, 38(Database issue):D613-619.<br />

110


<strong>Chapter</strong> 5: Integrating the multiple dimensions <strong>of</strong> genomic and<br />

epigenomic landscapes <strong>of</strong> cancer 4<br />

4 Sections 5.1 to 5.3, 5.4.1 to 5.4.4, 5.5 and 5.6 <strong>of</strong> this chapter has been published. Chari R,<br />

Thu KL, Wilson IM, Lockwood WW, Lonergan KM, Coe BP, Mall<strong>of</strong>f CA, Gazdar AF, Lam S,<br />

Garnis C, MacAulay CE, Alvarez CE, Lam WL. (2010) Integrating the multiple dimensions <strong>of</strong><br />

genomic and epigenomic landscapes <strong>of</strong> cancer. Cancer and Metastasis Reviews, 29(1):73-93.<br />

doi: 10.1007/s10555-010-9199-2. Sections 5.4.5 to 5.4.6 were not published previously.<br />

111


5.1 Introduction<br />

In the past decade, advancements in genome pr<strong>of</strong>iling technologies have greatly improved our<br />

ability to understand the landscape <strong>of</strong> cancer genomes. From the emergence <strong>of</strong> array based<br />

comparative genomic hybridization (CGH) and spectral karyotyping (SKY) to the current state <strong>of</strong><br />

next generation sequencing (NGS), the improvement in resolution at which the genome can be<br />

described has been over a million fold [1-6]. Likewise, the recent development <strong>of</strong> integrative<br />

platforms to relate multiple dimensions <strong>of</strong> DNA features (such as copy number, allelic status,<br />

sequence mutations, and DNA methylation) to gene expression pattern, has dramatically<br />

improved our ability to identify causal genetic events and decipher their downstream<br />

consequences in the context <strong>of</strong> gene networks and biological functions [7, 8] (Table 5.1).<br />

Landmark events in cancer genomics, from the launch <strong>of</strong> Cancer Genome Anatomy Project at<br />

the beginning <strong>of</strong> the decade to the recent publications <strong>of</strong> complete cancer genome sequences,<br />

are highlighted in Figure 5.1 [3-6, 8-43].<br />

Multiple levels <strong>of</strong> genetic and epigenetic disruption are instrumental to cancer development,<br />

whereby specific genes may be altered by a variety <strong>of</strong> mechanisms. For example, the tumor<br />

suppressor CDKN2A can be inactivated through copy number loss, DNA hypermethylation, or<br />

sequence mutation. These mechanisms <strong>of</strong> disruption can occur in a tumor-specific manner or,<br />

may occur concurrently in the same tumor, i.e. a two hit scenario. Moreover, in the former<br />

situation, if a given gene or pathway's frequency <strong>of</strong> alteration is low when examined by one<br />

mechanism or dimension, it is likely the gene/pathway would be overlooked by the analysis.<br />

However, when multiple dimensions <strong>of</strong> disruption are considered in the analyses, alteration <strong>of</strong><br />

the gene in question may be detected at a high frequency, albeit at low frequencies by any one<br />

mechanism. This illustrates the need for and the benefit <strong>of</strong> integrative analytical approaches. In<br />

this article, we discuss the impact <strong>of</strong> multi-dimensional genomic analyses on our view <strong>of</strong> the<br />

cancer genome landscape, and the contribution <strong>of</strong> such new knowledge to our understanding <strong>of</strong><br />

cancer progression and metastasis.<br />

112


5.2 Genomic alterations<br />

5.2.1 Chromosomal aberrations<br />

Chromosomal aberrations and rearrangements, such as translocations and gains/losses <strong>of</strong><br />

whole or portions <strong>of</strong> chromosome arms are detected through direct examination using molecular<br />

cytogenetic techniques such as G-banding, SKY, fluorescence in situ hybridization (FISH) and<br />

CGH [2, 44-48]. The manifestation <strong>of</strong> such alterations are generally attributed to mitotic errors,<br />

where centrosomal aberrations and telomere dysfunction play key causative roles [49-53].<br />

Aberrations such as gains and losses have been further refined using technologies such as<br />

microarray CGH (see below). While primarily associated with different types <strong>of</strong> leukemia and<br />

lymphomas, recent genomic studies have identified translocations in epithelial tumors such as<br />

prostate and lung cancer [54-61]. A compilation <strong>of</strong> cumulative cytogenetic data from three main<br />

sources - NCI/NCBI SKY/M-FISH & CGH Database, NCI Mitelman Database <strong>of</strong> Chromosome<br />

Aberrations in Cancer, and NCI Recurrent Aberrations in Cancer – are now integrated into<br />

NCBI's Entrez system as Cancer Chromosomes [62] (Table 5.2).<br />

5.2.2 Gene dosage, allelic imbalance, mutational status<br />

Gene dosage. Genomic DNA copy number alterations are a prominent mechanism <strong>of</strong> gene<br />

disruption that contributes to tumor development [63]. Segmental amplification may lead to an<br />

increase in gene and protein expression <strong>of</strong> oncogenes, while deletions may lead to<br />

haploinsufficiency or the loss <strong>of</strong> expression <strong>of</strong> tumor suppressor genes. Since its development<br />

in the mid 1990s, advances in microarray-based CGH technology have dramatically increased<br />

genome coverage and target density, improving both the resolution and sensitivity <strong>of</strong> detection<br />

<strong>of</strong> copy number alterations [64, 65]. The first genome-wide array CGH analysis utilized cDNA<br />

microarrays originally designed for gene expression pr<strong>of</strong>iling [66]. Since these first<br />

experiments, whole genome tiling path arrays with tens <strong>of</strong> thousands <strong>of</strong> bacterial artificial<br />

chromosome (BAC) clones, oligonucleotide (25-80 bp nucleotide probes) and single nucleotide<br />

113


polymorphism (SNP) arrays with over one million DNA elements and the essential<br />

bioinformatics tools for visualization and analysis <strong>of</strong> high density array CGH data have been<br />

developed (Figure 5.1) [7, 33, 67-71]. These innovations have enabled increasingly precise<br />

mapping <strong>of</strong> the boundaries and magnitude <strong>of</strong> genetic alterations throughout the genome in a<br />

single experiment, greatly increasing our understanding <strong>of</strong> the cancer genome landscape in the<br />

context <strong>of</strong> DNA copy number [33, 72-76]. While early attempts have been made utilizing<br />

sequence-based approaches [77-80], recent studies have begun to illustrate the improvement in<br />

detection resolution through the advances in high throughput sequencing technologies [6, 11,<br />

13, 14]. The popularity <strong>of</strong> genome sequencing will depend on further cost reduction in data<br />

generation and major advancements in analysis [81].<br />

Copy number variation. The discovery <strong>of</strong> a vast abundance <strong>of</strong> germ line segmental DNA copy<br />

number variation (CNV) in the normal human population has not only provided a baseline for<br />

interpretation <strong>of</strong> cancer genome data, but also highlighted the need for comparison against<br />

paired normal tissue [18, 19, 31, 32, 82-89]. Moreover, it has been shown that many <strong>of</strong> the<br />

reported CNVs overlap with loci involved with sensory perception and more importantly, disease<br />

susceptibility. While the role <strong>of</strong> CNV in cancer is not well understood, a recent study showed<br />

that these regions are more susceptible to genomic rearrangement and may initiate subsequent<br />

alterations during tumorigenesis [90]. Moreover, CNV at 1q21.1 was recently shown to be<br />

associated with neuroblastoma and implicated NBPF23, a new member <strong>of</strong> the Neuroblastoma<br />

Breakpoint Family, in tumorigenesis [91]. A database <strong>of</strong> all known CNVs is available at<br />

http://projects.tcag.ca/variation [31]. In addition, as copy number pr<strong>of</strong>iles <strong>of</strong> cancer genomes<br />

accumulate, hotspots for amplification and deletion are becoming evident, and signature<br />

alterations associated with specific diseases and cancer histologic subtypes are emerging [92-<br />

96]. The manifestation <strong>of</strong> “oncogene addiction” through lineage specific DNA amplification is a<br />

case in point [38, 39, 97-100].<br />

114


Allelic status. Single nucleotide polymorphism (SNP) arrays are best known for their application<br />

in genome wide association studies (GWAS), where the correlation <strong>of</strong> haplotype with phenotype<br />

implicates disease susceptibility [101, 102]. SNP array platforms have shown tremendous<br />

advances in resolution, with the number <strong>of</strong> SNPs that can be simultaneously measured<br />

increased by 1000-fold since initial development. Currently, for example, the Affymetrix SNP<br />

6.0 array platform measures 1.8 million elements representing 906,600 SNP elements and ><br />

946,000 CNV elements. Likewise, on the Illumina HumanOmni1 platform, over 1,000,000 sites<br />

(representing a mixture <strong>of</strong> SNP and CNV elements) can be simultaneously assessed. In<br />

addition to their application in GWAS, SNP arrays can also be used to detect somatic<br />

alterations and when applied in this context, can allow for the simultaneous detection <strong>of</strong> copy<br />

number alteration and allele imbalance in tumor genomes. In the example in Figure 5.2, when<br />

the SNP array pr<strong>of</strong>ile <strong>of</strong> a lung cancer genome is compared against that <strong>of</strong> its paired non-<br />

cancerous lung tissue, it is not only possible to distinguish regions <strong>of</strong> allelic balanced copy<br />

neutrality (Figure 5.2a) from allelic imbalance (Figure 5.2b, 5.2c), but also regions <strong>of</strong> allelic<br />

imbalance due to segmental DNA copy number alteration (Figure 5.2b) from those without<br />

change in total copy number (Figure 5.2c).<br />

Mutational pr<strong>of</strong>iling and whole genome sequencing. In cancer, oncogenes are thought to<br />

harbor mutations which lead to increased protein expression or constitutive protein activation<br />

while tumor suppressor genes are thought to harbor mutations which are inactivating, either<br />

through total loss <strong>of</strong> protein expression or expression <strong>of</strong> mutant, non-functional protein. In<br />

addition, activating and inactivating mutations can also be accompanied by changes in gene<br />

dosage or allele status (see below). Traditionally, mutation screening has been focused on<br />

specific oncogene and tumor suppressor loci. With the availability <strong>of</strong> newer and cheaper<br />

sequencing technologies [103], recent studies have expanded from single gene analyses to<br />

genome-wide screens [6, 11, 13, 14, 104]. For example, in studies using small cell lung cancer<br />

and melanoma cell lines, tens <strong>of</strong> thousands <strong>of</strong> somatic mutations were identified in each cell<br />

line, with a proportion <strong>of</strong> these mutations being attributed to cigarette smoke (G to T<br />

115


substitutions) and UV exposure (C to T), respectively [4, 5]. It will be interesting to see if other<br />

cancers have such mutation signatures. Another observation made in both studies was that the<br />

uneven distribution <strong>of</strong> mutations suggests that DNA sequence integrity is largely maintained by<br />

transcription-associated DNA repair. While these and future studies will uncover a vast number<br />

<strong>of</strong> mutations, the contribution <strong>of</strong> those mutations to tumorigenesis will need to be determined<br />

[105, 106].<br />

5.2.3 Genomic landscape: Gains, losses and uniparental disomy<br />

Individually, the study <strong>of</strong> genomic dimensions has yielded a global description <strong>of</strong> cancer<br />

genomes in terms <strong>of</strong> gene dosage, allelic status and somatic mutation. Collectively, however,<br />

the integration <strong>of</strong> these three dimensions has brought two concepts to the forefront: allele<br />

specific copy number alterations and uniparental disomy (UPD) (Figure 5.2). Typically, the<br />

relationship between somatic mutation and allele specific copy number alterations have been<br />

associated with tumor suppressor genes (e.g. RB1 and TP53) whereby mutation is combined<br />

with loss to achieve bi-allelic inactivation [107, 108]. However, recent studies have shown<br />

preferential amplification <strong>of</strong> alleles encoding mutated oncogenes as well [109-114]. In non-<br />

small cell lung cancer, mutant allele specific imbalance (MASI) is frequently present in mutant<br />

EGFR and KRAS tumor cells, and is associated with increased mutant allele transcription and<br />

gene activity [114].<br />

UPD is the presence <strong>of</strong> two copies <strong>of</strong> a chromosome segment from one parent, and the<br />

absence <strong>of</strong> that DNA from the other parent. Somatic UPD, also known as copy neutral LOH<br />

(CNLOH), results in loss <strong>of</strong> heterozygosity (tumor versus normal), without a change in total DNA<br />

copy number [115-117]. UPD is observed at tumor suppressor gene loci whereby upon loss <strong>of</strong><br />

the wild type allele, the mutated allele is duplicated resulting in a diploid state with homozygous<br />

mutation <strong>of</strong> the target gene [118]. Interestingly, UPD events are also detected at mutated<br />

oncogenes [114, 119-121]. Until recently, due to limitations in the resolution <strong>of</strong> genomic array<br />

platforms, the prevalence <strong>of</strong> this event has been widely underestimated and underappreciated.<br />

116


Recent studies have shown that UPD events are frequently observed in tumor genomes, with<br />

most <strong>of</strong> the findings reported from hematological malignancies [122-131]. Our genome wide<br />

analysis <strong>of</strong> segmental gain, loss and UPD in the T47D breast cancer cell line genome identified<br />

that a significant portion <strong>of</strong> the genome exhibits UPD, rivaling the proportion <strong>of</strong> the genome<br />

affected by segmental gain and loss, and highlighting the potential <strong>of</strong> UPD as a prominent<br />

mechanism <strong>of</strong> gene disruption in epithelial cancer (Figure 3). Interestingly, PIK3CA and TP53<br />

mutations in T47D are noted in the Catalogue <strong>of</strong> Somatic Mutations in Cancer [132]. Integrative<br />

analysis at these loci detected copy number increase at PIK3CA and copy number loss at TP53<br />

illustrating the MASI concept described above (Figure 3).<br />

Somatic UPD also exists at genes without mutation. The potential significance <strong>of</strong> this somatic<br />

event is not readily apparent, but it raises the intriguing possibility <strong>of</strong> allelic conversion <strong>of</strong><br />

epigenetic status [117, 122, 133].<br />

5.3 Epigenomic alterations<br />

5.3.1 The cancer methylome<br />

Abnormal DNA methylation patterns occur in cancer, whereby focal hypermethylation at many<br />

CpG islands is evident in a background <strong>of</strong> global DNA hypomethylation [134-137]. Broad<br />

hypomethylation may lead to genomic instability, while hypermethylation <strong>of</strong> CpG islands<br />

silences transcription <strong>of</strong> specific genes [136, 138-140]. Non-random methylation <strong>of</strong> multiple<br />

CpG islands observed in colon cancer led to the discovery <strong>of</strong> CpG island methylator phenotype<br />

(CIMP), which is causally linked to microsatellite instability via silencing <strong>of</strong> the mismatch repair<br />

gene, MLH1 [141-143].<br />

The determination <strong>of</strong> DNA methylation status relies on the ability to discriminate between<br />

methylated and unmethylated cytosines. This is achieved by exploiting methylation-<br />

sensitive/insensitive isoschizomer restriction-enzyme pairs [144-150], chemical conversion <strong>of</strong><br />

unmethylated cytosine to uracil [151-156], and the affinity for methylated DNA <strong>of</strong> specially<br />

117


developed antibodies and methylated-DNA binding proteins [24, 157-163]. Several<br />

computational methods have been developed for deriving approximations <strong>of</strong> actual methylation<br />

levels from the relative levels generated by most microarray and locus specific sequencing<br />

assays [147, 162, 164, 165]. However, it is important to note that CpG targets represented on<br />

microarrays may or may not be the only elements controlling gene expression. Recently, it was<br />

shown that in the human colon cancer methylome sequences up to 2 kb away from CpG<br />

islands, termed CpG shores, exhibited more methylation than CpG islands and had greater<br />

influence on gene expression than CpG islands [166]. Furthermore, while excess promoter<br />

methylation is typically associated with transcriptional repression, the loss <strong>of</strong> required<br />

methylation within gene bodies, proximal to promoters, can have the same effect [167]. DNA<br />

methylation <strong>of</strong> epigenetic neighborhoods in the megabase size range has also been reported<br />

[168]. Validation <strong>of</strong> methylation-mediated control <strong>of</strong> gene-specific expression, and evaluation <strong>of</strong><br />

biological significance, can be achieved via pharmacologic manipulation <strong>of</strong> DNA methylation, for<br />

example by 5-azacytidine treatment, to relieve methylation silencing and invoke re-expression<br />

[20, 169].<br />

The first single-base-resolution maps <strong>of</strong> the human methylome have recently been generated<br />

by sequencing <strong>of</strong> bisulfite converted DNA from human embryonic stem cells and fetal fibroblasts<br />

[12, 170]. This landmark study will greatly advance the analysis <strong>of</strong> DNA methylation by<br />

providing whole genome reference maps <strong>of</strong> methylation in these specific cells. However, it is<br />

well known that DNA methylation is tissue specific and that it changes throughout development<br />

thus, methylome maps for all tissues at various stages <strong>of</strong> development may be necessary to<br />

provide adequate maps <strong>of</strong> 'normal' methylation patterns for use in deciphering aberrant<br />

methylation patterns characteristic <strong>of</strong> tumors [171-176]. In recognition <strong>of</strong> this, the Human<br />

Epigenome Project was launched in 2004 to map the methylomes <strong>of</strong> all major human tissues<br />

[177].<br />

118


5.3.2 Integration <strong>of</strong> cancer genomic and epigenomic events<br />

DNA methylation and genomic instability. Cancer-specific aberrant DNA methylation is<br />

associated with reduced genomic stability and subsequent copy number alterations, including<br />

preferential loss <strong>of</strong> certain imprinted alleles (LOI) [178-184]. Mechanistically, this instability may<br />

be related to the susceptibility <strong>of</strong> hypomethylated DNA to undergo inappropriate recombination<br />

events [185]. Another mechanism known to negatively impact genomic integrity in lung cancer<br />

is the relaxation <strong>of</strong> transposable element control that is mediated by DNA methylation [186-190].<br />

DNA hypomethylation and DNA amplification. Preliminary evidence <strong>of</strong> specific<br />

demethylation <strong>of</strong> somatic segmental amplifications (or amplicons) has been put forth in lung<br />

cancer, perhaps representing a novel mechanism <strong>of</strong> aberrant oncogene activation [189, 191].<br />

Further studies using large-scale sequencing <strong>of</strong> bisulfite treated DNA will help to clarify this<br />

phenomenon [12]. Hypomethylation has also been implicated in the formation <strong>of</strong> specific copy<br />

number alterations in glioblastoma multiforme [192]. One potentially interesting application for<br />

DNA methylation pr<strong>of</strong>iling <strong>of</strong> cancer amplicons such as these, is in the discrimination between<br />

"driver" and "passenger" genes within the amplified sequence. It may be that DNA methylation<br />

within the promoters or gene bodies <strong>of</strong> these genes is responsible for the lack <strong>of</strong> uniform<br />

overexpression <strong>of</strong> genes residing within amplicons.<br />

DNA hypermethylation and copy number loss. The relationship between DNA<br />

hypermethylation and allelic loss is well documented. Tumor suppressor genes are frequently<br />

found in regions <strong>of</strong> common LOH, and these same TSGs are frequently found to be<br />

hypermethylated, perhaps best exemplified by the FHIT gene on chromosome 3p [193].<br />

Although it is unclear whether loss or hypermethylation occurs first, both are known to be very<br />

early events in tumorigenesis preceding any histologic alterations [194-196]. With the advent <strong>of</strong><br />

high resolution genome-wide technologies it has become possible to comprehensively search<br />

for genes that are inactivated by both mechanisms simultaneously [197].<br />

119


Histone modification states. While DNA methylation and gene dosage pr<strong>of</strong>iling technologies<br />

have become accessible, technologies for global assays <strong>of</strong> other key epigenetic marks including<br />

histone modifications are not widely available. One <strong>of</strong> the main challenges to conducting the<br />

highest quality studies <strong>of</strong> genome wide chromatin-immunoprecipitation on microarray (ChIP-<br />

chip) or on sequencing platform (ChIP-seq) experiments is the requirement <strong>of</strong> high quality DNA<br />

from pure cells – which essentially means growing cells in culture. It is thus difficult to analyze<br />

these dimensions from clinical specimens. However, much has been learned from studies <strong>of</strong><br />

the relationship between different histone modification states and transcriptional activation or<br />

repression in model systems. Such examples utilizing ChIP-chip include: cell or context<br />

specific histone modification patterns related to cell or context specific gene expression; histone<br />

3 lysine 27 (H3K27) trimethylation patterns associated with prostate, lung and breast cancers;<br />

and H3K9 and H3K79 modification patterns in leukemia [198-204]. Examples utilizing ChIP-seq<br />

include: the analysis <strong>of</strong> the growth inhibition program <strong>of</strong> the androgen receptor, and the<br />

chromatic interaction network <strong>of</strong> the estrogen receptor [205, 206].<br />

5.4 Relating genetic and epigenetic events to changes in the<br />

transcriptome through integrative analysis<br />

Aberrations in individual genetic or epigenetic dimensions are prominent across various cancer<br />

types, culminating in changes to the transcriptome. However, for a given gene, most <strong>of</strong> the<br />

events documented previously, such as copy number amplification, homozygous deletion,<br />

somatic mutation, or DNA hypermethylation, do not occur in 100% <strong>of</strong> tumors for a given cancer<br />

type. Moreover, it has been observed that the same gene may be activated or inactivated by<br />

different mechanisms. Since most <strong>of</strong> the studies described above analyzed single DNA<br />

dimensions, it is likely many genes would be overlooked due to a low frequency <strong>of</strong> alteration in<br />

a single dimension; the same gene may be detected at a high frequency when multiple<br />

dimensions are considered. Thus, analysis <strong>of</strong> more dimensions may reveal higher frequency<br />

120


gene-specific disruption with corresponding transcriptome aberrations for particular cancer<br />

types, as would be expected for genes causative to cancer development.<br />

5.4.1 Multiple mechanisms <strong>of</strong> gene disruption<br />

Expression pr<strong>of</strong>iling studies have been instrumental in detecting genes dysregulated in cancer<br />

[207-209]. However, aberrant expression <strong>of</strong> some genes may simply reflect incidental genome<br />

instability or secondary dysregulation. Global gene expression pr<strong>of</strong>iling alone may not<br />

distinguish causal events and bystander changes. One <strong>of</strong> the first studies to relate gene<br />

expression changes with gene dosage status on a global scale was a parallel analysis <strong>of</strong> DNA<br />

and mRNA [66, 210]. The same cDNA microarray platform was used to investigate impact <strong>of</strong><br />

DNA copy number alterations on the expression <strong>of</strong> over 6,500 genes. This study determined<br />

that 62% <strong>of</strong> genes located within regions <strong>of</strong> DNA amplification showed elevated expression in<br />

breast cancer. Subsequent studies in other cancer types revealed a broad range in the<br />

correlation between increased gene dosage and expression levels for protein coding genes<br />

(19% to 62%) [92, 207, 210-213]. Studies integrating gene dosage and gene expression have<br />

identified cancer subtype-specific pathway activation and signatures associated with clinical<br />

outcome [96, 214-217]. In addition, when examining known disease-relevant pathways, it has<br />

been shown that even though individual components <strong>of</strong> a pathway are disrupted at a low<br />

frequency, collectively, these alterations can result in frequent disruption <strong>of</strong> a given pathway [16,<br />

92]. Similarly, alterations in DNA methylation or histone modification status can also affect gene<br />

expression and have subsequent pathway level consequences (see above).<br />

5.4.2 Multiple mechanisms <strong>of</strong> disrupting non-coding RNA levels<br />

Segmental DNA copy number alterations also affect the expression <strong>of</strong> non-coding RNAs<br />

(ncRNA) [218-222]. MicroRNAs (miRNA) have been shown to have a significant role in cancer<br />

development with specific miRNAs implicated in a number <strong>of</strong> different cancer types [26, 223-<br />

225]. Specific miRNA expression signatures are associated with critical steps in tumor initiation<br />

and development including cell hyperproliferation, angiogenesis, tumor formation and<br />

121


metastasis [226]. High throughput analysis <strong>of</strong> microRNAs has been <strong>of</strong> interest and microarrays<br />

have been developed to assess essentially all annotated microRNAs. To date, >700 miRNAs<br />

have been annotated in the genome (http://mirdb.org/miRDB/statistics.html, [227]), with more<br />

likely to be discovered. For example, we recently demonstrated that a deletion on chromosome<br />

5q leads to the reduced expression <strong>of</strong> two miRNAs that are abundant in hematopoietic<br />

stem/progenitor cells. This study revealed haploinsufficiency and reduced expression <strong>of</strong> miR-<br />

145 and miR-146a as mediators <strong>of</strong> a subtype <strong>of</strong> myelodysplastic syndrome [221]. Although the<br />

genomic loss and underexpression implicates a tumor-suppressive role for these specific<br />

miRNAs, others undergo activating genomic alterations and elevated expression and hence are<br />

thought to be oncogenic [228, 229].<br />

Just as copy number alterations can alter miRNA activity, epigenetic alterations have also been<br />

shown to affect miRNA expression [230-232]. Aberrant methylation <strong>of</strong> miRNAs has been<br />

reported in a variety <strong>of</strong> cancer types, and the disruption <strong>of</strong> epigenetically-mediated miRNA<br />

control has been shown to have oncogenic effects due to downstream gene deregulation [233].<br />

For example, abnormal DNA methylation <strong>of</strong> miRNAs has been associated with tumor<br />

metastasis, leading to the appreciation <strong>of</strong> a group <strong>of</strong> metastasis-related miRNAs [229].<br />

5.4.3 Multi-dimensional integration <strong>of</strong> genome, epigenome, and transcriptome<br />

Large scale initiatives. Since multiple genomic/epigenomic mechanisms can influence gene<br />

expression and lead to disruption <strong>of</strong> a given function, an integrative multi-dimensional analysis<br />

is necessary for a more comprehensive understanding <strong>of</strong> the cancer phenotype (Figure 4).<br />

Specific programs and initiatives such as those by The Cancer Genome Atlas (TCGA) project<br />

and the cancer Biomedical Informatics Grid (caBIG) enable parallel and multi-dimensional<br />

analysis <strong>of</strong> cancer genomes [8, 16] (Table 5.2). Recently, studies in glioblastoma and<br />

osteosarcoma have shown that integrative genomic and epigenomic approaches can indeed<br />

reveal the specific genetic pathways involved in different cancers [16, 234].<br />

122


Gene disruption by multiple mechanisms. One <strong>of</strong> the two key reasons for using an<br />

integrative approach is the ability to detect critical genes that are disrupted by multiple<br />

mechanisms across a sample set, but are disrupted at a low frequency by any one mechanism.<br />

These genes would have been overlooked in previous, single dimensional studies. The second<br />

key advantage <strong>of</strong> integrative approaches is the ability to identify genes that are simultaneously<br />

disrupted by multiple mechanisms -- two hits -- in a single sample. Using a dataset comprised<br />

<strong>of</strong> DNA copy number, allelic status, DNA methylation, and gene expression pr<strong>of</strong>iles from ten<br />

lung adenocarcinomas and matched non-malignant tissue controls, we illustrate these benefits<br />

below.<br />

If gene expression changes are a consequence <strong>of</strong> alterations at the DNA level, then a higher<br />

proportion <strong>of</strong> the observed expression changes can be directly attributed to a defined causal<br />

event when multiple types <strong>of</strong> DNA alterations are examined (Figure 5.5a). While some<br />

samples have over 70% <strong>of</strong> the expression associated with DNA level changes (Sample 7,<br />

Sample 8), other samples have only 30% (Sample 5, Sample 9). Additionally, consequential to<br />

associating more gene expression changes with DNA level changes within a sample, more<br />

disrupted genes are detected, and in turn, more disrupted pathways are identified across a<br />

sample set (Figure 5.5b, 5.5c). In fact, in our example, nearly five times as many genes<br />

(~1100 compared to ~200) are detected as disrupted in at least 50% <strong>of</strong> the samples when we<br />

account for multiple mechanisms <strong>of</strong> disruption (vs. one mechanism alone) (Figure 5.5c). This<br />

result illustrates that without using an integrative approach, many potentially important genes<br />

would be dismissed as they are disrupted by low frequency events when a single DNA<br />

dimension is analyzed. This also holds true at the pathway level when the identified genes are<br />

grouped based on their biological function (Figure 5.5d). For example, the Hepatic<br />

Fibrosis/Hepatic Stellate Cell Activation pathway and the RAR Activation pathway, which are<br />

identified when all DNA dimensions are considered, would not be detected as significantly<br />

altered when using individual DNA dimensions alone.<br />

123


Implications on sample size requirements. In the example above, we illustrate that a<br />

significant number <strong>of</strong> genes and pathways exhibit a low frequency <strong>of</strong> disruption when examining<br />

single dimensions (and thus would be overlooked) but, indeed exhibit a high frequency <strong>of</strong><br />

disruption when multiple dimensions are considered (Figure 5.5). Notably, these findings imply<br />

that integrative multi-dimensional analysis <strong>of</strong> individual samples may directly impact the cohort<br />

sample size required for gene discovery on the basis <strong>of</strong> frequency <strong>of</strong> disruption (Figure 5.5e).<br />

Reduction in sample size requirements means that one can extend this approach to situations<br />

involving rare specimens where accrual <strong>of</strong> hundreds <strong>of</strong> samples in a reasonable timeframe is<br />

not possible. Moreover, reduced sample sizes are particularly applicable to familial cancers or<br />

to isolated populations at increased risk for specific cancers.<br />

Bi-allelic gene disruption. Two-hit bi-allelic inactivation <strong>of</strong> genes and high level gene<br />

amplifications are typically considered to be causal mechanisms that inflict gene expression<br />

changes. When examining multiple DNA dimensions, concerted bi-allelic disruption <strong>of</strong> a gene in<br />

the same sample can be readily identified; copy number loss with hypermethylation resulting in<br />

underexpression, or copy number gain with hypomethylation and overexpression are examples.<br />

Indeed, we do identify genes harboring concerted disruptions using the same lung<br />

adenocarcinoma dataset mentioned above. The MUC1 locus exhibits concurrent copy number<br />

increase with hypomethylation and overexpression (Figure 5.4). MUC1 has previously been<br />

shown to be important in lung and breast cancers and is currently a target for therapeutic<br />

intervention [235-237]. Collectively, we have demonstrated how an integrative, multi-<br />

dimensional approach can be utilized for cancer gene and pathway discovery.<br />

5.4.4 Disruption <strong>of</strong> multiple components in biological pathways<br />

We described above how an integrative, multi-dimensional approach improves the detection <strong>of</strong><br />

disrupted genes, especially those affected by multiple low frequency mechanisms. This<br />

concept can be extended to identify biological pathways, where multiple pathway components<br />

are disrupted at low frequencies (see above; Figure 5.5d). The EGFR signaling pathway is a<br />

124


well documented dysregulated component <strong>of</strong> lung cancer. Using the same multi-dimensional<br />

pr<strong>of</strong>iling dataset from Figure 5 above, seven genes were detected with gene dosage alteration<br />

at a frequency ≥30%. However, when we considered alterations in gene dosage, allelic status,<br />

DNA methylation and somatic mutation collectively (for KRAS and EGFR only), 18 genes in the<br />

pathway were identified to be altered at ≥30% frequency (Figure 5.6). The detection <strong>of</strong> the<br />

additional 11 genes illustrates the benefit <strong>of</strong> employing an integrative approach and extends the<br />

sample size reduction argument to the pathway level.<br />

5.4.5 Identification <strong>of</strong> a novel gene involved with EGFR signaling deregulated in<br />

adenocarcinoma<br />

In the section above, I have shown that more <strong>of</strong> the well known components are frequently<br />

altered when we examine multiple DNA dimensions as opposed to a single DNA dimension,<br />

such as DNA copy number, alone. When this analysis is expanded to include more genes<br />

based on literature evidence, I found that the most frequently disrupted gene is signal-regulatory<br />

protein alpha (SIRPA) (Figure 5.7).<br />

SIRPA has been shown to be down-regulated when EGFR is activated in glioblastoma and up-<br />

regulated when EGFR is suppressed [238, 239]. SIRPA has also shown to be a tumor<br />

suppressor gene in multiple cancer types including liver and breast cancer [240, 241].<br />

Moreover, in the resting lung, SIRPA has been thought to modulate the inflammatory response<br />

through SHP-1 (also known as PTPN6) and eventually, NFKB [242]. While most studies have<br />

documented the association <strong>of</strong> SIRPA with SHP-2 (also known as PTPN11) [243-245], few<br />

studies have shown the association <strong>of</strong> SIRPA with SHP-1.<br />

To discern the association <strong>of</strong> SIRPA with SHP-1 and SHP-2, mutual information network<br />

analysis was utilized [246, 247]. Briefly, using our gene expression dataset and a publicly<br />

available dataset [248], Affymetrix exon array datasets were normalized separately using the<br />

aroma.affymetrix package (Bengtsson et al 2008 Berkeley). Subsequently, each dataset was<br />

125


analyzed using the "minet" package in the Bioconductor suite in R [247, 249]. From these<br />

analysis, for each gene, a score between each gene and every other gene is calculated. The<br />

top 5% <strong>of</strong> gene-gene interactions (based on the score) from each dataset were retained and<br />

those interactions which were in the top 5% <strong>of</strong> both analyses were retained. Finally, gene-gene<br />

interactions involving SIRPA were extracted, resulting in a total <strong>of</strong> 310 genes found to highly<br />

correlate with SIRPA expression (Table 5.3). Within this list <strong>of</strong> genes, PTPN6 was present and<br />

PTPN11 was not, suggesting that SIRPA is likely involving PTPN6 rather than PTPN11 in lung<br />

adenocarcinoma.<br />

5.4.6 Prevalence <strong>of</strong> SIRPA deregulation and association with clinical characteristics<br />

Given that the sample set we examined comprised <strong>of</strong> only 10 samples, we then wanted to<br />

assess the expression in a larger panel <strong>of</strong> samples to validate the frequency <strong>of</strong> underexpression<br />

observed in the initial set. Using 59 lung adenocarcinoma and matched non-malignant sample<br />

pairs, the prevalence <strong>of</strong> SIRPA underexpression was assessed and the correlation <strong>of</strong> SIRPA<br />

and PTPN6 was re-evaluated. It was found that 47/59 pairs exhibited at least a 1.5-fold<br />

reduction <strong>of</strong> SIRPA in tumors as compared to matched non-malignant tissue, representing<br />

~80% <strong>of</strong> tumors assessed (Figure 5.8a). In addition, correlating SIRPA and PTPN6 expression<br />

using a Pearson correlation, a correlation coefficient <strong>of</strong> 0.907 was found (Figure 5.8b).<br />

It should also be noted that there was a small number <strong>of</strong> samples which exhibited<br />

overexpression <strong>of</strong> SIRPA. This finding was somewhat unexpected given the high prevalence <strong>of</strong><br />

underexpression observed in the initial dataset. However, the initial ten tumors were from<br />

individuals who were former smokers and the set <strong>of</strong> 59 tumors was comprised <strong>of</strong> 23 current<br />

smokers, 21 former smokers and 15 never smokers. When stratifying the differential gene<br />

expression based on smoking status, overexpression was not observed in any <strong>of</strong> the current or<br />

former smokers, while overexpression was only observed in a subset <strong>of</strong> never smokers (Figure<br />

5.8c).<br />

126


Finally, using publicly microarray datasets with patient survival information [250-252], Kaplan-<br />

Meier analysis was performed on each <strong>of</strong> these datasets based on SIRPA expression levels.<br />

The association was deemed significant if the gene had a p-value ≤ 0.05 based on a Mantel-<br />

Cox (or log ranks) test. Two <strong>of</strong> the five datasets showed a statistically significant association<br />

between SIRPA expression levels and overall patient survival with an additional two datasets<br />

close to significance with p values ≤ 0.18 (Figure 5.9).<br />

5.5 Tracking clonal expansion in spatial dimensions<br />

Delineating the clonal relationship between multiple tumors in the same patient is relevant not<br />

only to clinical management <strong>of</strong> disease but also to the understanding <strong>of</strong> metastasis. Multiple<br />

tumors in the same patient may not necessarily share an identical genomic pr<strong>of</strong>ile. The<br />

similarities and differences in genomic landscape between tumors are quantifiable and therefore<br />

can be used for delineating relatedness. Whole genome comparison based on array CGH<br />

pr<strong>of</strong>iles is a new tool for distinguishing metastatic from primary synchronous carcinomas. A<br />

multitude <strong>of</strong> genomic features, for example the boundaries <strong>of</strong> segmental deletions, are used to<br />

delineate the presence and the sequence <strong>of</strong> events in clonal evolution [253-261].<br />

Furthermore, signature genetic alterations can be used to track clonality in a cell population,<br />

putting genetic events in the context <strong>of</strong> tumor tissue architecture. By assessing the appearance<br />

<strong>of</strong> pre-selected markers in individual nuclei on a tissue section by FISH, the clustering and the<br />

expansion <strong>of</strong> clonally related cells can be delineated by analyzing the marker patterns <strong>of</strong><br />

neighboring cells (Figure 5.10).<br />

5.6 Evaluating the biological significance <strong>of</strong> integrative genomics<br />

findings<br />

The utilization <strong>of</strong> an integrative genomic, epigenomic and transcriptomic approach will<br />

undoubtedly improve our ability to identify gene disruptions and their effects on gene<br />

127


expression. The next challenge is to develop approaches for the determination <strong>of</strong> functional<br />

and phenotypic evidence <strong>of</strong> the biological relevance <strong>of</strong> such gene disruptions in a high<br />

throughput manner -- for example, functional genomic screens by RNAi, proteomic pr<strong>of</strong>iling and<br />

metabolite pr<strong>of</strong>iling. Forced expression <strong>of</strong> genes and RNAi knockdown <strong>of</strong> gene expression are<br />

commonly used methods for assessing growth and invasion phenotypes in cell models.<br />

Genome wide RNAi screens, comprised <strong>of</strong> large libraries <strong>of</strong> short hairpin RNA sequences<br />

redundantly targeting thousands <strong>of</strong> genes, have been used to identify genes essential to<br />

tumorigenesis, including tumor suppressor genes as well as cooperative genes with oncogenic<br />

mutation in several malignancies [22, 28, 29, 262-270]. Animal models are also instrumental to<br />

functional validation <strong>of</strong> genes singly or in combination, but this topic is beyond the scope <strong>of</strong> this<br />

article. Cross referencing genomic findings with proteomic pr<strong>of</strong>iles will determine the functional<br />

consequences yielding information on expression levels, post-translational modification, and<br />

protein-protein interactions [271-275]. As recent studies have highlighted the importance <strong>of</strong> the<br />

metabolome in cancer, the genomic landscape can also be integrated with metabolome pr<strong>of</strong>iles<br />

to determine the role <strong>of</strong> genetic and epigenetic alterations in cellular physiology relevant to<br />

cancer development [276-278].<br />

The progress made in the development <strong>of</strong> technologies and approaches to analyze the<br />

genome, epigenome, and transcriptome have allowed for much improved understanding <strong>of</strong><br />

cancer landscapes. With the increased application <strong>of</strong> sequence based approaches to analyze<br />

genetic and epigenetic dimensions and the additional complexity with the proteome and<br />

metabolome to follow, an unprecedented definition <strong>of</strong> the cancer cell can be achieved. The next<br />

key challenge will be the synthesis <strong>of</strong> this information to better understand fundamental cancer<br />

processes such as progression, metastasis and drug resistance.<br />

128


Figure 5.1. Advances in cancer genomic landscape post Y2K.<br />

2009<br />

2007<br />

2006<br />

2005<br />

2004<br />

2002<br />

DNA nanoballs sequencing technology [3]<br />

Breast, lung & skin cancer genomes sequen ced [4-6,11]<br />

Human met hylomes sequen ced [12]<br />

Acute myeloid leukemia genome sequen ced [13,14]<br />

International Cancer Genome Consortium initiated<br />

1000 genomes p roject launched [15]<br />

Integrative study <strong>of</strong> glioblas toma [16]<br />

Genome RNAi database established [9,10]<br />

2nd gene ration human haplo type map with >3M SN Ps [17]<br />

Next generation, massi vely parallel sequencing technolo gies<br />

Copy number variation maps [18,19]<br />

5-Azacytidine re-expression <strong>of</strong> met hylated cancer genes [20]<br />

Exome sequencing mut ation detection [21]<br />

The RNAi Consortium (TRC) [22]<br />

Bead Arrays for bisulfite DNA methylation [23]<br />

NIH Cancer Genome Atlas (CGA) initiated<br />

Methylome map by MeDNA immunop recipitation [24]<br />

First human genome haplo type map [25]<br />

MicroRNA expression pr<strong>of</strong>iles classify can cers [26]<br />

Catalogue <strong>of</strong> som atic mutations in can cer (COSMIC) [27]<br />

Large scale RN Ai-based sc reens [28,29]<br />

Cancer Gene Census published [30]<br />

Large scale copy number variation in humans [31,32]<br />

Whole genome tiling p ath CGH microarrays [33]<br />

Tiling path analysis <strong>of</strong> human t ranscribed sequen ces [34]<br />

Cancer Biomedical Informatics Grid (caBIG) launched [8]<br />

The Ensembl genome d atabase project [35]<br />

The human genome b rowser at UCSC [36]<br />

BeadArray genotyping pl atforms [37]<br />

Concept <strong>of</strong> oncogene addi ction [38,39]<br />

First human genome sequen ces [40,41]<br />

CGAP launched [42,43]<br />

Figure 5.1. Advances in cancer genomic landscape post Y2K. The timeframe <strong>of</strong> events<br />

are estimated based on time <strong>of</strong> publication.<br />

129


Figure 5.2<br />

# <strong>of</strong> copies<br />

Total copy number<br />

4<br />

2<br />

0<br />

Allele speci�c copy number<br />

3<br />

# <strong>of</strong> copies<br />

2<br />

1<br />

0<br />

(a)<br />

# <strong>of</strong> copies<br />

# <strong>of</strong> copies<br />

4<br />

2<br />

0<br />

3<br />

2<br />

1<br />

0<br />

130<br />

Total copy number<br />

Allele speci�c copy number<br />

Neutral (b) Gain (c) UPD<br />

Figure 5.2. SNP array analysis to identify areas <strong>of</strong> altered copy number and allelic<br />

composition in a clinical lung cancer specimen. Shown here are (a) a region that is copy<br />

neutral with no observed allelic imbalance and regions containing a (b) segmental gain and<br />

(c) UPD. Examining the allele specific copy number plot, the gain (in b) is likely a single<br />

copy change and the UPD event (in c) is signified by the shift in allele levels while maintaining<br />

total copy number neutral status.


Figure 5.3<br />

TP53<br />

1 2 3 4 5 6 7 8<br />

9 10<br />

PIK3CA<br />

11 12 13 14 15 16<br />

17 18 19 20 21 22<br />

131<br />

Gain<br />

Loss<br />

UPD<br />

Figure 5.3. Overlay <strong>of</strong> chromosomal regions <strong>of</strong> gain, loss and UPD (copy number<br />

neutral LOH) inherent to the T47D breast cancer cell line. The chromosomal loci for<br />

PIK3CA and TP53 (modified by activating and inactivating mutations, respectively, in this cell<br />

line), are indicated. The majority <strong>of</strong> the genome is affected by any one <strong>of</strong> the three genomic<br />

alterations. Raw SNP 6.0 array data was obtained from the Sanger database with mutation<br />

status obtained from the COSMIC database [132]. Copy number and allelic status changes<br />

were determined using Partek Genomics Suite and reference genomes used were 72<br />

individuals from the HapMap collection. Data was visualized using the SIGMA2 s<strong>of</strong>tware<br />

[7].


Figure 5.4. Integration <strong>of</strong> copy number, allelic status, DNA methylation, and gene<br />

expression for a single lung adenocarcinoma sample. (a) Copy number and (b) allele<br />

status analyses revealed a high level allele-specific DNA amplification (highlighted in yellow,<br />

image generated with Partek Genomics Suite); (c) individual CpG loci within this region were<br />

assessed for differential methylation between tumor and non-malignant tissue.<br />

Hypomethylation at the indicated CpG locus, which corresponds to the MUC1 gene, is observed<br />

(visualized with Genesis). (d) Expression analysis revealed four-fold overexpression <strong>of</strong> the<br />

MUC1 transcript when a tumor sample was compared to matched, adjacent non-malignant<br />

tissue. Copy number and allele status pr<strong>of</strong>iling was performed using the Affymetrix SNP 6.0<br />

array; DNA methylation pr<strong>of</strong>iling using the Illumina Infinium HM27 platform; and gene<br />

expression using the Affymetrix Human Exon 1.0 ST array.<br />

132


Figure 5.4<br />

a<br />

b<br />

c<br />

d<br />

# <strong>of</strong> copies<br />

# <strong>of</strong> copies<br />

4<br />

2<br />

0<br />

3<br />

2<br />

1<br />

0<br />

Normal<br />

Tumor<br />

Relative expression<br />

7000<br />

6000<br />

5000<br />

4000<br />

3000<br />

2000<br />

1000<br />

0<br />

Total copy number: Ampli�cation<br />

Allele speci�c copy number<br />

DNA hypomethylation<br />

Overexpression<br />

MUC1<br />

Normal Tumor<br />

133


Figure 5.5. Integration <strong>of</strong> copy number, allelic status, DNA methylation, and gene<br />

expression for a single lung adenocarcinoma sample. Enhanced analysis <strong>of</strong> the cancer<br />

phenotype using an integrative and multi-dimensional approach. (a) On average, a higher<br />

proportion <strong>of</strong> differential gene expression can be associated with genomic alterations when<br />

examining multiple DNA dimensions relative to single dimensions. (b) Using a fixed frequency<br />

threshold <strong>of</strong> 50 %, more genes are revealed to be frequently disrupted when multiple<br />

mechanisms <strong>of</strong> genomic alteration (e.g. altered copy number, DNA methylation, or copy number<br />

neutral LOH) are considered, (~200 genes versus more than 1000 genes). (c) Pathway<br />

analyses performed using gene lists derived from a multi-dimensional approach, identifies an<br />

enhanced number <strong>of</strong> aberrant pathways relative to those identified from a uni-dimensional<br />

approach. (d) Functional pathways identified using the integrated gene list are <strong>of</strong> relatively high<br />

significance; the top 10 such pathways are shown. This suggests that the additional identified<br />

genes associate with specific pathways rather than with random functions. The four bars<br />

represent, from left to right: all dimensions, copy number, DNA methylation, and UPD.<br />

Ingenuity Pathway Analysis was used for analyses in (c) and (d). (e) Example <strong>of</strong> two genes that<br />

are missed when a single DNA dimension is studied, but captured when multiple DNA<br />

dimensions are examined. Both ribonucleotide reductase M2 (RRM2) [279, 280] and retinoic<br />

acid receptor responder (tazarotene induced) 2 (RARRES2) [281, 282] are known to be<br />

deregulated in multiple cancer types.<br />

134


Figure 5.5<br />

a<br />

Proportion <strong>of</strong> di�erentially expressed genes<br />

d<br />

-log(pvalue)<br />

e<br />

0.9<br />

0.8<br />

0.7<br />

0.6<br />

0.5<br />

0.4<br />

0.3<br />

0.2<br />

0.1<br />

6<br />

4<br />

2<br />

0<br />

0<br />

DNA Methylation<br />

Hepatic Fibrosis /<br />

Hepatic Stellate<br />

Cell Activation<br />

RAR Activation<br />

RARRES2<br />

Copy Number Neutral<br />

LOH (CNNLOH)<br />

Macropinocytosis<br />

Copy Number<br />

Complement System<br />

Leveraging All<br />

Dimensions<br />

Leukocyte Extravasation<br />

Signaling<br />

135<br />

Reelin Signaling<br />

in Neurons<br />

Sample 1<br />

Sample 2<br />

Sample 3<br />

Sample 4<br />

Sample 5<br />

Sample 6<br />

Sample 7<br />

Sample 8<br />

Sample 9<br />

Sample 10<br />

Average<br />

Copy Number Copy Number<br />

DNA Methylation DNA Methylation<br />

CNNLOH CNNLOH<br />

All Dimensions All Dimensions<br />

b<br />

c<br />

Oncostatin M<br />

Signaling<br />

RRM2<br />

# <strong>of</strong> genes identi�ed<br />

# <strong>of</strong> Signi�cant Pathways<br />

IL-8 Signaling<br />

1400<br />

1200<br />

1000<br />

800<br />

600<br />

400<br />

200<br />

60<br />

50<br />

40<br />

30<br />

20<br />

10<br />

0<br />

0<br />

CNNLOH<br />

Copy Number<br />

Copy Number<br />

DNA Methylation<br />

Acute Phase<br />

Response Signaling<br />

DNA Methylation<br />

All Dimensions<br />

CNNLOH<br />

All Dimensions<br />

CXCR4 Signaling<br />

0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80<br />

Frequency <strong>of</strong> Disruption (%) Frequency <strong>of</strong> Disruption (%)


Figure 5.6<br />

SHC1<br />

GRB2<br />

SOS2<br />

RRAS<br />

ER<br />

RAF1<br />

MAP2K1<br />

MAPK1<br />

ITPR1<br />

IP3<br />

PLCG1<br />

DAG<br />

Ca2+<br />

DUSP4<br />

MAPK1 CCND1<br />

EGF<br />

Proliferation &<br />

Differentiation<br />

EGFR<br />

PIK3R1<br />

136<br />

TGFA<br />

ERBB2<br />

PDK1<br />

MUC1<br />

PIP2 PIP3<br />

PRKCA<br />

CASP9<br />

MYC<br />

Cell Cycle<br />

RASSF5<br />

MST1<br />

Apoptosis<br />

AKT2 AKT1<br />

FOXO3<br />

Apoptosis<br />

X<br />

KRAS<br />

RASSF1<br />

X<br />

CCND1<br />

Proliferation<br />

BAD<br />

Figure 5.6. Identification <strong>of</strong> multiple disrupted components in a biological pathway.<br />

Integrative analysis identifies more genes affected in the EGFR signaling pathway than a<br />

single dimensional analysis alone. In this example, multi-dimensional pr<strong>of</strong>iling data were<br />

generated from ten lung adenocarcinomas and their paired non-cancerous lung tissue.<br />

Analysis <strong>of</strong> DNA copy number (gene dosage) alterations that affected expression, identified<br />

7 genes (in green) that are disrupted at ≥ 30% frequency. However, when alterations in<br />

copy number, DNA methylation, sequence mutation and/or copy neutral LOH were considered,<br />

17 genes disrupted at ≥ 30% frequency were identified to be associated with a change<br />

in expression, with an additional gene, KRAS, harboring frequent mutation. The 11 additional<br />

genes are indicated in red. Genes in gray are not significant in this dataset as they<br />

did not meet the frequency criteria.


Figure 5.7<br />

SHC1 CN<br />

S7 S9S10<br />

GE<br />

M<br />

L<br />

S5 S6S7<br />

GE<br />

CN<br />

GRB2 M<br />

L<br />

S1 S8<br />

GE<br />

CN<br />

SOS2 M<br />

L<br />

*<br />

SIRPA CN<br />

S1 S2S3 S4S5 S6 S7S8S9S10<br />

GE<br />

M<br />

L<br />

PI3K-AKT<br />

Signalling<br />

RRAS CN<br />

S1 S2S3 S4S6 S9 S10<br />

GE<br />

M<br />

L<br />

ER<br />

RAF1<br />

MAP2K1<br />

MAPK1<br />

ITPR1 CN<br />

S1 S2S3 S4 S6 S7S8S9<br />

GE<br />

M<br />

L<br />

S8 S10<br />

GE<br />

CN<br />

M<br />

L<br />

S2 S6 S7S8<br />

GE<br />

CN<br />

M<br />

L<br />

S6 S7S8<br />

GE<br />

CN<br />

M<br />

L<br />

Ca2+<br />

DUSP4<br />

MAPK1 CCND1<br />

S1 S2S4 S6S7 S8 S10<br />

S1 S3S4 S5S9<br />

GE * GE<br />

CN<br />

CN<br />

M<br />

EGF TGFA M<br />

L<br />

L<br />

S1 S2S4 S7S8 S9 S10<br />

GE<br />

CN<br />

M<br />

L<br />

µ µ<br />

IP3<br />

Proliferation &<br />

PLCG1 CN<br />

S2<br />

GE<br />

M<br />

L<br />

DAG<br />

PRKCA CN<br />

S1 S2 S4 S6 S8<br />

GE<br />

M<br />

L<br />

S5 S6S7 S8S10<br />

GE<br />

CN<br />

M<br />

L<br />

MYC<br />

Cell Cycle<br />

EGFR<br />

CASP9<br />

PIK3R1<br />

S1 S2S3 S4S5 S6 S7S8S9S10<br />

GE<br />

CN<br />

M<br />

L<br />

RASSF5<br />

RASSF1<br />

Legend:<br />

GE: Gene Expression: Over Under<br />

CN: DNA Copy Number: Gain Loss<br />

L: Allelic Status: LOH<br />

Differentiation<br />

M: DNA Methylation: Hypo Hyper<br />

Figure 5.7. Multi-dimensional analysis <strong>of</strong> the epidermal growth factor receptor signaling<br />

pathway. Integrative analysis identifies more genes affected in the EGFR signaling<br />

pathway than a single dimensional analysis alone. In this example, multi-dimensional<br />

pr<strong>of</strong>iling data were generated from ten lung adenocarcinomas and their paired noncancerous<br />

lung tissue. Analysis <strong>of</strong> DNA copy number (gene dosage) alterations that<br />

affected expression, identified 7 genes (in green) that are disrupted at ≥ 30% frequency.<br />

However, when alterations in copy number, DNA methylation, sequence mutation and/or<br />

copy neutral LOH were considered, 17 genes disrupted at ≥ 30% frequency were identified<br />

to be associated with a change in expression, with an additional gene, KRAS, harboring<br />

frequent mutation. The 11 additional genes are indicated in red. Genes in gray are not<br />

significant in this dataset as they did not meet the frequency criteria. Genome pr<strong>of</strong>iles were<br />

generated using the Affymetrix SNP 6.0 platform, DNA methylation data were genrated<br />

using the Illumina Infinium HM27 platform and gene expression pr<strong>of</strong>iles were generated<br />

using the Affymetrix Exon Array.<br />

137<br />

*<br />

ERBB2<br />

PIP2 PIP3<br />

S2 S3S4 S5S8 S9 S10<br />

GE<br />

CN<br />

M<br />

L<br />

MUC1 CN<br />

S3 S5S6 S7S10<br />

GE<br />

M<br />

L<br />

PDK1 CN<br />

S2 S3S4 S5S8 S9<br />

GE<br />

M<br />

L<br />

S7 S10<br />

GE<br />

CN<br />

M<br />

L<br />

S3 S4S5 S6 S7S8<br />

GE<br />

CN<br />

M<br />

L<br />

AKT2 AKT1<br />

FOXO3<br />

S1<br />

GE<br />

CN<br />

M<br />

L<br />

Apoptosis<br />

X<br />

S1 S3S4 S6S8 S9 S10<br />

GE<br />

CN<br />

M<br />

L<br />

KRAS CN<br />

S1 S5S6 S8S9<br />

GE<br />

M<br />

L<br />

µ µ µ µ µ<br />

X<br />

CCND1<br />

Proliferation<br />

MST1<br />

Apoptosis<br />

S2 S5S8S9S10<br />

GE<br />

CN<br />

M<br />

L<br />

S1 S3S4 S6S7 S10<br />

GE<br />

CN<br />

M<br />

L<br />

BAD<br />

S2<br />

GE<br />

CN<br />

M<br />

L


Figure 5.8<br />

a<br />

Log2 Fold Change (T vs. N)<br />

b<br />

c<br />

4<br />

3<br />

2<br />

1<br />

0<br />

-1<br />

-2<br />

-3<br />

-4<br />

% <strong>of</strong> underexpressing cases<br />

100<br />

80<br />

60<br />

40<br />

20<br />

0<br />

PTPN6<br />

r = 0.907426<br />

SIRPA<br />

Figure 5.8. Prevalence <strong>of</strong> SIRPA underexpression and its relationship with PTPN6<br />

and smoking status. (a) Analysis <strong>of</strong> SIRPA and PTPN6 expression in 59 lung adenocarcinoma<br />

tumor/non-malignant pairs using quantitative PCR. Plotted are the log2 fold changes<br />

<strong>of</strong> each tumors versus its matched non-malignant sample. PCR data were normalized with<br />

Beta-Actin. All samples were done in triplicate. Threshold lines denote a 1.5-fold change.<br />

(b) Pairwise comparison <strong>of</strong> SIRPA and PTPN6 fold changes in the 59 sample pairs. Spearman<br />

correlation coefficient was calculated. (c) Stratification <strong>of</strong> qPCR results based on<br />

smoking status. While the majority <strong>of</strong> current smokers (CS, n=22) and former smokers<br />

(n=22) show underexpression, a subset <strong>of</strong> never smokers (NS, n=15) exhibit overexpression.<br />

138<br />

% <strong>of</strong> overexpressing cases<br />

30<br />

20<br />

10<br />

SIRPA<br />

PTPN6<br />

0<br />

CS FS NS CS FS NS<br />

40<br />

Threshold


Figure 5.9<br />

Survival Ratio<br />

Survival Ratio<br />

1<br />

0.9<br />

0.8<br />

0.7<br />

0.6<br />

0.5<br />

0.4<br />

Duke H. Lee Mo�tt<br />

LowExpression<br />

HighExpression<br />

0.3<br />

p = 0.009<br />

0.1<br />

p = 0.009<br />

0.2<br />

0 5 10 15 20 25 30 35 40 45<br />

0<br />

0 20 40 60 80 100 120<br />

Time (months)<br />

Time<br />

1<br />

0.9<br />

0.8<br />

0.7<br />

0.6<br />

0.5<br />

0.4<br />

0.3<br />

0.2<br />

0.1<br />

MSKCC<br />

LowExpression<br />

HighExpression<br />

p = 0.150 p = 0.180<br />

0<br />

0 50 100 150 200 250<br />

Time<br />

Figure 5.9. Kaplan-Meier analysis <strong>of</strong> SIRPA in four independent microarray datasets.<br />

Using publicly available gene expression microarray data, Kaplan-Meier analysis was<br />

performed to assess the association <strong>of</strong> SIRPA expression levels and overall patient survival.<br />

Briefly, for each dataset, samples were sorted based on ascending SIRPA expression and<br />

survival distributions <strong>of</strong> the top 1/3 <strong>of</strong> samples expressing SIRPA and bottom 1/3 <strong>of</strong> samples<br />

expression SIRPA were compared. In total, five datasets were tested with two <strong>of</strong> the datsets<br />

(Duke, H. Lee M<strong>of</strong>fitt) showing a stastistically significant association. In an additional two<br />

datasets (MSKCC, Michigan), the p-values were close to statistical significance. All expression<br />

data were normalized using RMA. P-values were calculated using a Mantel-Cox log<br />

rank test.<br />

Survival Ratio<br />

Survival Ratio<br />

139<br />

1<br />

0.9<br />

0.8<br />

0.7<br />

0.6<br />

0.5<br />

0.4<br />

0.3<br />

0.2<br />

1<br />

0.9<br />

0.8<br />

0.7<br />

0.6<br />

0.5<br />

0.4<br />

0.3<br />

Michigan<br />

LowExpression<br />

HighExpression<br />

LowExpression<br />

HighExpression<br />

0.2<br />

0 20 40 60 80<br />

Time<br />

100 120 140 160


Figure 5.10<br />

(a) (b)<br />

Figure 5.10. Automated detection <strong>of</strong> selected clonal populations <strong>of</strong> cells within a<br />

cancer biopsy tissue section. All nuclei (~150,000 in this example) are detected and<br />

FISH probe signal counts are enumerated for each nucleus. FISH signal pattern for each<br />

cell is compared against its neighbor in order to define spatial association (or neighborhood).<br />

A mathematical model is then applied to determine clonal cell relationships. (a)<br />

Mapping cancer cells on a tissue section. A gain or loss <strong>of</strong> any one <strong>of</strong> three FISH markers<br />

indicates a cancer cell. This image shows the density <strong>of</strong> cancer cells (so defined) in neighborhoods<br />

as a color overlay. Red indicates high fraction <strong>of</strong> cancer cells, yellow indicates<br />

medium fraction <strong>of</strong> cancer cells and blue indicates low to none (see scale bar). Most <strong>of</strong> the<br />

section is highlighted except for the surrounding normal stromal infiltrates. (b) Mapping<br />

clonal cells. The same image data were analyzed for concurrent gains <strong>of</strong> each <strong>of</strong> the three<br />

<strong>of</strong> the markers. The two clusters <strong>of</strong> cells, magnified within the white boxes, are cells harboring<br />

gain <strong>of</strong> all three markers.<br />

140


Table 5.1. List <strong>of</strong> s<strong>of</strong>tware for integrative analysis<br />

S<strong>of</strong>tware<br />

Agilent<br />

Genomic<br />

Workbench<br />

5.0<br />

Source:<br />

Commercia<br />

l (C)<br />

or<br />

Academic<br />

(A)<br />

Genome<br />

Epigenome<br />

Transcriptome<br />

Integrative<br />

141<br />

Citatio<br />

n<br />

C X X X X N/A<br />

SIGMA2 A X X X X [7]<br />

Integrative<br />

Genomics<br />

Viewer<br />

Nexus Copy<br />

Number<br />

Website (http://www.)<br />

chem.agilent.com/enus/products/instruments/dnamicroar<br />

rays/<br />

dnaanalyticss<strong>of</strong>tware/pages/default.<br />

aspx<br />

flintbox.com/technology.asp?page=<br />

3716<br />

A X X X N/A broadinstitute.org/igv/<br />

C X X X N/A biodiscovery.com/index/nexus<br />

CGH Fusion C X X N/A infoquant.com/index/cghfusion<br />

ISA-CGH A X X X [283] isacgh.bioinfo.cipf.es<br />

VAMP<br />

Partek<br />

A X X X X [284] bioinfo-out.curie.fr/projects/vamp/<br />

Genomics<br />

Suite<br />

C X X X X N/A partek.com/partekgs


Table 5.2. List <strong>of</strong> genomic resources and databases<br />

Name Description Citation Website (http://www.)<br />

ArrayExpress Gene<br />

Expression Atlas<br />

Gene expression analysis <strong>of</strong> public<br />

datasets<br />

[285] ebi.ac.uk/gxa<br />

BioDrugScreen<br />

Catalogue <strong>of</strong> Somatic<br />

Protein/Small molecule interaction<br />

database<br />

[286] biodrugscreen.org<br />

Mutations in Cancer<br />

(COSMIC)<br />

Listing <strong>of</strong> somatic mutations in cancer [132] sanger.ac.uk/cosmic<br />

Cancer Gene Expression<br />

Database (CGED)<br />

Gene expression analysis <strong>of</strong> cancer [287] cged.hgc.jp<br />

Database <strong>of</strong> Differentially<br />

Expressed Proteins in<br />

human Cancers (dbDEPC)<br />

Differentially expressed proteins in<br />

cancer<br />

[288] dbdepc.biosino.org/index<br />

Database <strong>of</strong> Genomics<br />

Variants<br />

Reported normal copy number<br />

variations<br />

[31] projects.tcag.ca/variation<br />

European Bioinformatics<br />

Institute (EBI)<br />

Integrated database <strong>of</strong> multiple<br />

biological resources<br />

[289] ebi.ac.uk<br />

GeneCards<br />

Integrated database <strong>of</strong> multiple<br />

biological resources<br />

[290] genecards.org<br />

GenomeRNAi RNAi experiment results [10] rnai2.dkfz.de/GenomeRNAi<br />

Human DNA Methylome<br />

Whole genome methylation sequences<br />

<strong>of</strong> multiple individuals<br />

[12]<br />

neomorph.salk.edu/human<br />

_methylome<br />

Human Histone Modification<br />

Database (HHMD)<br />

Histone modification database [291] bioinfo.hrbmu.edu.cn/hhmd<br />

microRNA.org Annotated microRNAs and their targets [292] microRNA.org<br />

miR2Disease Deregulated microRNAs in cancer [293] miR2Disease.org<br />

miRDB Annotated microRNAs and their targets [227] mirdb.org<br />

miRGen Annotated microRNAs and their targets [294]<br />

diana.cslab.ece.ntua.gr/mir<br />

gen<br />

National Center for<br />

Biotechnology Information<br />

(NCBI)<br />

Integrated database <strong>of</strong> multiple<br />

biological resources<br />

[295] ncbi.nlm.nih.gov<br />

NCBI Cancer Chromosomes<br />

Curated cytogenetic alterations in<br />

cancer<br />

[295]<br />

ncbi.nlm.nih.gov/sites/entre<br />

z?db=cancerchromosomes<br />

NCBI GEO Pr<strong>of</strong>iles<br />

Gene expression analysis <strong>of</strong> public<br />

datasets<br />

[296]<br />

ncbi.nlm.nih.gov/sites/entre<br />

z?db=geo<br />

Oncomine<br />

Gene expression analysis <strong>of</strong> public<br />

datasets<br />

[297] oncomine.org<br />

PROGENETIX<br />

Copy number aberrations in cancer by<br />

CGH<br />

[298] progenetix.net<br />

PRoteomics IDentifications<br />

Database (PRIDE)<br />

Mass spectrometry results [299] ebi.ac.uk/pride<br />

Sanger CGP LOH And Copy<br />

Number Analysis<br />

Copy number and LOH pr<strong>of</strong>iles <strong>of</strong><br />

cancer cell lines<br />

-<br />

sanger.ac.uk/cgibin/genetics/CGP/cghviewe<br />

r/CghHome.cgi<br />

siRecords<br />

System for Integrative<br />

RNAi experiment results [300]<br />

siRecords.umn.edu/siRecor<br />

ds<br />

Genomic Microarray<br />

Analysis (SIGMA)<br />

Array CGH pr<strong>of</strong>iles <strong>of</strong> cancer cell lines [301] sigma.bccrc.ca<br />

The Cancer Genome<br />

Anatomy Project (CGAP)<br />

Gene expression analysis <strong>of</strong> cancer [43] cgap.nci.nih.gov/<br />

The Cancer Genome Atlas<br />

(TCGA)<br />

Multi-dimensional description <strong>of</strong> cancer<br />

genomes<br />

[16]<br />

cancergenome.nih.gov/dat<br />

aportal/data/about/<br />

UCSC Genome Browser<br />

Integrated database <strong>of</strong> multiple<br />

biological resources<br />

[302]<br />

genome.ucsc.edu/cgibin/hgNear<br />

142


Table 5.3. Genes interacting with SIRPA as identified by network analysis<br />

Gene Gene Gene Gene Gene Gene Gene Gene<br />

ABCG1 C5AR1 DOK2 GPR65 LILRA1 NTNG1 RECK STK10<br />

ABI3BP C7orf44 DPEP2 GPR85 LILRA5 NTRK3 RHOG STK33<br />

ACOT1 CANT1 DSE GPX3 LILRB1 NUP62CL RHOJ STX11<br />

ACP5 CCND2 EMP3 GSPT2 LILRB2 OGN RNASEK SULF1<br />

ACSL4 CD14 EMR1 GTDC1 LILRB5<br />

LOC440<br />

OLFML1 RTN1 TACSTD1<br />

ACVRL1 CD163 ETS1 GYPC 295 OR1J1 RUNX1T1 TARP<br />

ACY1 CD300C EVI2B HCK LOXL2 PARVB SAMHD1 TARS2<br />

ADAMTSL4 CD300LF FAAH2 HERPUD1 LPAR1 PCDH15<br />

PDCD1LG<br />

SELPLG TCEAL2<br />

ADARB1 CD33 FAM107A HIST2H4A LPIN1 2 SH2B3 TCF21<br />

ADC CD34 FAM65A HSD11B1 LPXN PDE3B SIGLEC7 TDRKH<br />

ADCY4 CD4 FBLN5 HSPB7 LRCH2 PHEX SIP1 TFE3<br />

ADCY7 CD53 FBXL17 HVCN1 LRRC25 PHKA1 SIRPB1 TGFBR3<br />

ADPRH CD86 FBXL2 IFI30 LRRC33<br />

LRRC37<br />

PIK3AP1 SLA TLN1<br />

ADRA1A CD93 FCER1G IGSF10 A<br />

LRRC8<br />

PIK3R5 SLC15A3 TLR4<br />

ADRBK1 CD97 FCGR1A IGSF2 C PILRA SLC16A2 TLR8<br />

AGER CDH1 FCGR3A IL17RA LST1 PLEK SLC22A25 TM6SF1<br />

AKR7A3 CDKL3 FERMT3 IL8RA LTBP2 PLEKHA8 SLC25A10 TMED3<br />

TMEM183<br />

AKT1 CFD FGD2 IRF6 MAF PLEKHO2 SLC25A29 B<br />

MAN1C<br />

TMEM184<br />

ALS2CR12 CFP FGF2 IRF8 1<br />

MAP3K<br />

PMP22 SLC2A9 A<br />

ANGPTL1 CLEC4E FGFR4 ITGA5 3<br />

MARCH<br />

PNPLA6 SLC31A2 TMEM47<br />

ANKRD36 CLN3 FGL2 ITGAL 1 PPAP2B SLC7A11 TMTC1<br />

TNFRSF1<br />

AOC3 CMKLR1 FGR ITPR3 MARCO PPM1F SLC7A7 B<br />

ARHGAP3<br />

MCOLN PPP1R14<br />

0 COASY FHL1 ITPRIP 1 B SLC8A1 TPK1<br />

ARRB2 COG1 FIBIN JAM2 MFNG PRCP SLCO2B1 TPRG1<br />

ATP6V1B2 COMMD3 FIGF JUNB MMP19 PREX1 SLFN13 TRPV2<br />

BAALC CPA3 FLI1 KCNJ5 MORC2 PROM1 SMARCA2 TSPAN18<br />

BHLHB3 CPVL FLJ22662 KCNK1 MRAS PRUNE2 SMYD3 TTC13<br />

BRCC3 CSF1R FPR1 KCTD12 MRC2 PTGDS SNN TTC30B<br />

BTK CSF2RB<br />

CSGALNA<br />

FRMD4A KIF26B MS4A15 PTGER4 SORBS1 TYROBP<br />

BVES CT1 FXYD6 KLF13 MSRB3<br />

MTMR1<br />

PTGIS SORD TYW1<br />

BZW2 CYP4Z1 GFRA2 KLF4 0 PTPN6 SPARCL1 USP48<br />

C10orf54 CYTH4 GIMAP1 KMO MYCT1 PTPRG SPI1 VAMP7<br />

C10orf72 CYYR1 GIMAP4 LAIR1 MYD88 PVRL4 SPN VASH1<br />

C14orf49 DAB2 GIMAP5 LAMB3 NCF1<br />

NCKAP<br />

QKI SPOCK2 VAT1<br />

C1orf38 DNAH7 GIMAP6 LAMC2 1L QSER1 SPON1 VSIG4<br />

C1QA DNAJB4 GIMAP8 LAPTM5 NEK4 RAD54B SRGN VWF<br />

C1QB DNASE1L3 GLIPR2 LAT2 NEXN RASSF2 SRPX WDR60<br />

C1QC DOCK11 GMFG LCP1 NLRC4 RBM17 STAP2<br />

C1RL DOCK8 GNAI2 LDB2 NTAN1 RBM35A STAT5A<br />

143


5.5 References<br />

1. Pinkel D, Segraves R, Sudar D, Clark S, Poole I, Kowbel D, Collins C, Kuo WL, Chen C,<br />

Zhai Y et al: High resolution analysis <strong>of</strong> DNA copy number variation using<br />

comparative genomic hybridization to microarrays. Nat Genet 1998, 20(2):207-211.<br />

2. Schrock E, du Manoir S, Veldman T, Schoell B, Wienberg J, Ferguson-Smith MA, Ning<br />

Y, Ledbetter DH, Bar-Am I, Soenksen D et al: Multicolor spectral karyotyping <strong>of</strong><br />

human chromosomes. Science 1996, 273(5274):494-497.<br />

3. Drmanac R, Sparks AB, Callow MJ, Halpern AL, Burns NL, Kermani BG, Carnevali P,<br />

Nazarenko I, Nilsen GB, Yeung G et al: Human Genome Sequencing Using<br />

Unchained Base Reads on Self-Assembling DNA Nanoarrays. Science 2009.<br />

4. Pleasance ED, Cheetham RK, Stephens PJ, McBride DJ, Humphray SJ, Greenman CD,<br />

Varela I, Lin ML, Ordonez GR, Bignell GR et al: A comprehensive catalogue <strong>of</strong><br />

somatic mutations from a human cancer genome. Nature 2009.<br />

5. Pleasance ED, Stephens PJ, O'Meara S, McBride DJ, Meynert A, Jones D, Lin ML,<br />

Beare D, Lau KW, Greenman C et al: A small-cell lung cancer genome with complex<br />

signatures <strong>of</strong> tobacco exposure. Nature 2009.<br />

6. Stephens PJ, McBride DJ, Lin ML, Varela I, Pleasance ED, Simpson JT, Stebbings LA,<br />

Leroy C, Edkins S, Mudie LJ et al: Complex landscapes <strong>of</strong> somatic rearrangement in<br />

human breast cancer genomes. Nature 2009, 462(7276):1005-1010.<br />

7. Chari R, Coe BP, Wedselt<strong>of</strong>t C, Benetti M, Wilson IM, Vucic EA, MacAulay C, Ng RT,<br />

Lam WL: SIGMA2: a system for the integrative genomic multi-dimensional analysis<br />

<strong>of</strong> cancer genomes, epigenomes, and transcriptomes. BMC Bioinformatics 2008,<br />

9:422.<br />

8. von Eschenbach AC, Buetow K: Cancer Informatics Vision: caBIG. Cancer Inform<br />

2007, 2:22-24.<br />

9. Horn T, Arziman Z, Berger J, Boutros M: GenomeRNAi: a database for cell-based<br />

RNAi phenotypes. Nucleic Acids Res 2007, 35(Database issue):D492-497.<br />

10. Gilsdorf M, Horn T, Arziman Z, Pelz O, Kiner E, Boutros M: GenomeRNAi: a database<br />

for cell-based RNAi phenotypes. 2009 update. Nucleic Acids Res 2010, 38(Database<br />

issue):D448-452.<br />

11. Shah SP, Morin RD, Khattra J, Prentice L, Pugh T, Burleigh A, Delaney A, Gelmon K,<br />

Guliany R, Senz J et al: Mutational evolution in a lobular breast tumour pr<strong>of</strong>iled at<br />

single nucleotide resolution. Nature 2009, 461(7265):809-813.<br />

12. Lister R, Pelizzola M, Dowen RH, Hawkins RD, Hon G, Tonti-Filippini J, Nery JR, Lee L,<br />

Ye Z, Ngo QM et al: Human DNA methylomes at base resolution show widespread<br />

epigenomic differences. Nature 2009.<br />

13. Ley TJ, Mardis ER, Ding L, Fulton B, McLellan MD, Chen K, Dooling D, Dunford-Shore<br />

BH, McGrath S, Hickenbotham M et al: DNA sequencing <strong>of</strong> a cytogenetically normal<br />

acute myeloid leukaemia genome. Nature 2008, 456(7218):66-72.<br />

14. Mardis ER, Ding L, Dooling DJ, Larson DE, McLellan MD, Chen K, Koboldt DC, Fulton<br />

RS, Delehaunty KD, McGrath SD et al: Recurring mutations found by sequencing an<br />

acute myeloid leukemia genome. N Engl J Med 2009, 361(11):1058-1066.<br />

15. Wise J: Consortium hopes to sequence genome <strong>of</strong> 1000 volunteers. BMJ 2008,<br />

336(7638):237.<br />

16. Comprehensive genomic characterization defines human glioblastoma genes and<br />

core pathways. Nature 2008, 455(7216):1061-1068.<br />

17. Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL, Gibbs RA, Belmont JW,<br />

Boudreau A, Hardenbol P, Leal SM et al: A second generation human haplotype map<br />

<strong>of</strong> over 3.1 million SNPs. Nature 2007, 449(7164):851-861.<br />

144


18. Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, Fiegler H, Shapero MH,<br />

Carson AR, Chen W et al: Global variation in copy number in the human genome.<br />

Nature 2006, 444(7118):444-454.<br />

19. Wong KK, deLeeuw RJ, Dosanjh NS, Kimm LR, Cheng Z, Horsman DE, MacAulay C,<br />

Ng RT, Brown CJ, Eichler EE et al: A comprehensive analysis <strong>of</strong> common copynumber<br />

variations in the human genome. Am J Hum Genet 2007, 80(1):91-104.<br />

20. Shames DS, Girard L, Gao B, Sato M, Lewis CM, Shivapurkar N, Jiang A, Perou CM,<br />

Kim YH, Pollack JR et al: A genome-wide screen for promoter methylation in lung<br />

cancer identifies novel methylation markers for multiple malignancies. PLoS Med<br />

2006, 3(12):e486.<br />

21. Sjoblom T, Jones S, Wood LD, Parsons DW, Lin J, Barber TD, Mandelker D, Leary RJ,<br />

Ptak J, Silliman N et al: The consensus coding sequences <strong>of</strong> human breast and<br />

colorectal cancers. Science 2006, 314(5797):268-274.<br />

22. Root DE, Hacohen N, Hahn WC, Lander ES, Sabatini DM: Genome-scale loss-<strong>of</strong>function<br />

screening with a lentiviral RNAi library. Nat Methods 2006, 3(9):715-719.<br />

23. Bibikova M, Lin Z, Zhou L, Chudin E, Garcia EW, Wu B, Doucet D, Thomas NJ, Wang Y,<br />

Vollmer E et al: High-throughput DNA methylation pr<strong>of</strong>iling using universal bead<br />

arrays. Genome Res 2006, 16(3):383-393.<br />

24. Weber M, Davies JJ, Wittig D, Oakeley EJ, Haase M, Lam WL, Schubeler D:<br />

Chromosome-wide and promoter-specific analyses identify sites <strong>of</strong> differential<br />

DNA methylation in normal and transformed human cells. Nat Genet 2005,<br />

37(8):853-862.<br />

25. A haplotype map <strong>of</strong> the human genome. Nature 2005, 437(7063):1299-1320.<br />

26. Lu J, Getz G, Miska EA, Alvarez-Saavedra E, Lamb J, Peck D, Sweet-Cordero A, Ebert<br />

BL, Mak RH, Ferrando AA et al: MicroRNA expression pr<strong>of</strong>iles classify human<br />

cancers. Nature 2005, 435(7043):834-838.<br />

27. Bamford S, Dawson E, Forbes S, Clements J, Pettett R, Dogan A, Flanagan A, Teague<br />

J, Futreal PA, Stratton MR et al: The COSMIC (Catalogue <strong>of</strong> Somatic Mutations in<br />

Cancer) database and website. Br J Cancer 2004, 91(2):355-358.<br />

28. Paddison PJ, Silva JM, Conklin DS, Schlabach M, Li M, Aruleba S, Balija V,<br />

O'Shaughnessy A, Gnoj L, Scobie K et al: A resource for large-scale RNAinterference-based<br />

screens in mammals. Nature 2004, 428(6981):427-431.<br />

29. Schlabach MR, Luo J, Solimini NL, Hu G, Xu Q, Li MZ, Zhao Z, Smogorzewska A, Sowa<br />

ME, Ang XL et al: Cancer proliferation gene discovery through functional<br />

genomics. Science 2008, 319(5863):620-624.<br />

30. Futreal PA, Coin L, Marshall M, Down T, Hubbard T, Wooster R, Rahman N, Stratton<br />

MR: A census <strong>of</strong> human cancer genes. Nat Rev Cancer 2004, 4(3):177-183.<br />

31. Iafrate AJ, Feuk L, Rivera MN, Listewnik ML, Donahoe PK, Qi Y, Scherer SW, Lee C:<br />

Detection <strong>of</strong> large-scale variation in the human genome. Nat Genet 2004, 36(9):949-<br />

951.<br />

32. Sebat J, Lakshmi B, Troge J, Alexander J, Young J, Lundin P, Maner S, Massa H,<br />

Walker M, Chi M et al: Large-scale copy number polymorphism in the human<br />

genome. Science 2004, 305(5683):525-528.<br />

33. Ishkanian AS, Mall<strong>of</strong>f CA, Watson SK, DeLeeuw RJ, Chi B, Coe BP, Snijders A,<br />

Albertson DG, Pinkel D, Marra MA et al: A tiling resolution DNA microarray with<br />

complete coverage <strong>of</strong> the human genome. Nat Genet 2004, 36(3):299-303.<br />

34. Bertone P, Stolc V, Royce TE, Rozowsky JS, Urban AE, Zhu X, Rinn JL, Tongprasit W,<br />

Samanta M, Weissman S et al: Global identification <strong>of</strong> human transcribed<br />

sequences with genome tiling arrays. Science 2004, 306(5705):2242-2246.<br />

35. Hubbard T, Barker D, Birney E, Cameron G, Chen Y, Clark L, Cox T, Cuff J, Curwen V,<br />

Down T et al: The Ensembl genome database project. Nucleic Acids Res 2002,<br />

30(1):38-41.<br />

36. Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D: The<br />

human genome browser at UCSC. Genome Res 2002, 12(6):996-1006.<br />

145


37. Oliphant A, Barker DL, Stuelpnagel JR, Chee MS: BeadArray technology: enabling an<br />

accurate, cost-effective approach to high-throughput genotyping. Biotechniques<br />

2002, Suppl:56-58, 60-51.<br />

38. Weinstein IB: Cancer. Addiction to oncogenes--the Achilles heal <strong>of</strong> cancer. Science<br />

2002, 297(5578):63-64.<br />

39. Weinstein IB, Joe A: Oncogene addiction. Cancer Res 2008, 68(9):3077-3080;<br />

discussion 3080.<br />

40. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K,<br />

Doyle M, FitzHugh W et al: Initial sequencing and analysis <strong>of</strong> the human genome.<br />

Nature 2001, 409(6822):860-921.<br />

41. Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M,<br />

Evans CA, Holt RA et al: The sequence <strong>of</strong> the human genome. Science 2001,<br />

291(5507):1304-1351.<br />

42. Riggins GJ, Strausberg RL: Genome and genetic resources from the Cancer<br />

Genome Anatomy Project. Hum Mol Genet 2001, 10(7):663-667.<br />

43. Strausberg RL, Buetow KH, Emmert-Buck MR, Klausner RD: The cancer genome<br />

anatomy project: building an annotated gene index. Trends Genet 2000, 16(3):103-<br />

106.<br />

44. Bayani JM, Squire JA: Applications <strong>of</strong> SKY in cancer cytogenetics. Cancer Invest<br />

2002, 20(3):373-386.<br />

45. Kallioniemi A, Kallioniemi OP, Sudar D, Rutovitz D, Gray JW, Waldman F, Pinkel D:<br />

Comparative genomic hybridization for molecular cytogenetic analysis <strong>of</strong> solid<br />

tumors. Science 1992, 258(5083):818-821.<br />

46. Garnis C, Buys TP, Lam WL: Genetic alteration and gene expression modulation<br />

during cancer progression. Mol Cancer 2004, 3:9.<br />

47. Gebhart E: Genomic imbalances in human leukemia and lymphoma detected by<br />

comparative genomic hybridization (Review). Int J Oncol 2005, 27(3):593-606.<br />

48. Gebhart E, Liehr T: Patterns <strong>of</strong> genomic imbalances in human solid tumors<br />

(Review). Int J Oncol 2000, 16(2):383-399.<br />

49. Cahill DP, Lengauer C, Yu J, Riggins GJ, Willson JK, Markowitz SD, Kinzler KW,<br />

Vogelstein B: Mutations <strong>of</strong> mitotic checkpoint genes in human cancers. Nature<br />

1998, 392(6673):300-303.<br />

50. Fukasawa K: Centrosome amplification, chromosome instability and cancer<br />

development. Cancer Lett 2005, 230(1):6-19.<br />

51. Lingle WL, Lukasiewicz K, Salisbury JL: Deregulation <strong>of</strong> the centrosome cycle and<br />

the origin <strong>of</strong> chromosomal instability in cancer. Adv Exp Med Biol 2005, 570:393-<br />

421.<br />

52. Chin K, de Solorzano CO, Knowles D, Jones A, Chou W, Rodriguez EG, Kuo WL, Ljung<br />

BM, Chew K, Myambo K et al: In situ analyses <strong>of</strong> genome instability in breast<br />

cancer. Nat Genet 2004, 36(9):984-988.<br />

53. O'Hagan RC, Chang S, Maser RS, Mohan R, Artandi SE, Chin L, DePinho RA:<br />

Telomere dysfunction provokes regional amplification and deletion in cancer<br />

genomes. Cancer Cell 2002, 2(2):149-155.<br />

54. Green AR: Transcription factors, translocations and haematological malignancies.<br />

Blood Rev 1992, 6(2):118-124.<br />

55. Rowley JD: Chromosomal translocations: revisited yet again. Blood 2008,<br />

112(6):2183-2189.<br />

56. Watson SK, deLeeuw RJ, Horsman DE, Squire JA, Lam WL: Cytogenetically balanced<br />

translocations are associated with focal copy number alterations. Hum Genet<br />

2007, 120(6):795-805.<br />

57. Brenner JC, Chinnaiyan AM: Translocations in epithelial cancers. Biochim Biophys<br />

Acta 2009, 1796(2):201-215.<br />

146


58. Mani RS, Tomlins SA, Callahan K, Ghosh A, Nyati MK, Varambally S, Palanisamy N,<br />

Chinnaiyan AM: Induced chromosomal proximity and gene fusions in prostate<br />

cancer. Science 2009, 326(5957):1230.<br />

59. Tomlins SA, Rhodes DR, Perner S, Dhanasekaran SM, Mehra R, Sun XW, Varambally<br />

S, Cao X, Tchinda J, Kuefer R et al: Recurrent fusion <strong>of</strong> TMPRSS2 and ETS<br />

transcription factor genes in prostate cancer. Science 2005, 310(5748):644-648.<br />

60. Dang TP, Gazdar AF, Virmani AK, Sepetavec T, Hande KR, Minna JD, Roberts JR,<br />

Carbone DP: Chromosome 19 translocation, overexpression <strong>of</strong> Notch3, and human<br />

lung cancer. J Natl Cancer Inst 2000, 92(16):1355-1357.<br />

61. Soda M, Choi YL, Enomoto M, Takada S, Yamashita Y, Ishikawa S, Fujiwara S,<br />

Watanabe H, Kurashina K, Hatanaka H et al: Identification <strong>of</strong> the transforming EML4-<br />

ALK fusion gene in non-small-cell lung cancer. Nature 2007, 448(7153):561-566.<br />

62. Knutsen T, Gobu V, Knaus R, Padilla-Nash H, Augustus M, Strausberg RL, Kirsch IR,<br />

Sirotkin K, Ried T: The interactive online SKY/M-FISH & CGH database and the<br />

Entrez cancer chromosomes search database: linkage <strong>of</strong> chromosomal<br />

aberrations with the genome sequence. Genes Chromosomes Cancer 2005,<br />

44(1):52-64.<br />

63. Albertson DG, Collins C, McCormick F, Gray JW: Chromosome aberrations in solid<br />

tumors. Nat Genet 2003, 34(4):369-376.<br />

64. Coe BP, Ylstra B, Carvalho B, Meijer GA, Macaulay C, Lam WL: Resolving the<br />

resolution <strong>of</strong> array CGH. Genomics 2007, 89(5):647-653.<br />

65. Lockwood WW, Chari R, Chi B, Lam WL: Recent advances in array comparative<br />

genomic hybridization technologies and their applications in human genetics. Eur<br />

J Hum Genet 2006, 14(2):139-148.<br />

66. Pollack JR, Perou CM, Alizadeh AA, Eisen MB, Pergamenschikov A, Williams CF,<br />

Jeffrey SS, Botstein D, Brown PO: Genome-wide analysis <strong>of</strong> DNA copy-number<br />

changes using cDNA microarrays. Nat Genet 1999, 23(1):41-46.<br />

67. Almagro-Garcia J, Manske M, Carret C, Campino S, Auburn S, Macinnis BL, Maslen G,<br />

Pain A, Newbold CI, Kwiatkowski DP et al: SnoopCGH: s<strong>of</strong>tware for visualizing<br />

comparative genomic hybridization data. Bioinformatics 2009, 25(20):2732-2733.<br />

68. Chari R, Lockwood WW, Lam WL: Computational methods for the analysis <strong>of</strong> array<br />

comparative genomic hybridization. Cancer Inform 2007, 2:48-58.<br />

69. Chi B, DeLeeuw RJ, Coe BP, MacAulay C, Lam WL: SeeGH--a s<strong>of</strong>tware tool for<br />

visualization <strong>of</strong> whole genome array comparative genomic hybridization data.<br />

BMC Bioinformatics 2004, 5:13.<br />

70. Chi B, deLeeuw RJ, Coe BP, Ng RT, MacAulay C, Lam WL: MD-SeeGH: a platform for<br />

integrative analysis <strong>of</strong> multi-dimensional genomic data. BMC Bioinformatics 2008,<br />

9:243.<br />

71. Venkatraman ES, Olshen AB: A faster circular binary segmentation algorithm for<br />

the analysis <strong>of</strong> array CGH data. Bioinformatics 2007, 23(6):657-663.<br />

72. Bignell GR, Huang J, Greshock J, Watt S, Butler A, West S, Grigorova M, Jones KW,<br />

Wei W, Stratton MR et al: High-resolution analysis <strong>of</strong> DNA copy number using<br />

oligonucleotide microarrays. Genome Res 2004, 14(2):287-295.<br />

73. Iacobucci I, Storlazzi CT, Cilloni D, Lonetti A, Ottaviani E, Soverini S, Astolfi A, Chiaretti<br />

S, Vitale A, Messa F et al: Identification and molecular characterization <strong>of</strong> recurrent<br />

genomic deletions on 7p12 in the IKZF1 gene in a large cohort <strong>of</strong> BCR-ABL1positive<br />

acute lymphoblastic leukemia patients: on behalf <strong>of</strong> Gruppo Italiano<br />

Malattie Ematologiche dell'Adulto Acute Leukemia Working Party (GIMEMA AL<br />

WP). Blood 2009, 114(10):2159-2167.<br />

74. Niini T, Lopez-Guerrero JA, Ninomiya S, Guled M, Hattinger CM, Michelacci F, Bohling<br />

T, Llombart-Bosch A, Picci P, Serra M et al: Frequent deletion <strong>of</strong> CDKN2A and<br />

recurrent coamplification <strong>of</strong> KIT, PDGFRA, and KDR in fibrosarcoma <strong>of</strong> bone-An<br />

array comparative genomic hybridization study. Genes Chromosomes Cancer 2010,<br />

49(2):132-143.<br />

147


75. Selzer RR, Richmond TA, P<strong>of</strong>ahl NJ, Green RD, Eis PS, Nair P, Brothman AR, Stallings<br />

RL: Analysis <strong>of</strong> chromosome breakpoints in neuroblastoma at sub-kilobase<br />

resolution using fine-tiling oligonucleotide array CGH. Genes Chromosomes<br />

Cancer 2005, 44(3):305-319.<br />

76. Zhao X, Li C, Paez JG, Chin K, Janne PA, Chen TH, Girard L, Minna J, Christiani D, Leo<br />

C et al: An integrated view <strong>of</strong> copy number and allelic alterations in the cancer<br />

genome using single nucleotide polymorphism arrays. Cancer Res 2004,<br />

64(9):3060-3071.<br />

77. Wang TL, Maierh<strong>of</strong>er C, Speicher MR, Lengauer C, Vogelstein B, Kinzler KW,<br />

Velculescu VE: Digital karyotyping. Proc Natl Acad Sci U S A 2002, 99(25):16156-<br />

16161.<br />

78. Tuzun E, Sharp AJ, Bailey JA, Kaul R, Morrison VA, Pertz LM, Haugen E, Hayden H,<br />

Albertson D, Pinkel D et al: Fine-scale structural variation <strong>of</strong> the human genome.<br />

Nat Genet 2005, 37(7):727-732.<br />

79. Volik S, Raphael BJ, Huang G, Stratton MR, Bignel G, Murnane J, Brebner JH,<br />

Bajsarowicz K, Paris PL, Tao Q et al: Decoding the fine-scale structure <strong>of</strong> a breast<br />

cancer genome and transcriptome. Genome Res 2006, 16(3):394-404.<br />

80. Volik S, Zhao S, Chin K, Brebner JH, Herndon DR, Tao Q, Kowbel D, Huang G, Lapuk<br />

A, Kuo WL et al: End-sequence pr<strong>of</strong>iling: sequence-based analysis <strong>of</strong> aberrant<br />

genomes. Proc Natl Acad Sci U S A 2003, 100(13):7696-7701.<br />

81. McPherson JD: Next-generation gap. Nat Methods 2009, 6(11 Suppl):S2-5.<br />

82. Alkan C, Kidd JM, Marques-Bonet T, Aksay G, Antonacci F, Hormozdiari F, Kitzman JO,<br />

Baker C, Malig M, Mutlu O et al: Personalized copy number and segmental<br />

duplication maps using next-generation sequencing. Nat Genet 2009, 41(10):1061-<br />

1067.<br />

83. Conrad DF, Andrews TD, Carter NP, Hurles ME, Pritchard JK: A high-resolution<br />

survey <strong>of</strong> deletion polymorphism in the human genome. Nat Genet 2006, 38(1):75-<br />

81.<br />

84. Conrad DF, Pinto D, Redon R, Feuk L, Gokcumen O, Zhang Y, Aerts J, Andrews TD,<br />

Barnes C, Campbell P et al: Origins and functional impact <strong>of</strong> copy number variation<br />

in the human genome. Nature 2009.<br />

85. Fiegler H, Redon R, Andrews D, Scott C, Andrews R, Carder C, Clark R, Dovey O, Ellis<br />

P, Feuk L et al: Accurate and reliable high-throughput detection <strong>of</strong> copy number<br />

variation in the human genome. Genome Res 2006, 16(12):1566-1574.<br />

86. Jakobsson M, Scholz SW, Scheet P, Gibbs JR, VanLiere JM, Fung HC, Szpiech ZA,<br />

Degnan JH, Wang K, Guerreiro R et al: Genotype, haplotype and copy-number<br />

variation in worldwide human populations. Nature 2008, 451(7181):998-1003.<br />

87. Kidd JM, Cooper GM, Donahue WF, Hayden HS, Sampas N, Graves T, Hansen N,<br />

Teague B, Alkan C, Antonacci F et al: Mapping and sequencing <strong>of</strong> structural<br />

variation from eight human genomes. Nature 2008, 453(7191):56-64.<br />

88. McCarroll SA, Kuruvilla FG, Korn JM, Cawley S, Nemesh J, Wysoker A, Shapero MH,<br />

de Bakker PI, Maller JB, Kirby A et al: Integrated detection and population-genetic<br />

analysis <strong>of</strong> SNPs and copy number variation. Nat Genet 2008, 40(10):1166-1174.<br />

89. Shaikh TH, Gai X, Perin JC, Glessner JT, Xie H, Murphy K, O'Hara R, Casalunovo T,<br />

Conlin LK, D'Arcy M et al: High-resolution mapping and analysis <strong>of</strong> copy number<br />

variations in the human genome: a data resource for clinical and research<br />

applications. Genome Res 2009, 19(9):1682-1690.<br />

90. Hastings PJ, Ira G, Lupski JR: A microhomology-mediated break-induced<br />

replication model for the origin <strong>of</strong> human copy number variation. PLoS Genet<br />

2009, 5(1):e1000327.<br />

91. Diskin SJ, Hou C, Glessner JT, Attiyeh EF, Laudenslager M, Bosse K, Cole K, Mosse<br />

YP, Wood A, Lynch JE et al: Copy number variation at 1q21.1 associated with<br />

neuroblastoma. Nature 2009, 459(7249):987-991.<br />

148


92. Lockwood WW, Chari R, Coe BP, Girard L, Macaulay C, Lam S, Gazdar AF, Minna JD,<br />

Lam WL: DNA amplification is a ubiquitous mechanism <strong>of</strong> oncogene activation in<br />

lung and other cancers. Oncogene 2008, 27(33):4615-4624.<br />

93. Myllykangas S, Himberg J, Bohling T, Nagy B, Hollmen J, Knuutila S: DNA copy<br />

number amplification pr<strong>of</strong>iling <strong>of</strong> human neoplasms. Oncogene 2006, 25(55):7324-<br />

7332.<br />

94. Teschendorff AE, Caldas C: The breast cancer somatic 'muta-ome': tackling the<br />

complexity. Breast Cancer Res 2009, 11(2):301.<br />

95. Chin SF, Teschendorff AE, Marioni JC, Wang Y, Barbosa-Morais NL, Thorne NP, Costa<br />

JL, Pinder SE, van de Wiel MA, Green AR et al: High-resolution aCGH and<br />

expression pr<strong>of</strong>iling identifies a novel genomic subtype <strong>of</strong> ER negative breast<br />

cancer. Genome Biol 2007, 8(10):R215.<br />

96. Coe BP, Lockwood WW, Girard L, Chari R, Macaulay C, Lam S, Gazdar AF, Minna JD,<br />

Lam WL: Differential disruption <strong>of</strong> cell cycle pathways in small cell and non-small<br />

cell lung cancer. Br J Cancer 2006, 94(12):1927-1935.<br />

97. Bass AJ, Watanabe H, Mermel CH, Yu S, Perner S, Verhaak RG, Kim SY, Wardwell L,<br />

Tamayo P, Gat-Viks I et al: SOX2 is an amplified lineage-survival oncogene in lung<br />

and esophageal squamous cell carcinomas. Nat Genet 2009, 41(11):1238-1242.<br />

98. Garraway LA, Widlund HR, Rubin MA, Getz G, Berger AJ, Ramaswamy S, Beroukhim<br />

R, Milner DA, Granter SR, Du J et al: Integrative genomic analyses identify MITF as<br />

a lineage survival oncogene amplified in malignant melanoma. Nature 2005,<br />

436(7047):117-122.<br />

99. Weir BA, Woo MS, Getz G, Perner S, Ding L, Beroukhim R, Lin WM, Province MA, Kraja<br />

A, Johnson LA et al: Characterizing the cancer genome in lung adenocarcinoma.<br />

Nature 2007, 450(7171):893-898.<br />

100. Kwei KA, Kim YH, Girard L, Kao J, Pacyna-Gengelbach M, Salari K, Lee J, Choi YL,<br />

Sato M, Wang P et al: Genomic pr<strong>of</strong>iling identifies TITF1 as a lineage-specific<br />

oncogene amplified in lung cancer. Oncogene 2008, 27(25):3635-3640.<br />

101. Plomin R, Haworth CM, Davis OS: Common disorders are quantitative traits. Nat<br />

Rev Genet 2009, 10(12):872-878.<br />

102. Savas S, Liu G: Genetic variations as cancer prognostic markers: review and<br />

update. Hum Mutat 2009, 30(10):1369-1377.<br />

103. Ansorge WJ: Next-generation DNA sequencing techniques. N Biotechnol 2009,<br />

25(4):195-203.<br />

104. Shah SP, Kobel M, Senz J, Morin RD, Clarke BA, Wiegand KC, Leung G, Zayed A, Mehl<br />

E, Kalloger SE et al: Mutation <strong>of</strong> FOXL2 in granulosa-cell tumors <strong>of</strong> the ovary. N<br />

Engl J Med 2009, 360(26):2719-2729.<br />

105. Greenman C, Stephens P, Smith R, Dalgliesh GL, Hunter C, Bignell G, Davies H,<br />

Teague J, Butler A, Stevens C et al: Patterns <strong>of</strong> somatic mutation in human cancer<br />

genomes. Nature 2007, 446(7132):153-158.<br />

106. Stratton MR, Campbell PJ, Futreal PA: The cancer genome. Nature 2009,<br />

458(7239):719-724.<br />

107. Cavenee WK, Hansen MF, Nordenskjold M, Kock E, Maumenee I, Squire JA, Phillips<br />

RA, Gallie BL: Genetic origin <strong>of</strong> mutations predisposing to retinoblastoma. Science<br />

1985, 228(4698):501-503.<br />

108. Knudson AG, Jr.: Mutation and cancer: statistical study <strong>of</strong> retinoblastoma. Proc Natl<br />

Acad Sci U S A 1971, 68(4):820-823.<br />

109. Benz CC, Fedele V, Xu F, Ylstra B, Ginzinger D, Yu M, Moore D, Hall RK, Wolff DJ,<br />

Disis ML et al: Altered promoter usage characterizes monoallelic transcription<br />

arising with ERBB2 amplification in human breast cancers. Genes Chromosomes<br />

Cancer 2006, 45(11):983-994.<br />

110. LaFramboise T, Weir BA, Zhao X, Beroukhim R, Li C, Harrington D, Sellers WR,<br />

Meyerson M: Allele-specific amplification in cancer revealed by SNP array<br />

analysis. PLoS Comput Biol 2005, 1(6):e65.<br />

149


111. Melcher R, Al-Taie O, Kudlich T, Hartmann E, Maisch S, Steinlein C, Schmid M,<br />

Rosenwald A, Menzel T, Scheppach W et al: SNP-Array genotyping and spectral<br />

karyotyping reveal uniparental disomy as early mutational event in MSS- and MSIcolorectal<br />

cancer cell lines. Cytogenet Genome Res 2007, 118(2-4):214-221.<br />

112. Nomura M, Shigematsu H, Li L, Suzuki M, Takahashi T, Estess P, Siegelman M, Feng<br />

Z, Kato H, Marchetti A et al: Polymorphisms, mutations, and amplification <strong>of</strong> the<br />

EGFR gene in non-small cell lung cancers. PLoS Med 2007, 4(4):e125.<br />

113. Sholl LM, Yeap BY, Iafrate AJ, Holmes-Tisch AJ, Chou YP, Wu MT, Goan YG, Su L,<br />

Benedettini E, Yu J et al: Lung adenocarcinoma with EGFR amplification has<br />

distinct clinicopathologic and molecular features in never-smokers. Cancer Res<br />

2009, 69(21):8341-8348.<br />

114. Soh J, Okumura N, Lockwood WW, Yamamoto H, Shigematsu H, Zhang W, Chari R,<br />

Shames DS, Tang X, MacAulay C et al: Oncogene mutations, copy number gains<br />

and mutant allele specific imbalance (MASI) frequently occur together in tumor<br />

cells. PLoS One 2009, 4(10):e7464.<br />

115. Bacolod MD, Schemmann GS, Giardina SF, Paty P, Notterman DA, Barany F:<br />

Emerging paradigms in cancer genetics: some important findings from highdensity<br />

single nucleotide polymorphism array studies. Cancer Res 2009, 69(3):723-<br />

727.<br />

116. Robinson WP: Mechanisms leading to uniparental disomy and their clinical<br />

consequences. Bioessays 2000, 22(5):452-459.<br />

117. Tuna M, Knuutila S, Mills GB: Uniparental disomy in cancer. Trends Mol Med 2009,<br />

15(3):120-128.<br />

118. Zhu X, Dunn JM, Goddard AD, Squire JA, Becker A, Phillips RA, Gallie BL:<br />

Mechanisms <strong>of</strong> loss <strong>of</strong> heterozygosity in retinoblastoma. Cytogenet Cell Genet<br />

1992, 59(4):248-252.<br />

119. Gondek LP, Dunbar AJ, Szpurka H, McDevitt MA, Maciejewski JP: SNP array<br />

karyotyping allows for the detection <strong>of</strong> uniparental disomy and cryptic<br />

chromosomal abnormalities in MDS/MPD-U and MPD. PLoS One 2007, 2(11):e1225.<br />

120. Tiu RV, Gondek LP, O'Keefe CL, Huh J, Sekeres MA, Elson P, McDevitt MA, Wang XF,<br />

Levis MJ, Karp JE et al: New lesions detected by single nucleotide polymorphism<br />

array-based chromosomal analysis have important clinical impact in acute<br />

myeloid leukemia. J Clin Oncol 2009, 27(31):5219-5226.<br />

121. Yamamoto G, Nannya Y, Kato M, Sanada M, Levine RL, Kawamata N, Hangaishi A,<br />

Kurokawa M, Chiba S, Gilliland DG et al: Highly sensitive method for genomewide<br />

detection <strong>of</strong> allelic composition in nonpaired, primary tumor specimens by use <strong>of</strong><br />

affymetrix single-nucleotide-polymorphism genotyping microarrays. Am J Hum<br />

Genet 2007, 81(1):114-126.<br />

122. Darbary HK, Dutt SS, Sait SJ, Nowak NJ, Heinaman RE, Stoler DL, Anderson GR:<br />

Uniparentalism in sporadic colorectal cancer is independent <strong>of</strong> imprint status, and<br />

coordinate for chromosomes 14 and 18. Cancer Genet Cytogenet 2009, 189(2):77-<br />

86.<br />

123. Grand FH, Hidalgo-Curtis CE, Ernst T, Zoi K, Zoi C, McGuire C, Kreil S, Jones A, Score<br />

J, Metzgeroth G et al: Frequent CBL mutations associated with 11q acquired<br />

uniparental disomy in myeloproliferative neoplasms. Blood 2009, 113(24):6182-<br />

6192.<br />

124. Gupta M, Raghavan M, Gale RE, Chelala C, Allen C, Molloy G, Chaplin T, Linch DC,<br />

Cazier JB, Young BD: Novel regions <strong>of</strong> acquired uniparental disomy discovered in<br />

acute myeloid leukemia. Genes Chromosomes Cancer 2008, 47(9):729-739.<br />

125. Kawamata N, Ogawa S, Gueller S, Ross SH, Huynh T, Chen J, Chang A, Nabavi-Nouis<br />

S, Megrabian N, Siebert R et al: Identified hidden genomic changes in mantle cell<br />

lymphoma using high-resolution single nucleotide polymorphism genomic array.<br />

Exp Hematol 2009, 37(8):937-946.<br />

150


126. Makishima H, Cazzolli H, Szpurka H, Dunbar A, Tiu R, Huh J, Muramatsu H, O'Keefe C,<br />

Hsi E, Paquette RL et al: Mutations <strong>of</strong> e3 ubiquitin ligase cbl family members<br />

constitute a novel common pathogenic lesion in myeloid malignancies. J Clin<br />

Oncol 2009, 27(36):6109-6116.<br />

127. Walter MJ, Payton JE, Ries RE, Shannon WD, Deshmukh H, Zhao Y, Baty J, Heath S,<br />

Westervelt P, Watson MA et al: Acquired copy number alterations in adult acute<br />

myeloid leukemia genomes. Proc Natl Acad Sci U S A 2009, 106(31):12950-12955.<br />

128. Yin D, Ogawa S, Kawamata N, Tunici P, Finocchiaro G, Eoli M, Ruckert C, Huynh T, Liu<br />

G, Kato M et al: High-resolution genomic copy number pr<strong>of</strong>iling <strong>of</strong> glioblastoma<br />

multiforme by single nucleotide polymorphism DNA microarray. Mol Cancer Res<br />

2009, 7(5):665-677.<br />

129. Purdie KJ, Lambert SR, Teh MT, Chaplin T, Molloy G, Raghavan M, Kelsell DP, Leigh<br />

IM, Harwood CA, Proby CM et al: Allelic imbalances and microdeletions affecting<br />

the PTPRD gene in cutaneous squamous cell carcinomas detected using single<br />

nucleotide polymorphism microarray analysis. Genes Chromosomes Cancer 2007,<br />

46(7):661-669.<br />

130. Akagi T, Ito T, Kato M, Jin Z, Cheng Y, Kan T, Yamamoto G, Olaru A, Kawamata N,<br />

Boult J et al: Chromosomal abnormalities and novel disease-related regions in<br />

progression from Barrett's esophagus to esophageal adenocarcinoma. Int J<br />

Cancer 2009, 125(10):2349-2359.<br />

131. Andersen CL, Wiuf C, Kruh<strong>of</strong>fer M, Korsgaard M, Laurberg S, Ornt<strong>of</strong>t TF: Frequent<br />

occurrence <strong>of</strong> uniparental disomy in colorectal cancer. Carcinogenesis 2007,<br />

28(1):38-48.<br />

132. Forbes SA, Tang G, Bindal N, Bamford S, Dawson E, Cole C, Kok CY, Jia M, Ewing R,<br />

Menzies A et al: COSMIC (the Catalogue <strong>of</strong> Somatic Mutations in Cancer): a<br />

resource to investigate acquired mutations in human cancer. Nucleic Acids Res<br />

2010, 38(Database issue):D652-657.<br />

133. Kerkel K, Spadola A, Yuan E, Kosek J, Jiang L, Hod E, Li K, Murty VV, Schupf N, Vilain<br />

E et al: Genomic surveys by methylation-sensitive SNP analysis identify<br />

sequence-dependent allele-specific DNA methylation. Nat Genet 2008, 40(7):904-<br />

908.<br />

134. Jones PA, Baylin SB: The epigenomics <strong>of</strong> cancer. Cell 2007, 128(4):683-692.<br />

135. Esteller M: Epigenetics in cancer. N Engl J Med 2008, 358(11):1148-1159.<br />

136. Feinberg AP: Phenotypic plasticity and the epigenetics <strong>of</strong> human disease. Nature<br />

2007, 447(7143):433-440.<br />

137. Vucic EA, Brown CJ, Lam WL: Epigenetics <strong>of</strong> cancer progression.<br />

Pharmacogenomics 2008, 9(2):215-234.<br />

138. Feinberg AP, Gehrke CW, Kuo KC, Ehrlich M: Reduced genomic 5-methylcytosine<br />

content in human colonic neoplasia. Cancer Res 1988, 48(5):1159-1161.<br />

139. Feinberg AP, Tycko B: The history <strong>of</strong> cancer epigenetics. Nat Rev Cancer 2004,<br />

4(2):143-153.<br />

140. Lo PK, Sukumar S: Epigenomics and breast cancer. Pharmacogenomics 2008,<br />

9(12):1879-1902.<br />

141. Toyota M, Ahuja N, Ohe-Toyota M, Herman JG, Baylin SB, Issa JP: CpG island<br />

methylator phenotype in colorectal cancer. Proc Natl Acad Sci U S A 1999,<br />

96(15):8681-8686.<br />

142. Issa JP: CpG island methylator phenotype in cancer. Nat Rev Cancer 2004,<br />

4(12):988-993.<br />

143. Tanemura A, Terando AM, Sim MS, van Hoesel AQ, de Maat MF, Morton DL, Hoon DS:<br />

CpG island methylator phenotype predicts progression <strong>of</strong> malignant melanoma.<br />

Clin Cancer Res 2009, 15(5):1801-1807.<br />

144. Dai Z, Lakshmanan RR, Zhu WG, Smiraglia DJ, Rush LJ, Fruhwald MC, Brena RM, Li<br />

B, Wright FA, Ross P et al: Global methylation pr<strong>of</strong>iling <strong>of</strong> lung cancer identifies<br />

novel methylated genes. Neoplasia 2001, 3(4):314-323.<br />

151


145. Takai D, Yagi Y, Wakazono K, Ohishi N, Morita Y, Sugimura T, Ushijima T: Silencing <strong>of</strong><br />

HTR1B and reduced expression <strong>of</strong> EDN1 in human lung cancers, revealed by<br />

methylation-sensitive representational difference analysis. Oncogene 2001,<br />

20(51):7505-7513.<br />

146. Hu M, Yao J, Cai L, Bachman KE, van den Brule F, Velculescu V, Polyak K: Distinct<br />

epigenetic changes in the stromal cells <strong>of</strong> breast cancers. Nat Genet 2005,<br />

37(8):899-905.<br />

147. Irizarry RA, Ladd-Acosta C, Carvalho B, Wu H, Brandenburg SA, Jeddeloh JA, Wen B,<br />

Feinberg AP: Comprehensive high-throughput arrays for relative methylation<br />

(CHARM). Genome Res 2008, 18(5):780-790.<br />

148. Yan PS, Chen CM, Shi H, Rahmatpanah F, Wei SH, Caldwell CW, Huang TH:<br />

Dissecting complex epigenetic alterations in breast cancer using CpG island<br />

microarrays. Cancer Res 2001, 61(23):8375-8380.<br />

149. Yamamoto F, Yamamoto M: A DNA microarray-based methylation-sensitive (MS)-<br />

AFLP hybridization method for genetic and epigenetic analyses. Mol Genet<br />

Genomics 2004, 271(6):678-686.<br />

150. Omura N, Li CP, Li A, Hong SM, Walter K, Jimeno A, Hidalgo M, Goggins M: Genomewide<br />

pr<strong>of</strong>iling <strong>of</strong> methylated promoters in pancreatic adenocarcinoma. Cancer Biol<br />

Ther 2008, 7(7):1146-1156.<br />

151. Trinh BN, Long TI, Laird PW: DNA methylation analysis by MethyLight technology.<br />

Methods 2001, 25(4):456-462.<br />

152. Fan JB, Gunderson KL, Bibikova M, Yeakley JM, Chen J, Wickham Garcia E, Lebruska<br />

LL, Laurent M, Shen R, Barker D: Illumina universal bead arrays. Methods Enzymol<br />

2006, 410:57-73.<br />

153. Houshdaran S, Cortessis VK, Siegmund K, Yang A, Laird PW, Sokol RZ: Widespread<br />

epigenetic abnormalities suggest a broad DNA methylation erasure defect in<br />

abnormal human sperm. PLoS One 2007, 2(12):e1289.<br />

154. Houseman EA, Christensen BC, Karagas MR, Wrensch MR, Nelson HH, Wiemels JL,<br />

Zheng S, Wiencke JK, Kelsey KT, Marsit CJ: Copy number variation has little impact<br />

on bead-array-based measures <strong>of</strong> DNA methylation. Bioinformatics 2009,<br />

25(16):1999-2005.<br />

155. Breton CV, Byun HM, Wenten M, Pan F, Yang A, Gilliland FD: Prenatal tobacco<br />

smoke exposure affects global and gene-specific DNA methylation. Am J Respir<br />

Crit Care Med 2009, 180(5):462-467.<br />

156. Taylor KH, Pena-Hernandez KE, Davis JW, Arthur GL, Duff DJ, Shi H, Rahmatpanah<br />

FB, Sjahputera O, Caldwell CW: Large-scale CpG methylation analysis identifies<br />

novel candidate genes and reveals methylation hotspots in acute lymphoblastic<br />

leukemia. Cancer Res 2007, 67(6):2617-2625.<br />

157. Weber M, Hellmann I, Stadler MB, Ramos L, Paabo S, Rebhan M, Schubeler D:<br />

Distribution, silencing potential and evolutionary impact <strong>of</strong> promoter DNA<br />

methylation in the human genome. Nat Genet 2007, 39(4):457-466.<br />

158. Rauch T, Pfeifer GP: Methylated-CpG island recovery assay: a new technique for<br />

the rapid detection <strong>of</strong> methylated-CpG islands in cancer. Lab Invest 2005,<br />

85(9):1172-1180.<br />

159. Jacinto FV, Ballestar E, Ropero S, Esteller M: Discovery <strong>of</strong> epigenetically silenced<br />

genes by methylated DNA immunoprecipitation in colon cancer cells. Cancer Res<br />

2007, 67(24):11481-11486.<br />

160. Ballestar E, Paz MF, Valle L, Wei S, Fraga MF, Espada J, Cigudosa JC, Huang TH,<br />

Esteller M: Methyl-CpG binding proteins identify novel sites <strong>of</strong> epigenetic<br />

inactivation in human cancer. EMBO J 2003, 22(23):6335-6345.<br />

161. Serre D, Lee BH, Ting AH: MBD-isolated Genome Sequencing provides a highthroughput<br />

and comprehensive survey <strong>of</strong> DNA methylation in the human genome.<br />

Nucleic Acids Res 2009.<br />

152


162. Down TA, Rakyan VK, Turner DJ, Flicek P, Li H, Kulesha E, Graf S, Johnson N, Herrero<br />

J, Tomazou EM et al: A Bayesian deconvolution strategy for immunoprecipitationbased<br />

DNA methylome analysis. Nat Biotechnol 2008, 26(7):779-785.<br />

163. Thu KL, Pikor LA, Kennett JY, Alvarez CE, Lam WL: Methylation analysis by DNA<br />

immunoprecipitation. J Cell Physiol 2009, 222(3):522-531.<br />

164. Pelizzola M, Koga Y, Urban AE, Krauthammer M, Weissman S, Halaban R, Molinaro<br />

AM: MEDME: an experimental and analytical methodology for the estimation <strong>of</strong><br />

DNA methylation levels based on microarray derived MeDIP-enrichment. Genome<br />

Res 2008, 18(10):1652-1659.<br />

165. Yamashita S, Hosoya K, Gyobu K, Takeshima H, Ushijima T: Development <strong>of</strong> a Novel<br />

Output Value for Quantitative Assessment in Methylated DNA<br />

Immunoprecipitation-CpG Island Microarray Analysis. DNA Res 2009.<br />

166. Irizarry RA, Ladd-Acosta C, Wen B, Wu Z, Montano C, Onyango P, Cui H, Gabo K,<br />

Rongione M, Webster M et al: The human colon cancer methylome shows similar<br />

hypo- and hypermethylation at conserved tissue-specific CpG island shores. Nat<br />

Genet 2009, 41(2):178-186.<br />

167. Lorincz MC, Dickerson DR, Schmitt M, Groudine M: Intragenic DNA methylation alters<br />

chromatin structure and elongation efficiency in mammalian cells. Nat Struct Mol<br />

Biol 2004, 11(11):1068-1075.<br />

168. Frigola J, Song J, Stirzaker C, Hinshelwood RA, Peinado MA, Clark SJ: Epigenetic<br />

remodeling in colorectal cancer results in coordinate gene suppression across an<br />

entire chromosome band. Nat Genet 2006, 38(5):540-549.<br />

169. Zhong S, Fields CR, Su N, Pan YX, Robertson KD: Pharmacologic inhibition <strong>of</strong><br />

epigenetic modifications, coupled with gene expression pr<strong>of</strong>iling, reveals novel<br />

targets <strong>of</strong> aberrant DNA methylation and histone deacetylation in lung cancer.<br />

Oncogene 2007, 26(18):2621-2634.<br />

170. Lister R, Ecker JR: Finding the fifth base: genome-wide sequencing <strong>of</strong> cytosine<br />

methylation. Genome Res 2009, 19(6):959-966.<br />

171. Byun HM, Siegmund KD, Pan F, Weisenberger DJ, Kanel G, Laird PW, Yang AS:<br />

Epigenetic pr<strong>of</strong>iling <strong>of</strong> somatic tissues from human autopsy specimens identifies<br />

tissue- and individual-specific DNA methylation patterns. Hum Mol Genet 2009,<br />

18(24):4808-4817.<br />

172. Fraga MF, Ballestar E, Paz MF, Ropero S, Setien F, Ballestar ML, Heine-Suner D,<br />

Cigudosa JC, Urioste M, Benitez J et al: Epigenetic differences arise during the<br />

lifetime <strong>of</strong> monozygotic twins. Proc Natl Acad Sci U S A 2005, 102(30):10604-10609.<br />

173. Deng J, Shoemaker R, Xie B, Gore A, LeProust EM, Antosiewicz-Bourget J, Egli D,<br />

Maherali N, Park IH, Yu J et al: Targeted bisulfite sequencing reveals changes in<br />

DNA methylation associated with nuclear reprogramming. Nat Biotechnol 2009,<br />

27(4):353-360.<br />

174. Costello JF, Fruhwald MC, Smiraglia DJ, Rush LJ, Robertson GP, Gao X, Wright FA,<br />

Feramisco JD, Peltomaki P, Lang JC et al: Aberrant CpG-island methylation has nonrandom<br />

and tumour-type-specific patterns. Nat Genet 2000, 24(2):132-138.<br />

175. Gama-Sosa MA, Midgett RM, Slagel VA, Githens S, Kuo KC, Gehrke CW, Ehrlich M:<br />

Tissue-specific differences in DNA methylation in various mammals. Biochim<br />

Biophys Acta 1983, 740(2):212-219.<br />

176. Richardson B: Impact <strong>of</strong> aging on DNA methylation. Ageing Res Rev 2003, 2(3):245-<br />

261.<br />

177. Eckhardt F, Beck S, Gut IG, Berlin K: Future potential <strong>of</strong> the Human Epigenome<br />

Project. Expert Rev Mol Diagn 2004, 4(5):609-618.<br />

178. Kohda M, Hoshiya H, Katoh M, Tanaka I, Masuda R, Takemura T, Fujiwara M,<br />

Oshimura M: Frequent loss <strong>of</strong> imprinting <strong>of</strong> IGF2 and MEST in lung<br />

adenocarcinoma. Mol Carcinog 2001, 31(4):184-191.<br />

153


179. Kondo M, Suzuki H, Ueda R, Osada H, Takagi K, Takahashi T: Frequent loss <strong>of</strong><br />

imprinting <strong>of</strong> the H19 gene is <strong>of</strong>ten associated with its overexpression in human<br />

lung cancers. Oncogene 1995, 10(6):1193-1198.<br />

180. Rainier S, Johnson LA, Dobry CJ, Ping AJ, Grundy PE, Feinberg AP: Relaxation <strong>of</strong><br />

imprinted genes in human cancer. Nature 1993, 362(6422):747-749.<br />

181. Pal N, Wadey RB, Buckle B, Yeomans E, Pritchard J, Cowell JK: Preferential loss <strong>of</strong><br />

maternal alleles in sporadic Wilms' tumour. Oncogene 1990, 5(11):1665-1668.<br />

182. Schroeder WT, Chao LY, Dao DD, Strong LC, Pathak S, Riccardi V, Lewis WH,<br />

Saunders GF: Nonrandom loss <strong>of</strong> maternal chromosome 11 alleles in Wilms<br />

tumors. Am J Hum Genet 1987, 40(5):413-420.<br />

183. Scrable H, Cavenee W, Ghavimi F, Lovell M, Morgan K, Sapienza C: A model for<br />

embryonal rhabdomyosarcoma tumorigenesis that involves genome imprinting.<br />

Proc Natl Acad Sci U S A 1989, 86(19):7480-7484.<br />

184. Gaudet F, Hodgson JG, Eden A, Jackson-Grusby L, Dausman J, Gray JW, Leonhardt H,<br />

Jaenisch R: Induction <strong>of</strong> tumors in mice by genomic hypomethylation. Science<br />

2003, 300(5618):489-492.<br />

185. Rizwana R, Hahn PJ: CpG methylation reduces genomic instability. J Cell Sci 1999,<br />

112 ( Pt 24):4513-4519.<br />

186. Daskalos A, Nikolaidis G, Xinarianos G, Savvari P, Cassidy A, Zakopoulou R, Kotsinas<br />

A, Gorgoulis V, Field JK, Liloglou T: Hypomethylation <strong>of</strong> retrotransposable elements<br />

correlates with genomic instability in non-small cell lung cancer. Int J Cancer 2009,<br />

124(1):81-87.<br />

187. Walsh CP, Chaillet JR, Bestor TH: Transcription <strong>of</strong> IAP endogenous retroviruses is<br />

constrained by cytosine methylation. Nat Genet 1998, 20(2):116-117.<br />

188. Chalitchagorn K, Shuangshoti S, Hourpai N, Kongruttanachok N, Tangkijvanich P,<br />

Thong-ngam D, Voravud N, Sriuranpong V, Mutirangura A: Distinctive pattern <strong>of</strong> LINE-<br />

1 methylation level in normal tissues and the association with carcinogenesis.<br />

Oncogene 2004, 23(54):8841-8846.<br />

189. Rauch TA, Zhong X, Wu X, Wang M, Kernstine KH, Wang Z, Riggs AD, Pfeifer GP:<br />

High-resolution mapping <strong>of</strong> DNA hypermethylation and hypomethylation in lung<br />

cancer. Proc Natl Acad Sci U S A 2008, 105(1):252-257.<br />

190. Groudine M, Eisenman R, Weintraub H: Chromatin structure <strong>of</strong> endogenous<br />

retroviral genes and activation by an inhibitor <strong>of</strong> DNA methylation. Nature 1981,<br />

292(5821):311-317.<br />

191. Wilson IM, Davies JJ, Weber M, Brown CJ, Alvarez CE, MacAulay C, Schubeler D, Lam<br />

WL: Epigenomics: mapping the methylome. Cell Cycle 2006, 5(2):155-158.<br />

192. Cadieux B, Ching TT, VandenBerg SR, Costello JF: Genome-wide hypomethylation in<br />

human glioblastomas associated with specific copy number alteration,<br />

methylenetetrahydr<strong>of</strong>olate reductase allele status, and increased proliferation.<br />

Cancer Res 2006, 66(17):8469-8476.<br />

193. Zabarovsky ER, Lerman MI, Minna JD: Tumor suppressor genes on chromosome 3p<br />

involved in the pathogenesis <strong>of</strong> lung and other cancers. Oncogene 2002,<br />

21(45):6915-6935.<br />

194. Belinsky SA, Palmisano WA, Gilliland FD, Crooks LA, Divine KK, Winters SA, Grimes<br />

MJ, Harms HJ, Tellez CS, Smith TM et al: Aberrant promoter methylation in<br />

bronchial epithelium and sputum from current and former smokers. Cancer Res<br />

2002, 62(8):2370-2377.<br />

195. Palmisano WA, Divine KK, Saccomanno G, Gilliland FD, Baylin SB, Herman JG,<br />

Belinsky SA: Predicting lung cancer by detecting aberrant promoter methylation in<br />

sputum. Cancer Res 2000, 60(21):5954-5958.<br />

196. Belinsky SA: Gene-promoter hypermethylation as a biomarker in lung cancer. Nat<br />

Rev Cancer 2004, 4(9):707-717.<br />

154


197. Tessema M, Willink R, Do K, Yu YY, Yu W, Machida EO, Brock M, Van Neste L, Stidley<br />

CA, Baylin SB et al: Promoter methylation <strong>of</strong> genes in and around the candidate<br />

lung cancer susceptibility locus 6q23-25. Cancer Res 2008, 68(6):1707-1714.<br />

198. Heintzman ND, Hon GC, Hawkins RD, Kheradpour P, Stark A, Harp LF, Ye Z, Lee LK,<br />

Stuart RK, Ching CW et al: Histone modifications at human enhancers reflect global<br />

cell-type-specific gene expression. Nature 2009, 459(7243):108-112.<br />

199. Komashko VM, Acevedo LG, Squazzo SL, Iyengar SS, Rabinovich A, O'Geen H, Green<br />

R, Farnham PJ: Using ChIP-chip technology to reveal common principles <strong>of</strong><br />

transcriptional repression in normal and cancer cells. Genome Res 2008,<br />

18(4):521-532.<br />

200. Ke XS, Qu Y, Rostad K, Li WC, Lin B, Halvorsen OJ, Haukaas SA, Jonassen I, Petersen<br />

K, Goldfinger N et al: Genome-wide pr<strong>of</strong>iling <strong>of</strong> histone h3 lysine 4 and lysine 27<br />

trimethylation reveals an epigenetic signature in prostate carcinogenesis. PLoS<br />

One 2009, 4(3):e4687.<br />

201. Kondo Y, Shen L, Cheng AS, Ahmed S, Boumber Y, Charo C, Yamochi T, Urano T,<br />

Furukawa K, Kwabi-Addo B et al: Gene silencing in cancer by histone H3 lysine 27<br />

trimethylation independent <strong>of</strong> promoter DNA methylation. Nat Genet 2008,<br />

40(6):741-750.<br />

202. Yu J, Rhodes DR, Tomlins SA, Cao X, Chen G, Mehra R, Wang X, Ghosh D, Shah RB,<br />

Varambally S et al: A polycomb repression signature in metastatic prostate cancer<br />

predicts cancer outcome. Cancer Res 2007, 67(22):10657-10663.<br />

203. Wu J, Wang SH, Potter D, Liu JC, Smith LT, Wu YZ, Huang TH, Plass C: Diverse<br />

histone modifications on histone 3 lysine 9 and their relation to DNA methylation<br />

in specifying gene silencing. BMC Genomics 2007, 8:131.<br />

204. Krivtsov AV, Feng Z, Lemieux ME, Faber J, Vempati S, Sinha AU, Xia X, Jesneck J,<br />

Bracken AP, Silverman LB et al: H3K79 methylation pr<strong>of</strong>iles define murine and<br />

human MLL-AF4 leukemias. Cancer Cell 2008, 14(5):355-368.<br />

205. Lin B, Wang J, Hong X, Yan X, Hwang D, Cho JH, Yi D, Utleg AG, Fang X, Schones DE<br />

et al: Integrated expression pr<strong>of</strong>iling and ChIP-seq analyses <strong>of</strong> the growth<br />

inhibition response program <strong>of</strong> the androgen receptor. PLoS One 2009, 4(8):e6589.<br />

206. Fullwood MJ, Liu MH, Pan YF, Liu J, Xu H, Mohamed YB, Orlov YL, Velkov S, Ho A, Mei<br />

PH et al: An oestrogen-receptor-alpha-bound human chromatin interactome.<br />

Nature 2009, 462(7269):58-64.<br />

207. Coe BP, Chari R, Lockwood WW, Lam WL: Evolving strategies for global gene<br />

expression analysis <strong>of</strong> cancer. J Cell Physiol 2008, 217(3):590-597.<br />

208. Liang P, Pardee AB: Analysing differential gene expression in cancer. Nat Rev<br />

Cancer 2003, 3(11):869-876.<br />

209. Nevins JR, Potti A: Mining gene expression pr<strong>of</strong>iles: expression signatures as<br />

cancer phenotypes. Nat Rev Genet 2007, 8(8):601-609.<br />

210. Pollack JR, Sorlie T, Perou CM, Rees CA, Jeffrey SS, Lonning PE, Tibshirani R,<br />

Botstein D, Borresen-Dale AL, Brown PO: Microarray analysis reveals a major direct<br />

role <strong>of</strong> DNA copy number alteration in the transcriptional program <strong>of</strong> human<br />

breast tumors. Proc Natl Acad Sci U S A 2002, 99(20):12963-12968.<br />

211. Heidenblad M, Lindgren D, Veltman JA, Jonson T, Mahlamaki EH, Gorunova L, van<br />

Kessel AG, Schoenmakers EF, Hoglund M: Microarray analyses reveal strong<br />

influence <strong>of</strong> DNA copy number alterations on the transcriptional patterns in<br />

pancreatic cancer: implications for the interpretation <strong>of</strong> genomic amplifications.<br />

Oncogene 2005, 24(10):1794-1801.<br />

212. Hyman E, Kauraniemi P, Hautaniemi S, Wolf M, Mousses S, Rozenblum E, Ringner M,<br />

Sauter G, Monni O, Elkahloun A et al: Impact <strong>of</strong> DNA amplification on gene<br />

expression patterns in breast cancer. Cancer Res 2002, 62(21):6240-6245.<br />

213. Wolf M, Mousses S, Hautaniemi S, Karhu R, Huusko P, Allinen M, Elkahloun A, Monni<br />

O, Chen Y, Kallioniemi A et al: High-resolution analysis <strong>of</strong> gene copy number<br />

155


alterations in human prostate cancer using CGH on cDNA microarrays: impact <strong>of</strong><br />

copy number on gene expression. Neoplasia 2004, 6(3):240-247.<br />

214. Adelaide J, Finetti P, Bekhouche I, Repellini L, Geneix J, Sircoulomb F, Charafe-Jauffret<br />

E, Cervera N, Desplans J, Parzy D et al: Integrated pr<strong>of</strong>iling <strong>of</strong> basal and luminal<br />

breast cancers. Cancer Res 2007, 67(24):11565-11575.<br />

215. Broet P, Camilleri-Broet S, Zhang S, Alifano M, Bangarusamy D, Battistella M, Wu Y,<br />

Tuefferd M, Regnard JF, Lim E et al: Prediction <strong>of</strong> clinical outcome in multiple lung<br />

cancer cohorts by integrative genomics: implications for chemotherapy selection.<br />

Cancer Res 2009, 69(3):1055-1062.<br />

216. Chin K, DeVries S, Fridlyand J, Spellman PT, Roydasgupta R, Kuo WL, Lapuk A, Neve<br />

RM, Qian Z, Ryder T et al: Genomic and transcriptional aberrations linked to breast<br />

cancer pathophysiologies. Cancer Cell 2006, 10(6):529-541.<br />

217. Natrajan R, Weigelt B, Mackay A, Geyer FC, Grigoriadis A, Tan DS, Jones C, Lord CJ,<br />

Vatcheva R, Rodriguez-Pinilla SM et al: An integrative genomic and transcriptomic<br />

analysis reveals molecular pathways and networks regulated by copy number<br />

aberrations in basal-like, HER2 and luminal cancers. Breast Cancer Res Treat 2009.<br />

218. Deng S, Calin GA, Croce CM, Coukos G, Zhang L: Mechanisms <strong>of</strong> microRNA<br />

deregulation in human cancer. Cell Cycle 2008, 7(17):2643-2646.<br />

219. Kuo KT, Guan B, Feng Y, Mao TL, Chen X, Jinawath N, Wang Y, Kurman RJ, Shih Ie M,<br />

Wang TL: Analysis <strong>of</strong> DNA copy number alterations in ovarian serous tumors<br />

identifies new molecular genetic changes in low-grade and high-grade<br />

carcinomas. Cancer Res 2009, 69(9):4036-4042.<br />

220. Lionetti M, Agnelli L, Mosca L, Fabris S, Andronache A, Todoerti K, Ronchetti D,<br />

Deliliers GL, Neri A: Integrative high-resolution microarray analysis <strong>of</strong> human<br />

myeloma cell lines reveals deregulated miRNA expression associated with allelic<br />

imbalances and gene expression pr<strong>of</strong>iles. Genes Chromosomes Cancer 2009,<br />

48(6):521-531.<br />

221. Starczynowski DT, Kuchenbauer F, Argiropoulos B, Sung S, Morin R, Muranyi A, Hirst<br />

M, Hogge D, Marra M, Wells RA et al: Identification <strong>of</strong> miR-145 and miR-146a as<br />

mediators <strong>of</strong> the 5q- syndrome phenotype. Nat Med 2009.<br />

222. Zhang L, Volinia S, Bonome T, Calin GA, Greshock J, Yang N, Liu CG, Giannakakis A,<br />

Alexiou P, Hasegawa K et al: Genomic and epigenetic alterations deregulate<br />

microRNA expression in human epithelial ovarian cancer. Proc Natl Acad Sci U S A<br />

2008, 105(19):7004-7009.<br />

223. Calin GA, Croce CM: MicroRNA signatures in human cancers. Nat Rev Cancer 2006,<br />

6(11):857-866.<br />

224. Nicoloso MS, Spizzo R, Shimizu M, Rossi S, Calin GA: MicroRNAs--the micro<br />

steering wheel <strong>of</strong> tumour metastases. Nat Rev Cancer 2009, 9(4):293-302.<br />

225. Wolf NG, Farver C, Abdul-Karim FW, Schwartz S: Analysis <strong>of</strong> microsatellite<br />

instability and X-inactivation in ovarian borderline tumors lacking numerical<br />

abnormalities by comparative genomic hybridization. Cancer Genet Cytogenet<br />

2003, 145(2):133-138.<br />

226. Olson P, Lu J, Zhang H, Shai A, Chun MG, Wang Y, Libutti SK, Nakakura EK, Golub<br />

TR, Hanahan D: MicroRNA dynamics in the stages <strong>of</strong> tumorigenesis correlate with<br />

hallmark capabilities <strong>of</strong> cancer. Genes Dev 2009, 23(18):2152-2165.<br />

227. Wang X: miRDB: a microRNA target prediction and functional annotation database<br />

with a wiki interface. RNA 2008, 14(6):1012-1017.<br />

228. Garzon R, Calin GA, Croce CM: MicroRNAs in Cancer. Annu Rev Med 2009, 60:167-<br />

179.<br />

229. Lujambio A, Esteller M: How epigenetics can explain human metastasis: a new role<br />

for microRNAs. Cell Cycle 2009, 8(3):377-382.<br />

230. Iorio MV, Visone R, Di Leva G, Donati V, Petrocca F, Casalini P, Taccioli C, Volinia S,<br />

Liu CG, Alder H et al: MicroRNA signatures in human ovarian cancer. Cancer Res<br />

2007, 67(18):8699-8707.<br />

156


231. Lujambio A, Esteller M: CpG island hypermethylation <strong>of</strong> tumor suppressor<br />

microRNAs in human cancer. Cell Cycle 2007, 6(12):1455-1459.<br />

232. Lujambio A, Ropero S, Ballestar E, Fraga MF, Cerrato C, Setien F, Casado S, Suarez-<br />

Gauthier A, Sanchez-Cespedes M, Git A et al: Genetic unmasking <strong>of</strong> an<br />

epigenetically silenced microRNA in human cancer cells. Cancer Res 2007,<br />

67(4):1424-1429.<br />

233. Guil S, Esteller M: DNA methylomes, histone codes and miRNAs: tying it all<br />

together. Int J Biochem Cell Biol 2009, 41(1):87-95.<br />

234. Sadikovic B, Yoshimoto M, Chilton-MacNeill S, Thorner P, Squire JA, Zielenska M:<br />

Identification <strong>of</strong> interactive networks <strong>of</strong> gene expression associated with<br />

osteosarcoma oncogenesis by integrated molecular pr<strong>of</strong>iling. Hum Mol Genet<br />

2009, 18(11):1962-1975.<br />

235. Joshi MD, Ahmad R, Yin L, Raina D, Rajabi H, Bubley G, Kharbanda S, Kufe D: MUC1<br />

oncoprotein is a druggable target in human prostate cancer cells. Mol Cancer Ther<br />

2009, 8(11):3056-3065.<br />

236. Khodarev NN, Pitroda SP, Beckett MA, MacDermed DM, Huang L, Kufe DW,<br />

Weichselbaum RR: MUC1-induced transcriptional programs associated with<br />

tumorigenesis predict outcome in breast and lung cancer. Cancer Res 2009,<br />

69(7):2833-2837.<br />

237. Senapati S, Das S, Batra SK: Mucin-interacting proteins: from function to<br />

therapeutics. Trends Biochem Sci 2009.<br />

238. Wu CJ, Chen Z, Ullrich A, Greene MI, O'Rourke DM: Inhibition <strong>of</strong> EGFR-mediated<br />

phosphoinositide-3-OH kinase (PI3-K) signaling and glioblastoma phenotype by<br />

signal-regulatory proteins (SIRPs). Oncogene 2000, 19(35):3999-4010.<br />

239. Kapoor GS, Kapitonov D, O'Rourke DM: Transcriptional regulation <strong>of</strong> signal<br />

regulatory protein alpha1 inhibitory receptors by epidermal growth factor receptor<br />

signaling. Cancer Res 2004, 64(18):6444-6452.<br />

240. Yamasaki Y, Ito S, Tsunoda N, Kokuryo T, Hara K, Senga T, Kannagi R, Yamamoto T,<br />

Oda K, Nagino M et al: SIRPalpha1 and SIRPalpha2: their role as tumor<br />

suppressors in breast carcinoma cells. Biochem Biophys Res Commun 2007,<br />

361(1):7-13.<br />

241. Qin JM, Wan XW, Zeng JZ, Wu MC: Effect <strong>of</strong> Sirpalpha1 on the expression <strong>of</strong><br />

nuclear factor-kappa B in hepatocellular carcinoma. Hepatobiliary Pancreat Dis Int<br />

2007, 6(3):276-283.<br />

242. Gardai SJ, Xiao YQ, Dickinson M, Nick JA, Voelker DR, Greene KE, Henson PM: By<br />

binding SIRPalpha or calreticulin/CD91, lung collectins act as dual function<br />

surveillance molecules to suppress or enhance inflammation. Cell 2003, 115(1):13-<br />

23.<br />

243. Takada T, Matozaki T, Takeda H, Fukunaga K, Noguchi T, Fujioka Y, Okazaki I, Tsuda<br />

M, Yamao T, Ochi F et al: Roles <strong>of</strong> the complex formation <strong>of</strong> SHPS-1 with SHP-2 in<br />

insulin-stimulated mitogen-activated protein kinase activation. J Biol Chem 1998,<br />

273(15):9234-9242.<br />

244. Motegi S, Okazawa H, Ohnishi H, Sato R, Kaneko Y, Kobayashi H, Tomizawa K, Ito T,<br />

Honma N, Buhring HJ et al: Role <strong>of</strong> the CD47-SHPS-1 system in regulation <strong>of</strong> cell<br />

migration. EMBO J 2003, 22(11):2634-2644.<br />

245. Kharitonenkov A, Chen Z, Sures I, Wang H, Schilling J, Ullrich A: A family <strong>of</strong> proteins<br />

that inhibit signalling through tyrosine kinase receptors. Nature 1997,<br />

386(6621):181-186.<br />

246. Meyer PE, Kontos K, Lafitte F, Bontempi G: Information-theoretic inference <strong>of</strong> large<br />

transcriptional regulatory networks. EURASIP J Bioinform Syst Biol 2007:79879.<br />

247. Meyer PE, Lafitte F, Bontempi G: minet: A R/Bioconductor package for inferring<br />

large transcriptional networks using mutual information. BMC Bioinformatics 2008,<br />

9:461.<br />

157


248. Xi L, Feber A, Gupta V, Wu M, Bergemann AD, Landreneau RJ, Litle VR, Pennathur A,<br />

Luketich JD, Godfrey TE: Whole genome exon arrays identify differential<br />

expression <strong>of</strong> alternatively spliced, cancer-related genes in lung cancer. Nucleic<br />

Acids Res 2008, 36(20):6535-6547.<br />

249. Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L,<br />

Ge Y, Gentry J et al: Bioconductor: open s<strong>of</strong>tware development for computational<br />

biology and bioinformatics. Genome Biol 2004, 5(10):R80.<br />

250. Bild AH, Yao G, Chang JT, Wang Q, Potti A, Chasse D, Joshi MB, Harpole D, Lancaster<br />

JM, Berchuck A et al: Oncogenic pathway signatures in human cancers as a guide<br />

to targeted therapies. Nature 2006, 439(7074):353-357.<br />

251. Shedden K, Taylor JM, Enkemann SA, Tsao MS, Yeatman TJ, Gerald WL, Eschrich S,<br />

Jurisica I, Giordano TJ, Misek DE et al: Gene expression-based survival prediction in<br />

lung adenocarcinoma: a multi-site, blinded validation study. Nat Med 2008,<br />

14(8):822-827.<br />

252. Chitale D, Gong Y, Taylor BS, Broderick S, Brennan C, Somwar R, Golas B, Wang L,<br />

Motoi N, Szoke J et al: An integrated genomic analysis <strong>of</strong> lung cancer reveals loss<br />

<strong>of</strong> DUSP4 in EGFR-mutant tumors. Oncogene 2009, 28(31):2773-2783.<br />

253. Buys TP, Aviel-Ronen S, Waddell TK, Lam WL, Tsao MS: Defining genomic alteration<br />

boundaries for a combined small cell and non-small cell lung carcinoma. J Thorac<br />

Oncol 2009, 4(2):227-239.<br />

254. Brommesson S, Jonsson G, Strand C, Grabau D, Malmstrom P, Ringner M, Ferno M,<br />

Hedenfalk I: Tiling array-CGH for the assessment <strong>of</strong> genomic similarities among<br />

synchronous unilateral and bilateral invasive breast cancer tumor pairs. BMC Clin<br />

Pathol 2008, 8:6.<br />

255. Kawanishi H, Takahashi T, Ito M, Matsui Y, Watanabe J, Ito N, Kamoto T, Kadowaki T,<br />

Tsujimoto G, Imoto I et al: Genetic analysis <strong>of</strong> multifocal superficial urothelial<br />

cancers by array-based comparative genomic hybridisation. Br J Cancer 2007,<br />

97(2):260-266.<br />

256. Mhawech-Fauceglia P, Rai H, Nowak N, Cheney RT, Rodabaugh K, Lele S, Odunsi K:<br />

The use <strong>of</strong> array-based comparative genomic hybridization (a-CGH) to distinguish<br />

metastatic from primary synchronous carcinomas <strong>of</strong> the ovary and the uterus.<br />

Histopathology 2008, 53(4):490-495.<br />

257. Nakano H, Soda H, Nakamura Y, Uchida K, Takasu M, Nakatomi K, Izumikawa K,<br />

Hayashi T, Nagayasu T, Tsukamoto K et al: Different epidermal growth factor<br />

receptor gene mutations in a patient with 2 synchronous lung cancers. Clin Lung<br />

Cancer 2007, 8(9):562-564.<br />

258. Ryoo BY, Na, II, Yang SH, Koh JS, Kim CH, Lee JC: Synchronous multiple primary<br />

lung cancers with different response to gefitinib. Lung Cancer 2006, 53(2):245-248.<br />

259. Speel EJ, van de Wouw AJ, Claessen SM, Haesevoets A, Hopman AH, van der Wurff<br />

AA, Osieka R, Buettner R, Hillen HF, Ramaekers FC: Molecular evidence for a clonal<br />

relationship between multiple lesions in patients with unknown primary<br />

adenocarcinoma. Int J Cancer 2008, 123(6):1292-1300.<br />

260. Wa CV, DeVries S, Chen YY, Waldman FM, Hwang ES: Clinical application <strong>of</strong> arraybased<br />

comparative genomic hybridization to define the relationship between<br />

multiple synchronous tumors. Mod Pathol 2005, 18(4):591-597.<br />

261. Agelopoulos K, Tidow N, Korsching E, Voss R, Hinrichs B, Brandt B, Boecker W,<br />

Buerger H: Molecular cytogenetic investigations <strong>of</strong> synchronous bilateral breast<br />

cancer. J Clin Pathol 2003, 56(9):660-665.<br />

262. Whitehurst AW, Bodemann BO, Cardenas J, Ferguson D, Girard L, Peyton M, Minna<br />

JD, Michn<strong>of</strong>f C, Hao W, Roth MG et al: Synthetic lethal screen identification <strong>of</strong><br />

chemosensitizer loci in cancer cells. Nature 2007, 446(7137):815-819.<br />

263. Barbie DA, Tamayo P, Boehm JS, Kim SY, Moody SE, Dunn IF, Schinzel AC, Sandy P,<br />

Meylan E, Scholl C et al: Systematic RNA interference reveals that oncogenic<br />

KRAS-driven cancers require TBK1. Nature 2009, 462(7269):108-112.<br />

158


264. Berns K, Hijmans EM, Mullenders J, Brummelkamp TR, Velds A, Heimerikx M,<br />

Kerkhoven RM, Madiredjo M, Nijkamp W, Weigelt B et al: A large-scale RNAi screen<br />

in human cells identifies new components <strong>of</strong> the p53 pathway. Nature 2004,<br />

428(6981):431-437.<br />

265. Gobeil S, Zhu X, Doillon CJ, Green MR: A genome-wide shRNA screen identifies<br />

GAS1 as a novel melanoma metastasis suppressor gene. Genes Dev 2008,<br />

22(21):2932-2940.<br />

266. Luo B, Cheung HW, Subramanian A, Sharifnia T, Okamoto M, Yang X, Hinkle G, Boehm<br />

JS, Beroukhim R, Weir BA et al: Highly parallel identification <strong>of</strong> essential genes in<br />

cancer cells. Proc Natl Acad Sci U S A 2008, 105(51):20380-20385.<br />

267. Luo J, Emanuele MJ, Li D, Creighton CJ, Schlabach MR, Westbrook TF, Wong KK,<br />

Elledge SJ: A genome-wide RNAi screen identifies multiple synthetic lethal<br />

interactions with the Ras oncogene. Cell 2009, 137(5):835-848.<br />

268. M<strong>of</strong>fat J, Grueneberg DA, Yang X, Kim SY, Kloepfer AM, Hinkle G, Piqani B, Eisenhaure<br />

TM, Luo B, Grenier JK et al: A lentiviral RNAi library for human and mouse genes<br />

applied to an arrayed viral high-content screen. Cell 2006, 124(6):1283-1298.<br />

269. Scholl C, Frohling S, Dunn IF, Schinzel AC, Barbie DA, Kim SY, Silver SJ, Tamayo P,<br />

Wadlow RC, Ramaswamy S et al: Synthetic lethal interaction between oncogenic<br />

KRAS dependency and STK33 suppression in human cancer cells. Cell 2009,<br />

137(5):821-834.<br />

270. Silva JM, Marran K, Parker JS, Silva J, Golding M, Schlabach MR, Elledge SJ, Hannon<br />

GJ, Chang K: Pr<strong>of</strong>iling essential genes in human mammary cells by multiplex RNAi<br />

screening. Science 2008, 319(5863):617-620.<br />

271. Apweiler R, Aslanidis C, Deufel T, Gerstner A, Hansen J, Hochstrasser D, Kellner R,<br />

Kubicek M, Lottspeich F, Maser E et al: Approaching clinical proteomics: current<br />

state and future fields <strong>of</strong> application in cellular proteomics. Cytometry A 2009,<br />

75(10):816-832.<br />

272. Apweiler R, Aslanidis C, Deufel T, Gerstner A, Hansen J, Hochstrasser D, Kellner R,<br />

Kubicek M, Lottspeich F, Maser E et al: Approaching clinical proteomics: current<br />

state and future fields <strong>of</strong> application in fluid proteomics. Clin Chem Lab Med 2009,<br />

47(6):724-744.<br />

273. Peng XQ, Wang F, Geng X, Zhang WM: Current advances in tumor proteomics and<br />

candidate biomarkers for hepatic cancer. Expert Rev Proteomics 2009, 6(5):551-561.<br />

274. Tainsky MA: Genomic and proteomic biomarkers for cancer: a multitude <strong>of</strong><br />

opportunities. Biochim Biophys Acta 2009, 1796(2):176-193.<br />

275. Zamo A, Cecconi D: Proteomic analysis <strong>of</strong> lymphoid and haematopoietic<br />

neoplasms: There's more than biomarker discovery. J Proteomics 2009.<br />

276. Griffin JL, Kauppinen RA: A metabolomics perspective <strong>of</strong> human brain tumours.<br />

FEBS J 2007, 274(5):1132-1139.<br />

277. Spratlin JL, Serkova NJ, Eckhardt SG: Clinical applications <strong>of</strong> metabolomics in<br />

oncology: a review. Clin Cancer Res 2009, 15(2):431-440.<br />

278. Sreekumar A, Poisson LM, Rajendiran TM, Khan AP, Cao Q, Yu J, Laxman B, Mehra R,<br />

Lonigro RJ, Li Y et al: Metabolomic pr<strong>of</strong>iles delineate potential role for sarcosine in<br />

prostate cancer progression. Nature 2009, 457(7231):910-914.<br />

279. Adamovic T, Trosso F, Roshani L, Andersson L, Petersen G, Rajaei S, Helou K, Levan<br />

G: Oncogene amplification in the proximal part <strong>of</strong> chromosome 6 in rat<br />

endometrial adenocarcinoma as revealed by combined BAC/PAC FISH,<br />

chromosome painting, zoo-FISH, and allelotyping. Genes Chromosomes Cancer<br />

2005, 44(2):139-153.<br />

280. Ferrandina G, Mey V, Nannizzi S, Ricciardi S, Petrillo M, Ferlini C, Danesi R, Scambia<br />

G, Del Tacca M: Expression <strong>of</strong> nucleoside transporters, deoxycitidine kinase,<br />

ribonucleotide reductase regulatory subunits, and gemcitabine catabolic enzymes<br />

in primary ovarian cancer. Cancer Chemother Pharmacol 2009.<br />

159


281. Fernandez-Ranvier GG, Weng J, Yeh RF, Khanafshar E, Suh I, Barker C, Duh QY,<br />

Clark OH, Kebebew E: Identification <strong>of</strong> biomarkers <strong>of</strong> adrenocortical carcinoma<br />

using genomewide gene expression pr<strong>of</strong>iling. Arch Surg 2008, 143(9):841-846;<br />

discussion 846.<br />

282. Segditsas S, Sieber O, Deheragoda M, East P, Rowan A, Jeffery R, Nye E, Clark S,<br />

Spencer-Dene B, Stamp G et al: Putative direct and indirect Wnt targets identified<br />

through consistent gene expression changes in APC-mutant intestinal adenomas<br />

from humans and mice. Hum Mol Genet 2008, 17(24):3864-3875.<br />

283. Conde L, Montaner D, Burguet-Castell J, Tarraga J, Medina I, Al-Shahrour F, Dopazo J:<br />

ISACGH: a web-based environment for the analysis <strong>of</strong> Array CGH and gene<br />

expression which includes functional pr<strong>of</strong>iling. Nucleic Acids Res 2007, 35(Web<br />

Server issue):W81-85.<br />

284. La Rosa P, Viara E, Hupe P, Pierron G, Liva S, Neuvial P, Brito I, Lair S, Servant N,<br />

Robine N et al: VAMP: visualization and analysis <strong>of</strong> array-CGH, transcriptome and<br />

other molecular pr<strong>of</strong>iles. Bioinformatics 2006, 22(17):2066-2073.<br />

285. Kapushesky M, Emam I, Holloway E, Kurnosov P, Zorin A, Malone J, Rustici G, Williams<br />

E, Parkinson H, Brazma A: Gene expression atlas at the European bioinformatics<br />

institute. Nucleic Acids Res 2010, 38(Database issue):D690-698.<br />

286. Li L, Bum-Erdene K, Baenziger PH, Rosen JJ, Hemmert JR, Nellis JA, Pierce ME,<br />

Meroueh SO: BioDrugScreen: a computational drug design resource for ranking<br />

molecules docked to the human proteome. Nucleic Acids Res 2010, 38(Database<br />

issue):D765-773.<br />

287. Kato K, Yamashita R, Matoba R, Monden M, Noguchi S, Takagi T, Nakai K: Cancer<br />

gene expression database (CGED): a database for gene expression pr<strong>of</strong>iling with<br />

accompanying clinical information <strong>of</strong> human cancer tissues. Nucleic Acids Res<br />

2005, 33(Database issue):D533-536.<br />

288. Li H, He Y, Ding G, Wang C, Xie L, Li Y: dbDEPC: a database <strong>of</strong> Differentially<br />

Expressed Proteins in human Cancers. Nucleic Acids Res 2010, 38(Database<br />

issue):D658-664.<br />

289. Brooksbank C, Cameron G, Thornton J: The European Bioinformatics Institute's data<br />

resources. Nucleic Acids Res 2010, 38(Database issue):D17-25.<br />

290. Safran M, Chalifa-Caspi V, Shmueli O, Olender T, Lapidot M, Rosen N, Shmoish M,<br />

Peter Y, Glusman G, Feldmesser E et al: Human Gene-Centric Databases at the<br />

Weizmann Institute <strong>of</strong> Science: GeneCards, UDB, CroW 21 and HORDE. Nucleic<br />

Acids Res 2003, 31(1):142-146.<br />

291. Zhang Y, Lv J, Liu H, Zhu J, Su J, Wu Q, Qi Y, Wang F, Li X: HHMD: the human<br />

histone modification database. Nucleic Acids Res 2010, 38(Database issue):D149-<br />

154.<br />

292. Betel D, Wilson M, Gabow A, Marks DS, Sander C: The microRNA.org resource:<br />

targets and expression. Nucleic Acids Res 2008, 36(Database issue):D149-153.<br />

293. Jiang Q, Wang Y, Hao Y, Juan L, Teng M, Zhang X, Li M, Wang G, Liu Y:<br />

miR2Disease: a manually curated database for microRNA deregulation in human<br />

disease. Nucleic Acids Res 2009, 37(Database issue):D98-104.<br />

294. Alexiou P, Vergoulis T, Gleditzsch M, Prekas G, Dalamagas T, Megraw M, Grosse I,<br />

Sellis T, Hatzigeorgiou AG: miRGen 2.0: a database <strong>of</strong> microRNA genomic<br />

information and regulation. Nucleic Acids Res 2010, 38(Database issue).<br />

295. Sayers EW, Barrett T, Benson DA, Bolton E, Bryant SH, Canese K, Chetvernin V,<br />

Church DM, Dicuccio M, Federhen S et al: Database resources <strong>of</strong> the National Center<br />

for Biotechnology Information. Nucleic Acids Res 2010, 38(Database issue):D5-D16.<br />

296. Barrett T, Troup DB, Wilhite SE, Ledoux P, Rudnev D, Evangelista C, Kim IF, Soboleva<br />

A, Tomashevsky M, Marshall KA et al: NCBI GEO: archive for high-throughput<br />

functional genomic data. Nucleic Acids Res 2009, 37(Database issue):D885-890.<br />

297. Rhodes DR, Kalyana-Sundaram S, Mahavisno V, Varambally R, Yu J, Briggs BB,<br />

Barrette TR, Anstet MJ, Kincead-Beal C, Kulkarni P et al: Oncomine 3.0: genes,<br />

160


pathways, and networks in a collection <strong>of</strong> 18,000 cancer gene expression pr<strong>of</strong>iles.<br />

Neoplasia 2007, 9(2):166-180.<br />

298. Baudis M: Genomic imbalances in 5918 malignant epithelial tumors: an explorative<br />

meta-analysis <strong>of</strong> chromosomal CGH data. BMC Cancer 2007, 7:226.<br />

299. Vizcaino JA, Cote R, Reisinger F, Barsnes H, Foster JM, Rameseder J, Hermjakob H,<br />

Martens L: The Proteomics Identifications database: 2010 update. Nucleic Acids Res<br />

2010, 38(Database issue):D736-742.<br />

300. Ren Y, Gong W, Zhou H, Wang Y, Xiao F, Li T: siRecords: a database <strong>of</strong> mammalian<br />

RNAi experiments and efficacies. Nucleic Acids Res 2009, 37(Database issue):D146-<br />

149.<br />

301. Chari R, Lockwood WW, Coe BP, Chu A, Macey D, Thomson A, Davies JJ, MacAulay<br />

C, Lam WL: SIGMA: a system for integrative genomic microarray analysis <strong>of</strong><br />

cancer genomes. BMC Genomics 2006, 7:324.<br />

302. Rhead B, Karolchik D, Kuhn RM, Hinrichs AS, Zweig AS, Fujita PA, Diekhans M, Smith<br />

KE, Rosenbloom KR, Raney BJ et al: The UCSC Genome Browser database: update<br />

2010. Nucleic Acids Res 2010, 38(Database issue):D613-619.<br />

161


<strong>Chapter</strong> 6: Conclusions<br />

162


6.1 Summary<br />

Lung adenocarcinoma is the most commonly diagnosed form <strong>of</strong> lung cancer today, with a large<br />

percentage <strong>of</strong> patients exhibiting poor overall survival and prognosis. Genomic analysis has<br />

provided much insight into this disease with the identification <strong>of</strong> specific differentially expressed<br />

genes, somatically mutated genes, hyper- and hypomethylated genes and genes which are<br />

amplified and deleted at the DNA copy number level. While tools and platforms for whole<br />

genome analysis <strong>of</strong> gene dosage and gene expression are widely accessible and have<br />

improved in resolution, only until recently have technologies to assess DNA methylation in a<br />

high throughput manner been made available. Hence, the logical step with these vast amounts<br />

<strong>of</strong> data is to integrate the information from these different assays in a parallel manner to gain a<br />

better understanding <strong>of</strong> the biology <strong>of</strong> lung adenocarcinoma.<br />

6.1.1 Development <strong>of</strong> the integrative genetic and epigenetic approach<br />

In chapter 2, the development <strong>of</strong> the SIGMA2 s<strong>of</strong>tware package is discussed [1]. At the time<br />

the package was developed, there were no analysis tools for integrative genetic and epigenetic<br />

analysis, let alone tools to integrate gene dosage and gene expression, which were two well<br />

established high throughput platforms. Moreover, for array CGH data alone, limited number <strong>of</strong><br />

tools existed. Hence, as a precursor to SIGMA2, SIGMA [2], was developed and used as a<br />

framework for SIGMA2.<br />

In chapter 3, I demonstrated how when this integrative approach is applied to model systems,<br />

that we learn much more using multiple dimensions as compared to when only looking at a<br />

single dimension alone. Specifically, we show that we can associate more <strong>of</strong> the dysregulated<br />

gene expression with alterations at the DNA level, with some cell lines illustrating as much as<br />

80% <strong>of</strong> the observed gene expression changes being able to be associated with DNA level<br />

changes. In addition, I also illustrated two key concepts: (i) the “Or” (multiple alternate hit<br />

mechanisms) concept where across a sample set and using a fixed frequency <strong>of</strong> disruption, we<br />

163


can identify nearly three times as many genes when we can account for multiple types <strong>of</strong><br />

disruption as opposed to accounting for only a single type <strong>of</strong> alteration and (ii) the “And”<br />

(coupled multiple hits) concept where we identify genes which are targeted my multiple<br />

mechanisms in the sample and show that these genes can have significant biological and/or<br />

clinical relevance.<br />

6.1.2 Identification <strong>of</strong> a prevalent genetic alteration in lung adenocarcinoma<br />

Genetic and epigenetic alterations have been shown to be prominent in lung adenocarcinoma.<br />

Within genetic alterations, the majority <strong>of</strong> documented alterations have involved alterations in<br />

gene dosage, somatic mutation, and loss <strong>of</strong> heterozygosity (LOH) or allelic imbalance (whereby<br />

an allele or a portion <strong>of</strong> the allele is lost or gained in the tumor). In terms <strong>of</strong> allelic imbalance,<br />

the majority <strong>of</strong> the time this event is captured as a decrease or increase in gene dosage.<br />

However, there are also cases where allelic imbalance exists but there is no net change in gene<br />

dosage, termed copy neutral LOH or somatic uniparental disomy (UPD). <strong>Chapter</strong> 4 discusses<br />

the unexpected prevalence <strong>of</strong> UPD in the lung adenocarcinoma genome.<br />

Though previous studies were done using SNP arrays on lung adenocarcinoma tumors, the<br />

prevalence <strong>of</strong> UPD was likely underestimated due to a number <strong>of</strong> reasons. Amongst the<br />

reasons include the use <strong>of</strong> heterogeneous samples with high normal cell contamination due to<br />

lack <strong>of</strong> microdissection, lower resolution <strong>of</strong> alterations identifiable by previous SNP arrays, use<br />

<strong>of</strong> non patient-matched controls as reference and movement from call-based algorithms to<br />

algorithms which use allele specific copy number [3].<br />

In addition to the prevalence <strong>of</strong> UPD, the other key finding from this chapter is the presence <strong>of</strong><br />

frequent UPD at both known and novel oncogenes. While UPD has previously been shown to<br />

affect tumor suppressor genes such as RB1 [4], the association <strong>of</strong> UPD at oncogene loci has<br />

not been reported as <strong>of</strong>ten in solid tumors. Moreover, in the previous studies in hematological<br />

malignancies such as leukemias, lymphomas, or myeloid dysplastic syndromes, the observed<br />

164


UPD at oncogenes was typically accompanied by an acquired homozygous mutation at the<br />

locus [3, 5-8]. From our data, though UPD was also observed at mutated KRAS, as shown<br />

previously [9], there was also frequent UPD in cases where KRAS was not mutated. This<br />

finding suggests that in the cases which UPD occurs without somatic mutation, that the UPD<br />

event may in fact be used a mechanism for preferential allele selection. Specifically, this could<br />

be preference for the methylated allele for tumor suppressor genes or unmethylated allele for<br />

oncogenes [10, 11]. Alternatively, the preference could be for a more transcriptionally active (or<br />

inactive) allele as it has been shown that for specific genes, the two alleles may differ in rates <strong>of</strong><br />

transcription [12-16]. Thus, integration <strong>of</strong> genetic data with epigenetic and gene expression<br />

data would help decipher the target <strong>of</strong> these frequently observed UPD events.<br />

6.1.3 Application <strong>of</strong> the integrative approach to lung adenocarcinoma specimens<br />

Thus far, I have shown that the integrative genetic and epigenetic approach is beneficial in<br />

identifying important genes; both which would have been missed if single assays alone were<br />

used and those which have concurrent alteration at multiple levels. <strong>Chapter</strong> 5 discusses how<br />

upon application <strong>of</strong> this approach to lung adenocarcinoma specimens, we see that this trend<br />

also holds true in clinical samples. Specifically, I show that novel canonical signaling pathways<br />

are significantly enriched for when multiple DNA-based dimensions are used but are missed<br />

(not statistically significant) when a single DNA-based dimension is used. In addition, when we<br />

examined the well-documented EGFR signaling pathway, a pathway known to be involved in<br />

lung adenocarcinoma, the most frequently disrupted gene was signal-regulatory protein alpha<br />

(SIRPA).<br />

SIRPA has been shown to be a direct downstream component <strong>of</strong> EGFR, and has been shown<br />

to be suppressed in expression by EGFR activation [17, 18]. In the resting lung, SIRPA has<br />

been postulated to control the inflammatory response through SHP-1 and eventually, NFKB<br />

[19]. While there are likely multiple components between SIRPA and NFKB, we wanted to<br />

assess expression <strong>of</strong> components directly associated with SIRPA. In addition, since this gene<br />

165


was identified in a small set <strong>of</strong> samples, we wanted to see if this prevalence <strong>of</strong> disruption was<br />

maintained in an additional, larger set <strong>of</strong> tumors. Hence, we evaluated expression <strong>of</strong> SHP-1<br />

and SIRPA in a panel <strong>of</strong> approximately 60 lung adenocarcinoma tumors and found (i) a high<br />

prevalence <strong>of</strong> underexpression <strong>of</strong> SIRPA and (ii) a strong correlation between SIRPA and SHP-<br />

1 expression levels. It is interesting to observe this strong relationship between SIRPA and<br />

SHP-1 as most cancer studies have focused on SIRPA’s relationship with SHP-2 instead <strong>of</strong><br />

SHP-1.<br />

6.2 Conclusions<br />

I have demonstrated the power <strong>of</strong> an integrative genetic and epigenetic approach to decipher<br />

resultant gene expression changes in lung adenocarcinoma. The development <strong>of</strong> an<br />

application such as SIGMA2 was integral as it represented one <strong>of</strong> the first academic/research<br />

applications with the ability to integrate multiple dimensions <strong>of</strong> data. To date, there have been a<br />

few other applications that have been developed that can perform similar functionalities but<br />

most <strong>of</strong> these have been developed by commercial entities. Moreover, the s<strong>of</strong>tware still is not<br />

out-dated and based on the way it was built, can be extended to handle newer high throughput<br />

platforms including sequence-based platforms.<br />

In terms <strong>of</strong> what we learn from both the demonstration dataset (<strong>Chapter</strong> 3) as well as clinical<br />

tumor dataset (<strong>Chapter</strong> 5), we know that by using an integrative, multi-dimensional approach,<br />

we are detecting genes being disrupted at a much higher frequency when multiple dimensions<br />

are examined as compared to single dimensions alone. Moreover, at a given detection<br />

frequency, a gene may be disrupted by a single dimension at a low frequency but when multiple<br />

dimensions are accounted for, the frequency is in fact high. In Figure 5.5, I illustrate how well<br />

known lung cancer genes such as RRM2 are altered at both the genetic and epigenetic level<br />

and illustrate how more pathways are deemed significant when multiple dimensions are<br />

analyzed. The latter finding is likely a result <strong>of</strong> the fact that within a given pathway, not only can<br />

different genes be affected in different samples by one mechanism (e.g. DNA copy number<br />

166


amplification), but they also can be affected by different, but complementary, mechanisms (e.g.<br />

DNA methylation). These findings validate part A <strong>of</strong> the hypothesis. In addition, when aligning<br />

genetic and epigenetic pr<strong>of</strong>iles, I show that a high proportion <strong>of</strong> the observed differential<br />

expression can be attributed to genetic and epigenetic changes, validating part B <strong>of</strong> the<br />

hypothesis (genetic and epigenetic changes resulting in aberrant gene expression). Finally,<br />

when examining the EGFR signaling pathway, we observe that a number <strong>of</strong> key genes are<br />

frequently altered, while other genes are not altered as <strong>of</strong>ten. The most frequently affected<br />

gene, SIRPA, exhibited both genetic and epigenetic alteration and in some cases, this occurred<br />

concurrently within the same sample. Moreover, it was also found from both in the analyses <strong>of</strong><br />

chapter 3 (breast cancer) and chapter 5 (lung cancer), that over three times as many genes are<br />

identified as frequently aberrant when using multiple dimensions as compared to any single<br />

dimension. These findings validate part C <strong>of</strong> the hypothesis, whereby a gene within a<br />

commonly deregulated lung cancer pathway was identified using the integrative genetic and<br />

epigenetic approach.<br />

This finding has potential implications on the sample sizes necessary to discover important<br />

alterations as one can look at a small number <strong>of</strong> samples in more detail rather than a large<br />

sample set using a single assay. Specifically, these would be the genes that are disrupted at a<br />

low frequency in one dimension but high frequency across multiple dimensions as large sample<br />

sets would be needed to be confident <strong>of</strong> the low single dimension alteration frequency. Such<br />

considerations exist in situations where large sample sets are not feasible due to rarity and<br />

preciousness <strong>of</strong> samples.<br />

In terms <strong>of</strong> the multi-dimensional perspective on lung adenocarcinoma, in addition to finding<br />

additional genes and pathways that are disrupted when we look at multiple dimensions, when<br />

examining known genes and pathways, we see a complex pattern <strong>of</strong> deregulation with some<br />

components altered more frequently than others. This added complexity highlights how each<br />

tumor is different from one another and the rational approach to identifying therapeutic targets<br />

167


will be done at the pathway level as opposed to the gene level. It is clear that within these<br />

pathways, key nodes or “choke points” will likely serve as the best targets for therapeutic<br />

intervention.<br />

6.3 Future directions<br />

There are two key future directions which should be pursued at this point; (i) further evaluation<br />

<strong>of</strong> SIRPA as a novel tumor suppressor gene in lung adenocarcinoma, (ii) evaluation <strong>of</strong> the novel<br />

signaling pathways implicated through multi-dimensional analysis, and (iii) incorporation <strong>of</strong> data<br />

from other dimensions not evaluated in this thesis.<br />

In terms <strong>of</strong> SIRPA, the first experiments that need to be done are to assess whether the<br />

deregulated mRNA expression <strong>of</strong> SIRPA is also observed at the protein level. One such<br />

approach would be using immunohistochemistry on a tissue microarray comprised <strong>of</strong> hundreds<br />

<strong>of</strong> samples with well annotated clinical information. Subsequently, the frequency <strong>of</strong><br />

underexpression, the correlation with overall patient survival, and the overexpression<br />

associated with a subset <strong>of</strong> tumors from patients with never smoking history could be validated.<br />

In addition, depending on what clinical information is available, one could correlate to other<br />

parameters that were not available to me from the public gene expression microarray datasets<br />

to uncover other interesting clinical associations.<br />

Secondly, with the amount <strong>of</strong> literature suggesting that SIRPA could be a tumor suppressor<br />

gene in adenocarcinoma, the next set <strong>of</strong> experiments would be designed to test the tumor<br />

suppressor role <strong>of</strong> SIRPA. This would require the silencing <strong>of</strong> the gene in normal cells, using<br />

RNAi based techniques for example, and assess tumorigenic phenotypes such as anchorage<br />

independent growth, reduction <strong>of</strong> apoptosis, and increase in proliferation. In addition, parallel<br />

gene introduction experiments would also need to be done in cancer cell lines which exhibit little<br />

or no expression <strong>of</strong> SIRPA and the level <strong>of</strong> suppression <strong>of</strong> the above listed tumor phenotypes<br />

would then be assessed.<br />

168


One <strong>of</strong> the canonical signaling pathways that was identified as the most statistically significant<br />

by Ingenuity Pathway Analysis is the Hepatic Fibrosis /Hepatic Stellate Cell Activation pathway.<br />

While the existence and role <strong>of</strong> stellate cells have been well documented in the liver and<br />

pancreas, there have been a limited number <strong>of</strong> reports <strong>of</strong> stellate cells in the lung [20]. From<br />

what is known in the liver and pancreas, stellate cells are involved in tissue fibrosis and<br />

inflammation in chronic diseases such as pancreatitis and hepatitis [21-25]. In pancreatic<br />

tumors, activated stellate cells promote an increase in connective tissue surrounding the tumors<br />

(termed the desmoplastic process) and have been shown to be proliferative in the presence <strong>of</strong><br />

tumor secreted factors [25]. In addition, stellate cells also have implications in drug resistance<br />

[26]. In the lung, it is plausible to envision a role <strong>of</strong> stellate cells in diseases such as chronic<br />

obstructive pulmonary disease (COPD) where tissue fibrosis and inflammation are prominent<br />

[27]. One <strong>of</strong> the challenges to testing this function in vitro is that it would be important to<br />

recapitulate the tumor microenvironment. Hence, this function would have to be tested in vivo<br />

using inducible mouse models where expression <strong>of</strong> secreted factors associated with stellate cell<br />

activation, which were identified from our analysis, can be assessed. Phenotypes such as<br />

cellular proliferation, apoptosis, and drug resistance could then be assayed and compared<br />

between pre and post-induction <strong>of</strong> these secreted factors.<br />

Finally, although multiple DNA dimensions were analyzed in this thesis, recent advances in<br />

technology have allowed for other dimensions that could be incorporated. For example,<br />

genome sequencing technologies allow for the detection <strong>of</strong> novel somatic mutations in a high<br />

throughput manner. While performing this at the whole genome level is financially and<br />

computationally challenging, this effort can be focused on examining the "exome" (DNA from<br />

gene coding exons only) using sequence capture based techniques [28, 29]. MicroRNAs have<br />

also shown to be important in lung cancer, with specific microRNAs shown to be differentially<br />

expressed [30-36]. MicroRNAs can affect downstream protein expression through a number <strong>of</strong><br />

different mechanisms [37-40]. Integration <strong>of</strong> microRNA and sequence mutation data with the<br />

169


previously described genetic and epigenetic data would further increase our understanding <strong>of</strong><br />

the biology <strong>of</strong> lung adenocarcinoma.<br />

170


6.4 References<br />

1. Chari R, Coe BP, Wedselt<strong>of</strong>t C, Benetti M, Wilson IM, Vucic EA, MacAulay C, Ng RT,<br />

Lam WL: SIGMA2: a system for the integrative genomic multi-dimensional analysis<br />

<strong>of</strong> cancer genomes, epigenomes, and transcriptomes. BMC Bioinformatics 2008,<br />

9:422.<br />

2. Chari R, Lockwood WW, Coe BP, Chu A, Macey D, Thomson A, Davies JJ, MacAulay<br />

C, Lam WL: SIGMA: a system for integrative genomic microarray analysis <strong>of</strong><br />

cancer genomes. BMC Genomics 2006, 7:324.<br />

3. Sanada M, Suzuki T, Shih LY, Otsu M, Kato M, Yamazaki S, Tamura A, Honda H,<br />

Sakata-Yanagimoto M, Kumano K et al: Gain-<strong>of</strong>-function <strong>of</strong> mutated C-CBL tumour<br />

suppressor in myeloid neoplasms. Nature 2009, 460(7257):904-908.<br />

4. Zhu X, Dunn JM, Goddard AD, Squire JA, Becker A, Phillips RA, Gallie BL:<br />

Mechanisms <strong>of</strong> loss <strong>of</strong> heterozygosity in retinoblastoma. Cytogenet Cell Genet<br />

1992, 59(4):248-252.<br />

5. Grand FH, Hidalgo-Curtis CE, Ernst T, Zoi K, Zoi C, McGuire C, Kreil S, Jones A, Score<br />

J, Metzgeroth G et al: Frequent CBL mutations associated with 11q acquired<br />

uniparental disomy in myeloproliferative neoplasms. Blood 2009, 113(24):6182-<br />

6192.<br />

6. Kralovics R, Guan Y, Prchal JT: Acquired uniparental disomy <strong>of</strong> chromosome 9p is<br />

a frequent stem cell defect in polycythemia vera. Exp Hematol 2002, 30(3):229-236.<br />

7. Tiu RV, Gondek LP, O'Keefe CL, Huh J, Sekeres MA, Elson P, McDevitt MA, Wang XF,<br />

Levis MJ, Karp JE et al: New lesions detected by single nucleotide polymorphism<br />

array-based chromosomal analysis have important clinical impact in acute<br />

myeloid leukemia. J Clin Oncol 2009, 27(31):5219-5226.<br />

8. Yamamoto G, Nannya Y, Kato M, Sanada M, Levine RL, Kawamata N, Hangaishi A,<br />

Kurokawa M, Chiba S, Gilliland DG et al: Highly sensitive method for genomewide<br />

detection <strong>of</strong> allelic composition in nonpaired, primary tumor specimens by use <strong>of</strong><br />

affymetrix single-nucleotide-polymorphism genotyping microarrays. Am J Hum<br />

Genet 2007, 81(1):114-126.<br />

9. Soh J, Okumura N, Lockwood WW, Yamamoto H, Shigematsu H, Zhang W, Chari R,<br />

Shames DS, Tang X, MacAulay C et al: Oncogene mutations, copy number gains<br />

and mutant allele specific imbalance (MASI) frequently occur together in tumor<br />

cells. PLoS One 2009, 4(10):e7464.<br />

10. Darbary HK, Dutt SS, Sait SJ, Nowak NJ, Heinaman RE, Stoler DL, Anderson GR:<br />

Uniparentalism in sporadic colorectal cancer is independent <strong>of</strong> imprint status, and<br />

coordinate for chromosomes 14 and 18. Cancer Genet Cytogenet 2009, 189(2):77-<br />

86.<br />

11. Tuna M, Knuutila S, Mills GB: Uniparental disomy in cancer. Trends Mol Med 2009,<br />

15(3):120-128.<br />

12. Bjornsson HT, Albert TJ, Ladd-Acosta CM, Green RD, Rongione MA, Middle CM,<br />

Irizarry RA, Broman KW, Feinberg AP: SNP-specific array-based allele-specific<br />

expression analysis. Genome Res 2008, 18(5):771-779.<br />

13. Gimelbrant A, Hutchinson JN, Thompson BR, Chess A: Widespread monoallelic<br />

expression on human autosomes. Science 2007, 318(5853):1136-1140.<br />

14. Milani L, Lundmark A, Nordlund J, Kiialainen A, Flaegstad T, Jonmundsson G, Kanerva<br />

J, Schmiegelow K, Gunderson KL, Lonnerholm G et al: Allele-specific gene<br />

expression patterns in primary leukemic cells reveal regulation <strong>of</strong> gene<br />

expression by CpG site methylation. Genome Res 2009, 19(1):1-11.<br />

15. Palacios R, Gazave E, Goni J, Piedrafita G, Fernando O, Navarro A, Villoslada P:<br />

Allele-specific gene expression is widespread across the genome and biological<br />

processes. PLoS One 2009, 4(1):e4150.<br />

171


16. Zhang K, Li JB, Gao Y, Egli D, Xie B, Deng J, Li Z, Lee JH, Aach J, Leproust EM et al:<br />

Digital RNA allelotyping reveals tissue-specific and allele-specific gene<br />

expression in human. Nat Methods 2009, 6(8):613-618.<br />

17. Kapoor GS, Kapitonov D, O'Rourke DM: Transcriptional regulation <strong>of</strong> signal<br />

regulatory protein alpha1 inhibitory receptors by epidermal growth factor receptor<br />

signaling. Cancer Res 2004, 64(18):6444-6452.<br />

18. Wu CJ, Chen Z, Ullrich A, Greene MI, O'Rourke DM: Inhibition <strong>of</strong> EGFR-mediated<br />

phosphoinositide-3-OH kinase (PI3-K) signaling and glioblastoma phenotype by<br />

signal-regulatory proteins (SIRPs). Oncogene 2000, 19(35):3999-4010.<br />

19. Gardai SJ, Xiao YQ, Dickinson M, Nick JA, Voelker DR, Greene KE, Henson PM: By<br />

binding SIRPalpha or calreticulin/CD91, lung collectins act as dual function<br />

surveillance molecules to suppress or enhance inflammation. Cell 2003, 115(1):13-<br />

23.<br />

20. Keane MP, Strieter RM, Belperio JA: Mechanisms and mediators <strong>of</strong> pulmonary<br />

fibrosis. Crit Rev Immunol 2005, 25(6):429-463.<br />

21. Geerts A: History, heterogeneity, developmental biology, and functions <strong>of</strong><br />

quiescent hepatic stellate cells. Semin Liver Dis 2001, 21(3):311-335.<br />

22. Hautekeete ML, Geerts A: The hepatic stellate (Ito) cell: its role in human liver<br />

disease. Virchows Arch 1997, 430(3):195-207.<br />

23. Masamune A, Shimosegawa T: Signal transduction in pancreatic stellate cells. J<br />

Gastroenterol 2009, 44(4):249-260.<br />

24. Masamune A, Watanabe T, Kikuta K, Shimosegawa T: Roles <strong>of</strong> pancreatic stellate<br />

cells in pancreatic inflammation and fibrosis. Clin Gastroenterol Hepatol 2009, 7(11<br />

Suppl):S48-54.<br />

25. Omary MB, Lugea A, Lowe AW, Pandol SJ: The pancreatic stellate cell: a star on the<br />

rise in pancreatic diseases. J Clin Invest 2007, 117(1):50-59.<br />

26. Mahadevan D, Von H<strong>of</strong>f DD: Tumor-stroma interactions in pancreatic ductal<br />

adenocarcinoma. Mol Cancer Ther 2007, 6(4):1186-1197.<br />

27. Chung KF, Adcock IM: Multifaceted mechanisms in COPD: inflammation, immunity,<br />

and tissue repair and destruction. Eur Respir J 2008, 31(6):1334-1356.<br />

28. Ng SB, Buckingham KJ, Lee C, Bigham AW, Tabor HK, Dent KM, Huff CD, Shannon<br />

PT, Jabs EW, Nickerson DA et al: Exome sequencing identifies the cause <strong>of</strong> a<br />

mendelian disorder. Nat Genet 2010, 42(1):30-35.<br />

29. Ng SB, Turner EH, Robertson PD, Flygare SD, Bigham AW, Lee C, Shaffer T, Wong M,<br />

Bhattacharjee A, Eichler EE et al: Targeted capture and massively parallel<br />

sequencing <strong>of</strong> 12 human exomes. Nature 2009, 461(7261):272-276.<br />

30. Du L, Pertsemlidis A: microRNAs and lung cancer: tumors and 22-mers. Cancer<br />

Metastasis Rev 2010.<br />

31. Gar<strong>of</strong>alo M, Di Leva G, Romano G, Nuovo G, Suh SS, Ngankeu A, Taccioli C, Pichiorri<br />

F, Alder H, Secchiero P et al: miR-221&222 regulate TRAIL resistance and enhance<br />

tumorigenicity through PTEN and TIMP3 downregulation. Cancer Cell 2009,<br />

16(6):498-509.<br />

32. Johnson SM, Grosshans H, Shingara J, Byrom M, Jarvis R, Cheng A, Labourier E,<br />

Reinert KL, Brown D, Slack FJ: RAS is regulated by the let-7 microRNA family. Cell<br />

2005, 120(5):635-647.<br />

33. Kumar MS, Erkeland SJ, Pester RE, Chen CY, Ebert MS, Sharp PA, Jacks T:<br />

Suppression <strong>of</strong> non-small cell lung tumor development by the let-7 microRNA<br />

family. Proc Natl Acad Sci U S A 2008, 105(10):3903-3908.<br />

34. Talotta F, Cimmino A, Matarazzo MR, Casalino L, De Vita G, D'Esposito M, Di Lauro R,<br />

Verde P: An autoregulatory loop mediated by miR-21 and PDCD4 controls the AP-1<br />

activity in RAS transformation. Oncogene 2009, 28(1):73-84.<br />

35. Weiss GJ, Bemis LT, Nakajima E, Sugita M, Birks DK, Robinson WA, Varella-Garcia M,<br />

Bunn PA, Jr., Haney J, Helfrich BA et al: EGFR regulation by microRNA in lung<br />

172


cancer: correlation with clinical response and survival to gefitinib and EGFR<br />

expression in cell lines. Ann Oncol 2008, 19(6):1053-1059.<br />

36. Xiao C, Srinivasan L, Calado DP, Patterson HC, Zhang B, Wang J, Henderson JM,<br />

Kutok JL, Rajewsky K: Lymphoproliferative disease and autoimmunity in mice with<br />

increased miR-17-92 expression in lymphocytes. Nat Immunol 2008, 9(4):405-414.<br />

37. Lee RC, Ambros V: An extensive class <strong>of</strong> small RNAs in Caenorhabditis elegans.<br />

Science 2001, 294(5543):862-864.<br />

38. Mattick JS, Makunin IV: Small regulatory RNAs in mammals. Hum Mol Genet 2005,<br />

14 Spec No 1:R121-132.<br />

39. McManus MT: MicroRNAs and cancer. Semin Cancer Biol 2003, 13(4):253-258.<br />

40. Vasudevan S, Tong Y, Steitz JA: Switching from repression to activation:<br />

microRNAs can up-regulate translation. Science 2007, 318(5858):1931-1934.<br />

173


APPENDIX I: List <strong>of</strong> publications<br />

This appendix details all <strong>of</strong> the publications that I was a part <strong>of</strong> that were either published,<br />

accepted, currently in submission, or prepared for submission. In total, 29 publications are<br />

listed below with four <strong>of</strong> them represented as chapters and an additional nine discussed in<br />

section 1.11. The remaining 16 publications are listed below with a brief description<br />

accompanying each publication.<br />

Publications included as thesis chapters<br />

1. Chari R, Coe BP, Wedselt<strong>of</strong>t C, Benetti M, Wilson IM, Vucic E, MacAulay C, Ng RT, Lam<br />

WL. (2008) SIGMA2: A system for the integrative genomic multi-dimensional analysis <strong>of</strong> cancer<br />

genomes, epigenomes, and transcriptomes. BMC Bioinformatics, 9(1):422, 1-12.<br />

This publication is included in the thesis as chapter 2.<br />

2. Chari R, Coe BP, Vucic EA, Lockwood WW, Lam WL. (2010) An integrative multi-<br />

dimensional genetic and epigenetic strategy to identify aberrant genes and pathways in cancer.<br />

BMC Systems Biology. Submitted.<br />

This manuscript submitted for publication is included in the thesis as chapter 3.<br />

3. Chari R, Lockwood WW, Coe BP, Soh J, MacAulay C, Lam S, Gazdar AF, Lam WL. (2010)<br />

UPD is a frequent mechanism <strong>of</strong> gene disruption in lung adenocarcinoma.<br />

This manuscript in preparation is included in the thesis as chapter 4.<br />

4. Chari R, Thu KL, Wilson IM, Lockwood WW, Lonergan KM, Coe BP, Mall<strong>of</strong>f CA, Gazdar AF,<br />

Lam S, Garnis C, MacAulay CE, Alvarez CE, Lam WL. (2010) Integrating the multiple<br />

dimensions <strong>of</strong> genomic and epigenomic landscapes <strong>of</strong> cancer. Cancer and Metastasis<br />

Reviews, 29(1):73-93.<br />

174


This publication is included in the thesis as chapter 5.<br />

Publications discussed in section 1.11 (9 listed)<br />

5. Chari R, Lockwood WW, Coe BP, Chu A, Macey D, Thomson A, Davies JJ, MacAulay C,<br />

Lam WL. (2006) SIGMA: A system for the integrative genomic microarray analysis <strong>of</strong> cancer<br />

genomes. BMC Genomics, 7(1):324, 1-11.<br />

This publication is described in section 1.11.1.<br />

6. Coe BP, Chari R, MacAulay C, Lam WL. (2010) FACADE: A fast and sensitive algorithm for<br />

the segmentation and calling <strong>of</strong> high resolution array CGH data. Nucleic Acids Research.<br />

Submitted.<br />

This publication is described in section 1.11.1.<br />

7. Lonergan KM, Chari R, deLeeuw RJ, Shadeo A, Chi B, Tsao M, Jones S, Marra M, Ng R,<br />

MacAulay C, Lam S, Lam WL. (2006) Identification <strong>of</strong> novel lung genes in bronchial epithelium<br />

by serial analysis <strong>of</strong> gene expression. American Journal <strong>of</strong> Respiratory Cell and Molecular<br />

Biology, 35(6):651-61.<br />

This publication is described in section 1.11.2.<br />

8. Chari R, Lonergan KM, Ng RT, MacAulay C, Lam S, Lam WL. (2007) Effect <strong>of</strong> active<br />

smoking on the bronchial epithelial transcriptome. BMC Genomics, 8(1):297, 1-13.<br />

This publication is described in section 1.11.2.<br />

9. Lee EHL*, Chari R*, Lam A, Ng RT, Yee J, English J, Evans KG, MacAulay C, Lam S, Lam<br />

WL. (2008) Disruption <strong>of</strong> the non-canonical Wnt pathway in lung squamous cell carcinoma.<br />

Clinical Medicine: Oncology, 2:169-179. *These authors contributed equally<br />

This publication is described in section 1.11.3.<br />

175


This publication is described in section 1.11.3.<br />

10. Lonergan KM, Chari R, Coe BP, Wilson IM, Tsao M-S, Ng RT, MacAulay C, Lam S, Lam<br />

WL. (2010) Transcriptome pr<strong>of</strong>iles <strong>of</strong> carcinoma-in-situ and invasive non-small cell lung cancer<br />

as revealed by SAGE. PLoS One, 5(2):e9162, 1-22.<br />

This publication is described in section 1.11.3.<br />

11. Chari R, Lonergan KM, Pikor LA, Coe BP, Zhu CQ, Chan THW, MacAulay C, Tsao M-S,<br />

Lam S, Ng RT, Lam WL. (2010) A sequenced-based approach to identify reference genes for<br />

gene expression analysis. BMC Medical Genomics. Submitted.<br />

This publication is described in section 1.11.3.<br />

12. Lockwood WW, Chari R, Coe BP, Girard L, MacAulay C, Lam S, Gazdar AF, Minna JD,<br />

Lam WL. (2008) DNA amplification is a ubiquitous mechanism <strong>of</strong> oncogene activation in lung<br />

and other cancers. Oncogene, 27(33):4615-4624.<br />

This publication is described in section 1.11.4.<br />

13. Lockwood WW, Chari R, Coe BP, Thu KL, Garnis C, Campbell J, Williams AC, Hwang D,<br />

Zhu CQ, Yee J, English J, Tsao M-S, Gazdar AF, MacAulay C, Minna JD, Lam S, Lam WL.<br />

(2010) BRF2 is a lineage specific oncogene amplified early in squamous cell lung cancer<br />

development. PLoS Medicine. Submitted.<br />

This publication is described in section 1.11.4.<br />

Other publications not discussed in this thesis (16 listed)<br />

Array comparative genomic hybridization and its application to multiple cancer types<br />

14. Garnis C, Chari R, Buys TP, Zhang L, Ng RT, Rosin MP, Lam WL. (2009) Genomic<br />

imbalances in precancerous tissues signal oral cancer risk. Molecular Cancer, 8:50, 1-7.<br />

176


The development <strong>of</strong> oral cancer is thought to occur through the progression <strong>of</strong> histopathological<br />

stages, from different stages <strong>of</strong> dyplasia (mild, moderate, and severe) to carcinoma in situ to<br />

invasive disease. Similar to many cancer types, early detection <strong>of</strong> this disease is critical for<br />

good prognosis. As such, it is important to be able to determine which cases will and will not<br />

progress at early stages <strong>of</strong> the disease such as mild dysplasia. This manuscript describes the<br />

use <strong>of</strong> array CGH as a tool to predict progression in genomes <strong>of</strong> mild dysplasia patients and it<br />

was shown that the level <strong>of</strong> genomic alteration had high concordance with disease progression.<br />

15. Coe BP, Lockwood WW, Chari R, Lam WL. (2009) Comparative genomic hybridization on<br />

BAC arrays. Methods in Molecular Biology, 556:7-19.<br />

This publication is a chapter in the Methods in Molecular Biology textbook and describes the<br />

process <strong>of</strong> developing, using and analyzing data from bacterial artificial chromosome CGH<br />

arrays.<br />

16. deLeeuw RJ, Zettl A, Klinker E, Haralambieva E, Trottier M, Chari R, Ge Y, Gascoyne RD,<br />

Chott A, Muller-Hermelink HK, Lam WL. (2007) Whole genome analysis and HLA haplotyping<br />

<strong>of</strong> enteropathy-type T-cell lymphoma reveals two distinct lymphoma subtypes.<br />

Gastroenterology, 132(5):1902-11.<br />

Enteropathy-type T-cell lypmhoma (ETL) is an aggressive non-Hodgkin lymphoma and the<br />

genetic alterations underlying this disease were not well understood. In this publication, array<br />

CGH was applied to samples from patients with ETL and based on the genetic alterations and<br />

HLA genotyping, it was found that two distinct subytpes <strong>of</strong> this disease existed, which was<br />

contrary to the clinical classification used at the time.<br />

17. Buys TPH, Wilson IM, Coe BP, Lee EHL, Kennett JY, Lockwood WW, Tsui IFL, Shadeo A,<br />

Chari R, Garnis C, Lam WL. (2006) “Detailed Comparisons <strong>of</strong> Cancer Genomes” in<br />

Comparative Genomics: Fundamental and Applied Perspectives (Brown JR, ed.), CRC Press /<br />

Taylor & Francis, LLC, Boca Raton, FLA, pp. 245-259.<br />

177


This chapter details the technologies used for cancer genome comparisons as well as the<br />

different types <strong>of</strong> comparisons that are currently undertaken in research today such as the<br />

comparison <strong>of</strong> cancer subtypes, clonal versus multiple primary tumors, cancer susceptibility and<br />

drug sensitivity.<br />

18. Lockwood WW*, Chari R*, Chi B, Lam WL. (2006) Recent advances in array comparative<br />

genomic hybridization technologies and their applications in human genetics. European Journal<br />

<strong>of</strong> Human Genetics, 14(2):139-48.<br />

This publication is a review <strong>of</strong> literature describing the advances in array CGH technology and<br />

its application to many genetic diseases, including cancer.<br />

19. Buys TPH, Wilson IM, Coe BP, Lockwood WW, Davies JJ, Chari R, DeLeeuw RJ, Shadeo<br />

A, MacAulay C, Lam WL. (2005) “Key Features <strong>of</strong> BAC Array Production and Usage” in DNA<br />

Microarrays (Methods Express Series) (Schena M, ed), Scion Publishing, Ltd., Bloxham, pp.<br />

115-145.<br />

This chapter describes the production and use <strong>of</strong> bacterial artificial chromosome microarray-<br />

based CGH and the analysis <strong>of</strong> the data generated by this platform. In addition, protocols for<br />

array CGH experiments are also provided.<br />

Gene expression based studies<br />

20. Coe BP, Chari R, Lockwood WW, Lam WL. (2008) Evolving strategies for global gene<br />

expression analysis <strong>of</strong> cancer. Journal <strong>of</strong> Cellular Physiology, 217(3):590-597.<br />

This publication is a review <strong>of</strong> literature describing the advancement in technology to analyze<br />

gene expression in cancer and the movement <strong>of</strong> the field towards integrative genomics.<br />

21. Shadeo A, Chari R, Lonergan KM, Pusic A, Miller D, Ehlen T, Van Niekerk D, Matisic J,<br />

Richards-Kortum R, Follen M, Guillaud M, Lam WL, MacAulay C. (2008) Up regulation in gene<br />

178


expression <strong>of</strong> chromatin remodelling factors in cervical intraepithelial neoplasia. BMC<br />

Genomics, 9(1):64, 1-14.<br />

Cervical cancer is a major problem in developing countries. Similar to oral cancer, it is thought<br />

to go through a progression <strong>of</strong> histopathological stages and thus identifying markers at stages<br />

<strong>of</strong> intervention are crucial to the prognoses <strong>of</strong> patients with this disease. In this publication, a<br />

comparison <strong>of</strong> normal cervical tissue with cervical intraepithelial neoplasia (CIN) was performed<br />

to identify genes upregulated in CIN. It was found that genes involved in chromatin remodelling<br />

were upregulated in CIN.<br />

22. Shadeo A, Chari R, Vatcher G, Campbell J, Lonergan KM, Matisic J, van NieKerk D, Ehlen<br />

T, Miller D, Follen M, Lam WL, MacAulay C. (2007) Comprehensive serial analysis <strong>of</strong> gene<br />

expression <strong>of</strong> the cervical transcriptome. BMC Genomics, 8(1):142, 1-11.<br />

This publication describes the transcriptome <strong>of</strong> normal cervix tissue using serial analysis <strong>of</strong><br />

gene expression.<br />

Integrative analysis <strong>of</strong> multiple DNA and RNA dimensions<br />

23. Wilson IM, Vucic EA, Chari R, Zhang Y-A, Starczynowski DT, Lonergan KM, Enfield KSS,<br />

Buys TPH, Yee J, Laird-Offringa I, Karsan A, Liu P, You M, Anderson M, MacAulay C, Lam S,<br />

Gazdar AF, Lam WL. (2010) EYA4 is a non-small cell lung cancer tumor suppressor located in<br />

the susceptibility locus on chromosome 6q.<br />

Chromosome arm 6q has been shown to harbor a region associated with lung cancer<br />

susceptibility based on the analysis <strong>of</strong> familial lung cancer datasets. Moreover, this specific<br />

region is also frequently lost in sporadic, non-familial lung cancers as well. Hence, many<br />

studies have been undertaken to identify the gene(s) in this region which may critical to lung<br />

tumorigenesis. In this manuscript, we detail the use <strong>of</strong> a genetic and epigenetic approach to<br />

identify key genes in this region which are frequently deregulated by concerted genetic and<br />

179


epigenetic alteration. This led to the identification <strong>of</strong> the gene EYA4, which we further<br />

demonstrate to have tumor suppressive activity.<br />

24. Soh J, Okumura N, Lockwood WW, Yamamoto H, Shigematsu H, Zhang W, Chari R,<br />

Shames D, Tang X, MacAulay C, Varella-Garcia M, Vooder T, Wistuba II, Lam S, Brekken R,<br />

Toyooka S, Minna JD, Lam WL, Gazdar AF. (2009) Oncogene mutations, copy number gains<br />

and mutant allele specific imbalance (MASI) frequently occur together in tumor cells. PLoS<br />

One, 4(10):e7464, 1-13.<br />

Somatic mutation <strong>of</strong> both oncogenes and tumor suppressor genes have been shown to be<br />

important in cancer. While tumor suppressor genes typically are recessive and require both<br />

alleles to harbor mutation, activating mutations <strong>of</strong> oncogenes generally only require one <strong>of</strong> the<br />

alleles to be mutated. However, it has been shown that for specific activating mutations,<br />

multiple mutated copies can exist. In this study, using some <strong>of</strong> the most commonly mutated<br />

genes in multiple cancer types, the prevalence <strong>of</strong> this phenomenon was assessed in set <strong>of</strong> cell<br />

lines and tumors representing lung, pancreatic and colorectal cancers. It was found that for the<br />

EGFR locus, mutation <strong>of</strong> the gene is accompanied with copy number increase whereby there is<br />

preferential gain <strong>of</strong> the mutated copy and for KRAS, the event observed is acquired uniparental<br />

disomy where the wild type copy is lost and the mutant copy is duplicated.<br />

25. Campbell JM, Lockwood WW, Buys TP, Chari R, Coe BP, Lam S, Lam WL. (2008)<br />

Integrative genomic and gene expression analysis <strong>of</strong> chromosome 7 identified novel oncogene<br />

loci in non-small cell lung cancer. Genome, 51(12): 1032–1039.<br />

Genomic alteration <strong>of</strong> chromosome 7 is a frequent event in non-small cell lung cancer. While<br />

the most commonly known oncogenes on this chromosome include EGFR, MET and BRAF,<br />

there are likely other candidate genes which may have a role in lung tumorigenesis. In this<br />

manuscript, utilizing an integrative genetic and gene expression approach, novel oncogene loci<br />

are identified.<br />

180


26. Buys TPH, Chari R, Lee E, Zhang M, MacAulay C, Lam S, Lam WL, Ling V. (2007)<br />

Genetic changes in the evolution <strong>of</strong> multidrug resistance for cultured human ovarian cancer<br />

cells. Genes, Chromosomes and Cancer, 46(12):1069-79.<br />

Drug resistance is a common problem for cancer patients treated by chemotherapeutics. One<br />

<strong>of</strong> the mechanisms <strong>of</strong> resistance is through the multi-drug resistance phenotype which is <strong>of</strong>ten<br />

associated with the activity <strong>of</strong> ATP-binding cassette (ABC) transporters. In this study, using an<br />

ovarian cancer cell line exposed to increasing concentrations <strong>of</strong> vincristine to derive drug<br />

resistant derivatives, the genetic and gene expression pr<strong>of</strong>iles were compared between these<br />

resistant derivatives and the original cancer cell line. It was found that in while initial resistant<br />

derivatives (lines exposed to lower concentration <strong>of</strong> drug) harbored copy number and gene<br />

expression increase <strong>of</strong> ABCC1 and ABCC6, latter resistant derivatives (lines exposed to higher<br />

concentration <strong>of</strong> drug) did not have the increase in ABCC1 and ABCC6, but had an increase <strong>of</strong><br />

ABCB, suggesting the drug resistance phenotype may be a dynamic process.<br />

27. Coe BP, Lockwood WW, Girard L, Chari R, Minna JD, MacAulay C, Lam S, Gazdar AF,<br />

Lam WL. (2006) Differential regulation <strong>of</strong> cell cycle pathways in small cell and non-small cell<br />

lung cancer. <strong>British</strong> Journal <strong>of</strong> Cancer, (12):1927-35.<br />

Small cell lung cancer (SCLC) and non-small cell lung cancer (SCLC) are the two major cell<br />

types <strong>of</strong> lung cancer. While pathologically they can be distinguished, the molecular basis <strong>of</strong><br />

these two cancer types is not well understood. In this study, a whole genome integrative<br />

genetic and gene expression comparison <strong>of</strong> NSCLC and SCLC was performed and differential<br />

regulation <strong>of</strong> cell cycle pathways was identified. Specifically, NSCLC is primarily deregulated at<br />

the receptor level while SCLC is primarily deregulated at the nuclear transcription factor level.<br />

S<strong>of</strong>tware, analysis approaches, and databases<br />

28. Tsui IFL, Chari R, Buys TPH, Lam WL. (2007) Public databases and s<strong>of</strong>tware for the<br />

pathway analysis <strong>of</strong> cancer genomes. Cancer Informatics, 3:389-407.<br />

181


This manuscript describes the currently available computational resources for the analysis <strong>of</strong><br />

pathways in cancer. Specifically, the use <strong>of</strong> these resources to analyze results from high<br />

throughput studies examining genetic, epigenetic or gene expression alterations.<br />

29. Chari R*, Lockwood WW*, Lam WL. (2006) Computational methods for the analysis <strong>of</strong><br />

array comparative genomic hybridization. Cancer Informatics, (2):48-58.<br />

This manuscript describes the most commonly used analysis strategies for array CGH data and<br />

compares and contrasts these approaches. In addition, the specific features <strong>of</strong> currently<br />

available s<strong>of</strong>tware suites are also compared.<br />

182


APPENDIX II: Description <strong>of</strong> cell lines<br />

Sample ER Status PR Status HER2 Status TP53 Mutation Status**<br />

HCC38 - - +<br />

HCC1008 - - N/A<br />

HCC1143 - - +<br />

HCC1395 + - +<br />

HCC1599 - - +<br />

HCC1937 - - + (heterozygous mutation)<br />

HCC2218 - + + -<br />

BT474 + - + +<br />

MCF7 + + +<br />

MCF10A N/A N/A N/A N/A<br />

** mutation status obtained from the Sanger Cancer Cell Line Project<br />

(http://www.sanger.ac.uk/genetics/CGP/CellLines/)<br />

183


APPENDIX III: Sources <strong>of</strong> data<br />

Sample DNA Copy<br />

Number -<br />

Array CGH<br />

Allelic Status<br />

- Affymetrix<br />

SNP 500K<br />

HCC38 GSE21540 https://cabig.n<br />

ci.nih.gov<br />

(GSE21347)<br />

HCC1008 GSE21540 https://cabig.n<br />

ci.nih.gov<br />

(GSE21347)<br />

HCC1143 GSE21540 https://cabig.n<br />

ci.nih.gov<br />

(GSE21347)<br />

HCC1395 GSE21540 https://cabig.n<br />

ci.nih.gov<br />

(GSE21347)<br />

HCC1599 GSE21540 https://cabig.n<br />

ci.nih.gov<br />

(GSE21347)<br />

HCC1937 GSE21540 https://cabig.n<br />

ci.nih.gov<br />

(GSE21347)<br />

HCC2218 GSE21540 https://cabig.n<br />

ci.nih.gov<br />

(GSE21347)<br />

BT474 GSE21540 https://cabig.n<br />

ci.nih.gov<br />

(GSE21347)<br />

MCF7 GSE21540 https://cabig.n<br />

ci.nih.gov<br />

(GSE21347)<br />

184<br />

DNA<br />

Methylation -<br />

Illumina<br />

Infinium<br />

new data for<br />

this publication<br />

(GSE17769)<br />

new data for<br />

this publication<br />

(GSE17769)<br />

new data for<br />

this publication<br />

(GSE17769)<br />

new data for<br />

this publication<br />

(GSE17769)<br />

new data for<br />

this publication<br />

(GSE17769)<br />

new data for<br />

this publication<br />

(GSE17769)<br />

new data for<br />

this publication<br />

(GSE17769)<br />

new data for<br />

this publication<br />

(GSE17769)<br />

new data for<br />

this publication<br />

(GSE17769)<br />

MCF10A N/A N/A new data for<br />

this publication<br />

(GSE17769)<br />

Gene expression -<br />

Affymetrix U133 Plus<br />

2.0 (NCBI GEO<br />

Accession number<br />

provided)<br />

new data for this<br />

publication (GSE17768)<br />

new data for this<br />

publication (GSE17768)<br />

new data for this<br />

publication (GSE17768)<br />

new data for this<br />

publication (GSE17768)<br />

new data for this<br />

publication (GSE17768)<br />

new data for this<br />

publication (GSE17768)<br />

new data for this<br />

publication (GSE17768)<br />

https://cabig.nci.nih.gov<br />

(GSE17768)<br />

https://cabig.nci.nih.gov<br />

(GSE17768)<br />

GSM254525<br />

(GSE17768)


APPENDIX IV: MCD strategy and Kaplan-Meier analysis <strong>of</strong><br />

TUSC3<br />

185


APPENDIX V: Kaplan-Meier and Oncomine expression<br />

analysis <strong>of</strong> frequent MCD genes<br />

Symbol (+) (-) Total<br />

Survival<br />

Associated*<br />

186<br />

**Status in Tumors (p-value)<br />

SH3TC1 0 6 6 No -<br />

CCNA1 0 5 5 Yes -<br />

COL7A1 0 5 5 No -<br />

KCTD4 0 5 5 N/A Not tested<br />

LMCD1 5 0 5 Yes O3(6.3E-4), O5(3.1E-10)<br />

LYAR 0 5 5 Yes U5(3.8E-4), O3(2.6E-4)<br />

MTMR9 0 5 5 No U3(1.8E-7)<br />

SYT8 0 5 5 N/A Not tested<br />

TUSC3 0 5 5 Yes U5(7.4E-5)<br />

ASAM<br />

0 4 4 N/A Not tested<br />

B3GALNT1 4 0 4 No O3(1.1E-7)<br />

COL17A1 0 4 4 No U1(6.8E-5), U3(1.4E-8)<br />

ELK3 0 4 4 Yes -<br />

FGFR1 0 4 4 Yes U1(2.6E-8),U3(4.2E-6),U4(1.3E-7)<br />

KRT17 0 4 4 No U1 (2.3E-11),U2 (1.1E-7), U3(3.9E-7)<br />

LCP1 0 4 4 Yes -<br />

OSBPL5 0 4 4 N/A Not tested<br />

PSD3 0 4 4 Yes -<br />

SFXN3 0 4 4 N/A Not tested<br />

SH3BGRL3 0 4 4 No O2(2.8E-4)<br />

SNRPN 0 4 4 No U3(5.4E-10), U5(2.7E-5)<br />

TNFRSF10D 0 4 4 No O5(5.2E-4), U3(9.7E-4)<br />

TNS4 0 4 4 N/A Not tested<br />

*Survival associated if gene expression was significant associated with survival in at least one<br />

<strong>of</strong> the two datasets tested (based on p < 0.05 using the log rank test).<br />

**U=underexpressed between tumor and normal, O=overexpressed between tumor and normal<br />

in the particular dataset; The numbers 1-5 indicate the reports from which the data originated,<br />

1= [1], 2= [2], 3=[3], 4=[4], 5=[5], 6=[6]; “-“ indicates gene was either not represented or not<br />

statistically differentially expressed based on group-wise analysis. (+) represents two-fold<br />

overexpression, copy number gain, hypomethylation and allelic imbalance; (-) represents tw<strong>of</strong>old<br />

underexpression, copy number loss, hypermethylation, and LOH in the same sample and<br />

the number <strong>of</strong> samples in our dataset which met this criteria.


REFERENCES<br />

1. Sorlie T, Perou CM, Tibshirani R, Aas T, Geisler S, Johnsen H, Hastie T, Eisen MB, van<br />

de Rijn M, Jeffrey SS et al: Gene expression patterns <strong>of</strong> breast carcinomas<br />

distinguish tumor subclasses with clinical implications. Proc Natl Acad Sci U S A<br />

2001, 98(19):10869-10874.<br />

2. Perou CM, Sorlie T, Eisen MB, van de Rijn M, Jeffrey SS, Rees CA, Pollack JR, Ross<br />

DT, Johnsen H, Akslen LA et al: Molecular portraits <strong>of</strong> human breast tumours.<br />

Nature 2000, 406(6797):747-752.<br />

3. Richardson AL, Wang ZC, De Nicolo A, Lu X, Brown M, Miron A, Liao X, Iglehart JD,<br />

Livingston DM, Ganesan S: X chromosomal abnormalities in basal-like human<br />

breast cancer. Cancer Cell 2006, 9(2):121-132.<br />

4. Radvanyi L, Singh-Sandhu D, Gallichan S, Lovitt C, Pedyczak A, Mallo G, Gish K, Kwok<br />

K, Hanna W, Zubovits J et al: The gene associated with trichorhinophalangeal<br />

syndrome in humans is overexpressed in breast cancer. Proc Natl Acad Sci U S A<br />

2005, 102(31):11005-11010.<br />

5. Finak G, Bertos N, Pepin F, Sadekova S, Souleimanova M, Zhao H, Chen H, Omeroglu<br />

G, Meterissian S, Omeroglu A et al: Stromal gene expression predicts clinical<br />

outcome in breast cancer. Nat Med 2008, 14(5):518-527.<br />

6. Karnoub AE, Dash AB, Vo AP, Sullivan A, Brooks MW, Bell GW, Richardson AL, Polyak<br />

K, Tubo R, Weinberg RA: Mesenchymal stem cells within tumour stroma promote<br />

breast cancer metastasis. Nature 2007, 449(7162):557-563.<br />

187


APPENDIX VI: Summary <strong>of</strong> Kaplan-Meier survival analysis<br />

GeneSymbol<br />

Alternative<br />

Names<br />

van de Vijver - Pvalue<br />

188<br />

Sorlie - Pvalue<br />

SH3TC1 FLJ20356 Fail N/A<br />

CCNA1 0.01484628 N/A<br />

COL7A1 Fail N/A<br />

KCTD4 N/A N/A<br />

LMCD1 Fail 0.00261366<br />

LYAR FLJ20425 0.00551113 Fail<br />

MTMR9 DKFZP434K171 Fail N/A<br />

SYT8 DKFZp434K0322 N/A N/A<br />

TUSC3 N33 0.01696356 Fail<br />

ASAM N/A N/A<br />

B3GALNT1 B3GALT3 Fail N/A<br />

COL17A1 Fail Fail<br />

ELK3 0.04816902 N/A<br />

FGFR1 Fail 0.0147898<br />

KRT17 Fail Fail<br />

LCP1 0.01132949 0.04024164<br />

OSBPL5 N/A N/A<br />

PSD3 DKFZp761K1423 0.00205916 N/A<br />

SFXN3 N/A N/A<br />

SH3BGRL3 N/A Fail<br />

SNRPN Fail Fail<br />

TNFRSF10D Fail N/A<br />

TNS4 N/A N/A<br />

Fail = p-value > 0.05; N/A = not represented on array platform


APPENDIX VII: Copy <strong>of</strong> UBC Research Ethics Board<br />

certificate <strong>of</strong> approval<br />

189


<strong>University</strong> <strong>of</strong> <strong>British</strong> <strong>Columbia</strong> - <strong>British</strong> <strong>Columbia</strong> Cancer Agency<br />

Research Ethics Board (UBC BCCA REB)<br />

UBC BCCA Research Ethics Board<br />

Fairmont Medical Building (6th Floor)<br />

614 - 750 West Broadway<br />

Vancouver, BC V5Z 1H5<br />

Tel: (604) 877-6284 Fax: (604) 708-2132<br />

E-mail: reb@bccancer.bc.ca<br />

Website: http://www.bccancer.bc.ca ><br />

Research Ethics<br />

RISe: http://rise.ubc.ca<br />

Certificate <strong>of</strong> Expedited Approval: Annual<br />

Renewal<br />

PRINCIPAL INVESTIGATOR: INSTITUTION / DEPARTMENT: REB NUMBER:<br />

Wan Lam<br />

BCCA/BCCA/Cancer Genetics &<br />

Development (BCCA)<br />

H08-01392<br />

INSTITUTION(S) WHERE RESEARCH WILL BE CARRIED OUT:<br />

Institution Site<br />

BC Cancer Agency<br />

Other locations where the research will be conducted:<br />

N/A<br />

Vancouver BCCA<br />

PRINCIPAL INVESTIGATOR FOR EACH ADDITIONAL PARTICIPATING BCCA CENTRE:<br />

Vancouver: Wan Lam Vancouver Island: N/A<br />

Fraser Valley: N/A Southern Interior: N/A<br />

Abbotsford Centre: N/A<br />

SPONSORING AGENCIES AND COORDINATING GROUPS:<br />

Canadian Institutes <strong>of</strong> Health Research (CIHR)<br />

PROJECT TITLE:<br />

Development <strong>of</strong> a multi-spectral platform for integrated analysis <strong>of</strong> clinical and research samples.<br />

APPROVAL DATE: EXPIRY DATE OF THIS APPROVAL: PAA#: H08-01392-A003<br />

August 4, 2009 August 4, 2010<br />

CERTIFICATION:<br />

1. The membership <strong>of</strong> the UBC BCCA REB complies with the membership requirements for research ethics<br />

boards defined in Division 5 <strong>of</strong> the Food and Drug Regulations <strong>of</strong> Canada.<br />

2. The UBC BCCA REB carries out its functions in a manner fully consistent with Good Clinical Practices.<br />

3. The UBC BCCA REB has reviewed and approved the research project named on this Certificate <strong>of</strong> Approval<br />

including any associated consent form and taken the action noted above. This research project is to be<br />

conducted by the provincial investigator named above. This review and the associated minutes <strong>of</strong> the UBC<br />

BCCA REB have been documented electronically and in writing.<br />

The UBC BCCA Research Ethics Board has reviewed the documentation for the above named project. The research<br />

study as presented in documentation, was found to be acceptable on ethical grounds for research involving human<br />

subjects and was approved for renewal by the UBC BCCA REB.<br />

UBC BCCA Ethics Board Approval <strong>of</strong> the above has been verified by one <strong>of</strong> the following:<br />

Dr. George Browman, Chair<br />

Dr. Lynne Nakashima, Second Vice-Chair<br />

If you have any questions, please call:<br />

Bonnie Shields, Manager, BCCA Research Ethics Board: 604-877-6284 or e-mail: reb@bccancer.bc.ca<br />

Dr. George Browman, Chair: 604-877-6284 or e-mail: gbrowman@bccancer.bc.ca<br />

Dr. Lynne Nakashima, Second Vice-Chair: 604-707-5989 or e-mail: lnakas@bccancer.bc.ca<br />

https://rise.ubc.ca/rise/Doc/0/JKQ8088GG9RKN55VLAL9OHM869/fromString.html<br />

Page 1 <strong>of</strong> 1<br />

15/04/2010

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!