Chapter 2 - University of British Columbia

DEVELOPMENT AND APPLICATION OF AN INTEGRATIVE GENOMICS APPROACH TO 

LUNG CANCER 

by 

RAJAGOPAL CHARI 

B.Sc., University of British Columbia, 2001 

B.Sc., University of British Columbia, 2004 

A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF 

THE REQUIREMENTS FOR THE DEGREE OF 

DOCTOR OF PHILOSOPHY 

in 

THE FACULTY OF GRADUATE STUDIES 

(Pathology and Laboratory Medicine) 

THE UNIVERSITY OF BRITISH COLUMBIA 

(Vancouver) 

June 2010 

© Rajagopal Chari, 2010

Abstract 

Lung cancer has the highest mortality rate amongst all diagnosed malignancies with 

adenocarcinoma (AC) being the most commonly diagnosed subtype of this disease in North 

America. The dismal survival statistics of lung cancer patients are largely due to the detection 

of the disease at an advanced stage and to a lesser extent, the limited efficacy of current front 

line treatments. 

Genomic approaches, namely gene expression analysis, have provided tremendous insight into 

lung cancer. While many gene expression changes have been identified, most changes are 

likely reactive to changes which have a primary role in cancer development. Moreover, one 

feature which can discern primary from reactive changes is the presence of concordant DNA 

level alteration. 

Many well known genes involved in cancer such as TP53 and CDKN2A have been shown to be 

affected by multiple mechanisms of alteration such as somatic mutation in or loss of DNA 

sequence. For a given gene, one tumor may be affected by one mechanism while another 

tumor may be affected by a different mechanism. Although this level of multi-dimensional 

analysis has been performed for specific genes, such analysis has not been done at the 

genome-wide level. 

This thesis highlights the development and application of a multi-dimensional genetic and 

epigenetic approach to identify frequently aberrant genes and pathways in lung AC. I present, 

first, the design and implementation of the system for integrative genomic multi-dimensional 

analysis of cancer genomes, epigenomes and transcriptomes (SIGMA 2 ). Next, analyzing a 

multi-dimensional dataset generated from ten lung AC specimens with non-malignant controls, I 

identified novel genes and pathways that would have been missed if a non-integrative approach 

were used. Finally, examining genes involved with EGFR signaling, I identified a gene, signal 

receptor protein alpha (SIRPA), which had not been previously shown to be associated with 

lung cancer. 

Taken together, these findings demonstrate the power of a multi-dimensional approach to 

identify important genes and pathways in lung cancer. Moreover, identifying key genes using a 

multi-dimensional approach on a small sample set suggests the need of large datasets may be 

circumvented by using a more comprehensive approach on a smaller set of samples. 

ii

Table of Contents 

Abstract ......................................................................................................................................... ii 

Table of Contents ......................................................................................................................... iii 

List of Tables ............................................................................................................................... vii 

List of Figures ............................................................................................................................. viii 

List of Abbreviations ...................................................................................................................... x 

Acknowledgements ..................................................................................................................... xii 

Dedication ................................................................................................................................... xiii 

Co-Authorship Statement ........................................................................................................... xiv 

Chapter 1: Introduction ................................................................................................................. 1 

1.1 Lung cancer ......................................................................................................................... 2 

1.2 Genomic profiling of lung cancer ......................................................................................... 3 

1.2.1 Gene expression analysis ............................................................................................. 3 

1.2.2 DNA copy number analysis .......................................................................................... 4 

1.2.3 Loss of heterozygosity (LOH) and allelic imbalance ..................................................... 5 

1.3 Somatic mutations in lung cancer ....................................................................................... 5 

1.4 Epigenetic alterations in lung cancer ................................................................................... 6 

1.4.1 DNA methylation ........................................................................................................... 6 

1.5 Current level of integrative analysis .................................................................................... 7 

1.6 Need for an integrative approach to study lung cancer ....................................................... 7 

1.7 Bioinformatic tools for genomic analysis ............................................................................. 8 

1.8 Thesis theme ....................................................................................................................... 9 

1.9 Objectives and hypothesis .................................................................................................. 9 

1.10 Specific aims and outline of thesis .................................................................................. 10 

1.11 Description of high throughput data in this thesis ............................................................ 13 

1.12 Other relevant contributions not included as chapters in this thesis ............................... 13 

1.12.1 Development of tools for genomic analysis .............................................................. 14 

1.12.2 Baseline gene expression in non-malignant lung tissue ........................................... 14 

1.12.3 Differential gene expression analysis in lung cancer ................................................ 15 

1.12.4 Integration of gene dosage and gene expression in lung cancer ............................. 16 

1.13 References ...................................................................................................................... 18 

iii

Chapter 2: SIGMA 2 : A system for the integrative genomic multi-dimensional analysis of cancer 

genomes, epigenomes, and transcriptomes 1 .............................................................................. 24 

2.1 Introduction ........................................................................................................................ 25 

2.2 Implementation .................................................................................................................. 26 

2.3 Results and discussion ...................................................................................................... 26 

2.3.1 Look and feel of SIGMA 2 ............................................................................................ 26 

2.3.2 Description of application scope and functionality ...................................................... 27 

2.3.3 Approach to integration between array platforms and assays .................................... 27 

2.3.4 Format requirements of input data .............................................................................. 27 

2.3.5 Description of user interface ....................................................................................... 28 

2.3.6 Analysis of data from a single assay type ................................................................... 29 

2.3.7 Analysis of data from multiple assays in a given 'omics dimension ............................ 30 

2.3.8 Combinatorial analysis of multiple 'omics dimensions - gene dosage and gene 

expression ........................................................................................................................... 30 

2.3.9 Group comparison analysis - single ‘omics dimension ............................................... 31 

2.3.10 Group comparison analysis - integrating multiple 'omics dimensions ....................... 31 

2.3.11 Multi-dimensional analysis of a breast cancer genome ............................................ 31 

2.3.12 Exporting data and results ........................................................................................ 32 

2.4 Conclusions ....................................................................................................................... 32 

2.5 Availability and requirements ............................................................................................ 33 

2.6 References ........................................................................................................................ 46 

Chapter 3: An integrative multi-dimensional genetic and epigenetic strategy to identify aberrant 

genes and pathways in cancer 2 .................................................................................................. 48 

3.1 Background ....................................................................................................................... 49 

3.2 Methods ............................................................................................................................. 50 

3.2.1 Data generation and acquisition ................................................................................. 50 

3.2.2 Data processing and normalization ............................................................................ 51 

3.2.3 Strategy for integrative analysis .................................................................................. 52 

3.2.4 Multiple concerted disruption (MCD) analysis ............................................................ 53 

3.2.5 Simulated data analysis .............................................................................................. 54 

3.2.6 Pathway enrichment analysis ..................................................................................... 54 

3.2.6 Survival and differential gene expression analysis in publicly available datasets....... 55 

3.3 Results and discussion ...................................................................................................... 55 

3.3.1 Analysis of individual genomic dimensions ................................................................. 55 

iv

3.3.2 Multi-dimensional analysis (MDA) reveals a higher proportion of intra-sample 

deregulated gene expression can be explained when more dimensions are analyzed ....... 56 

3.3.3 MDA reveals genes are disrupted at higher frequencies when examining multiple 

dimensions as compared to any single dimension alone .................................................... 56 

3.3.4 MDA identifies significantly enriched cancer related pathways .................................. 58 

3.3.5 MDA of the Neuregulin signaling pathway reveals a complex pattern of deregulation 

............................................................................................................................................. 59 

3.3.6 Genes exhibiting multiple concerted disruption (MCD) - biological and clinical 

significance .......................................................................................................................... 60 

3.3.7 Association of genes exhibiting MCD and triple negative breast cancers (TNBC) ..... 62 

3.4 Conclusions ....................................................................................................................... 63 

3.5 References ........................................................................................................................ 75 

Chapter 4: Uniparental disomy is a prevalent genetic mechanism of oncogene disruption in lung 

adenocarcinoma 3 ........................................................................................................................ 79 

4.1 Introduction ........................................................................................................................ 80 

4.2 Methods ............................................................................................................................. 81 

4.2.1 Genome wide profiling of clinical lung adenocarcinoma specimens ........................... 81 

4.2.2 Determination of regions of uniparental disomy (UPD) in clinical lung tumors ........... 81 

4.2.3 Determining frequent regions of UPD, gain and loss .................................................. 82 

4.2.4 Determination of UPD in cancer cell lines .................................................................. 82 

4.2.5 Expression analysis of genes in focal regions of UPD ............................................... 82 

4.3 Results .............................................................................................................................. 83 

4.3.1 Detection of UPD using allele specific copy number analysis .................................... 83 

4.3.2 UPD is prevalent and non-random in the lung cancer genome with comparable 

frequencies to gain and loss ................................................................................................ 83 

4.3.3 Overlap of major oncogenes and tumor suppressor genes in regions of gain, loss, and 

UPD ..................................................................................................................................... 84 

4.3.4 UPD is prevalent at oncogenes across multiple cancer types .................................... 84 

4.3.5 Identification of novel candidate oncogenes using focal regions of UPD ................... 85 

4.4 Discussion ......................................................................................................................... 85 

4.5 Conclusion ......................................................................................................................... 87 

4.6 References ...................................................................................................................... 108 

Chapter 5: Integrating the multiple dimensions of genomic and epigenomic landscapes of 

cancer 4 ...................................................................................................................................... 111 

5.1 Introduction ...................................................................................................................... 112 

v

5.2 Genomic alterations ........................................................................................................ 113 

5.2.1 Chromosomal aberrations ........................................................................................ 113 

5.2.2 Gene dosage, allelic imbalance, mutational status ................................................... 113 

5.2.3 Genomic landscape: Gains, losses and uniparental disomy .................................... 116 

5.3 Epigenomic alterations .................................................................................................... 117 

5.3.1 The cancer methylome ............................................................................................. 117 

5.3.2 Integration of cancer genomic and epigenomic events ............................................ 119 

5.4 Relating genetic and epigenetic events to changes in the transcriptome through 

integrative analysis ................................................................................................................ 120 

5.4.1 Multiple mechanisms of gene disruption ................................................................... 121 

5.4.2 Multiple mechanisms of disrupting non-coding RNA levels ...................................... 121 

5.4.3 Multi-dimensional integration of genome, epigenome, and transcriptome ............... 122 

5.4.4 Disruption of multiple components in biological pathways ........................................ 124 

5.4.5 Identification of a novel gene involved with EGFR signaling deregulated in 

adenocarcinoma ................................................................................................................ 125 

5.4.6 Prevalence of SIRPA deregulation and association with clinical characteristics ...... 126 

5.5 Tracking clonal expansion in spatial dimensions ............................................................ 127 

5.6 Evaluating the biological significance of integrative genomics findings .......................... 127 

5.5 References ...................................................................................................................... 144 

Chapter 6: Conclusions ............................................................................................................. 162 

6.1 Summary ......................................................................................................................... 163 

6.1.1 Development of the integrative genetic and epigenetic approach ............................ 163 

6.1.2 Identification of a prevalent genetic alteration in lung adenocarcinoma ................... 164 

6.1.3 Application of the integrative approach to lung adenocarcinoma specimens ........... 165 

6.2 Conclusions ..................................................................................................................... 166 

6.3 Future directions .............................................................................................................. 168 

6.4 References ...................................................................................................................... 171 

APPENDIX I: List of publications .............................................................................................. 174 

APPENDIX II: Description of cell lines ...................................................................................... 183 

APPENDIX III: Sources of data ................................................................................................. 184 

APPENDIX IV: MCD strategy and Kaplan-Meier analysis of TUSC3 ....................................... 185 

APPENDIX V: Kaplan-Meier and Oncomine expression analysis of frequent MCD genes ...... 186 

APPENDIX VI: Summary of Kaplan-Meier survival analysis .................................................... 188 

APPENDIX VII: Copy of UBC Research Ethics Board certificate of approval........................... 189 

vi

List of Tables 

Table 2.1. Features required for integrative analysis .................................................................. 44 

Table 2.2. Summary of Input, analysis, output for each dimension ............................................ 45 

Table 4.1. Regions of the genome exhibiting frequent UPD ....................................................... 99 

Table 4.2. List of major oncogenes and tumor suppressor genes assessed ............................ 101 

Table 4.3. Overlap of oncogenes in frequent regions of genomic alteration ............................. 102 

Table 4.4. Overlap of tumor suppressor genes in frequent regions of genomic alteration ....... 103 

Table 4.5. Cell lines and oncogene loci with homozygous mutation ......................................... 104 

Table 4.6. Summary of homozygous mutation analysis in cancer cell lines ............................. 105 

Table 4.7. RefSeq genes in focal regions of UPD .................................................................... 106 

Table 4.8. Genes overexpressed in focal regions of UPD ........................................................ 107 

Table 5.1. List of software for integrative analysis .................................................................... 141 

Table 5.2. List of genomic resources and databases ............................................................... 142 

Table 5.3. Genes interacting with SIRPA as identified by network analysis ............................. 143 

vii

List of Figures 

Figure 1.1. Multiple mechanisms of alteration leading to same downstream consequences ..... 17 

Figure 2.1. Main structural components of SIGMA2. .................................................................. 34 

Figure 2.2. Data structure hierarchy. .......................................................................................... 35 

Figure 2.3. Algorithm for integrating between different array platforms ...................................... 36 

Figure 2.4. SIGMA2 interface. .................................................................................................... 37 

Figure 2.5. Consensus calling and heterogeneous array analysis. ............................................ 38 

Figure 2.6. Integrative genetic analysis of HCC2218 .................................................................. 40 

Figure 2.7. Two-group two dimensional comparison of 37 NSCLC and 16 SCLC cancer cell 

lines. ............................................................................................................................................ 41 

Figure 2.8. Multi-dimensional perspective of chromosome 17 of the HCC2218 breast cancer cell 

line. ............................................................................................................................................. 42 

Figure 3.1. Genomic profiles of breast cancer cell lines. ............................................................ 65 

Figure 3.2. Quantitative and qualitative benefits of integrative analyses. ................................... 66 

Figure 3.3. Determination and application of a disruption frequency threshold. ......................... 68 

Figure 3.4. Impact of multi-dimensional analysis on low frequency events. ............................... 69 

Figure 3.5. Pathway analysis of the 1162 genes identified by multi-dimensional analysis. ........ 70 

Figure 3.6. Complex deregulation of the Neuregulin/ERBB2 signaling pathway. ....................... 71 

Figure 3.7. Deregulation of PTEN occurs differently between samples. ..................................... 72 

Figure 3.8. Multiple concerted disruption (MCD) analysis and its application to triple negative 

breast cancer. ............................................................................................................................. 73 

Figure 4.1. Detection of UPD using allele specific copy number. ............................................... 88 

Figure 4.2. Comparison of frequent regions of gain, loss and UPD in the lung adenocarcinoma 

genome ....................................................................................................................................... 90 

Figure 4.3. Venn diagram illustrating the amount of the genome covered by gain, loss, and UPD 

.................................................................................................................................................... 92 

Figure 4.4. Genomic profile of an individual lung adenocarcinoma sample ............................... 93 

Figure 4.5. Examination of UPD events at the KRAS and RB1 loci ............................................ 95 

Figure 4.6. Relationship of homozygous mutation at oncogenes and genomic alteration .......... 96 

Figure 4.7. Identification of E2F3 in a focal region of UPD ......................................................... 97 

Figure 5.1. Advances in cancer genomic landscape post Y2K. ................................................ 129 

Figure 5.2. SNP array analysis to identify areas of altered copy number and allelic composition 

in a clinical lung cancer specimen. ........................................................................................... 130 

viii

Figure 5.3. Overlay of chromosomal regions of gain, loss and UPD (copy number neutral LOH) 

inherent to the T47D breast cancer cell line. ............................................................................ 131 

Figure 5.4. Integration of copy number, allelic status, DNA methylation, and gene expression for 

a single lung adenocarcinoma sample. ..................................................................................... 132 

Figure 5.5. Integration of copy number, allelic status, DNA methylation, and gene expression for 

a single lung adenocarcinoma sample. ..................................................................................... 134 

Figure 5.6. Identification of multiple disrupted components in a biological pathway. ................ 136 

Figure 5.7. Multi-dimensional analysis of the epidermal growth factor receptor signaling 

pathway. .................................................................................................................................... 137 

Figure 5.8. Prevalence of SIRPA underexpression and its relationship with PTPN6 and smoking 

status. ....................................................................................................................................... 138 

Figure 5.9. Kaplan-Meier analysis of SIRPA in four independent microarray datasets. ........... 139 

Figure 5.10. Automated detection of selected clonal populations of cells within a cancer biopsy 

tissue section. ........................................................................................................................... 140 

ix

List of Abbreviations 

Abbreviation Definition 

AC Adenocarcinoma 

ASCN Allele specific copy number 

BRAF v-raf murine sarcoma viral oncogene homolog B1 

CDKN2A Cyclin-dependent kinase inhibitor 2A 

CGH Comparative Genomic Hybridization 

CNV Copy number variation 

DNA Deoxyribonucleic Acid 

EGFR Epidermal Growth Factor Receptor 

FISH Fluorescence in-situ hybridization 

GWAS Genome wide association studies 

KRAS v-Ki-ras2 Kirsten rat sarcoma viral oncogene 

LOH Loss of Heterozygosity 

MASI Mutant allele specific imbalance 

MCD Multiple Concerted Disruption 

MDA Multi-Dimensional Analysis 

MUC1 Mucin 1 

NSCLC Non-small cell lung cancer 

PCR Polymerase Chain Reaction 

qPCR Quantitative PCR 

x

RB1 Retinoblastoma 1 

RNA Ribonucleic Acid 

RRM2 Ribonucleotide Reductase Subunit M2 

SIGMA System for integrative genomic microarray analysis 

SIGMA2 System for integrative genomic multi-dimensional analysis 

SIRPA Signal Regulatory Protein Alpha 

SKY Spectral karyotyping 

SNP Single nucleotide polymorphism 

TUSC3 Tumor suppressor candidate 3 

UPD Uniparental Disomy 

xi

Acknowledgements 

I would like to acknowledge the contributions of many of my colleagues in the Wan Lam Lab 

who contributed to this work, especially the co-authors of each of the manuscript chapters 

presented herein. Detailed acknowledgements from the published version of Chapter 2 is listed 

below: 

Chapter 2: We thank William W. Lockwood and Timon P.H. Buys for useful discussion and 

critical reading of manuscript, Ashleen Shadeo for providing data for breast cancer samples, 

and Anna Chu, Byron Cline, Devon Macey, Andrew Thomson, Lan Wei, Reginald Sacdalan, 

Tiffany Chao, and Laura Aslan for help with software development. 

I would also like to acknowledge generous scholarship support from the Canadian Institutes of 

Health Research and Michael Smith Foundation for Health Research. 

The research presented in this thesis was funded by the following granting agencies: Genome 

Canada/ Genome British Columbia, Canadian Cancer Society Research Institute (CCS20485), 

Canadian Institute of Health Research (MOP 86731, MOP 77903), National Institutes of Health 

(R01 DE15965-01), National Cancer Institute Early Detection Research Network (5U01 

CA84971-10), Canary Foundation, and Canadian Breast Cancer Research Alliance. 

xii

Dedication 

To my family. 

xiii

Co-Authorship Statement 

Chapters 2 to 5 were co-authored as manuscripts for publication. The following author lists 

apply for each chapter: 

Chapter 2: Chari R, Coe BP, Wedseltoft C, Benetti M, Wilson IM, Vucic EA, MacAulay C, Ng 

RT, Lam WL. (2008) SIGMA2: a system for the integrative genomic multi-dimensional analysis 

of cancer genomes, epigenomes, and transcriptomes. BMC Bioinformatics, 9(1):422, 1-12. 

Contribution: I am the first author of this manuscript. I designed and developed the software 

and wrote the manuscript. The co-authors of this manuscript were either undergraduate 

students who I mentored on this project or were fellow graduate students who tested the 

software and provided important user feedback. 

Chapter 3: Chari R, Coe BP, Vucic EA, Lockwood WW, Lam WL. (2010) An integrative multidimensional 

genetic and epigenetic strategy to identify aberrant genes and pathways in cancer. 

BMC Systems Biology, 4(1):67, 1-14. 

Contribution: I am the first author of this manuscript. I acquired most of the data through 

generating genomic profiles and downloaded the rest of the data from public resources. I 

conceived the analysis for the manuscript and wrote the manuscript. 

Chapter 4: Chari R, Lockwood WW, Soh J, Coe BP, Tam K, MacAulay C, Minna JD, Lam S, 

Gazdar AF, Lam WL. (2010) Uniparental disomy is a prevalent mechanism of genetic alteration 

in lung adenocarcinoma. 

Contribution: I am the first author of this manuscript. I generated all of the data and performed 

all of the analyses for this manuscript and my co-authors provided useful information through 

comments and other supporting data. 

xiv

Chapter 5: Chari R, Thu KL, Wilson IM, Lockwood WW, Lonergan KM, Coe BP, Malloff CA, 

Gazdar AF, Lam S, Garnis C, MacAulay CE, Alvarez CE, Lam WL. (2010) Integrating the 

multiple dimensions of genomic and epigenomic landscapes of cancer. Cancer and Metastasis 

Reviews, 29(1):73-93. 

Contribution: I am the first author of this manuscript. I orchestrated the study, performed all 

analyses and wrote the manuscript with the help of my supervisor. Other co-authors provided 

useful information, data, or comments. 

xv

Chapter 1: Introduction 

1

1.1 Lung cancer 

Lung cancer has the highest mortality rate amongst all diagnosed malignancies [1]. In 2009, it 

is estimated that 24,000 individuals will be diagnosed with lung cancer with approximately 

21,000 individuals succumbing to this disease (Canadian Cancer Statistics 2009, 

www.cancer.ca). Lung cancer is classified into two main types: non-small cell lung cancer 

(NSCLC) and small cell lung cancer (SCLC) and within NSCLC, the two major histological 

subtypes are adenocarcinoma (AC) and squamous cell carcinoma (SqCC) with large cell 

carcinoma (LCC) being the third most common histological subtype . AC accounts for the 

highest percentage of all lung cancer cases, representing almost half of all NSCLCs diagnosed. 

The primary etiological factor associated with lung cancer is tobacco smoke exposure. While 

the majority of lung cancer patients have a heavy smoke exposure history, there is an 

increasing percentage of lung cancer patients (25%) where primary smoke exposure is not the 

associated cause of the disease [2]. Moreover, when examining the association of smoke 

history and histological subtypes diagnosed, while all subtypes have an association with smoke 

exposure, SCLC and SqCC show the most strongest associations [3]. In addition, amongst 

never smokers, the majority of cases are of the adenocarcinoma subtype [2]. 

Examining across the spectrum of all NSCLC patients, independent of stage, only 15% of all 

lung cancer patients will achieve five-year survival with the median survival time of lung cancer 

patients less than one year. Stratification by stage reveals that those individuals diagnosed 

early (stage IA) have a superior rate of five year survival as compared to those diagnosed late 

(stage IV) (50% vs. 2%) [4]. Given the overall survival independent of stage is closer to stage 

IV than stage IA, it is clear that the paltry survival statistics are largely due to the late diagnosis 

of this disease and to a lesser extent, the nominal response rate observed by conventional 

chemotherapies [5]. 

2

While overall therapeutic strategies have provided limited benefit to prolonging patient survival, 

there has been moderate success in the application of targeted therapeutics. Specifically, 

pharmacological agents against the epidermal growth factor receptor (EGFR) tyrosine kinase 

have shown selective efficacy in a subset of lung AC patients [6-12]. Hence, in addition to 

improving early detection strategies, another main focus of lung cancer research is the 

identification of novel therapeutic targets. One such approach that can be used to identify 

targets is through the application of genomic tools to clinical lung cancer specimens. 

1.2 Genomic profiling of lung cancer 

1.2.1 Gene expression analysis 

One of the first applications of high throughput genome technologies was to the assessment of 

messenger RNA (mRNA) levels [13, 14]. While the first, landmark cancer-related studies were 

done in breast and hematological malignancies [15-17], substantial findings were made in the 

analysis of lung cancer. Specifically, lung cancer gene expression studies have identified 

genes differentially expressed in tumors, genes associated with angiogenic potential, genes 

associated with chemoresistance, expression signatures defining subclasses of lung cancer, 

expression signatures associated with patient prognosis, and expression signatures from 

normal bronchial epithelium samples to detect lung cancer [18-34]. In addition, much work has 

also been done to understand baseline gene expression in non-malignant lung tissue as well its 

changes with respect to heavy smoke exposure [35-38]. These studies are as important as 

studies involving lung cancer samples as they provide an important reference level of gene 

expression to decipher the dysregulated gene expression in tumors. 

However, from a given analysis of differential expression in tumors, there are typically 

hundreds, if not thousands, of genes which may show aberrant gene expression in tumors when 

compared to non-malignant tissue. Moreover, it is likely that a proportion of the genes which 

are aberrantly expressed are not integral or causal to tumor development as many gene 

3

expression changes are reactive to changes in expression of other genes. In addition, using 

gene expression alone, one cannot discern which changes are causal and which changes are 

reactive. One approach to assign causality with gene expression changes is to identify 

alterations at the DNA level such as somatic mutation, changes in gene dosage (DNA copy 

number), or epigenetic changes such aberrant DNA methylation or histone modification which 

can explain the observed differential expression. 

1.2.2 DNA copy number analysis 

Alterations in gene dosage, whereby segments of DNA in the genome are either replicated or 

lost, have shown to be important in lung cancer [39-41]. Typically, these gains and losses of 

DNA are detected through the comparison of a genome from a tumor sample with a genome 

that is normal or non-malignant. It is thought that these increases or decreases in amounts of 

specific gene sequence could allow for increased or decreased expression of that gene. 

Technological advances have allowed for the high throughput assessment of DNA copy number 

changes in the cancer genome namely through microarray comparative genomic hybridization 

(CGH) [42, 43]. Briefly, this technology capitalizes on differential fluorescence labelling where 

DNA from the tumor sample and DNA from the normal sample, each labelled with different 

fluorescent dyes, are hybridized together on the same chip and differences in fluorescent 

intensities are measured. Moreover, array CGH profiling of both lung cancer cell lines and 

tumors have identified areas of the genome which are frequently gained or lost [44-52]. 

Specifically, these areas of copy number alteration have targeted known oncogenes such as 

MYC, EGFR, MDM2, TERT, and tumor suppressor genes such as CDKN2A, TP53 and RB1. 

However, these alterations typically do not occur in 100% of lung tumors (e.g. the EGFR locus 

is gained/amplified in 10-20% of cases). In addition, the amplification and deletion events 

typically encompass multiple genes and as such, more often than not, only a subset of those 

genes will have a downstream consequence at the gene expression level. Hence, integration of 

4

gene dosage with gene expression analysis would be useful to discern the target gene(s) of a 

given copy number alteration. 

1.2.3 Loss of heterozygosity (LOH) and allelic imbalance 

Loss of heterozygosity (LOH) is a common genetic event in cancer [53]. In the normal cell, 

each somatic chromosome has two copies, with one copy (or allele) originiating from each 

parent. Subsequently, in the tumor, a specific segment from one of the copies of the 

chromosome is lost, resulting in loss of heterozygosity. 

Frequent regions of LOH have also been identified in the lung cancer genome [54-58]. While 

initial studies involved the use of microsatellite markers placed throughout the genome and 

thus, the resolution of these changes were limited, the application of SNP arrays were able to 

refine these areas into specific chromosome arms [45, 46]. In addition to advances in SNP 

array technology, analysis approaches were also developed that increased the detection 

sensitivity of regions of LOH / allelic imbalance [59-63]. Although most areas with altered gene 

dosage will also be detected as LOH (in case of copy number loss) and allelic imbalance (in 

case of copy number gain), there are also areas in the genome which exhibit LOH but no 

change in copy number, termed copy neutral LOH or uniparental disomy. However, the role of 

UPD in lung cancer is not well understood. 

1.3 Somatic mutations in lung cancer 

Somatic mutations have also shown to be important in cancer development. In addition, 

mutational analysis is also used for screening purposes in high risk populations (e.g. BRCA1/2 

and hereditary breast cancer) as well as criteria for receiving targeted chemotherapy (e.g. 

EGFR mutation and EGFR inhibitors). Many studies have been undertaken to identify 

mutations in genes involved with important cellular processes pertinent to the cancer phenotype 

such as DNA repair and cellular proliferation and have successfully identified key genes to a 

5

number of different cancer types. Moreover, it can be classified that while oncogenes typically 

harbour activating mutations, tumor suppressor genes often harbour inactivating mutations. 

In lung adenocarcinoma, the most well known genes shown to be mutated are EGFR, KRAS, 

LKB1 (or STK11), TP53 and CDKN2A [30, 54, 64, 65], with some mutations such as EGFR and 

KRAS showing preferential mutation patterns based on smoking history. A recent study 

assessing other well known oncogenes and tumor suppressor genes showed there were a 

number of other genes also observed to be mutated in lung adenocarcinoma [64]. However, 

due to technological and material limitations at the time, many of these studies only assess 

small numbers of genes in a given study and thus, genome wide screening for somatic 

mutations is unfeasible. While high throughput sequencing technologies to assess sequence 

mutation on a genome scale have become available, challenges associated with cost and data 

analysis preclude the use in a routine manner. 

1.4 Epigenetic alterations in lung cancer 

1.4.1 DNA methylation 

Another DNA level mechanism which can affect gene expression is through the methylation of 

DNA at gene promoters. DNA methylation is a reversible chemical modification which has 

shown to have a prominent role in the silencing of tumor suppressor genes. Specifically, this 

modification targets cytosines whereby a methyl (CH3) is added to the carbon 5 moiety of 

cytosine. 

It is thought that in cancer, the majority of the genome loses its methylation but small areas in 

the gene promoters, known as CpG islands, gain methylation [66-69]. Generally, it is thought 

that the acquired methylation targets tumor suppressor genes while the areas of lost 

methylation facilitate the activation of repetitive areas of the genome which can lead to 

increased genomic instability. In addition, aberrant DNA methylation of critical genes have been 

6

utilized for early detection purposes as well as a target for therapeutic intervention [70-72], 

emphasizing its key role in cancer. 

In lung cancer, a number of specific genes such as CDKN2A (or p16), RASSF1A, and MGMT 

have shown to harbour increased promoter methylation [73]. While many of these methylation 

events were discovered using single locus assays, recent advances have allowed for the high 

throughput analysis of 1000s of genes in a single experiment [74-79]. As such, applications of 

these high throughput approaches in lung cancer are likely to identify novel methylated genes. 

Similar to array CGH analysis, though many methylated genes are likely to be identified, it will 

be important to validate if these alterations affect downstream gene expression. 

1.5 Current level of integrative analysis 

At the time this thesis started, there were a small number of whole genome integrative studies 

which primarily focused on the integration of gene dosage and gene expression. In fact, the 

majority of the integrative analysis would be done at single locus level such as the examination 

of gene dosage and expression of HER2 (ERBB2) oncogene in breast cancer [80]. Moreover, 

there were a limited number of gene dosage or gene expression studies in lung cancer. 

However, from recent studies involving multiple cancer types, including lung cancer, it has been 

shown that anywhere between 20% and 60% of genes in regions of copy number change also 

exhibit a concerted change in gene expression [52, 81-84]. Conversely, when the proportion of 

differential expression associated with gene dosage alteration was examined, it was found that 

only 11% of the observed differential expression could be attributed to high level DNA copy 

number change [83]. Thus, it is clear that gene dosage alterations are responsible for only a 

part of the overall dysregulated gene expression and that other mechanisms are likely involved. 

1.6 Need for an integrative approach to study lung cancer 

As discussed earlier, a gene such as CDKN2A has been shown to be inactivated by both gene 

dosage loss and increased promoter methylation. Thus, it is very likely that when examining a 

7

large number of tumors, that a given gene may be affected by one mechanism in tumor (e.g. 

gene dosage increase) and another mechanism in a different tumor (e.g. DNA 

hypomethylation), but both leading to the same net effect (Figure 1.1). In addition, if the specific 

event (e.g. gene dosage increase) occurs at a low frequency, but cumulatively, the deregulation 

occurs at a high frequency, then examining only gene dosage or DNA methylation would 

preclude the identification of such potentially important genes. Hence, it should be apparent 

that an integrative, multi-dimensional genetic and epigenetic approach is needed to identify 

novel genes which would have escaped previous, single dimensional analyses. 

1.7 Bioinformatic tools for genomic analysis 

While many software packages exist for the analysis of high throughput gene expression data 

[85-88], at the start of my thesis project, software packages for the visualization and analysis of 

DNA copy number data were very limited [89-95]. A summary of array CGH analysis 

methodologies and software packages is provided in this review [96]. Moreover, three of the 

key challenges at the time were (i) the increase in data generated from a single experiment, (ii) 

the effective visualization of this data for easy interpretation, and (iii) the microarray platform 

dependence of the majority of software packages. 

With respect to the increase in data generation, the first generation of microarrays used for 

array CGH typically comprised of two to three thousand data points. As such, software for both 

visualization and analysis were developed to effectively handle this level of data complexity. 

For example, since array CGH data in fact represents discrete levels of copy number 

throughout the genome, one of the data analysis steps required is segmentation which 

effectively smoothes data based on genomic position. The first version of DNACopy [97], one 

of the first algorithms to segment array CGH data, would need a significant amount of time to 

execute when applied to arrays with 100,000 data points or greater and eventually, a new 

version of the program was developed a few years later [98]. Similarly, in terms of visualization, 

most programs displayed array CGH data in an ordinal manner whereby the relative genomic 

8

position was on the x-axis and the log ratio of the data point was drawn on the y-axis. While 

this type of visualization can provide a quick genome summary of a single sample, it is difficult 

to readily link to information such as protein coding genes from this type of visualization. 

Finally, software developed by microarray manufacturers such as Affymetrix, Agilent or 

Nimblegen were specifically tailored to handle data from their respective microarray platforms. 

Thus, aggregate analysis of data emanating from different microarray platforms, but analyzing 

samples with common characteristics, could not be analyzed in a concerted manner resulting in 

under-utilization of the increasingly available array CGH data in the public domain. Most 

importantly, no tools existed to integrate multiple dimensions of data such as global gene 

dosage and gene expression, let alone integration with DNA methylation. Hence, it is clear that 

with these apparent challenges, the development of such bioinformatic tools was needed. 

1.8 Thesis theme 

The theme of this thesis is the development and utilization of an integrative genetic and 

epigenetic approach to identify novel aberrant genes and pathways that may be involved in the 

tumorigenesis of lung adenocarcinoma. This will be achieved by employing genome wide 

genetic and epigenetic profiling experiments of lung adenocarcinoma samples and the 

subsequent integration of this data using novel bioinformatics tools and approaches. 

1.9 Objectives and hypothesis 

The objective of this work is to demonstrate the importance of employing an integrative 

approach to understand genetic and epigenetic alterations and their consequence on gene 

expression. The hypothesis can be broken down to three parts: 

(A) Genes/pathways which are important to tumorigenesis are disrupted by multiple 

mechanisms in lung cancer. 

9

(B) By using an integrative approach, looking at the global genetic and epigenetic regulation of 

gene expression, changes at the DNA level which have downstream effects at the gene 

expression level will be identified. 

(C) This approach will lead to the identification of more genes that are disrupted than previously 

anticipated and these genes will be enriched in key pathways and functions important to lung 

tumorigenesis. 

1.10 Specific aims and outline of thesis 

This thesis consists of four manuscripts assembled in a non-chronological order to best address 

the objectives and hypothesis of this thesis. 

Aim 1: Development of a platform for multi 'omics data integration and analysis 

Chapter 2 discusses the development of an integrative analysis software package called a 

system for the integrative genomic multi-dimensional analysis of cancer genomes, epigenomes, 

and transcriptomes (SIGMA 2 ). The development of this application was necessary prior to the 

undertaking of the analysis of the vast amount of data generated from the utilized high 

throughput, genome-wide technologies. 

As discussed earlier in this chapter, there were very few bioionformatic tools available for the 

analysis of array CGH data, let alone for integrative analysis of gene dosage and gene 

expression. Prior to the development of SIGMA 2 , I developed the pre-cursor version of this 

software SIGMA [95]. SIGMA provided the basic framework in terms of the user interfaces, 

database communication, data structures and "look and feel" that would be utilized in SIGMA 2 . 

Moreover, one of the key challenges when SIGMA was developed was the effective 

visualization and analysis of large datasets generated by newer, high density array CGH 

platforms. At the time, the majority of data that were generated were on platforms comprised of 

3000 measurements per sample but, newer technologies were being developed which 

10

generated over 500,000 data points per sample, representing a 100-fold increase in information 

obtained from each experiment [99]. Hence, the base software architecture used in SIGMA 2 

was already capable of handling large amounts of data. 

Aim 2: Demonstration of an integrative approach using model systems 

Chapter 3 discusses the demonstration of an integrative, multi-dimensional approach on tumor 

cell line model systems. Using a set of breast cancer cell lines, I examine the gene dosage, 

allelic composition, DNA methylation, and gene expression profiles in an integrative manner to 

delineate which genes and pathways would be missed or less significant if such an approach 

was not used. This demonstrative study was needed to show the key advantages and benefits 

of an integrative approach. While cell lines are artificial systems and may have acquired 

alterations that are beneficial to grow in vitro, it is important that a sample source was used 

where material limitations did not exist. For each of the genetic or epigenetic profiling studies, 

sufficient amounts of DNA and RNA are needed and when more assays are done in a given 

sample, more material is required. Moreover, when whole tumor samples are microdissected to 

ensure high tumor cell purity, this inherently will reduce the amount of usable sample material. 

As such, it is important that the quantitative and qualitative benefits of utilizing an integrative 

approach are sufficient to warrant using clinical samples. At the time this study was initiated, 

SNP array and array CGH profiles were available for breast cancer cell lines and thus, only 

generation of DNA methylation and gene expression profiles were needed to complete this set. 

Given the purpose of this study was to demonstrate the effectiveness of the integrative 

approach, while data from lung cancer cell lines would have been most optimal, the source of 

data has limited relevance to the purpose of this aim. 

Aim 3: Characterization of DNA level alterations in lung adenocarcinoma 

A number of studies have been done to identify gene dosage alterations in lung cancer and in 

lung adenocarcinoma specifically. These studies were done on a number of different array 

11

platforms, with one of the latest studies done using Affymetrix SNP arrays. One of the benefits 

of Affymetrix SNP arrays is the ability to simultaneously detect changes in gene dosage as well 

as allelic imbalance. Allelic imbalance, though should be determined using a patient matched 

non-malignant sample as a control, has also been determined using a pool of unmatched non- 

malignant samples. While the ability to detect imbalance using unmatched control samples is 

important when matched control samples are not available, this may falsely score regions as 

imbalanced but in fact are not, and vice versa. In addition, samples in these different studies 

were typically not microdissected and thus, tumor cell purity in the samples would be variable. 

Thus, those samples with low tumor cell content would make it difficult to detect genetic 

alterations. Moreover, there has been a recent drastic increase in resolution with the newest 

SNP arrays, with the ability to measure over 4X as many SNPs and over 8X as many spots for 

gene dosage. Chapter 4 discusses the application of a new SNP array technology to 

microdissected lung adenocarcinoma specimens with the goal of identifying genetic alterations 

at the highest resolution currently available. 

Aim 4: Application of an integrative approach to clinical lung adenocarcinoma 

specimens 

With the approach and necessary tools developed and now demonstrated to be beneficial using 

a model dataset, chapter 5 discusses the application of the integrative approach to lung 

adenocarcinoma specimens. While the published chapter provides an overview of cancer 

genome and epigenome landscapes, sections 5.4.3 to 5.4.4 present some of the quantitative 

and qualitative benefits of integrative analysis specific to the analysis of a lung adenocarcinoma 

dataset. In addition, sections 5.4.5 to 5.4.6 discusses key findings in terms of genes and 

pathways that were identified from this integrative analysis. 

12

1.11 Description of high throughput data in this thesis 

Throughout this thesis, a number of platform technologies were utilized to generate high 

throughput, genome wide data. Below is a summary of all platforms used in each chapter. 

In Chapter 3, for nine breast cancer cell lines and one control cell line (MCF10A), the following 

profiles were generated: Affymetrix SNP 500K for the analysis of allelic status; whole genome 

tiling path array CGH for the analysis of gene dosage; Illumina Infinium HumanMethylation27 for 

DNA methylation analysis; and Affymetrix U133 Plus 2.0 for the analysis of gene expression. 

In Chapter 4, for the 46 tumors and matched non-malignant tissue as well as the cancer cell 

lines, the Affymetrix SNP 6.0 platform was utilized to measure total copy number and allelic 

imbalance. For a subset of tumors, gene expression profiles were generated using a custom 

Affymetrix platform designed by Rosetta Inpharmatics. 

In Chapter 5, for the ten tumors and matched non-malignant tissue samples, the following 

profiles were generated: Affymetrix SNP 6.0 for the analysis of allelic status and gene dosage; 

Illumina Infinium HumanMethylation27 for DNA methylation analysis; and Affymetrix HuEx 1.0 

ST array for the analysis of gene expression. Quantitative RT-PCR was performed using the 

Applied Biosystems TaqMan gene expression assay. 

1.12 Other relevant contributions not included as chapters in this 

thesis 

In this thesis, I have chosen to include a small portion of my overall work in order to achieve a 

coherent theme. However, in this section, I have outlined specific contributions, which I’ve 

either led or participated as 2 nd author, that I’ve deemed are relevant to the theme of lung 

cancer and genomics. 

13

1.12.1 Development of tools for genomic analysis 

As mentioned earlier, the precursor version of SIGMA2 was SIGMA [95]. This tool was built as 

an interactive database of cancer cell line array CGH profiles and provided a means for 

effective visualization of high density array CGH data as well as sharing of data. One of the 

other problems that arose for high density array CGH data is the availability of efficient analysis 

algorithms to delineate gains and losses. Most algorithms that were developed for array CGH 

analysis were developed for arrays with 2000 to 3000 data points and their execution times did 

not scale up efficiently when the arrays were generating 100,000 to 1,000,000 data points. To 

address this problem, I contributed to the development of a segmentation and calling algorithm 

named FACADE [100]. 

1.12.2 Baseline gene expression in non-malignant lung tissue 

Though gene expression studies studying malignant samples are important, it is also critical to 

define what genes are expressed in non-malignant samples as these are used in reference to 

determine aberrant gene expression. There were two studies I was involved with which 

addressed this question. First, we examined gene expression of non-malignant, smoke 

damaged bronchial epithelium using serial analysis of gene expression (SAGE) [37]. We found 

that there were specific genes that showed high tissue specificity to the bronchial epithelium 

with limited representation in other tissues and that there were differences between bronchial 

epithelial samples and lung parenchyma, which are samples adjacent to tumors typically used 

as non-malignant controls comprised of a mixture of different cells. 

In the study described above, bronchial epithelium samples from current and former smokers 

were grouped together. Hence, the next logical question was to assess the effect of active 

smoking on the bronchial epithelium. In the second study, a group of never smoker samples 

were added to the groups of current and former smokers and the gene expression profiles of 

the three groups were compared. We first identified a set of genes which were differentially 

14

expressed in response to smoke exposure and found a subset of genes that were reversible 

upon smoking cessation and another subset of genes irreversible upon smoking cessation [35]. 

Those genes which were irreversibly altered after heavy smoke exposure may have implications 

in affecting future risk of developing lung cancer. Moreover, these findings also suggest that 

when trying to identify cancer-specific changes when unmatched control samples are not 

available, clinical characteristics such as smoking status should be taken into consideration 

when comparisons are made. 

1.12.3 Differential gene expression analysis in lung cancer 

With the non-malignant, baseline gene expression defined, differential gene expression in early 

stages of lung cancer and locally invasive squamous cell carcinoma were then assessed [101]. 

It was found that genes associated with epidermal development were increased in expression 

and mucociliary function were decreased in expression in carcinoma-in-situ as well as in 

precancerous stages. Finally, genes associated with tissue re-modelling were also altered in 

expression in local invasive cancer and also showed altered expression in carcinoma-in-situ, 

suggesting this function is affected early in cancer development. 

The Wnt pathway has been shown to be aberrant in many cancer types. At the time the study 

began, there were two branches of the pathway that were known to exist, canonical and non- 

canonical, whose activation resulted in different downstream consequences. While the 

canonical branch was the primary focus of most researchers studying this pathway, we sought 

to assess the role of the non-canonical branch in lung squamous cell carcinoma using semi- 

quantitative and quantitative PCR of genes which were a part of the non-canonical branch [102]. 

From this study, it was found that (i) these non-canonical genes were expressed in the normal 

lung and (ii) some of these non-canonical genes were differentially expressed in tumors. 

An important consideration in the analysis of differential gene expression in cancer is the use of 

suitable reference genes for data normalization. This consideration is critical to both 

15

quantitative PCR experiments as well as microarray experiments where relative quantifications 

are typically used. To address this, using SAGE, where quantification of expression is absolute, 

genes whose expression was constant across both malignant and non-malignant samples were 

identified [103]. Those genes demonstrated better constancy than some genes which are 

typically used as controls for gene expression analysis. 

1.12.4 Integration of gene dosage and gene expression in lung cancer 

The first level of genomic integration that needed to be accomplished was the integration of 

gene dosage and gene expression. In one study using cancer cell lines, hot spots of DNA 

amplification were identified throughout the genome. When specifically examining lung cancer 

cell lines and subsequently coupling this with gene expression data, it was found that 50% of 

genes in these frequently amplified regions show correlation between gene dosage and gene 

expression [52]. Moreover, it was also observed that different components of the EGFR 

signaling pathway were amplified in different cell lines illustrating that for a given pathway, one 

can underestimate the frequency of pathway disruption when only well known genes in the 

pathway are assessed. 

In a second study involving clinical lung tumors, a genomic region which was preferentially 

amplified in squamous cell carcinomas as compared to adenocarcinomas was identified. 

Further integration with gene expression data allowed for the identification of the target gene, 

BRF2, in this amplified region [104]. Moreover, gene dosage and protein expression level 

assessment of CIS samples for BRF2 showed that amplification and overexpression were 

present, suggesting that this event is occurring at an early stage of tumorigenesis. 

16

Figure 1.1 

a Normal Tumor b Normal Tumor 

c Normal Tumor 

Copy Number Loss / 

Loss of heterozygosity (LOH) 

d 

e 

Normal 

Tumor 

Normal 

Tumor 

Allelic Imbalance 

AATACGCGCGCGTCGCATCCAGCATGAACAGA 

TTATGCGCGCGCAGCGTAGGTCGTACTTGTCT 



DNA Hypermethylation 



AATACGCGCGCGTCGCATCCAGCATGAACTGA 

TTATGCGCGCGCAGCGTAGGTCGTACTTGACT 

Somatic mutation 

Figure 1.1. Multiple mechanisms of alteration leading to the same downstream consequences. 

(a) Illustration of copy number loss. Loss of a particular chromosomal region 

in tumors. (b) Illustration of allelic imbalance. While both alleles are present, there is a 

preferential increase of one of the alleles. (c) Ilustration of uniparental disomy. While 

overall the total number of DNA copies is normal, one part of an allele is lost and replaced 

by a part from the other allele. (d) Promoter hypermethylation in tumors which results in 

suppression of gene transcription. (e) Somatic mutation in the tumor which can lead to the 

transcription of a truncated (possibly non-functional) transcript. Mechanisms shown in (a), 

(c), (d), and (e) can lead to the same net downstream effect in loss of gene and protein 

expression. For (a), (b), and (c), though whole chormosomes are shown, these events can 

vary in scale from a focal region of change to a whole chormosome arm. The green arrow 

represents the transcription start site. 

17 

Uniparental disomy (UPD) 

Premature stop, 

truncated transcript

1.13 References 

1. Jemal A, Siegel R, Ward E, Hao Y, Xu J, Thun MJ: Cancer statistics, 2009. CA Cancer 

J Clin 2009, 59(4):225-249. 

2. Sun S, Schiller JH, Gazdar AF: Lung cancer in never smokers--a different disease. 

Nat Rev Cancer 2007, 7(10):778-790. 

3. Khuder SA: Effect of cigarette smoking on major histological types of lung cancer: 

a meta-analysis. Lung Cancer 2001, 31(2-3):139-148. 

4. Detterbeck FC, Boffa DJ, Tanoue LT: The new lung cancer staging system. Chest 

2009, 136(1):260-271. 

5. Herbst RS, Lynch TJ, Sandler AB: Beyond doublet chemotherapy for advanced nonsmall-cell 

lung cancer: combination of targeted agents with first-line 

chemotherapy. Clin Lung Cancer 2009, 10(1):20-27. 

6. Kim KS, Jeong JY, Kim YC, Na KJ, Kim YH, Ahn SJ, Baek SM, Park CS, Park CM, Kim 

YI et al: Predictors of the response to gefitinib in refractory non-small cell lung 

cancer. Clin Cancer Res 2005, 11(6):2244-2251. 

7. Kim TE, Murren JR: Erlotinib OSI/Roche/Genentech. Curr Opin Investig Drugs 2002, 

3(9):1385-1395. 

8. Miller VA, Kris MG, Shah N, Patel J, Azzoli C, Gomez J, Krug LM, Pao W, Rizvi N, Pizzo 

B et al: Bronchioloalveolar pathologic subtype and smoking history predict 

sensitivity to gefitinib in advanced non-small-cell lung cancer. J Clin Oncol 2004, 

22(6):1103-1109. 

9. Mitsudomi T, Kosaka T, Endoh H, Horio Y, Hida T, Mori S, Hatooka S, Shinoda M, 

Takahashi T, Yatabe Y: Mutations of the epidermal growth factor receptor gene 

predict prolonged survival after gefitinib treatment in patients with non-small-cell 

lung cancer with postoperative recurrence. J Clin Oncol 2005, 23(11):2513-2520. 

10. Pao W, Miller V, Zakowski M, Doherty J, Politi K, Sarkaria I, Singh B, Heelan R, Rusch 

V, Fulton L et al: EGF receptor gene mutations are common in lung cancers from 

"never smokers" and are associated with sensitivity of tumors to gefitinib and 

erlotinib. Proc Natl Acad Sci U S A 2004, 101(36):13306-13311. 

11. Sirotnak FM, Zakowski MF, Miller VA, Scher HI, Kris MG: Efficacy of cytotoxic agents 

against human tumor xenografts is markedly enhanced by coadministration of 

ZD1839 (Iressa), an inhibitor of EGFR tyrosine kinase. Clin Cancer Res 2000, 

6(12):4885-4892. 

12. Paez JG, Janne PA, Lee JC, Tracy S, Greulich H, Gabriel S, Herman P, Kaye FJ, 

Lindeman N, Boggon TJ et al: EGFR mutations in lung cancer: correlation with 

clinical response to gefitinib therapy. Science 2004, 304(5676):1497-1500. 

13. Schena M, Shalon D, Davis RW, Brown PO: Quantitative monitoring of gene 

expression patterns with a complementary DNA microarray. Science 1995, 

270(5235):467-470. 

14. Schena M, Shalon D, Heller R, Chai A, Brown PO, Davis RW: Parallel human genome 

analysis: microarray-based expression monitoring of 1000 genes. Proc Natl Acad 

Sci U S A 1996, 93(20):10614-10619. 

15. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh 

ML, Downing JR, Caligiuri MA et al: Molecular classification of cancer: class 

discovery and class prediction by gene expression monitoring. Science 1999, 

286(5439):531-537. 

16. Perou CM, Sorlie T, Eisen MB, van de Rijn M, Jeffrey SS, Rees CA, Pollack JR, Ross 

DT, Johnsen H, Akslen LA et al: Molecular portraits of human breast tumours. 

Nature 2000, 406(6797):747-752. 

17. Sorlie T, Perou CM, Tibshirani R, Aas T, Geisler S, Johnsen H, Hastie T, Eisen MB, van 

de Rijn M, Jeffrey SS et al: Gene expression patterns of breast carcinomas 

18

distinguish tumor subclasses with clinical implications. Proc Natl Acad Sci U S A 

2001, 98(19):10869-10874. 

18. Fukumoto S, Yamauchi N, Moriguchi H, Hippo Y, Watanabe A, Shibahara J, Taniguchi 

H, Ishikawa S, Ito H, Yamamoto S et al: Overexpression of the aldo-keto reductase 

family protein AKR1B10 is highly correlated with smokers' non-small cell lung 

carcinomas. Clin Cancer Res 2005, 11(5):1776-1785. 

19. Heighway J, Knapp T, Boyce L, Brennand S, Field JK, Betticher DC, Ratschiller D, 

Gugger M, Donovan M, Lasek A et al: Expression profiling of primary non-small cell 

lung cancer for target identification. Oncogene 2002, 21(50):7749-7763. 

20. Hu J, Bianchi F, Ferguson M, Cesario A, Margaritora S, Granone P, Goldstraw P, Tetlow 

M, Ratcliffe C, Nicholson AG et al: Gene expression signature for angiogenic and 

nonangiogenic non-small-cell lung cancer. Oncogene 2005, 24(7):1212-1219. 

21. Larsen JE, Pavey SJ, Passmore LH, Bowman R, Clarke BE, Hayward NK, Fong KM: 

Expression profiling defines a recurrence signature in lung squamous cell 

carcinoma. Carcinogenesis 2007, 28(3):760-766. 

22. Larsen JE, Pavey SJ, Passmore LH, Bowman RV, Hayward NK, Fong KM: Gene 

expression signature predicts recurrence in lung adenocarcinoma. Clin Cancer 

Res 2007, 13(10):2946-2954. 

23. Lau SK, Boutros PC, Pintilie M, Blackhall FH, Zhu CQ, Strumpf D, Johnston MR, Darling 

G, Keshavjee S, Waddell TK et al: Three-gene prognostic classifier for early-stage 

non small-cell lung cancer. J Clin Oncol 2007, 25(35):5562-5569. 

24. Oshita F, Ikehara M, Sekiyama A, Hamanaka N, Saito H, Yamada K, Noda K, Kameda 

Y, Miyagi Y: Genomic-wide cDNA microarray screening to correlate gene 

expression profile with chemoresistance in patients with advanced lung cancer. J 

Exp Ther Oncol 2004, 4(2):155-160. 

25. Potti A, Mukherjee S, Petersen R, Dressman HK, Bild A, Koontz J, Kratzke R, Watson 

MA, Kelley M, Ginsburg GS et al: A genomic strategy to refine prognosis in earlystage 

non-small-cell lung cancer. N Engl J Med 2006, 355(6):570-580. 

26. Raponi M, Zhang Y, Yu J, Chen G, Lee G, Taylor JM, Macdonald J, Thomas D, 

Moskaluk C, Wang Y et al: Gene expression signatures for predicting prognosis of 

squamous cell and adenocarcinomas of the lung. Cancer Res 2006, 66(15):7466- 

7472. 

27. Remmelink M, Mijatovic T, Gustin A, Mathieu A, Rombaut K, Kiss R, Salmon I, 

Decaestecker C: Identification by means of cDNA microarray analyses of gene 

expression modifications in squamous non-small cell lung cancers as compared 

to normal bronchial epithelial tissue. Int J Oncol 2005, 26(1):247-258. 

28. Singhal S, Amin KM, Kruklitis R, DeLong P, Friscia ME, Litzky LA, Putt ME, Kaiser LR, 

Albelda SM: Alterations in cell cycle genes in early stage lung adenocarcinoma 

identified by expression profiling. Cancer Biol Ther 2003, 2(3):291-298. 

29. Spira A, Beane JE, Shah V, Steiling K, Liu G, Schembri F, Gilman S, Dumas YM, Calner 

P, Sebastiani P et al: Airway epithelial gene expression in the diagnostic evaluation 

of smokers with suspect lung cancer. Nat Med 2007, 13(3):361-366. 

30. Sun Z, Wigle DA, Yang P: Non-overlapping and non-cell-type-specific gene 

expression signatures predict lung cancer survival. J Clin Oncol 2008, 26(6):877- 

883. 

31. Wang T, Hopkins D, Schmidt C, Silva S, Houghton R, Takita H, Repasky E, Reed SG: 

Identification of genes differentially over-expressed in lung squamous cell 

carcinoma using combination of cDNA subtraction and microarray analysis. 

Oncogene 2000, 19(12):1519-1528. 

32. Wikman H, Seppanen JK, Sarhadi VK, Kettunen E, Salmenkivi K, Kuosma E, Vainio- 

Siukola K, Nagy B, Karjalainen A, Sioris T et al: Caveolins as tumour markers in lung 

cancer detected by combined use of cDNA and tissue microarrays. J Pathol 2004, 

203(1):584-593. 

19

33. Garber ME, Troyanskaya OG, Schluens K, Petersen S, Thaesler Z, Pacyna-Gengelbach 

M, van de Rijn M, Rosen GD, Perou CM, Whyte RI et al: Diversity of gene expression 

in adenocarcinoma of the lung. Proc Natl Acad Sci U S A 2001, 98(24):13784-13789. 

34. Shedden K, Taylor JM, Enkemann SA, Tsao MS, Yeatman TJ, Gerald WL, Eschrich S, 

Jurisica I, Giordano TJ, Misek DE et al: Gene expression-based survival prediction in 

lung adenocarcinoma: a multi-site, blinded validation study. Nat Med 2008, 

14(8):822-827. 

35. Chari R, Lonergan KM, Ng RT, MacAulay C, Lam WL, Lam S: Effect of active smoking 

on the human bronchial epithelium transcriptome. BMC Genomics 2007, 8:297. 

36. Spira A, Beane J, Shah V, Liu G, Schembri F, Yang X, Palma J, Brody JS: Effects of 

cigarette smoke on the human airway epithelial cell transcriptome. Proc Natl Acad 

Sci U S A 2004, 101(27):10143-10148. 

37. Lonergan KM, Chari R, Deleeuw RJ, Shadeo A, Chi B, Tsao MS, Jones S, Marra M, 

Ling V, Ng R et al: Identification of novel lung genes in bronchial epithelium by 

serial analysis of gene expression. Am J Respir Cell Mol Biol 2006, 35(6):651-661. 

38. Beane J, Sebastiani P, Liu G, Brody JS, Lenburg ME, Spira A: Reversible and 

permanent effects of tobacco smoke exposure on airway epithelial gene 

expression. Genome Biol 2007, 8(9):R201. 

39. Balsara BR, Testa JR: Chromosomal imbalances in human lung cancer. Oncogene 

2002, 21(45):6877-6883. 

40. Sato M, Shames DS, Gazdar AF, Minna JD: A translational view of the molecular 

pathogenesis of lung cancer. J Thorac Oncol 2007, 2(4):327-343. 

41. Thomas RK, Weir B, Meyerson M: Genomic approaches to lung cancer. Clin Cancer 

Res 2006, 12(14 Pt 2):4384s-4391s. 

42. Albertson DG, Collins C, McCormick F, Gray JW: Chromosome aberrations in solid 

tumors. Nat Genet 2003, 34(4):369-376. 

43. Lockwood WW, Chari R, Chi B, Lam WL: Recent advances in array comparative 

genomic hybridization technologies and their applications in human genetics. Eur 

J Hum Genet 2006, 14(2):139-148. 

44. Garnis C, Lockwood WW, Vucic E, Ge Y, Girard L, Minna JD, Gazdar AF, Lam S, 

MacAulay C, Lam WL: High resolution analysis of non-small cell lung cancer cell 

lines by whole genome tiling path array CGH. Int J Cancer 2006, 118(6):1556-1564. 

45. Janne PA, Li C, Zhao X, Girard L, Chen TH, Minna J, Christiani DC, Johnson BE, 

Meyerson M: High-resolution single-nucleotide polymorphism array and clustering 

analysis of loss of heterozygosity in human lung cancer cell lines. Oncogene 2004, 

23(15):2716-2726. 

46. Zhao X, Li C, Paez JG, Chin K, Janne PA, Chen TH, Girard L, Minna J, Christiani D, Leo 

C et al: An integrated view of copy number and allelic alterations in the cancer 

genome using single nucleotide polymorphism arrays. Cancer Res 2004, 

64(9):3060-3071. 

47. Zhao X, Weir BA, LaFramboise T, Lin M, Beroukhim R, Garraway L, Beheshti J, Lee JC, 

Naoki K, Richards WG et al: Homozygous deletions and chromosome 

amplifications in human lung carcinomas revealed by single nucleotide 

polymorphism array analysis. Cancer Res 2005, 65(13):5561-5570. 

48. Weir BA, Woo MS, Getz G, Perner S, Ding L, Beroukhim R, Lin WM, Province MA, Kraja 

A, Johnson LA et al: Characterizing the cancer genome in lung adenocarcinoma. 

Nature 2007, 450(7171):893-898. 

49. Chitale D, Gong Y, Taylor BS, Broderick S, Brennan C, Somwar R, Golas B, Wang L, 

Motoi N, Szoke J et al: An integrated genomic analysis of lung cancer reveals loss 

of DUSP4 in EGFR-mutant tumors. Oncogene 2009, 28(31):2773-2783. 

50. Kendall J, Liu Q, Bakleh A, Krasnitz A, Nguyen KC, Lakshmi B, Gerald WL, Powers S, 

Mu D: Oncogenic cooperation and coamplification of developmental transcription 

factor genes in lung cancer. Proc Natl Acad Sci U S A 2007, 104(42):16663-16668. 

20

51. Tonon G, Wong KK, Maulik G, Brennan C, Feng B, Zhang Y, Khatry DB, Protopopov A, 

You MJ, Aguirre AJ et al: High-resolution genomic profiles of human lung cancer. 

Proc Natl Acad Sci U S A 2005, 102(27):9625-9630. 

52. Lockwood WW, Chari R, Coe BP, Girard L, Macaulay C, Lam S, Gazdar AF, Minna JD, 

Lam WL: DNA amplification is a ubiquitous mechanism of oncogene activation in 

lung and other cancers. Oncogene 2008, 27(33):4615-4624. 

53. Cavenee WK: Loss of heterozygosity in stages of malignancy. Clin Chem 1989, 

35(7 Suppl):B48-52. 

54. Bepler G, Garcia-Blanco MA: Three tumor-suppressor regions on chromosome 11p 

identified by high-resolution deletion mapping in human non-small-cell lung 

cancer. Proc Natl Acad Sci U S A 1994, 91(12):5513-5517. 

55. Fong KM, Zimmerman PV, Smith PJ: Microsatellite instability and other molecular 

abnormalities in non-small cell lung cancer. Cancer Res 1995, 55(1):28-30. 

56. Merlo A, Gabrielson E, Askin F, Sidransky D: Frequent loss of chromosome 9 in 

human primary non-small cell lung cancer. Cancer Res 1994, 54(3):640-642. 

57. Merlo A, Gabrielson E, Mabry M, Vollmer R, Baylin SB, Sidransky D: Homozygous 

deletion on chromosome 9p and loss of heterozygosity on 9q, 6p, and 6q in 

primary human small cell lung cancer. Cancer Res 1994, 54(9):2322-2326. 

58. Sundaresan V, Heppell-Parton A, Coleman N, Miozzo M, Sozzi G, Ball R, Cary N, 

Hasleton P, Fowler W, Rabbitts P: Somatic genetic changes in lung cancer and 

precancerous lesions. Ann Oncol 1995, 6 Suppl 1:27-31; discussion 31-22. 

59. Huang J, Wei W, Chen J, Zhang J, Liu G, Di X, Mei R, Ishikawa S, Aburatani H, Jones 

KW et al: CARAT: a novel method for allelic detection of DNA copy number 

changes using high density oligonucleotide arrays. BMC Bioinformatics 2006, 7:83. 

60. Yamamoto G, Nannya Y, Kato M, Sanada M, Levine RL, Kawamata N, Hangaishi A, 

Kurokawa M, Chiba S, Gilliland DG et al: Highly sensitive method for genomewide 

detection of allelic composition in nonpaired, primary tumor specimens by use of 

affymetrix single-nucleotide-polymorphism genotyping microarrays. Am J Hum 

Genet 2007, 81(1):114-126. 

61. LaFramboise T, Weir BA, Zhao X, Beroukhim R, Li C, Harrington D, Sellers WR, 

Meyerson M: Allele-specific amplification in cancer revealed by SNP array 

analysis. PLoS Comput Biol 2005, 1(6):e65. 

62. Li C, Beroukhim R, Weir BA, Winckler W, Garraway LA, Sellers WR, Meyerson M: Major 

copy proportion analysis of tumor samples using SNP arrays. BMC Bioinformatics 

2008, 9:204. 

63. Lin M, Wei LJ, Sellers WR, Lieberfarb M, Wong WH, Li C: dChipSNP: significance 

curve and clustering of SNP-array-based loss-of-heterozygosity data. 

Bioinformatics 2004, 20(8):1233-1240. 

64. Ding L, Getz G, Wheeler DA, Mardis ER, McLellan MD, Cibulskis K, Sougnez C, 

Greulich H, Muzny DM, Morgan MB et al: Somatic mutations affect key pathways in 

lung adenocarcinoma. Nature 2008, 455(7216):1069-1075. 

65. Suda K, Tomizawa K, Mitsudomi T: Biological and clinical significance of KRAS 

mutations in lung cancer: an oncogenic driver that contrasts with EGFR mutation. 

Cancer Metastasis Rev 2010. 

66. Feinberg AP: Phenotypic plasticity and the epigenetics of human disease. Nature 

2007, 447(7143):433-440. 

67. Feinberg AP, Gehrke CW, Kuo KC, Ehrlich M: Reduced genomic 5-methylcytosine 

content in human colonic neoplasia. Cancer Res 1988, 48(5):1159-1161. 

68. Feinberg AP, Tycko B: The history of cancer epigenetics. Nat Rev Cancer 2004, 

4(2):143-153. 

69. Lo PK, Sukumar S: Epigenomics and breast cancer. Pharmacogenomics 2008, 

9(12):1879-1902. 

70. Decitabine: 2'-deoxy-5-azacytidine, Aza dC, DAC, dezocitidine, NSC 127716. Drugs 

R D 2003, 4(3):179-184. 

21

71. Shivapurkar N, Gazdar AF: DNA Methylation Based Biomarkers in Non-Invasive 

Cancer Screening. Curr Mol Med. 

72. Anglim PP, Alonzo TA, Laird-Offringa IA: DNA methylation-based biomarkers for 

early detection of non-small cell lung cancer: an update. Mol Cancer 2008, 7:81. 

73. Heller G, Zielinski CC, Zochbauer-Muller S: Lung cancer: From single-gene 

methylation to methylome profiling. Cancer Metastasis Rev 2010. 

74. Bibikova M, Lin Z, Zhou L, Chudin E, Garcia EW, Wu B, Doucet D, Thomas NJ, Wang Y, 

Vollmer E et al: High-throughput DNA methylation profiling using universal bead 

arrays. Genome Res 2006, 16(3):383-393. 

75. Weber M, Davies JJ, Wittig D, Oakeley EJ, Haase M, Lam WL, Schubeler D: 

Chromosome-wide and promoter-specific analyses identify sites of differential 

DNA methylation in normal and transformed human cells. Nat Genet 2005, 

37(8):853-862. 

76. Shames DS, Girard L, Gao B, Sato M, Lewis CM, Shivapurkar N, Jiang A, Perou CM, 

Kim YH, Pollack JR et al: A genome-wide screen for promoter methylation in lung 

cancer identifies novel methylation markers for multiple malignancies. PLoS Med 

2006, 3(12):e486. 

77. Thu KL, Pikor LA, Kennett JY, Alvarez CE, Lam WL: Methylation analysis by DNA 

immunoprecipitation. J Cell Physiol 2009, 222(3):522-531. 

78. Khulan B, Thompson RF, Ye K, Fazzari MJ, Suzuki M, Stasiek E, Figueroa ME, Glass 

JL, Chen Q, Montagna C et al: Comparative isoschizomer profiling of cytosine 

methylation: the HELP assay. Genome Res 2006, 16(8):1046-1055. 

79. Omura N, Li CP, Li A, Hong SM, Walter K, Jimeno A, Hidalgo M, Goggins M: Genomewide 

profiling of methylated promoters in pancreatic adenocarcinoma. Cancer Biol 

Ther 2008, 7(7):1146-1156. 

80. Slamon DJ, Godolphin W, Jones LA, Holt JA, Wong SG, Keith DE, Levin WJ, Stuart SG, 

Udove J, Ullrich A et al: Studies of the HER-2/neu proto-oncogene in human breast 

and ovarian cancer. Science 1989, 244(4905):707-712. 

81. Coe BP, Chari R, Lockwood WW, Lam WL: Evolving strategies for global gene 

expression analysis of cancer. J Cell Physiol 2008, 217(3):590-597. 

82. Heidenblad M, Lindgren D, Veltman JA, Jonson T, Mahlamaki EH, Gorunova L, van 

Kessel AG, Schoenmakers EF, Hoglund M: Microarray analyses reveal strong 

influence of DNA copy number alterations on the transcriptional patterns in 

pancreatic cancer: implications for the interpretation of genomic amplifications. 

Oncogene 2005, 24(10):1794-1801. 

83. Hyman E, Kauraniemi P, Hautaniemi S, Wolf M, Mousses S, Rozenblum E, Ringner M, 

Sauter G, Monni O, Elkahloun A et al: Impact of DNA amplification on gene 

expression patterns in breast cancer. Cancer Res 2002, 62(21):6240-6245. 

84. Pollack JR, Sorlie T, Perou CM, Rees CA, Jeffrey SS, Lonning PE, Tibshirani R, 

Botstein D, Borresen-Dale AL, Brown PO: Microarray analysis reveals a major direct 

role of DNA copy number alteration in the transcriptional program of human 

breast tumors. Proc Natl Acad Sci U S A 2002, 99(20):12963-12968. 

85. Brazma A, Robinson A, Cameron G, Ashburner M: One-stop shop for microarray 

data. Nature 2000, 403(6771):699-700. 

86. Tusher VG, Tibshirani R, Chu G: Significance analysis of microarrays applied to the 

ionizing radiation response. Proc Natl Acad Sci U S A 2001, 98(9):5116-5121. 

87. Rajagopalan D: A comparison of statistical methods for analysis of high density 

oligonucleotide array data. Bioinformatics 2003, 19(12):1469-1476. 

88. Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, Speed TP: 

Exploration, normalization, and summaries of high density oligonucleotide array 

probe level data. Biostatistics 2003, 4(2):249-264. 

89. Myers CL, Chen X, Troyanskaya OG: Visualization-based discovery and analysis of 

genomic aberrations in microarray data. BMC Bioinformatics 2005, 6:146. 

22

90. Chen W, Erdogan F, Ropers HH, Lenzner S, Ullmann R: CGHPRO -- a comprehensive 

data analysis tool for array CGH. BMC Bioinformatics 2005, 6:85. 

91. Kim SY, Nam SW, Lee SH, Park WS, Yoo NJ, Lee JY, Chung YJ: ArrayCyGHt: a web 

application for analysis and visualization of array-CGH data. Bioinformatics 2005, 

21(10):2554-2555. 

92. Autio R, Hautaniemi S, Kauraniemi P, Yli-Harja O, Astola J, Wolf M, Kallioniemi A: CGH- 

Plotter: MATLAB toolbox for CGH-data analysis. Bioinformatics 2003, 19(13):1714- 

1715. 

93. Lingjaerde OC, Baumbusch LO, Liestol K, Glad IK, Borresen-Dale AL: CGH-Explorer: a 

program for analysis of array-CGH data. Bioinformatics 2005, 21(6):821-822. 

94. Chi B, DeLeeuw RJ, Coe BP, MacAulay C, Lam WL: SeeGH--a software tool for 

visualization of whole genome array comparative genomic hybridization data. 

BMC Bioinformatics 2004, 5:13. 

95. Chari R, Lockwood WW, Coe BP, Chu A, Macey D, Thomson A, Davies JJ, MacAulay 

C, Lam WL: SIGMA: a system for integrative genomic microarray analysis of 

cancer genomes. BMC Genomics 2006, 7:324. 

96. Chari R, Lockwood WW, Lam WL: Computational methods for the analysis of array 

comparative genomic hybridization. Cancer Inform 2007, 2:48-58. 

97. Olshen AB, Venkatraman ES, Lucito R, Wigler M: Circular binary segmentation for 

the analysis of array-based DNA copy number data. Biostatistics 2004, 5(4):557-572. 

98. Venkatraman ES, Olshen AB: A faster circular binary segmentation algorithm for 

the analysis of array CGH data. Bioinformatics 2007, 23(6):657-663. 

99. Coe BP, Ylstra B, Carvalho B, Meijer GA, Macaulay C, Lam WL: Resolving the 

resolution of array CGH. Genomics 2007, 89(5):647-653. 

100. Coe BP, Chari R, MacAulay C, Lam WL: FACADE: A fast and sensitive algorithm for 

the segmentation and calling of high resolution array CGH data. Nucleic Acids Res 

2010, Revision. 

101. Lonergan KM, Chari R, Coe BP, Wilson IM, Tsao MS, Ng RT, MacAulay C, Lam S, Lam 

WL: Transcriptome profiles of carcinoma-in-situ and invasive non-small cell lung 

cancer as revealed by SAGE. PLoS One 2010, Accepted. 

102. Lee EHL, Chari R, Lam A, Ng RT, Yee J, English J, Evans KG, MacAulay C, Lam S, 

Lam WL: Disruption of the non-canonical WNT pathway in lung squamous cell 

carcinoma. Clinical Medicine: Oncology 2008, 2:169-179. 

103. Chari R, Lonergan KM, Pikor LA, Coe BP, Zhu CQ, Chan THW, MacAulay C, Tsao MS, 

Lam S, Ng RT et al: A sequence-based approach to identify reference genes for 

gene expression analysis. BMC Medical Genomics 2010, Submitted. 

104. Lockwood WW, Chari R, Coe BP, Thu KL, Garnis C, Malloff CA, Campbell J, Williams 

AC, Hwang D, Zhu CQ et al: BRF2 – A Novel Lineage Specific Oncogene in Lung 

Squamous Cell Carcinoma. PLoS Med 2010, Revisions. 

23

Chapter 2: SIGMA 2 : A system for the integrative genomic 

multi-dimensional analysis of cancer genomes, epigenomes, 

and transcriptomes 1 

1 A version of this chapter has been published. Chari R, Coe BP, Wedseltoft C, Benetti M, 

Wilson IM, Vucic EA, MacAulay C, Ng RT, Lam WL. (2008) SIGMA2: A system for the 

integrative genomic multi-dimensional analysis of cancer genomes, epigenomes, and 

transcriptomes. BMC Bioinformatics 9:422. doi:10.1186/1471-2105-9-422. Please see the 

published version of this chapter for all supplementary materials. 

24

2.1 Introduction 

Multiple mechanisms of gene disruption have been shown to be important in the development of 

cancer. Genetic alterations (mutations, changes in gene dosage, allele imbalance) and 

epigenetic alterations (changes in DNA methylation and histone modification states) are 

responsible for changing the expression of genes. High throughput approaches have afforded 

the ability to interrogate the genomic, epigenomic and gene expression (transcriptomic) profiles 

at unprecedented resolution [1-6]. However, a gene can be disrupted by one or by a 

combination of mechanisms, therefore, investigation in a single 'omics dimension (genomics, 

epigenomics, or transcriptomics) alone cannot detect all disrupted genes in a given tumor. 

Moreover, individual tumors may have different patterns of gene disruption, by different 

mechanisms for a given gene while achieving the same net effect on phenotype. Hence, a 

multi-dimensional approach is required to identify the causal events at the DNA level and 

understand their downstream consequences. 

The current state of software for global profile comparison typically focuses on analyzing and 

displaying data from a single dimension, for example CGH Fusion (infoQuant Ltd, London, UK) 

for DNA copy number profile analysis and GeneSpring (Agilent Technologies, Santa Clara, CA, 

USA) for gene expression profile analysis. Software for integrative analysis have been 

restricted to working with datasets derived from limited combination of technology platforms 

(Table 1) [7-10]. Though different software can analyze data generated from different 

platforms, the ability to perform meta-analysis using data from multiple microarray platforms is 

limited to a small number of software packages. Consequently, integrative analysis of cancer 

genomes typically involves no more than two types of data, most commonly the integration of 

gene dosage and gene expression data [11-16] and recently expanded to integrating allelic 

information [17]. Software to perform multi-dimensional analysis are therefore greatly in 

demand. 

25

Here, we present SIGMA 2 , a novel software package which allows users to integrate data from 

the various 'omics disciplines such as genomics, epigenomics and transcriptomics. Multi- 

dimensional datasets can be simultaneously compared, analyzed and visualized with respect to 

individual dimensions, allowing combinatorial integration of the different assays belonging to the 

different 'omics. The identification of genes altered at multiple levels such as copy number, 

LOH, DNA methylation and the detection of consequential changes in gene expression can be 

concertedly performed, establishing SIGMA 2 as a tool to facilitate the high throughput systems 

biology analysis of cancer. SIGMA 2 is freely available for academic and research use from our 

website, http://www.flintbox.com/technology.asp?Page=3716. 

2.2 Implementation 

SIGMA 2 is implemented in Java, and requires version 1.6+ of the runtime compiler. In addition, 

the statistical package R and database application MySQL are also required. The java interface 

communicates with MySQL using a JDBC connector and with R using the JRI package by JGR 

(Figure 2.1). MySQL is used for data storage and querying while R is used for the 

segmentation and statistical analysis. All genomic coordinate information was obtained from 

University of California Santa Cruz (UCSC) genome databases [18]. 

2.3 Results and discussion 

2.3.1 Look and feel of SIGMA 2 

The novel multi-dimensional ‘omics data analysis software SIGMA 2 is built on the framework of 

a facile visualization tool called SIGMA, which can display alignment of genomic data from a 

built-in static database [7]. The arsenal of functionalities introduced in SIGMA 2 is shown in 

Table 2.1. 

26

2.3.2 Description of application scope and functionality 

SIGMA 2 is built to handle a variety of analysis techniques typically used in the high-throughput 

study of cancer, allowing the combinatorial integration of multiple 'omics disciplines. The 

hierarchy, which underlies the program, groups data into genome, epigenome, and 

transcriptome is shown in Figure 2.2a and the overall functionality map is given in Figure 2.2b 

and listed in Table 2.2. With each 'omics dimension, data sets may be imported representing 

any of the major types of biological measurements being assayed, for example, (i) examining 

both DNA copy number and LOH assays within the genomic bundle, (ii) examining both DNA 

methylation and histone modification status within the epigenomics bundle, and (iii) examining 

both gene expression profiles and microRNAs expression assays within the transcriptomic 

bundle. Each assay may branch into data sources from a multitude of technology platforms. 

2.3.3 Approach to integration between array platforms and assays 

SIGMA 2 treats all data in the context of genome position based on the relevant human genome 

build using the UCSC genome assemblies. An interval-based approach is used to sample 

across different array platforms and assays and data from each interval are merged together. 

Briefly, this is done by querying data at fixed genomic intervals for each platform and 

subsequently taking an average of the measurements within each interval. The algorithm is 

listed in Figure 2.3. 

2.3.4 Format requirements of input data 

Standard tab-delimited text files are used for the input of data for all of the assay types. For 

genomic data, specifically array CGH, normalization is recommended using external algorithms 

such as CGH-Norm and MANOR [19, 20]. Segmentation analysis can be performed within 

SIGMA 2 , but results from external analysis can be imported and used in the consensus calling 

feature. The algorithms which can be called within SIGMA 2 currently include DNACopy and 

GLAD [21, 22]. Multiple sample batch importing is available to facilitate efficient loading of 

27

datasets. To utilize this, the user must create an information file which describes each sample 

in the dataset. Formatting requirements of the information file are specified in the manual. 

Alternatively, for Affymetrix SNP array analysis, data should also be pre-processed and 

normalized using the appropriate software, such as CNAG before importing into SIGMA 2 [23]. 

Genotyping calls should be made prior to importing, using the "AA", "AB" and "BB" convention. 

If the genotype call does not exist, "NC" must be specified. For epigenomic data, data from 

affinity based-approaches (MeDIP [6] and ChIP [24]) should contain a value representing the 

level of enrichment and the genomic coordinates for each spot. Similarly, for bisulphite-based 

approaches [25], a percent of converted CpGs should be provided along with the genomic 

coordinates for each spot. Finally, for transcriptome data, gene expression data from Affymetrix 

experiments can be directly imported and processed as CEL files and are normalized using the 

MAS 5.0 algorithm implemented in the "affy" package of R. For any assay type, custom data 

can be imported whereby the user provides a map of the platform based on the given genome 

build, and the unique identifier for the map must be used for the data generated from those 

experiments. 

2.3.5 Description of user interface 

The main user interface in SIGMA 2 utilizes a tabbed window-pane which allows the user to 

open multiple visualizations simultaneously (Figure 2.4). The left part of the window manages 

the analyses and projects which belong to the current user and button shortcuts for the main 

functionality are spread along the top of the window. Using an example of an array CGH profile 

from the Agilent 244K platform, we demonstrate the step-wise interrogation of a region of 

interest [26]. Briefly, using the highlighting toolbar button, the user can select a region of 

interest and subsequently, by clicking the right mouse button, the user can search for annotated 

genes within the specified genomic coordinates. 

28

2.3.6 Analysis of data from a single assay type 

The first, and most basic, level of analysis is from a single assay type. For array CGH, multiple 

options for segmentation algorithms are available within the program and results from externally 

run segmentation can be imported as well. However, each segmentation algorithm has its 

advantages and disadvantages depending on the type of data used and the quality of data at 

hand. A unique feature of SIGMA 2 is the ability to take a consensus of multiple algorithms using 

"And" or "Or" logic between algorithms. Moreover, a level of consensus can be specified 

(Figure 2.5a). For example, if an experiment is analyzed using five approaches, the user can 

select areas of gain and loss which were detected by one algorithm, at least three algorithms, 

all five algorithms, etc. For LOH, basic analysis using the number of consecutive markers that 

exhibit LOH is used to determine its status. Affinity-based approaches for DNA methylation and 

histone modification states or bead-based percentage of CpG island methylation is analyzed by 

either direct thresholding or z-transform thresholding. For any of the different assay types, 

when examining across a number of samples, a frequency of alteration can be calculated and 

plotted. 

For data from different array platforms, but assaying the same biological measurement, the 

algorithm for integration is used to derive common data. This feature is most applicable to DNA 

copy number data due to the number of array CGH platforms. This allows for better utilization 

of publicly available data and thus, increasing sample size for statistical analysis. Similar to the 

multiple sample analysis of data on the sample platform, a frequency of altered states can be 

generated and plotted. Figure 5A shows the concerted analysis of a sample profiled on the 

Affymetrix 500K SNP array, Agilent 244K CGH array and the whole genome tiling path BAC 

array (Figure 2.5b). 

29

2.3.7 Analysis of data from multiple assays in a given 'omics dimension 

Within a given 'omics dimension, multiple assay types can be analyzed in combination. For 

example, it is useful to investigate copy number and LOH and the interplay between DNA 

methylation and different states of histone modification. Typically, in regions of copy number 

loss, LOH is also observed. However, LOH can also occur in regions which are copy number 

neutral, indicating a change in allelic status which is not interpretable by one dimension alone. 

Here, we show a sample for which copy number and LOH information exists, a region of copy 

number loss associated with LOH (Figure 2.6). In terms of epigenetics, DNA methylation and 

states of histone methylation and acetylation have been known to be biologically relevant. With 

high throughput technologies available to assay these dimensions, this type of analysis will 

become more prevalent. 

2.3.8 Combinatorial analysis of multiple 'omics dimensions - gene dosage and gene 

expression 

The most common analysis of multiple 'omics dimensions is the influence of the genome on the 

transcriptome. A number of software packages have started to incorporate approaches to 

examining gene dosage and gene expression [8, 9, 27]. In SIGMA 2 , there are multiple 

functionalities which allow the user to link DNA copy number to gene expression. For a single 

group of samples, with matching DNA copy number and gene expression profiles, the user can 

determine associations through two main options: a) using a correlation-based approach, 

correlating the log ratios with the normalized gene expression intensities and b) using a 

statistical-based approach comparing the expression in samples with copy number changes 

against those without copy number change utilizing the Mann Whitney U test, analogous to 

approaches taken in previous studies [27]. Spearman, Kendall or Pearson correlation 

coefficients can be calculated for option a). Similarly, this functionality is also available for 

correlating epigenetic profiles and gene expression. 

30

In addition to single group analysis, two-dimensional genome/transcriptome analysis can be 

applied to two-group comparison analysis. For example, if patterns of copy number alterations 

are compared between two groups and a particular region is more frequently gained in one 

group than another, the expression data can subsequently compared between the groups of 

sample to determine if there is an association between gene dosage and gene expression. 

That is, we would expect the group with more frequent copy number gain to have higher 

expression than the other group. Notably, this functionality does not require both copy number 

and expression data to exist for the same sample, but allows the user to select an independent 

dataset for expression data comparisons (Figure 2.7). 

2.3.9 Group comparison analysis - single ‘omics dimension 

Finally, for two groups of samples, the user can compare the distribution of changes between 

two groups to determine if the patterns are statistically different using a Fisher's Exact test. For 

DNA copy number, it is the distribution of gain and losses; for DNA methylation or histone 

modification states, the proportion of samples that meet the threshold of enrichment for each 

group (low or high); and for LOH, proportion of samples with LOH for a region for each group. 

2.3.10 Group comparison analysis - integrating multiple 'omics dimensions 

This type of analysis can be performed with a single sample or multiple samples, thus allowing 

combinatorial (“and”) analysis for large datasets. In addition, the user can also identify "or" 

events, where a change in any of the dimensions can be flagged. This is more important in 

multi-sample datasets as one dimension may not capture complex alterations of a particular 

region. 

2.3.11 Multi-dimensional analysis of a breast cancer genome 

Using the breast cancer cell line HCC2218, we show the integration of genomic, epigenomic, 

and transcriptomic data. Interestingly, when we examine the ERBB2 gene on chromosome 17, 

we show concurrent amplification, LOH, loss of methylation and drastic increase in gene 

31

expression (Figure 2.8). ERBB2 has shown to be an important gene in breast cancer 

development and therapeutic intervention. This demonstrates the value in integrating multiple 

dimensions to understand complex alteration patterns in disease samples where multiple 

causes can lead to a single effect. 

2.3.12 Exporting data and results 

High resolution images can be exported for all types of visualizations in SIGMA 2 . Histogram 

plots of gene expression, heatmaps with clustering of gene expression, karyogram plots and 

frequency histogram plots are the main types of visualization available. Frequency histogram 

data which is used to generate the plots can also be exported. Integrated plots with data plotted 

serially or overlaid are also available for analysis involving multiple genomic and epigenomic 

dimensions. Genes which are obtained from the conjunctive (And) and disjunctive (Or) multi- 

dimensional analysis can be exported with their status. Results of statistical analysis such as 

Fisher's exact comparisons and U-test comparisons of gene expression can be exported 

against annotate gene lists based on user-specified human genome builds. Currently, April 

2003 (hg15), May 2004 (hg17) and March 2006 (hg18) are the available genome builds [18]. 

As new builds are released, support for those builds will be available. Finally, data from multi- 

platform integration can be exported based on based pair position for additional external 

analysis if necessary. 

2.4 Conclusions 

With the increase in high-throughput data covering multiple dimensions of the genome, 

epigenome and transcriptome, the approaches and tools to analyze this data must advance 

accordingly to handle, analyze and interpret this data in an integrated manner. SIGMA 2 meets 

these requirements and provides the framework for the incorporation of data from future 

approaches and technologies. Specifically, with the movement from array to sequence-based 

32

technologies, the ability to assimilate sequence data with the various 'omics data sets will 

become a future requirement of software packages. 

2.5 Availability and requirements 

Project name: SIGMA 2 

Operating system(s): Java SE V.1.6+, R Project V.2.5+, Windows XP or Vista 

License: Free for academic and research use; commercial users please contact 

33

Figure 2.1 

R 

-Segmentation 

-Statistical analysis 

RMySQL 

MySQL 

-Data storage 

-Querying 

SIGMA 2 

JGR / JRI 

JDBC 

Java 

-User interface 

-Visualization 

34 

Link to 

external 

resources 

Biological Databases 

• PubMed 

• OMIM 

• NCBI Gene 

• UCSC Genome Browser 

• GEO Profiles 

• Database of Genomic Variants 

Figure 2.1. Main structural components of SIGMA2. Data and genome mapping information 

is stored in the MySQL database. Segmentation analysis using DNACopy and 

GLAD and statistical analysis is performed using R, with results stored in database. Java 

was used to program the application, specifically for the user interface and the different 

types of visualization. Base-pair positions and gene annotations are linked to other biological 

databases to facilitate further interrogation by the user.

Figure 2.2 

a 

Omics 

Assay 

Platform 

b 

Combinatorial Integration 

Genome Epigenome Transcriptome 

DNA Copy Number Allelic Imbalance (LOH) DNA Methylation Histone 

modification 

BAC 

array CGH 

Single sample 

Oligo 

array CGH 

Multiple samples (one group) 

Multiple samples (two groups) 

SNP Microsatellite 

Arrays markers 

Segmentation analysis for array CGH to identify regions of gain 

A 

and loss 

Moving average thresholding for affinity based approaches 

B (MeDIP for DNA methylation, ChIP-on-chip for histone 

modification states) 

C Regions of loss of heterozygosity (LOH) 

D Regions of copy number change and LOH 

E Regions of copy number neutrality and LOH (e.g. UPD) 

F Regions of copy number AND methylation alteration ("two" hit) 

Regions of copy number OR methylation alteration 

G 

(compensatory change with same net effect) 

Epigenetic interplay between DNA methylation and various 

H 

modification states of histones 

Correlation of copy number and gene expression (dataset with 

I 

matched copy number and expression profiles) 

Statistical comparison of samples with copy number change 

J versus without copy number change (dataset with matched copy 

number and expression profiles) using Mann Whitney U-test 

MeDIP - Bisulphite- 

array CGH based 

methods 

Single Platform / 

Single Assay 

35 

ChIP-onchip 

Single ‘omics 

(Multiple assays) 

A,B,C,Q,R,S D,E,H 

A,B,C,L,Q,R,S 

A,B,C,L,M,Q,R,S 

D,E,H 

D,E,H 

Gene & MicroRNA 

Expression 

SAGE Microarrays 

Combinatorial Integration 

(Multiple ‘omics) 

F,G,O,P 

F,G,I,J,K,O,P 

F,G,I,J,K,N,O,P 

Correlation of DNA methylation and gene expression (dataset 

K 

with matched DNA methylation and expression profiles) 

Identify recurrent changes (copy number alterations, common 

L 

enrichment patterns [MeDIP, ChIP], regions of LOH) 

Statistical comparison of patters of recurrent changes between 

M 

two groups using Fisher's exact test 

Two-dimensional two-group comparisons (statistical comparison 

N of expression profiles of genes in regions of difference identified 

by Fisher's exact comparison) 

Identify "And" events between three or more DNA-based 

O dimensions (copy number, LOH, DNA methylation, histone 


Identify "Or" events between three or more DNA-based 

P dimensions (copy number, LOH, DNA methylation, histone 


Q Cancer gene discovery 

R Lists of genes for systems/function/pathway analysis 

Linking to public biological databases (PubMed, NCBI Gene, 

S OMIM, NCBI GEO Profiles, UCSC Genome Browser, Database 

of Genomic Variants) 

Figure 2.2. Data structure hierarchy. (a) Data hierarchy describing the relationship 

between platforms, assays and 'omics disciplines. (b) Functionality map of SIGMA2. List of 

the various functions and the output from that function that can be performed given the 

number of samples or sample groups and dimensions. Multiple sample analysis (single 

group and two group) are microarray platform independent. Functions listed in boxes are in 

addition to those listed in the box preceding the arrows.

Figure 2.3 

numSamples

37 

Figure 2.4 

b 

d 

c 

e 

a 

Search for genes, 

link to databases 

Figure 2.4. SIGMA2 interface. Description of the SIGMA2 user interface using a single sample visualization as an 

example. (a) Customizable toolbar with shortcut buttons, (b) Project/Analysis tree to track work within and between 

sessions, (c) Main display area using tab-based navigation, (d) Information console and (e) Genome features tracks. Here, a 

copy number change is displayed in the context of CpG islands (red), microRNAs (orange) and regions annotated in the 

database of genomic variants (blue).

Figure 2.5. Consensus calling and heterogeneous array analysis. (a) Consensus calling 

using multiple algorithms. Multiple algorithms (and different parameters) can be selected to 

analyze a given array CGH sample and this can be defined for each array platform 

independently as each platform may have exhibit different noise and ratio response 

characteristics. (b) Heterogeneous array analysis using data from multiple array CGH 

platforms. Sample from the Agilent 244K, Affymetrix SNP 500K and whole genome BAC array 

were segmented to define areas of gain and loss. Subsequently, the results were aggregated 

into a frequency histogram plot showing the common areas of gain and loss across the three 

samples. 

38

Figure 2.5 

a 

b 

A�ymetrix 

SNP 500K 

Agilent 

244K CGH 

39 

BCCA 

WGTP 32K

Figure 2.6 

HCC2218 HCC2218 HCC2218BL 

Copy Number 

Figure 2.6. Integrative genetic analysis of HCC2218. Parallel visualization and analysis 

of the copy number and genotype profiles of the breast cancer cell line HCC2218. Genotype 

profile of the matching normal blood lymphoblast line (HCC2218BL) is also provided to 

define regions of LOH. DNA copy number profile was generated on the BCCA whole 

genome tiling path BAC array and genotype profiles are from the Affymetrix SNP 10K array 

{Zhao, 2004 #38}. This region of chromosome arm 3q has a defined segmental copy 

number loss and the boundary of the change is evident from the LOH profile. In the genotype 

profile, the horizontal blue lines indicate a SNP transition from heterozygous in normal 

to homozygous in the tumor, indicating LOH. 

40 

LOH

Figure 2.7 

a b 

NSCLC SCLC 

c 

Figure 2.7. Two-group two dimensional comparison of 37 NSCLC and 16 SCLC 

cancer cell lines. First, segmentation analysis is performed to delineate gains and losses 

in each sample. Next, a statistical comparison of the distribution of gains and losses 

between the two groups is done using the Fisher’s exact test. (a) Using the interactive 

search, one of the regions of difference identified is on chromosome 7, with a NSCLC and 

SCLC sample aligned next to each other. The NSCLC has a clear segmental gain of that 

region, with the SCLC not having the gain. The right-most graph is a frequency plot summary 

of two sample sets (NSCLC and SCLC). NSCLC is color-coded in red while SCLC in 

green, and the overlap appears in yellow. The frequency of chromosome arm 7p gain is 

higher in the red group. (b) A heatmap is shown representing 15 NSCLC and 15 SCLC gene 

expression profiles, of the specific genes in the region highlighted in yellow. (c) When 

examining gene expression data of EGFR specifically, a gene in this region, we can see that 

the expression is drastically higher in NSCLC vs. SCLC, as predicted by the higher 

frequency of gain in NSCLC vs. SCLC of that region. Gene expression data are represented 

as log2 of the normalized intensities. 

41

Figure 2.8. Multi-dimensional perspective of chromosome 17 of the HCC2218 breast 

cancer cell line. Copy number, LOH, and DNA methylation, and profiling identifies an 

amplification of ERBB2 coinciding with allelic imbalance and loss of methylation. When 

examining the gene expression, the expression of HCC2218 is significantly higher than a panel 

of normal luminal and myoepithelial cell lines [28]. 

42

43 

Figure 2.8 

DNA Copy Number Allelic imbalance (LOH) DNA Methylation 

HCC2218 Luminal Myoepithelial

Table 2.1. Features required for integrative analysis 

Features required for integrative 

analysis 

Nexus CGH 

44 

CGH Fusion 

ISA-CGH 

VAMP 

*CGH 

Analytics 

Built-in segmentation for array CGH � � � � � � � 

Consensus calling using multiple 

segmentation algorithms 

� 

Array platform-independent 

combined CGH analysis 

� � 

� 

Custom microarray data handling � � � � � � � 

Basic copy number and expression 

integration 

� � � 

� 

Alignment and analysis of genetic 

and epigenetic data 

Multi-dimensional visualization of 

� � � 

genetic, epigenetic and gene 

expression data 

� 

Two group statistical comparison 

Two group combinatorial gene 

� � � � 

dosage and gene expression 

comparison 

� 

Linking to external biological 

databases 

� � � � � � � � 

Linking to external gene expression 

(GEOProfiles) 

� 

Context-based visualization of 

genome features 

� � � � � 

Conversion of data between 

different genome assemblies 

� � � � 

Free for academic/research use � � � � � � 

MD-SeeGH 

SIGMA 

SIGMA 2

Table 2.2. Summary of Input, analysis, output for each dimension 

'Omics 

classification 

Assay(s) 

measured 

Genomics Copy number Array CGH 

Input Functionality*** Output 

Segmentation 

Direct thresholding 

Moving average-based thresholding 

Z-transformation of moving average 

Whole genome visualization 

45 

Regions of gain and loss 

Gene lists for further 

analysis 

High-resolution karyogram 

images 

Frequency histograms 

Genomics LOH SNPs* LOH based on consecutive altered markers Regions of LOH 

Genomics LOH 

Microsatellite 

markers 

Same as above Same as above 

Genomics 

Epigenomics 

Epigenomics 

Epigenomics 

Epigenomics 

Copy number, 

LOH 

DNA 

methylation 

DNA 

methylation 

Histone 

modification 

states 

DNA 

methylation, 

Histone 

modification 

states 

Transcriptomics Gene 

expression** 

Transcriptomics Gene 

expression** 

Genomics, 

Transcriptomics 

Genomics, 

Epigenomics 

Genomics, 

Epigenomics 

Genomics, 

Epigenomics, 

Transcriptomics 

Copy number, 

Gene 

expression 

Copy number, 

DNA 

methylation 

LOH, DNA 

methylation 

Copy number, 

LOH, DNA 

methylation, 

Histone 

modification 

Gene Expression 

MeDIP + 

array CGH 

Bilsulphitebased 

ChIP-on-chip 

Microarrays 

SAGE 

Identify regions of uniparental disomy 

(UPD): LOH with no copy number change 




Visualization against genome position 

Thresholding of proportion of methylated 

CpG’s 




Epigenetic interplay 

Heatmap visualization, clustering 

Histograms 

Statistical comparisons 

Heatmap visualization, clustering 

Histograms 

Statistical comparisons 

Correlation analysis of copy number and 

expression 

Statistical comparison of expression in 

regions of copy number difference (two 

group analysis) 

Identify regions of concerted change in 

BOTH copy number and methylation ("twohit") 

Identify regions with change in copy number 

OR DNA methylation 

Identify allele-specific methylation events 

Identify co-ordinate genetic, epigenetic and 

gene expression changes 

Regions of enrichment and 

lack of methylation 


analysis 

Regions of enrichment and 

lack of enrichment 


analysis 

Regions of mutually 

exclusive change between 

chromatin state and DNA 

methylation 

Expression of genes of 

interested based on DNA 

analysis 

Expression of genes of 

interested based on DNA 

analysis 

Genes whose expression 

is strongly regulatd by copy 

number 

p-values for associations 

p-values for group 

comparison 

Regions of allele specific 

aberrant methylation 

Genes altered at multiple 

levels


1. Garnis C, Buys TP, Lam WL: Genetic alteration and gene expression modulation 

during cancer progression. Mol Cancer 2004, 3:9. 

2. Ishkanian AS, Malloff CA, Watson SK, DeLeeuw RJ, Chi B, Coe BP, Snijders A, 

Albertson DG, Pinkel D, Marra MA et al: A tiling resolution DNA microarray with 

complete coverage of the human genome. Nat Genet 2004, 36(3):299-303. 

3. Khulan B, Thompson RF, Ye K, Fazzari MJ, Suzuki M, Stasiek E, Figueroa ME, Glass 

JL, Chen Q, Montagna C et al: Comparative isoschizomer profiling of cytosine 

methylation: the HELP assay. Genome Res 2006, 16(8):1046-1055. 



J Hum Genet 2006, 14(2):139-148. 

5. Rauch T, Li H, Wu X, Pfeifer GP: MIRA-assisted microarray analysis, a new 

technology for the determination of DNA methylation patterns, identifies frequent 

methylation of homeodomain-containing genes in lung cancer cells. Cancer Res 

2006, 66(16):7939-7947. 




37(8):853-862. 




8. Conde L, Montaner D, Burguet-Castell J, Tarraga J, Medina I, Al-Shahrour F, Dopazo J: 

ISACGH: a web-based environment for the analysis of Array CGH and gene 

expression which includes functional profiling. Nucleic Acids Res 2007, 35(Web 

Server issue):W81-85. 

9. La Rosa P, Viara E, Hupe P, Pierron G, Liva S, Neuvial P, Brito I, Lair S, Servant N, 

Robine N et al: VAMP: visualization and analysis of array-CGH, transcriptome and 

other molecular profiles. Bioinformatics 2006, 22(17):2066-2073. 

10. Chi B, deLeeuw RJ, Coe BP, Ng RT, MacAulay C, Lam WL: MD-SeeGH: a platform for 

integrative analysis of multi-dimensional genomic data. BMC Bioinformatics 2008, 

9:243. 

11. Carrasco DR, Tonon G, Huang Y, Zhang Y, Sinha R, Feng B, Stewart JP, Zhan F, 

Khatry D, Protopopova M et al: High-resolution genomic profiles define distinct 

clinico-pathogenetic subgroups of multiple myeloma patients. Cancer Cell 2006, 

9(4):313-325. 

12. Chin K, DeVries S, Fridlyand J, Spellman PT, Roydasgupta R, Kuo WL, Lapuk A, Neve 

RM, Qian Z, Ryder T et al: Genomic and transcriptional aberrations linked to breast 

cancer pathophysiologies. Cancer Cell 2006, 10(6):529-541. 

13. Coe BP, Lockwood WW, Girard L, Chari R, Macaulay C, Lam S, Gazdar AF, Minna JD, 

Lam WL: Differential disruption of cell cycle pathways in small cell and non-small 

cell lung cancer. Br J Cancer 2006, 94(12):1927-1935. 



lung and other cancers. Oncogene 2008. 

15. Neve RM, Chin K, Fridlyand J, Yeh J, Baehner FL, Fevr T, Clark L, Bayani N, Coppe JP, 

Tong F et al: A collection of breast cancer cell lines for the study of functionally 

distinct cancer subtypes. Cancer Cell 2006, 10(6):515-527. 

46

16. Stransky N, Vallot C, Reyal F, Bernard-Pierrot I, de Medina SG, Segraves R, de Rycke 

Y, Elvin P, Cassidy A, Spraggon C et al: Regional copy number-independent 

deregulation of transcription in cancer. Nat Genet 2006, 38(12):1386-1396. 

17. Sanders MA, Verhaak RG, Geertsma-Kleinekoort WM, Abbas S, Horsman S, van der 

Spek PJ, Lowenberg B, Valk PJ: SNPExpress: integrated visualization of genomewide 

genotypes, copy numbers and gene expression levels. BMC Genomics 2008, 

9:41. 

18. Karolchik D, Kuhn RM, Baertsch R, Barber GP, Clawson H, Diekhans M, Giardine B, 

Harte RA, Hinrichs AS, Hsu F et al: The UCSC Genome Browser Database: 2008 

update. Nucleic Acids Res 2008, 36(Database issue):D773-779. 

19. Khojasteh M, Lam WL, Ward RK, MacAulay C: A stepwise framework for the 

normalization of array CGH data. BMC Bioinformatics 2005, 6:274. 

20. Neuvial P, Hupe P, Brito I, Liva S, Manie E, Brennetot C, Radvanyi F, Aurias A, Barillot 

E: Spatial normalization of array-CGH data. BMC Bioinformatics 2006, 7:264. 

21. Hupe P, Stransky N, Thiery JP, Radvanyi F, Barillot E: Analysis of array CGH data: 

from signal ratio to gain and loss of DNA regions. Bioinformatics 2004, 20(18):3413- 

3422. 



23. Nannya Y, Sanada M, Nakazaki K, Hosoya N, Wang L, Hangaishi A, Kurokawa M, 

Chiba S, Bailey DK, Kennedy GC et al: A robust algorithm for copy number 

detection using high-density oligonucleotide single nucleotide polymorphism 

genotyping arrays. Cancer Res 2005, 65(14):6071-6079. 

24. Ballestar E, Paz MF, Valle L, Wei S, Fraga MF, Espada J, Cigudosa JC, Huang TH, 

Esteller M: Methyl-CpG binding proteins identify novel sites of epigenetic 

inactivation in human cancer. EMBO J 2003, 22(23):6335-6345. 






27. van Wieringen WN, Belien JA, Vosse SJ, Achame EM, Ylstra B: ACE-it: a tool for 

genome-wide integration of gene dosage and RNA expression data. Bioinformatics 

2006, 22(15):1919-1920. 

28. Grigoriadis A, Mackay A, Reis-Filho JS, Steele D, Iseli C, Stevenson BJ, Jongeneel CV, 

Valgeirsson H, Fenwick K, Iravani M et al: Establishment of the epithelial-specific 

transcriptome of normal and malignant human breast cells based on MPSS and 

array expression data. Breast Cancer Res 2006, 8(5):R56. 

47

Chapter 3: An integrative multi-dimensional genetic and 

epigenetic strategy to identify aberrant genes and pathways 

in cancer 2 

2 A version of this chapter has been published. Chari R, Coe BP, Vucic EA, Lockwood WW, 

Lam WL. (2010) An integrative multi-dimensional genetic and epigenetic strategy to identify 

aberrant genes and pathways in cancer. BMC Systems Biology, 4(1):67, 1-14. Please see the 

published version of this chapter for all supplementary materials. 

48

3.1 Background 

Genomic analyses have substantially improved our knowledge of cancer. Gene expression 

profiling, for example, is utilized to delineate subtypes of breast cancer, and has facilitated the 

derivation of predictive and prognostic signatures [1-5]. However, not all of the gene expression 

changes observed are causal to cancer development, and global gene expression analysis 

alone cannot distinguish between causal and reactive changes. Corresponding alteration at the 

DNA level is regarded as evidence of causality; for example, gene deletion or gene silencing by 

methylation. Hence, examining genetic and epigenetic events in conjunction with the changes 

in gene expression pattern should improve the identification of causal changes that lead to 

disease phenotype. 

Analysis of gene copy number alone has correlated breast cancer genome features with poor 

prognosis based on the degree of genomic instability observed [6]. In terms of gene discovery, 

specific genomic regions containing important loci have been shown to be frequently gained or 

lost [7-11]. Integrative analyses of gene dosage and gene expression in breast cancer have 

revealed specific genes which are deregulated at the gene expression level as a result of 

changes in DNA copy number. From a global perspective, studies have shown a broad range 

in concordance between DNA amplification and overexpression of genes. This variability is 

attributable to the sensitivity of the methods used in detecting gene copy number and gene 

expression changes as well as the number of genes examined [12-15]. Conversely, when 

examining gene overexpression, it was found that only 10.5% of the overexpression could be 

attributable to gene amplification [14]. It is certain that altered gene expression can not only be 

attributed to disruption of regulatory/signaling cascades and downstream effects, but also to a 

multitude of causal genetic and epigenetic aberrations. 

We reason that by examining multiple genomic dimensions simultaneously, with a dimension 

representing a genome wide assay measuring DNA level alterations such as gene copy number 

or DNA methylation, we are likely to achieve the following: (i) explain a greater fraction of the 

49

observed gene expression deregulation as compared with explaining expression deregulation 

using only a single dimension, (ii) improve the discovery of critical oncogenes and tumor 

suppressor genes (TSGs) by focusing on those genes altered simultaneously at multiple 

genomic dimensions, and (iii) begin to understand the complex mechanisms of dysregulation of 

oncogenic pathways. In this study, we demonstrate the power of an integrative genomics 

approach by performing multi-dimensional analyses (MDA) of the genome, epigenome, and 

transcriptome of breast cancer cell lines. We illustrate and demonstrate the need for integrative 

analysis of multiple genomic dimensions by showing the co-operative contribution of DNA 

mechanisms to explaining differential gene expression. Using a strategy to identify genes 

exhibiting congruent alteration in copy number, DNA methylation, and allelic (or loss of 

heterozygosity, LOH) status, which we term multiple concerted disruption (MCD) analysis, we 

find genes representing key nodes in pathways as well as genes which exhibit prognostic 

significance. In examining the neuregulin pathway, we observe the variability among samples 

in the mechanism of dysregulation of this commonly altered breast cancer pathway, highlighting 

the importance of multi-dimensional correlative analysis of a given pathway in individual tumor 

samples -- in addition to the conventional approach of identifying loci simply based on frequency 

of disruption in a cohort. Finally, examining the subset of triple negative breast cancer cell 

(TNBC) lines, we show that a downstream target of FGFR2, a recently implicated oncogene in 

TNBC, COL1A1 is frequently affected by MCD even though in FGFR2 itself is rarely affected. 

Notably, this is the first such in-depth genomic, epigenomic, and transcriptomic analyses of 

breast cancer. 

3.2 Methods 

3.2.1 Data generation and acquisition 

Commonly used breast cancer (HCC38, HCC1008, HCC1143, HCC1395, HCC1599, HCC1937, 

HCC2218, BT474, MCF-7) and non-cancer (MCF10A) cell lines were selected for analyses 

(Additional File 1 or Appendix II). Copy number profiles were obtained from the SIGMA 

50

database [11, 16]. These profiles were generated using a whole genome tiling path microarray 

CGH platform [17, 18]. Expression profiles for BT474 and MCF-7 were obtained from the NCI 

Cancer Biomedical Informatics Grid (caBIG, https://cabig.nci.nih.gov), MCF10A profile from 

GEO (GSM254525), and the rest were generated using Affymetrix U133 Plus 2.0 platform at the 

McGill University and Genome Quebec Innovation Centre. Affymetrix 500K SNP array data 

were obtained from caBIG. DNA methylation profiles were generated using the Illumina 

Infinium methylation platform at the Genomics Lab, Wellcome Trust Centre for Human 

Genetics. A summary of the sources of all the data used is provided in Additional File 2 or 

Appendix III. Gene expression and methylation data generated were deposited in NCBI GEO 

(GSE17768 and GSE17769). 

3.2.2 Data processing and normalization 

Array CGH data were normalized using a stepwise normalization framework [19]. In addition, 

data were filtered based on a stringent standard deviation cut-off of 0.075 between replicate 

spots, with those exceeding this cut-off excluded from further analysis. To identify regions of 

gain and loss, smoothing and segmentation analysis was performed using aCGH-Smooth [20] 

as previously described [21]. Copy number status for clones which were filtered from above 

were inferred using neighboring clones within a 1 Mb window. 

Affymetrix SNP array data were normalized and genotyped using the "oligo" package in R, 

specifically using the crlmm algorithm for genotyping [22]. Genotype calls whose confidences 

were less than 0.95 were termed "No Call" (NC). Subsequently, genotype profiles were 

analyzed using dChip [23] and LOH was determined using a panel of 60 normal genotypes from 

the HapMap dataset [24] as provided by dChip, as matching blood lymphoblast profiles were 

not available. LOH ("L"), Retention ("R"), and No Call ("N") status was determined for every 

marker in each sample. Analysis parameters used were as specified in the dChip manual. 

51

Raw gene expression profiles from all ten cell lines were normalized using the "rma" package in 

R (Additional File 3). Gene expression data were further filtered using the Affymetrix MAS 5.0 

Call values ("P","M", and "A"). Since the comparison of differential expression was one cancer 

line to one normal, both call values could not be "Absent" in order to be retained for analysis. 

Methylation data were normalized and processed using Illumina BeadStudio software 

(http://www.illumina.com/software/genomestudio_software.ilmn, Illumina, Inc., San Diego, CA, 

USA). Beta-values and confidence p-values were retained for further analysis. Beta-values 

with associated confidence p-values > 0.05 were excluded. Data from all genomic dimensions 

were mapped to the hg18 (March 2006) genome assembly. 

3.2.3 Strategy for integrative analysis 

Copy number and LOH profiles were mapped to genes using the mapping of the Affymetrix 

U133 Plus 2.0 platform as well as the UCSC Genome Browser [25]. Methylation data were 

linked to the other three types of data using either the RefSeq gene symbol as specified by the 

Illumina mapping file (Illumina), or the RefSeq accession number. Differential expression was 

determined by subtracting the expression value in the non-malignant line MCF10A from the 

value in each cancer line. Since the obtained gene expression values after RMA normalization 

were represented in log2 space, a gene was considered differentially expressed if the difference 

between the cancer line and MCF10A was greater than 1, which corresponded to a two-fold 

expression difference. DNA methylation status was determined by subtracting beta-values, with 

hypermethylation defined as a positive difference between tumor and normal (≥ 0.25) and 

hypomethylation defined as a negative difference between tumor and normal (≤ -0.25). Briefly, 

a beta value for a given CpG site ranges from 0 to 1 and represents the ratio of the methylated 

signal over the total signal (methylated plus unmethylated signal). These thresholds are 

comparable to those used in previous studies using an earlier Illumina methylation platform [26]. 

Using this mapping strategy, 12,910 unique genes were mapped across platforms 

corresponding to 24,708 of the ~27,000 Illumina Infinium probes and to 27,053 probes of the 

52

Affymetrix U133 Plus 2.0 platform. Visualization of multi-dimensional data was performed using 

the SIGMA2 software [27]. 

To determine the genetic events that caused (or could explain) gene expression status, we first 

identified a set of overexpressed and underexpressed genes for each cell line sample relative to 

MCF10A based on differential expression criteria mentioned above. Each cancer sample may 

have a different number of differentially expressed genes. Second, for each differentially 

expressed gene in each sample, we examined the copy number status, methylation status, and 

allelic status. A differential expression was considered "explained" when the observed 

expression change matched the expected change at the DNA level. If a gene was 

overexpressed, the causal copy number status would be a gain, DNA methylation status would 

be hypomethylation, or allelic status would be allelic imbalance. Conversely, if a gene was 

underexpressed, the causal copy number status would be a loss, DNA methylation status would 

be hypermethylation, or allelic status would be LOH. From this point forward, when a change in 

allele status with overexpression is discussed, it will be denoted as allelic imbalance (AI). 

Conversely, for underexpression, a change in allele status will be denoted as loss of 

heterozygosity (LOH). While changes in methylation or changes in gene dosage leading to 

differential expression are more commonly discussed, previous studies have shown that 

changes in allele status without change in copy number (copy neutral AI or LOH) can also lead 

to differential gene expression due to preferential allelic expression [28-30]. 

3.2.4 Multiple concerted disruption (MCD) analysis 

To determine what are likely key nodes in pathways and functions, we hypothesize that, in 

addition to being altered frequently (by one mechanism or multiple mechanisms), these genes 

also exhibit multiple concerted disruption (MCD) in a given sample. That is, a congruent 

change in gene copy number (gain or loss) accompanied by allelic imbalance and change in 

DNA methylation (hypomethylation or hypermethylation) resulting in a change in gene 

expression (over or underexpression). Moreover, the MCD events would be used as a similar 

53

screening approach to gene amplifications (multi-copy increases) or homozygous deletions 

whereby the expectation is that these events would occur at a lower frequency than disruptions 

through one mechanism alone and observation of these events would signify importance to the 

genes in question. 

In this study, the MCD strategy can be broken down into four sequential steps. First, using a 

pre-defined frequency threshold, we identify a set of the most frequently differentially expressed 

genes. Second, we identify the most frequently differentially expressed genes from step 1 

whose expression change is frequently associated with concerted change in at least one DNA 

dimension (either DNA copy number, DNA methylation or allelic status) within the same sample. 

Next, we further refine this subset of genes from step 2 by selecting those having concerted 

change in all dimensions in the same sample which we term as MCD. Finally, we introduce an 

additional level of stringency by requiring a minimum frequency of MCD in the given cohort. At 

the end of the process, we identify a small subset of genes which exhibit disruption through 

multiple mechanisms and show consequential change in gene expression. 

3.2.5 Simulated data analysis 

Using the status of DNA alteration and expression for every gene in every sample, data within 

each sample were shuffled and randomized ten times to create ten simulated datasets. Each 

dataset was analyzed for overall disruption frequency and MCD and all results were then 

aggregated to determine the frequency distribution of different thresholds observed in the 

randomized data analysis. 

3.2.6 Pathway enrichment analysis 

For pathway analysis, Ingenuity Pathway Analysis software was used (Ingenuity Systems, CA, 

USA). Specifically, the core and comparison analyses were used, with focus on canonical 

signaling pathways. Briefly, for a given function or pathway, statistical significance of pathway 

enrichment is calculated using a right-tailed Fisher's exact test based on the number of genes 

54

annotated, number of genes represented in the input dataset, and the total number of genes 

being assessed in the experiment. A pathway was deemed significant if the p-value of 

enrichment was ≤ 0.05 (adjusted for multiple comparisons using a Benjamini-Hochberg 

correction). 

3.2.6 Survival and differential gene expression analysis in publicly available datasets 

For survival analysis, Kaplan-Meier analysis was performed using the statistical toolbox in 

Matlab (Mathworks). For each gene, the expression data were sorted from lowest to highest 

expression across the sample set and survival times were compared between the top 1/3 and 

bottom 1/3 of the samples. Two publicly available gene expression microarray datasets with 

survival data were utilized for this analysis [4, 31]. For the Sorlie et al dataset, individuals whose 

cause of death was not breast cancer were excluded from the analysis and missing data due to 

quality control issues were filled using the knn method in the “impute” package in Bioconductor 

[32]. Of the 23 genes selected by our MCD analysis (see Results), 17 were represented in 

either dataset. Survival distributions were compared using a log rank test and two-tailed p- 

values unadjusted for multiple comparisons were reported. 

Subsequently, these 17 genes were further evaluated for differential expression in publicly 

available expression datasets of clinical breast cancer samples using the Oncomine database 

[33]. 

3.3 Results and discussion 

3.3.1 Analysis of individual genomic dimensions 

When examining each genomic dimension alone, we see that many of the common features 

identified are consistent with the current knowledge of breast cancer genomes, for example, 

previously reported chromosomal regions of frequent copy number gain, segmental loss and 

loss of heterozygosity (LOH) / allelic imbalance (AI) (Figure 3.1a) [6, 8, 11, 12, 34]. While 

55

many regions of frequent LOH/AI do overlap with regions of copy number change, others are in 

regions of neutral copy number. Key genes implicated in breast cancer reside in these specific 

regions and are altered expectedly (Figure 3.1b). 

3.3.2 Multi-dimensional analysis (MDA) reveals a higher proportion of intra-sample 

deregulated gene expression can be explained when more dimensions are analyzed 

The impact of integrative, multi-dimensional analysis on gene discovery is observed at two 

levels: (i) within an individual sample as well as (ii) across a set of samples. Within a given 

sample, we see that by sequentially examining more genomic dimensions at the DNA level, i.e. 

gene dosage, allelic status, and DNA methylation, we can explain a higher proportion of the 

differential gene expression changes observed. Interestingly, although this proportion may vary 

between samples, it always increases with every additional dimension examined (Figure 3.2a). 

For example, in HCC1395, a single genomic dimension alone can explain as much as 64.4% of 

overexpression but when using all three DNA based dimensions, whereby gene overexpression 

can be explained by disruption at the DNA level in at least one dimension, as much as 75.7% of 

aberrant overexpression can be explained. Similarly, in HCC1937, an increase from 56.9% to 

74.7% explainable underexpression is observed when moving from one to three genomic 

dimensions respectively. Conversely, in HCC2218, we observe 44% and 36% of 

overexpression and underexpression respectively when using all three DNA dimensions. This 

suggests that the majority of differential expression in sample HCC2218 is most likely a result of 

complex gene-gene trans-regulation and consequently, highlights the individual differences 

between samples. 

3.3.3 MDA reveals genes are disrupted at higher frequencies when examining multiple 

dimensions as compared to any single dimension alone 

When considering across a sample set, we see that analysis of multiple genomic dimensions 

leads to the discovery of more disrupted genes than what would be detected using a single 

dimension of analysis alone. For each identified gene, we gain insight in how multiple 

56

mechanisms are complementary in gene disruption (Figure 3.2b). For example, the tumor 

suppressor gene caspase 1 (CASP1) has been thought to be deactivated through DNA 

hypermethylation in multiple cancer types [35, 36]. The gene is underexpressed in all nine 

cases examined in this study. In a subset of these cases, the observed underexpression can 

be attributed to copy number loss. Interestingly, in the remaining cases, DNA hypermethylation 

and copy neutral LOH are observed. Similarly, in another example, GNAS is differentially 

expressed in all nine cases, with a subset of cases showing concerted copy number change 

while the remaining cases reveal concerted change in DNA methylation. Notably, our 

conclusion is supported by recent studies of glioblastoma, that also showed higher than 

expected disruption frequencies of specific genes when multiple genomic dimensions were 

analyzed [37, 38]. These examples illustrate how deregulated genes can be detected in more 

cases when multiple, but complementary, approaches are used. 

Until very recently, multi-dimensional genomic analysis typically represented the parallel 

examination of gene dosage and gene expression. To demonstrate the power of examining 

multiple dimensions, we examine the frequency of gene expression deregulation explained by 

congruent alteration at the DNA level. Briefly, for each gene, a sample is determined to have a 

DNA explained gene expression change if any of the following criteria are met; gene 

overexpression should be accompanied with either (i) copy number gain, (ii) copy neutral allelic 

imbalance, or (iii) hypomethylation and gene underexpression should be accompanied with 

either (i) copy number loss, (ii) copy neutral LOH, or (iii) hypermethylation. 

To determine an appropriate frequency of disruption threshold, ten random, simulated datasets 

were generated and a distribution plot was generated for all of the observed frequencies from 

0/9 to 9/9 across all simulations (Figure 3.3a). The proportion of observed frequencies ≥ 5/9 

was 0.086 but for ≥ 6/9, the proportion was 0.020. Thus, since the 6/9 threshold was the first 

threshold ≤ 0.05, 6/9 was used for further analysis. Using this threshold, we found that 437 

differentially expressed genes have a corresponding change in gene dosage. Scaling this 

57

approach to examining the whole genome at multiple dimensions, we anticipate identifying more 

disrupted genes. When we added the remaining dimensions to account for differential 

expression, at the same frequency cut-off, we identified the mechanism of disruption for 1162 

deregulated genes (Figure 3.3b, Additional File 4). 

The impact of multi-dimensional integrative analysis on cancer gene discovery is the enhanced 

detection of genes which are disrupted by multiple mechanisms but at lower frequencies for 

individual mechanisms. Collectively, the detection of gene dosage, allelic conversion and 

change in methylation status enable the identification of such genes as frequently disrupted. 

Using the list of 1162 genes, the distributions of alteration frequencies for each genomic 

dimension or combination of dimensions were assessed (Figure 3.4a). Examining the median 

frequencies in each box plot, there is a sequential increase in the median as more dimensions 

are examined. This point can be further validated using specific genes. For example, the CD70 

and ENG genes are underexpressed in the majority of samples. Using copy number analysis 

alone, the observed frequency of disruption (loss and underexpression) is 44% and 22% 

respectively. If we then examine the methylation status, in the remaining cases not explained 

by DNA copy number, we observe an additional 33% of cases exhibiting hypermethylation and 

underexpression for ENG (red) and 22% for CD70 (blue). Finally, when we also examine allelic 

status, we observe an additional 22% of cases with copy neutral LOH and gene 

underexpression for CD70 and 11% for ENG. In total, by using all three dimensions, the 

cumulative frequency of disruption is 88% for CD70 and 77% for ENG (Figure 3.4b). This 

example demonstrates the utility of a multi-dimensional approach to elucidate events which 

would escape conventional single dimensional analysis. 

3.3.4 MDA identifies significantly enriched cancer related pathways 

Using the set of 1162 genes identified by MDA (Additional File 4) and the similar lists of genes 

identified from each of the simulated datasets, pathway analyses were performed with Ingenuity 

Pathway Analysis. From the pathway analysis of MDA genes and focusing only on canonical 

58

signaling pathways, 53 pathways were significantly enriched for at a Benjamini-Hochberg 

corrected p-value of 0.05 (Additional File 5). In contrast, using the gene lists from the 10 

simulated datasets, nine of the 10 pathway analyses yielded no significant pathways enriched 

for at the same p-value with one of the pathway analyses yielding one significant pathway. 

Similar results from Gene Ontology analysis were obtained using the publicly available 

GATHER database [39] (Additional File 6). Specific pathways involved in breast cancer, 

ovarian cancer, and prostate cancer were amongst the ones identified as most significant 

(Figure 3.5). Consequently, these results suggest that the genes identified using MDA have a 

high degree of biological relevance. 

3.3.5 MDA of the Neuregulin signaling pathway reveals a complex pattern of deregulation 

Among the 53 pathways which were statistically over-represented from our list of 1162 genes, 

one of the pathways identified is the neuregulin pathway. This pathway contains the well known 

breast cancer oncogene ERBB2 as well as other genes known to be affected in breast and 

other cancers [40-43]. Examining the components of this pathway, we observe that some are 

genes commonly altered while others are infrequently altered across our sample set by multiple 

patterns of genomic alteration, and some genes which behave oppositely in different samples 

(Figure 3.6). 

While genes such as HRAS (down), BAD (down), HSP90AB1 (up), SOS2 (up) and RPS6KB1 

(up) generally exhibit consistent differential expression with concerted change at the DNA level 

across our sample set, genes such as GRB7, PTEN, and MAP2K1 exhibit both overexpression 

and underexpression, with concerted DNA change, in different samples. For example, if we 

examine PTEN, we observe copy number loss, LOH, DNA hypermethylation and consequent 

underexpression in HCC1395 while HCC1008 contains copy number gain, with DNA 

hypomethylation and consequent overexpression (Figure 3.7). The impact of such a difference 

on a downstream targets was recently shown in a breast cancer study where AKT and mTOR 

phosphorylation were higher in cases with low PTEN expression compared to those with high 

59

PTEN expression [44]. Using this pathway as an example, though average features across a 

sample set are important, those differences between samples in the same pathway may also 

play an important role and thus, may have a consequence on the biology of the tumor. 

3.3.6 Genes exhibiting multiple concerted disruption (MCD) - biological and clinical 

significance 

We have demonstrated that we can identify more disrupted genes in a given sample when 

considering any mechanism of disruption. On the other hand, those genes which exhibit 

multiple concerted disruptions (MCD) across all DNA dimensions -- i.e. overexpression of a 

gene due to increased gene dosage, which led to allelic imbalance, and DNA hypomethylation 

at the same locus relieving regulation -- may likely have strong biological significance. 

Likewise, underexpression due to reduced gene copy number, resulting in LOH, and 

complementary DNA hypermethylation, leading to gene silencing may also be significant. By 

employing multiple dimensions of interrogation, genes exhibiting MCD are captured. 

To determine what frequency of MCD was deemed significant, we performed a similar analysis 

of the 10 simulated datasets from before and assessed the proportion of events at each 

frequency of MCD from 0/9 to 1/9 (Figure 3.8a). It was found that by random chance, a gene 

exhibiting MCD in 1/9 would occur 0.3% of the time. Thus, using this threshold of at least one 

MCD event, 974 genes were identified (Additional File 7). Interestingly, the overlap of the 

MDA list (1162 genes) with the MCD list (974 genes) yielded 375 genes. 

The MCD strategy sequentially refines the roster of target genes with the intent of identifying 

critical genes for tumorigenesis (Additional File 8 or Appendix IV). Such genes which exhibit 

multiple mechanisms of deregulation, for example, may represent important nodes in pathways 

such as hub proteins [45], whereby disruption of the gene has an effect on multiple downstream 

targets or genes with biological and/or clinical relevance. Thus, although these genes may not 

be affected at a high frequency across the sample set, their disruption at multiple levels in 

60

individual samples would signify importance in tumorigenesis. As shown earlier, 375 genes 

identified by both MDA and MCD. If we further employed a criterion of frequent MCD, whereby 

this event occurs in 4/9 of cases (signifying high recurrence), we detect 23 genes (Additional 

File 8 or Appendix IV). Among the 23 genes identified are TUSC3 (8p22), ELK3 (12q23), and 

CCNA1 (13q12.3-q13). 

TUSC3 resides at 8p22, a locus frequently deleted across multiple epithelial cancers [46-49]. 

ELK3 is an ETS domain transcription factor which, in mice, acts as a transcriptional inhibitor in 

the absence of RAS, but is a transcriptional activator in the presence of RAS [50]. Recently, 

ELK3 was shown to be underexpressed in a panel of breast cancer lines as well clinical breast 

tumor specimens [51]. CCNA1 was shown to be hypermethylated in multiple cancer types, 

including breast cancer [52]. 

To validate the relevance of the 23 MCD genes in clinical breast cancer samples, we evaluated 

gene expression levels associated with survival and examined multiple publicly available 

microarray datasets using the Oncomine database [33]. Of these 23 genes, 17 were 

represented in either the van de Vijver et al or Sorlie et al datasets. Interestingly, eight of these 

genes, demonstrated a statistically significant association with patient survival in at least one of 

the two independent datasets (Additional File 9 or Appendix V, Additional File 10 or 

Appendix VI) [4, 31]. Moreover, when comparing the percentage of survival-associated genes 

(8/17, 47.1%) in the MCD gene list with what was expected without pre-selection (27.1%), the 

increased percentage was statistically significant based on the binomial test (p = 0.04131806). 

To further evaluate the clinical significance of these genes, we utilized the Oncomine database 

(Additional File 9 or Appendix V). It should be noted the caveat of the Oncomine analysis is 

that it may not detect all low levels of differential expression. TUSC3 is shown as an example of 

one of the genes whose expression correlates with survival (Additional File 8 or Appendix IV, 

see Methods). Notably, in ovarian cancer, TUSC3, in conjunction with EFA6R, also correlated 

with poor survival [53]. The observations that TUSC3 is altered frequently by multiple 

61

mechanisms at the DNA and RNA level and shows a strong association with patient survival, 

highlight the use of MCD in systematically identifying biologically, and potentially clinically, 

relevant genes. 

3.3.7 Association of genes exhibiting MCD and triple negative breast cancers (TNBC) 

In this study, the majority of samples used (5/9) were of the triple negative subtype of breast 

cancer; a subtype which is estrogen receptor (ER) negative, progesterone receptor (PR) 

negative, and HER2 negative and represents between 10% and 20% of all diagnosed breast 

malignancies [54-57] . Genomic analyses of triple negative breast cancers (TNBCs) have been 

previously performed [58-61] and they revealed a heterogeneous and complex view of this 

breast cancer subtype. A recent study, however, had implicated fibroblast growth factor 

receptor 2 (FGFR2) as novel therapeutic target amplified in TNBCs [57]. Interestingly, from a 

meta-analysis of array CGH data, this gene was found to be amplified in 4% of TNBC cases 

[57]. Thus, we assessed the status of FGFR2 and its downstream targets in our multi- 

dimensional dataset. 

While FGFR2 is not amplified in any of the five TNBC cell lines, all of the five cell lines showed 

overexpression of FGFR2 with one of the cell lines exhibiting a low level gain of a region 

encompassing FGFR2 (HCC1937). From this analysis, within the sample set of TNBC cell 

lines, though FGFR2 is overexpressed, it was not frequently associated with DNA level 

alterations. 

However, examining downstream targets of FGFR2 revealed a striking finding. Using the 

knowledge database of Ingenuity Pathway Analysis, one of the downstream components 

affected at the expression level, which was also on both the MDA (Additional File 4) and MCD 

(Additional File 7) lists, was COL1A1. Remarkably, of the five TNBC cell lines, four exhibited 

DNA alteration associated overexpression of COL1A1 with two lines exhibiting MCD at COL1A1 

and two other lines having DNA copy number associated overexpression. The remaining line 

62

exhibited DNA copy number associated overexpression of FGFR2 (Figure 3.8b). Hence, every 

TNBC line was affected at either FGFR2 or COL1A1 at both the DNA and RNA levels. 

Interestingly, COL1A1 has been shown to be both prognostic and predictive in multiple cancer 

types, including breast cancer [3, 5, 62, 63]. 


In conclusion, we have demonstrated that a multi-dimensional genomic approach is superior to 

analysis of one or two genomic dimensions alone. Each additional genomic dimension 

surveyed increases the amount of aberrant gene expression that can be explained within 

individual samples. As a by-product, when examining across a sample set, multi-dimensional 

genomic analysis can identify relevant genes that may be overlooked due to low frequencies of 

disruption by the individual mechanisms. The increased frequency of gene disruption detected, 

due to the consideration of multiple mechanisms of disruption, could potentially reduce the 

sample size of study cohort needed for gene discovery. 

Secondly, while the increased detection of genes disrupted using multi-dimensional analysis is 

useful for achieving a more comprehensive identification of deregulated pathways and gene 

networks, it also presents a challenge in prioritizing which genes are likely key nodes or hubs in 

the affected pathways and networks. Hence, one way to prioritize is to identify genes with 

evidence of multiple concerted disruption. The Knudson two-hit hypothesis suggests that tumor 

suppressor genes require two allelic hits to disrupt gene function. Bi-allelic alteration, such as 

homozygous deletion, or concerted genetic and epigenetic changes, are well documented 

causal mechanisms of gene disruption. Likewise, hypomethylation and increased gene dosage 

are known mechanisms for gene overexpression. The bi-allelic disruption phenomenon 

(leading to loss or gain of function) provides a means to identify causative genes; hence, 

parallel analysis of the genome and epigenome in the same tumor is of great benefit. In this 

study, we have developed a stepwise gene selection strategy to identify multiple concerted 

disruptions using an integrative genomics approach. 

63

In this study, three DNA dimensions, which have current affordable high throughput assays, 

were examined. However, we envision that new techniques for analysis of additional aspects 

such as histone modification states and gene mutation status will reveal mechanisms that would 

explain even more gene expression changes within individual samples. The identification of a 

number of key cancer-related genes and pathways using a relatively small sample size 

suggests that limitations in requiring large sample sizes for studies to identify relevant genes 

and pathways may be circumvented by our comprehensive approach. Consequently, this 

concept can be projected to current technologies such as high throughput sequencing where it 

may prove more prudent to perform this analysis in multiple dimensions in a smaller number of 

samples rather than in one dimension in many more samples at a comparable cost. Finally, 

observing the same gene in a given pathway being deregulated in a completely different 

manner between samples highlights one of the shortcomings of group-based analysis and 

highlights the eventual need to move to systems analysis of tumors as individual entities. 

64

Figure 3.1 

a 

CN Gain 

Frequency 

CN Loss 

Frequency 

LOH 

Frequency 

CN Neutral LOH 

Frequency 

b 

BRCA2 

ESR1 

ERBB2 

BRCA1 

1 

0.5 

0 

0 

1 

0.5 

0.5 1 1.5 2 2.5 3 

0 

0 

1 

0.5 1 1.5 2 2.5 3 

0.5 

0 

0 

1 

0.5 1 1.5 2 2.5 3 

0.5 

ESR1 

0 

0 0.5 1 1.5 2 2.5 3 

HCC38 

HCC1008 

HCC1143 

HCC1395 

HCC1599 

HCC1937 

HCC2218 

BT474 

MCF7 

TP53 

Genomic Position (Gbp) 

HCC38 

HCC1008 

HCC1143 

HCC1395 

HCC1599 

HCC1937 

Copy Number Gain 

Copy Number Loss 

LOH 

Retention (no LOH) 

Figure 3.1. Genomic profiles of breast cancer cell lines. (a) Whole genome frequency 

analysis copy number gain (red), copy number loss (green), loss of heterozygosity/allelic 

imbalance (AI) (top blue) and copy number neutral LOH/AI (bottom blue). Vertical lines 

through all four graphs represent the genomic location of key breast cancer genes, using the 

hg18 build of the human genome map. (b) Illustration of copy number and LOH/AI status for 

ESR1, BRCA1, BRCA2, ERBB2 and TP53 in each of the samples. Each of these DNA 

events is evident in all of these genes. 

65 

HCC2218 

BT474 

MCF7 

BRCA2 

TP53 

ERBB2 

BRCA1

Figure 3.2. Quantitative and qualitative benefits of integrative analyses. (a) Heatmap and 

bar plot illustration of the additive benefit of multi-dimensional DNA analysis for the explanation 

of consequential differential gene expression. Within a sample, when sequentially adding a DNA 

dimension of analysis, an increasing percentage of observed differential gene expression can 

be explained. For each dimension or combination of dimensions, in the bar plot, the median 

value is used (grey bars). Heatmaps display the percentage of differential expression explained 

by DNA mechanisms, with values near to 100 either dark red (overexpression) or green 

(underexpression) and values closer to 0 in white. (b) Two specific genes GNAS and CASP1 

are given as examples to show multiple and complementary mechanisms of gene disruption, 

illustrating the importance of multi-dimensional analysis (MDA). 

66

67 

Figure 3.2 

a 

Hypo 

AI 

CNG 

CNG Or Hypo Or AI 

Hyper 

LOH 

CNL 

CNL Or Hyper Or LOH 

b 

GNAS 


DNA Copy Number 

DNA Methylation 

Allelic Status 

CASP1 





HCC38 

HCC1008 

HCC1143 

HCC1395 

HCC1599 

HCC1937 

HCC2218 

BT474 

0.197 0.266 0.176 0.134 0.203 0.171 0.144 0.194 0.180 

0.319 0.325 0.325 0.337 0.215 0.421 0.132 0.271 0.122 

0.708 0.401 0.372 0.644 0.440 0.464 0.321 0.458 0.500 

0.821 0.686 0.655 0.757 0.612 0.743 0.435 0.679 0.629 

0.103 0.062 0.126 0.236 0.145 0.161 0.183 0.172 0.166 

0.425 0.512 0.516 0.523 0.316 0.569 0.197 0.348 0.226 

0.367 0.584 0.573 0.499 0.408 0.473 0.203 0.549 0.419 

0.522 0.705 0.790 0.702 0.562 0.747 0.363 0.721 0.558 

HCC38 

HCC1008 

HCC1143 

HCC1395 

HCC1599 

HCC1937 

HCC2218 

BT474 

MCF7 

MCF7 

0 0.2 0.4 0.6 0.8 

Proportion of Overexpression Explained 

0 0.2 0.4 0.6 0.8 

Proportion of Underexpression Explained 

Legend: 

GE: Gene Expression: Over Under 

CN: DNA Copy Number: Gain Loss 

L: Allelic Status: LOH 

M: DNA Methylation: Hypo Hyper

Figure 3.3 

a 

Proportion of genes in 

random simulations 

b 

0.30 

0.25 

0.20 

0.15 

0.10 

0.05 

CN Or AI/LOH Or Meth 

CN 

0 

Meth 

AI/LOH 

0 1 2 3 4 5 6 7 8 9 

Disruption frequency 

0 200 400 600 800 1000 1200 

# of genes at 6/9 cut-o� 

Simulated Data 

Experimental Data 

1400 

Figure 3.3. Determination and application of a disruption frequency threshold. (a) 

Results of the analyses of ten simulated datasets. Aggregating the results of the simulated 

analyses, the proportion of random simulations at the observed frequency thresholds are 

shown. From these analysis, approximately 2% of the simulations were ≥ 6/9. (b) Using a 

frequency cut-off of 6/9, the number of genes disrupted at that frequency using a single or 

combination of DNA dimensions. With a single dimension alone, we can maximally identify 

437 genes which are differentially expressed and exhibit a concerted change at the DNA 

level in a minimum of 6/9 samples. However, using all three dimensions, we find that 1162 

genes are in fact differentially expressed and contain at least one concerted change in one 

of the DNA dimensions. This represents over a two-fold increase in the number of genes 

identified. 

68

Figure 3.4 

a 

b 

Disruption Frequency 

Cumulative Frequency 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

DNA 

Methylation 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Copy Number 

LOH 


Copy Number 

LOH 

Copy Number Or LOH 

AI/LOH Or 


Copy Number 

Or AI/LOH 

Copy Number Or 





AI/LOH Or 


LOH Or DNA Methylation 

Frequency 

threshold 


LOH Or DNA 

Methylation 

Figure 3.4. Impact of multi-dimensional analysis on low frequency events. (a) Box 

plot analysis of the frequency distribution of single and multi-dimensional analyses (MDA) of 

the 1162 genes differentially expressed with a concerted change in one of the DNA dimensions. 

The area in red represents the number of genes (of the 1162) that would be missed if 

only a single DNA dimension was examined, while the area in blue represents the genes 

that would be detected. Examining the median values for the three right-most boxes, we 

see that by even using the box with the highest median (copy number), we would not be 

able to detect about 50% of the 1162 genes. (b) Two specific examples highlighting the 

importance of multi-dimensional genomic analysis. Using single dimensional analyses 

(green shade) alone, CD70 (blue line graph) and ENG (red line graph) disruption occur at 

very low frequencies (44% and 33% respectively). However, when examining two (red 

shade) or three genomic dimensions (blue shade), the disruption of these genes occurs at 

very high frequencies, 88% and 77% respectively. Frequency threshold of 6/9 is denoted 

with a black dotted line. 

69

70 

Figure 3.5 

-log(pvalue) 

5.0 

4.0 

3.0 

2.0 

1.0 

0.0 

Molecular Mechanisms 

of Cancer 

Cell Cycle: G1/S Checkpoint 

Regulation 

Aryl Hydrocarbon 

Receptor Signaling 

Breast Cancer Regulation 

by Stathmin1 

Legend: 

Multi-Dimensional Analyis 

Simulated Data Sets 

Ovarian Cancer Signaling 

Prostate Cancer Signaling 

p53 Signaling 

Neuregulin Signaling 

PI3K/AKT Signaling 

Threshold 

Cell Cycle: G2/M DNA Damage 

Checkpoint Regulation 

Figure 3.5. Pathway analysis of the 1162 genes identified by multi-dimensional analysis. Ingenuity Pathway 

Analysis of the 1162 genes identified by MDA as well as genes meeting the same frequency criteria (6/9) from the 

analysis of the ten simulated datasets. In total, using the list of 1162 MDA genes, 53 canonical signaling pathways 

were identified as significant after multiple testing correction using a Benjamini-Hochberg correction (Additional File 

5). In contrast, using the same statistical criteria, nine of the 10 simulated datasets yielded no significant pathways 

with one of the datasets yielding one pathway. In this figure, ten of the most well known, cancer-related pathways are 

shown. The yellow threshold line represents a Benjamini-Hochberg corrected p-value of 0.05 with bars above that 

line deemed significant.

Figure 3.6 

* 

* 

ERBB2 

ERBB2 

HSP90AB1 

GRB2 

SOS2 

HRAS 

RAF1 

MAP2K1 

ERK1/2 

ERK1/2 

ELK1 

S1 S2 S3 S4 S5 S6 S7 S8 S9 

GE 

CN 

L 

M 

S1 S2 S3 S4 S5 S6 S7 S8 S9 

GE 

CN 

L 

M 

S1 S2 S3 S4 S5 S6 S7 S8 S9 

GE 

CN 

L 

M 

S1 S2 S3 S4 S5 S6 S7 S8 S9 

GE 

CN 

L 

M 

S1 S2 S3 S4 S5 S6 S7 S8 S9 

GE 

CN 

L 

M 

S1 S2 S3 S4 S5 S6 S7 S8 S9 

GE 

CN 

L 

M 

S1 S2 S3 S4 S5 S6 S7 S8 S9 

GE 

CN 

L 

M 

S1 S2 S3 S4 S5 S6 S7 S8 S9 

GE 

CN 

L 

M 

S1 S2 S3 S4 S5 S6 S7 S8 S9 

GE 

CN 

L 

M 

MYC 

Proliferation & 

Differentiation 

EREG 

ERBB2 

PRKCI 

ERBB4 

S1 S2 S3 S4 S5 S6 S7 S8 S9 

GE 

CN 

L 

M 

* 

GRB7 

ERBB2IP 

STAT5 

ERRFI1 

S1 S2 S3 S4 S5 S6 S7 S8 S9 

GE 

CN 

L 

M 

Cell Cycle 

S1 S2 S3 S4 S5 S6 S7 S8 S9 

GE 

CN 

L 

M 

S1 S2 S3 S4 S5 S6 S7 S8 S9 

GE 

CN 

L 

M 

S1 S2 S3 S4 S5 S6 S7 S8 S9 

GE 

CN 

L 

M 

S1 S2 S3 S4 S5 S6 S7 S8 S9 

GE 

CN 

L 

M 

Mitogenic 

Signalling 

Figure 3.6. Complex deregulation of the Neuregulin/ERBB2 signaling pathway. Each 

gene is color-coded red and green to represent over and underexpression respectively. 

Genes colored both represent genes which are over and underexpressed in different 

samples. Beside each gene is the status for gene expression, copy number, LOH/AI and 

DNA methylation, with the alterations in each dimension colored as per the legend. DNA 

alterations are only shown when a change in gene expression is observed. It should be 

noted that LOH can be derived from multiple mechanisms. In this study, we do not distinguish 

between the which mechanisms. Likewise, methylation changes may affect one or 

both alleles. In this study, we do not distinguish the status of the alleles individually. Genes 

denoted with * have one sample exhibiting multiple concerted disruption (MCD). Samples 

are coded as follows: S1 = HCC38, S2 = HCC1008, S3 = HCC1143, S4 = HCC1395, S5 = 

HCC1599, S6 = HCC1937, S7 = HCC2218, S8 = BT474, and S9 = MCF7. 

71 

PDK1 

AKT2 

mTOR 

RPS6KB1 

RPS6KB1 

* 

PIK3R1 

* 

* 

ERBB4 

EREG 

PIP2 PIP3 

S1 S2 S3 S4 S5 S6 S7 S8 S9 

GE 

CN 

L 

M 

S1 S2 S3 S4 S5 S6 S7 S8 S9 

GE 

CN 

L 

M 

S1 S2 S3 S4 S5 S6 S7 S8 S9 

GE 

CN 

L 

M 

ERBB4 

S1 S2 S3 S4 S5 S6 S7 S8 S9 

GE 

CN 

L 

M 

S1 S2 S3 S4 S5 S6 S7 S8 S9 

GE 

CN 

L 

M 

* 

PTEN 

BAD 

CDKN1B 

RPS6 

S1 S2 S3 S4 S5 S6 S7 S8 S9 

GE 

CN 

L 

M 

PI3K-AKT 

Signalling 

Survival & 

Proliferation 

S1 S2 S3 S4 S5 S6 S7 S8 S9 

GE 

CN 

L 

M 

S1 S2 S3 S4 S5 S6 S7 S8 S9 

GE 

CN 

L 

M 

S1 S2 S3 S4 S5 S6 S7 S8 S9 

GE 

CN 

L 

M 

S1 S2 S3 S4 S5 S6 S7 S8 S9 

GE 

CN 

L 

M 

Legend: 





72 

Figure 3.7 

Sample: HCC1008 

Copy Number Gain Retention 

Beta value 

Log2 Intensity 

0.4 

0.2 

0 

12 

8 

4 

0 


MCF10A HCC1008 



Sample: HCC1395 

PTEN 

Copy Number Loss LOH 

Log2 Intensity 

0.8 

0.4 

Beta value 1.2 

0 

10 

8 

6 

4 

2 

0 





Figure 3.7. Deregulation of PTEN occurs differently between samples. In HCC1008 (left), PTEN is overexpressed with an 

associated gain in copy number and hypomethylation. Conversely, in HCC1395 (right), PTEN is underexpressed, with an 

associated loss in copy number, LOH, and DNA hypermethylation. This illustrates how each tumor may behave differently 

from another.

Figure 3.8. Multiple concerted disruption (MCD) analysis and its application to triple 

negative breast cancer. (a) Analysis of ten simulated datasets to determine the proportion of 

random simulations at each observed frequency of MCD. Notably, 99.7% of random 

simulations had a MCD frequency of 0/9, with the remaining 0.3% at 1/9. Moreover, no 

simulations showed a frequency ≥ 2/9. Thus, the observation of an MCD event suggests the 

event is likely non-random. (b) Using the knowledge database of Ingenuity Pathway Analysis, 

upstream and downstream components of FGFR2 were selected to assess their role in the 

subset of triple negative breast cancer (TNBC) cell lines. Only components which were shown 

to have a direct or indirect expression level relationship were selected. Of the seven 

components identified (four upstream and three downstream of FGFR2), one upstream 

component (FGF2) and one downstream component (COL1A1) were present in both the MDA 

list (Additional File 4) and MCD list (Additional File 7). FGF2, colored in green, is shown to be 

frequently underexpressed while COL1A2, colored in red, is frequently overexpressed. 

Interestingly, examining FGFR2 and COL1A1, while FGFR2 overexpression is not frequently 

associated with DNA level alteration, COL1A1 is frequently affected at DNA level. Moreover, in 

the five TNBC cell lines examined, four have DNA level alteration of COL1A1 and the remaining 

line has DNA level alteration of FGFR2. 

73

Figure 3.8 

a 

b 

Proportion of random simulations 

1 

0.9 

0.8 

0.7 

0.6 

0.5 

0.4 

0.3 

0.2 

0.1 

0 

0 1 2 3 4 5 6 7 8 9 

MCD Frequency 

FGFR2 





COL1A1 





FGF2 

HCC38 

* * 

HCC1008 

IGF2 

HCC1143 

TP63 

HCC1599 

HCC1937 

74 

FGFR2 

COL1A1 

*Sample has MCD 

RUNX2 

TGFB1 

Legend: 






1. Chang JC, Wooten EC, Tsimelzon A, Hilsenbeck SG, Gutierrez MC, Elledge R, Mohsin 

S, Osborne CK, Chamness GC, Allred DC et al: Gene expression profiling for the 

prediction of therapeutic response to docetaxel in patients with breast cancer. 

Lancet 2003, 362(9381):362-369. 


expression analysis of cancer. J Cell Physiol 2008. 



Nature 2000, 406(6797):747-752. 




2001, 98(19):10869-10874. 

5. van 't Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AA, Mao M, Peterse HL, van der 

Kooy K, Marton MJ, Witteveen AT et al: Gene expression profiling predicts clinical 

outcome of breast cancer. Nature 2002, 415(6871):530-536. 

6. Fridlyand J, Snijders AM, Ylstra B, Li H, Olshen A, Segraves R, Dairkee S, Tokuyasu T, 

Ljung BM, Jain AN et al: Breast tumor copy number aberration phenotypes and 

genomic instability. BMC Cancer 2006, 6:96. 

7. Albertson DG, Ylstra B, Segraves R, Collins C, Dairkee SH, Kowbel D, Kuo WL, Gray 

JW, Pinkel D: Quantitative mapping of amplicon structure by array CGH identifies 

CYP24 as a candidate oncogene. Nat Genet 2000, 25(2):144-146. 

8. Chin SF, Wang Y, Thorne NP, Teschendorff AE, Pinder SE, Vias M, Naderi A, Roberts I, 

Barbosa-Morais NL, Garcia MJ et al: Using array-comparative genomic hybridization 

to define molecular portraits of primary breast cancers. Oncogene 2007, 

26(13):1959-1970. 

9. Jain AN, Chin K, Borresen-Dale AL, Erikstein BK, Eynstein Lonning P, Kaaresen R, 

Gray JW: Quantitative analysis of chromosomal CGH in human breast tumors 

associates copy number abnormalities with p53 status and patient survival. Proc 

Natl Acad Sci U S A 2001, 98(14):7952-7957. 

10. Naylor TL, Greshock J, Wang Y, Colligon T, Yu QC, Clemmer V, Zaks TZ, Weber BL: 

High resolution genomic analysis of sporadic breast cancer using array-based 

comparative genomic hybridization. Breast Cancer Res 2005, 7(6):R1186-1198. 

11. Shadeo A, Lam WL: Comprehensive copy number profiles of breast cancer cell 

model genomes. Breast Cancer Res 2006, 8(1):R9. 




13. Chin SF, Teschendorff AE, Marioni JC, Wang Y, Barbosa-Morais NL, Thorne NP, Costa 

JL, Pinder SE, van de Wiel MA, Green AR et al: High-resolution aCGH and 

expression profiling identifies a novel genomic subtype of ER negative breast 

cancer. Genome Biol 2007, 8(10):R215. 








75







18. Lockwood WW, Coe BP, Williams AC, MacAulay C, Lam WL: Whole genome tiling 

path array CGH analysis of segmental copy number alterations in cervical cancer 

cell lines. Int J Cancer 2007, 120(2):436-443. 

19. Khojasteh M, Lam WL, Ward RK, MacAulay C: A stepwise framework for the 

normalization of array CGH data. BMC Bioinformatics 2005, 6:274. 

20. Jong K, Marchiori E, Meijer G, Vaart AV, Ylstra B: Breakpoint identification and 

smoothing of array comparative genomic hybridization data. Bioinformatics 2004, 

20(18):3636-3637. 




22. Carvalho B, Bengtsson H, Speed TP, Irizarry RA: Exploration, normalization, and 

genotype calls of high-density oligonucleotide SNP array data. Biostatistics 2007, 

8(2):485-499. 

23. Lin M, Wei LJ, Sellers WR, Lieberfarb M, Wong WH, Li C: dChipSNP: significance 

curve and clustering of SNP-array-based loss-of-heterozygosity data. 

Bioinformatics 2004, 20(8):1233-1240. 

24. Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, Fiegler H, Shapero MH, 

Carson AR, Chen W et al: Global variation in copy number in the human genome. 

Nature 2006, 444(7118):444-454. 

25. Karolchik D, Kuhn RM, Baertsch R, Barber GP, Clawson H, Diekhans M, Giardine B, 

Harte RA, Hinrichs AS, Hsu F et al: The UCSC Genome Browser Database: 2008 

update. Nucleic Acids Res 2008, 36(Database issue):D773-779. 




27. Chari R, Coe BP, Wedseltoft C, Benetti M, Wilson IM, Vucic EA, MacAulay C, Ng RT, 

Lam WL: SIGMA2: a system for the integrative genomic multi-dimensional analysis 

of cancer genomes, epigenomes, and transcriptomes. BMC Bioinformatics 2008, 

9:422. 

28. Soh J, Okumura N, Lockwood WW, Yamamoto H, Shigematsu H, Zhang W, Chari R, 

Shames DS, Tang X, MacAulay C et al: Oncogene mutations, copy number gains 

and mutant allele specific imbalance (MASI) frequently occur together in tumor 

cells. PLoS One 2009, 4(10):e7464. 

29. Tuna M, Knuutila S, Mills GB: Uniparental disomy in cancer. Trends Mol Med 2009, 

15(3):120-128. 

30. Yan H, Yuan W, Velculescu VE, Vogelstein B, Kinzler KW: Allelic variation in human 

gene expression. Science 2002, 297(5584):1143. 

31. van de Vijver MJ, He YD, van't Veer LJ, Dai H, Hart AA, Voskuil DW, Schreiber GJ, 

Peterse JL, Roberts C, Marton MJ et al: A gene-expression signature as a predictor 

of survival in breast cancer. N Engl J Med 2002, 347(25):1999-2009. 

32. Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, Botstein D, 

Altman RB: Missing value estimation methods for DNA microarrays. Bioinformatics 

2001, 17(6):520-525. 

33. Rhodes DR, Kalyana-Sundaram S, Mahavisno V, Varambally R, Yu J, Briggs BB, 

Barrette TR, Anstet MJ, Kincead-Beal C, Kulkarni P et al: Oncomine 3.0: genes, 

pathways, and networks in a collection of 18,000 cancer gene expression profiles. 

Neoplasia 2007, 9(2):166-180. 

76

34. Johnson N, Speirs V, Curtin NJ, Hall AG: A comparative study of genome-wide SNP, 

CGH microarray and protein expression analysis to explore genotypic and 

phenotypic mechanisms of acquired antiestrogen resistance in breast cancer. 

Breast Cancer Res Treat 2008, 111(1):55-63. 

35. Jee CD, Lee HS, Bae SI, Yang HK, Lee YM, Rho MS, Kim WH: Loss of caspase-1 

gene expression in human gastric carcinomas and cell lines. Int J Oncol 2005, 

26(5):1265-1271. 

36. Ueki T, Takeuchi T, Nishimatsu H, Kajiwara T, Moriyama N, Narita Y, Kawabe K, Ueki K, 

Kitamura T: Silencing of the caspase-1 gene occurs in murine and human renal 

cancer cells and causes solid tumor growth in vivo. Int J Cancer 2001, 91(5):673- 

679. 

37. McLendon R, Friedman A, Bigner D, Van Meir EG, Brat DJ, Mastrogianakis M, Olson JJ, 

Mikkelsen T, Lehman N, Aldape K et al: Comprehensive genomic characterization 

defines human glioblastoma genes and core pathways. Nature 2008. 

38. Parsons DW, Jones S, Zhang X, Lin JC, Leary RJ, Angenendt P, Mankoo P, Carter H, 

Siu IM, Gallia GL et al: An integrated genomic analysis of human glioblastoma 

multiforme. Science 2008, 321(5897):1807-1812. 

39. Chang JT, Nevins JR: GATHER: a systems approach to interpreting genomic 

signatures. Bioinformatics 2006, 22(23):2926-2933. 

40. Bachman KE, Argani P, Samuels Y, Silliman N, Ptak J, Szabo S, Konishi H, Karakas B, 

Blair BG, Lin C et al: The PIK3CA gene is mutated with high frequency in human 

breast cancers. Cancer Biol Ther 2004, 3(8):772-775. 

41. Slamon DJ, Godolphin W, Jones LA, Holt JA, Wong SG, Keith DE, Levin WJ, Stuart SG, 

Udove J, Ullrich A et al: Studies of the HER-2/neu proto-oncogene in human breast 

and ovarian cancer. Science 1989, 244(4905):707-712. 

42. Stein D, Wu J, Fuqua SA, Roonprapunt C, Yajnik V, D'Eustachio P, Moskow JJ, 

Buchberg AM, Osborne CK, Margolis B: The SH2 domain protein GRB-7 is coamplified, 

overexpressed and in a tight complex with HER2 in breast cancer. Embo 

J 1994, 13(6):1331-1340. 




44. Stemke-Hale K, Gonzalez-Angulo AM, Lluch A, Neve RM, Kuo WL, Davies M, Carey M, 

Hu Z, Guan Y, Sahin A et al: An integrative genomic and proteomic analysis of 

PIK3CA, PTEN, and AKT mutations in breast cancer. Cancer Res 2008, 68(15):6084- 

6091. 

45. Wang E, Lenferink A, O'Connor-McCourt M: Cancer systems biology: exploring 

cancer-associated genes on cellular networks. Cell Mol Life Sci 2007, 64(14):1752- 

1762. 

46. Bova GS, Carter BS, Bussemakers MJ, Emi M, Fujiwara Y, Kyprianou N, Jacobs SC, 

Robinson JC, Epstein JI, Walsh PC et al: Homozygous deletion and frequent allelic 

loss of chromosome 8p22 loci in human prostate cancer. Cancer Res 1993, 

53(17):3869-3873. 

47. Chinen K, Isomura M, Izawa K, Fujiwara Y, Ohata H, Iwamasa T, Nakamura Y: 

Isolation of 45 exon-like fragments from 8p22-->p21.3, a region that is commonly 

deleted in hepatocellular, colorectal, and non-small cell lung carcinomas. 

Cytogenet Cell Genet 1996, 75(2-3):190-196. 

48. Cooke SL, Pole JC, Chin SF, Ellis IO, Caldas C, Edwards PA: High-resolution array 

CGH clarifies events occurring on 8p in carcinogenesis. BMC Cancer 2008, 

8(1):288. 

49. Yaremko ML, Recant WM, Westbrook CA: Loss of heterozygosity from the short arm 

of chromosome 8 is an early event in breast cancers. Genes Chromosomes Cancer 

1995, 13(3):186-191. 

77

50. Giovane A, Pintzas A, Maira SM, Sobieszczuk P, Wasylyk B: Net, a new ets 

transcription factor that is activated by Ras. Genes Dev 1994, 8(13):1502-1513. 

51. He J, Pan Y, Hu J, Albarracin C, Wu Y, Dai JL: Profile of Ets gene expression in 

human breast carcinoma. Cancer Biol Ther 2007, 6(1):76-82. 




2006, 3(12):e486. 

53. Pils D, Horak P, Gleiss A, Sax C, Fabjani G, Moebus VJ, Zielinski C, Reinthaller A, 

Zeillinger R, Krainer M: Five genes from chromosomal band 8p22 are significantly 

down-regulated in ovarian carcinoma: N33 and EFA6R have a potential impact on 

overall survival. Cancer 2005, 104(11):2417-2429. 

54. Cheang MC, Voduc D, Bajdik C, Leung S, McKinney S, Chia SK, Perou CM, Nielsen 

TO: Basal-like breast cancer defined by five biomarkers has superior prognostic 

value than triple-negative phenotype. Clin Cancer Res 2008, 14(5):1368-1376. 

55. Gluz O, Liedtke C, Gottschalk N, Pusztai L, Nitz U, Harbeck N: Triple-negative breast 

cancer--current status and future directions. Ann Oncol 2009, 20(12):1913-1927. 

56. Rakha EA, El-Sayed ME, Green AR, Lee AH, Robertson JF, Ellis IO: Prognostic 

markers in triple-negative breast cancer. Cancer 2007, 109(1):25-32. 

57. Turner N, Lambros MB, Horlings HM, Pearson A, Sharpe R, Natrajan R, Geyer FC, van 

Kouwenhove M, Kreike B, Mackay A et al: Integrative molecular profiling of triple 

negative breast cancers identifies amplicon drivers and potential therapeutic 

targets. Oncogene 2010. 

58. Andre F, Job B, Dessen P, Tordai A, Michiels S, Liedtke C, Richon C, Yan K, Wang B, 

Vassal G et al: Molecular characterization of breast cancer with high-resolution 

oligonucleotide comparative genomic hybridization array. Clin Cancer Res 2009, 

15(2):441-451. 

59. Bertucci F, Finetti P, Cervera N, Esterni B, Hermitte F, Viens P, Birnbaum D: How basal 

are triple-negative breast cancers? Int J Cancer 2008, 123(1):236-240. 

60. Han W, Jung EM, Cho J, Lee JW, Hwang KT, Yang SJ, Kang JJ, Bae JY, Jeon YK, Park 

IA et al: DNA copy number alterations and expression of relevant genes in triplenegative 

breast cancer. Genes Chromosomes Cancer 2008, 47(6):490-499. 

61. Kreike B, van Kouwenhove M, Horlings H, Weigelt B, Peterse H, Bartelink H, van de 

Vijver MJ: Gene expression profiling and histopathological characterization of 

triple-negative/basal-like breast carcinomas. Breast Cancer Res 2007, 9(5):R65. 

62. Ramaswamy S, Ross KN, Lander ES, Golub TR: A molecular signature of metastasis 

in primary solid tumors. Nat Genet 2003, 33(1):49-54. 

63. Wang Y, Klijn JG, Zhang Y, Sieuwerts AM, Look MP, Yang F, Talantov D, Timmermans 

M, Meijer-van Gelder ME, Yu J et al: Gene-expression profiles to predict distant 

metastasis of lymph-node-negative primary breast cancer. Lancet 2005, 

365(9460):671-679. 

64. Richardson AL, Wang ZC, De Nicolo A, Lu X, Brown M, Miron A, Liao X, Iglehart JD, 

Livingston DM, Ganesan S: X chromosomal abnormalities in basal-like human 

breast cancer. Cancer Cell 2006, 9(2):121-132. 

65. Radvanyi L, Singh-Sandhu D, Gallichan S, Lovitt C, Pedyczak A, Mallo G, Gish K, Kwok 

K, Hanna W, Zubovits J et al: The gene associated with trichorhinophalangeal 

syndrome in humans is overexpressed in breast cancer. Proc Natl Acad Sci U S A 

2005, 102(31):11005-11010. 

66. Finak G, Bertos N, Pepin F, Sadekova S, Souleimanova M, Zhao H, Chen H, Omeroglu 

G, Meterissian S, Omeroglu A et al: Stromal gene expression predicts clinical 

outcome in breast cancer. Nat Med 2008, 14(5):518-527. 

67. Karnoub AE, Dash AB, Vo AP, Sullivan A, Brooks MW, Bell GW, Richardson AL, Polyak 

K, Tubo R, Weinberg RA: Mesenchymal stem cells within tumour stroma promote 

breast cancer metastasis. Nature 2007, 449(7162):557-563. 

78

Chapter 4: Uniparental disomy is a prevalent genetic 

mechanism of oncogene disruption in lung adenocarcinoma 3 

3 A version of this chapter will be submitted for publication with the following author list: Chari 

R, Lockwood WW, Soh J, Coe BP, Tam K, MacAulay CE, Minna JD, Lam S, Gazdar AF, Lam 

WL. (2010) Uniparental disomy is a prevalent genetic mechanism of oncogene disruption in 

lung adenocarcinoma. 

79


Genetic alterations play a significant role in a variety of malignancies [1, 2]. Typically, these 

alterations have been represented by either changes in gene dosage (DNA copy number) or 

somatic mutations such as total copy number gain or activating mutations of oncogenes and 

total copy number loss or inactivating mutations of tumor suppressor genes. Loss of 

heterozygosity is also a common alteration whereby one allele is lost and often, results in a loss 

of total copy number. However, there are instances in which where one allele is lost but the 

remaining allele is duplicated resulting in no net change in copy number, termed copy neutral 

loss of heterozygosity or somatic uniparental disomy (UPD). 

Although somatic UPD had been shown previously in malignancies such as retinoblastoma [3], 

recent studies have shown an increased prominence of this alteration [4]. This largely been a 

result of advances in technology to detect somatic UPD and advances in the methodologies to 

define UPD [5, 6]. Moreover, frequent regions of somatic UPD have been identified in many 

different cancer types such as colorectal cancer [7, 8], lymphoma [9, 10], myelodysplastic 

syndrome (MDS) [11-13], basal cell carcinoma [14], hepatoblastoma [15], and ovarian cancer 

[16]. In addition, while the target gene of some of these regions have been associated to tumor 

suppressors such as RB1 and TP53, where the gene is likely mutated, the targets have also 

been associated with oncogenes. For example, mutation with somatic UPD has been observed 

at loci such as JAK2 [6, 17], CBL [12, 18], FLT3 [19] in hematological malignancies. However, 

such associations have been limited in epithelial malignancies. 

Recently, we have illustrated the concept of mutant allele specific imbalance (MASI) in lung 

cancer [20]. It was found that a highly activated state for EGFR and KRAS is achieved through 

either copy number amplification of the mutated allele for EGFR and UPD of the mutated allele 

for KRAS. With the observed frequency of UPD at KRAS as such, we sought to assess the 

impact and prevalence of UPD in the lung adenocarcinoma genome. Strikingly, we found that 

the amount of the genome affected frequently by UPD was comparable to that of copy number 

80

gain and loss. When examining major oncogenes and tumor suppressor genes, while most 

oncogenes were associated with frequent areas of gain, we found a subset of both known and 

novel oncogenes that were frequently affected by UPD. Finally, examining oncogenes with 

homozygous mutation in multiple cancer types, we observe frequent UPD at these genes 

suggesting this mechanism of oncogene activation is prevalent across multiple cancer types. 

4.2 Methods 

4.2.1 Genome wide profiling of clinical lung adenocarcinoma specimens 

Forty-six lung adenocarcinoma cases were obtained from Vancouver General Hospital under 

approved ethics. Cases were reviewed by a pathologist and tumors were microdissected to 

ensure maximal tumor cell content (≥ 70%). Five hundred nanograms of genomic DNA were 

extracted from each tumor and adjacent non-malignant tissue were prepared and hybridized to 

the Affymetrix Genome-Wide Human SNP 6.0 array platform as per manufacturer's instructions. 

CEL files, the raw data files generated, were then processed using the Affymetrix Genotyping 

Console version 3.0.2 to generate .chp files using the birdseed v2 genotyping algorithm. 

4.2.2 Determination of regions of uniparental disomy (UPD) in clinical lung tumors 

CEL files and .chp files were imported into Partek Genomics Suite (PGS) using the software's 

recommended default settings. First, to determine total copy number, paired copy number 

intensities were calculated for each sample using the intensity in the tumor vs. it's matched non- 

malignant sample. Paired copy number intensities were then analyzed using the Genomic 

Segmentation method in PGS with all parameters run at default except for the number of 

markers which was set to 50. Subsequently, allele specific copy number (ASCN) analysis was 

used to determine regions of allelic imbalance. A region was deemed imbalanced if the 

imbalance proportion was ≥ 0.15 (as recommended by PGS). Finally, a region was called UPD 

if the region was imbalance and no change in total copy number was present. 

81

4.2.3 Determining frequent regions of UPD, gain and loss 

To determine frequent regions of gain, loss, and UPD, the frequency of each alteration was 

determined for each SNP probe on the somatic chromosomes. A frequency threshold of 40% 

was used. To smooth out regions of UPD (and for gain and loss as well), a three step process 

was performed. First, adjacent probes with frequencies greater than the threshold were merged 

together. Second, to account for dips in frequency where one region is split into two, if the dip is 

less than 1 Mb in size, the regions were merged. Finally, smoothed regions of UPD, gain, or 

loss which were less than 100 probes in size were removed. 

4.2.4 Determination of UPD in cancer cell lines 

Raw SNP 6.0 data (.CEL files) from cancer cell lines were obtained from the Wellcome Trust 

Sanger CGP database. CEL files were then genotyped similarly as above to generate CHP 

files. To define a total copy number and allele specific copy number reference, SNP 6.0 data, 

generated from 72 CEPH HapMap samples, were obtained from Affymetrix and were also 

genotyped. Unpaired copy number and allele specific copy number analyses were performed, 

as described above, to determine regions of allelic imbalance without a change in total copy 

number using Partek Genomics Suite. 

4.2.5 Expression analysis of genes in focal regions of UPD 

For 16 of the 46 tumor/non-malignant tissue pairs, gene expression profiles on a custom 

Affymetrix chip were generated. The 32 samples were normalized using the RMA algorithm 

[21] in the Bioconductor software suite in R [22]. To determine overexpression in a given 

sample pair for a given gene, since expression values are in log2 space, expression values in 

non-malignant samples were subtracted from expression values in the tumor. A two-fold 

expression change was deemed significant for this analysis. 

82

4.3 Results 

4.3.1 Detection of UPD using allele specific copy number analysis 

To determine regions of UPD, we first determined regions of allelic imbalance using an allele 

specific copy number based approach. This approach has been shown to identify more regions 

of imbalance than previous call-based approaches [6, 12]. In the first example, where no UPD 

is present, we observe a chromosome exhibiting no change in total copy number as compared 

to its matched control and also no imbalance between the alleles, represented by shift between 

the blue and red data points (Figure 4.1a). However, in the next two samples, we do observe 

large shifts between the blue and red data points. Specifically, one example illustrates a region 

of UPD with a region of gain on chromosome arm 12q (Figure 4.1b) and another example 

illustrates a whole chromosome UPD event on chromosome 14 (Figure 4.1c). The blue data 

points in the UPD regions are not completely at zero but slightly above due to cells that do not 

carry the UPD alteration. 

4.3.2 UPD is prevalent and non-random in the lung cancer genome with comparable 

frequencies to gain and loss 

With the ability to detect UPD as shown above as well as identifying UPD at the KRAS 

oncogene from a previous study, we then assessed the prevalence of UPD in the genome. 

Using a 40% frequency threshold, we determine the regions of the genome affected by UPD at 

this frequency. In total, 153 regions were identified (Table 4.1). Moreover, when examining 

areas of frequent gain and loss (at similar frequency thresholds), we observe that the amount of 

the genome affected by frequent UPD is comparable to that of frequent gain and loss (Figure 

4.2). While there was some overlap with the regions of loss and UPD, there was very little 

overlap between gain and UPD, even though we would expect some level of overlap by random 

chance. Using megabases of the genome as a metric, we observe 650 Mb affected by gain, 

500 Mb by loss and 400 Mb by UPD, with 7 Mb overlap in gain and UPD and 58 Mb overlap 

83

etween loss and UPD (Figure 4.3). Strikingly, all three alterations cover over 49% of the 

genome. It should also be noted that the observation of comparable levels of gain, loss and 

UPD at the frequency level is also seen when examining samples on an individual basis (Figure 

4.4). 

4.3.3 Overlap of major oncogenes and tumor suppressor genes in regions of gain, loss, 

and UPD 

We then assessed how major oncogenes and tumor suppressor genes associated with the 

three levels of genetic alteration. Using a list of 112 genes derived from a number of sources 

[23, 24] (Table 4.2), we found 52 of these genes to overlap with at least one of frequent gain, 

loss, or UPD. Major oncogenes such as EGFR, MYC, AKT1, MDM2, and ERBB2 are affected 

frequently by copy number gain, which has been shown previously [25-29] (Table 4.3). 

Similarly, major tumor suppressor genes such as FHIT, RARB and CDKN2A are affected by 

frequent copy number loss (Table 4.4). Interestingly, while expected tumor suppressor genes 

such as BRCA2 and RB1 are affected by UPD, a subset of seven oncogenes were affected by 

UPD. Specifically, UPD was observed at KRAS, as shown previously, PIK3CA, BCL6 and 

FLT3. Moreover, examining KRAS (Figure 4.5a) and RB1 (Figure 4.5b) specifically, we see 

that the UPD events are of different sizes between different samples. 

4.3.4 UPD is prevalent at oncogenes across multiple cancer types 

We observed frequent UPD at oncogenes in lung cancer. We sought to assess the prevalence 

of UPD at oncogenes across multiple cancer types. For this analysis, we utilized SNP 6.0 array 

data for over 700 cancer cell lines from the Wellcome Trust Sanger database where somatic 

mutation data were also available. In total, 67 instances of homozygous mutation at 13 

oncogene loci were assessed (Table 4.5). It was found that while copy number gain was the 

most prevalent genetic alteration, a significant proportion of samples exhibited UPD (Figure 

4.6a, Table 4.6). Examining the genes with the most samples harbouring homozygous 

84

mutation, KRAS and BRAF, the overall trend is consistent with what is observed at these two 

loci (Figure 4.6a). An example of UPD at KRAS in NCI-H2030 and BRAF in A427 are 

illustrated in Figure 4.6b. 

For this analysis, cancer cell lines were utilized as the samples represent a more homogeneous 

population of cells. In contrast, clinical tumors, even after microdissection, still may contain 

small amounts of contaminating normal cells. As such, determining if a mutation is 

homozygous in clinical lung tumors is challenging. With available KRAS mutation data, we 

assessed the frequency of gain, loss and UPD in KRAS mutant tumors and observe a similar 

distribution pattern observed in the cell lines (Figure 4.6c) 

4.3.5 Identification of novel candidate oncogenes using focal regions of UPD 

Selecting the more focal regions of UPD within the set of 153 regions, we identified 35 of the 

regions which contained three or less RefSeq annotated genes. In total, 64 RefSeq genes were 

identified across all 35 regions (Table 4.7) and amongst these genes was E2F3 (Figure 4.7a). 

Examining paired gene expression for a subset of the 46 tumor/normal pairs, it was found that 

10/16 pairs showed overexpression of E2F3 (Figure 4.7b). E2F3 has previously shown to be 

overexpressed in lung cancer and also shown to have a role in other cancer types [30, 31]. 

4.4 Discussion 

We have shown the unexpected and wide prevalence of UPD in the lung adenocarcinoma 

genome and have also observed a large number of both known and novel oncogenes harbored 

in these regions of frequent UPD. While there have been previous studies utilizing SNP arrays 

on lung adenocarcinoma tumors [26, 30], there are likely a number of reasons why these 

frequent regions were likely missed. First, the tumors utilized in this study were microdissected 

to ensure a high proportion of tumor cells (≥ 70% was required) were analyzed. This is 

important as previous studies have shown the impact of tissue heterogeneity and the ability to 

detect alterations [32, 33]. Secondly, for every tumor used, matched non-malignant tissue was 

85

obtained, profiled and used as the control. While it has been shown that unmatched references 

can be used to detect UPD, the resultant UPD may not be called correctly all the time. Finally, 

the progression from call-based approaches to allele specific copy number-based approaches 

can also increase the detection of UPD [6, 12]. Taken together, these improvements could 

explain the observed results. 

While it is interesting to observe these frequent regions of UPD in the lung adenocarcinoma 

genome, the larger implications of these findings may not be readily apparent. In the cases of 

somatically mutated oncogenes or tumor suppressor genes, the existence of UPD in these 

cases is clear as UPD is used to select the mutated allele to result in a homozygous mutation 

state. We have previously shown that mutant allelic specific imbalance (MASI), either through 

allele specific amplification or UPD, is associated with a poorer prognosis [20]. To assess the 

prevalence of UPD at homozygously mutated oncogene sites, we analyzed cancer cell lines 

encompassing multiple cancer types for UPD at mutated oncogenes. While the most frequent 

genomic alteration observed is copy number gain, frequent UPD also occurs. The distribution 

of alterations observed across all genes is consistent with the most frequently mutated 

oncogenes, KRAS and BRAF. The result of these UPD events is preferential expression of the 

mutated allele. 

It should also be noted that with the amount of frequent UPD detected, there are regions likely 

selected for reasons other than somatic mutation. For example, like in the cases of imprinted 

regions, there could be preferential selection of an unmethylated or methylated allele which in 

turn, could regulate downstream gene expression. Previous studies have assessed the 

relationship between regions of UPD and DNA methylation patterns in cancer [8, 34, 35]. 

Alternatively, in order to achieve downstream differential expression, in addition to preferential 

selection based on methylation, it has also been shown that for a given gene, transcription may 

involve only one of the alleles [36-39] and thus, selection may be based on transcriptional 

efficiency. Hence, it is important that the genetic data on UPD be integrated with methylation 

86

and gene expression data to refine these regions of UPD with many genes to a small number of 

candidate oncogenes and tumor suppressor genes. 

Though many of the regions of UPD identified were large and encompassed a number of 

genes, approximately 1/5 of the regions identified contained three or less genes. As such, this is 

one approach for narrowing down candidate gene targets. Using gene expression data on a 

subset of the profiled cases used for UPD, we assessed the gene expression profiles of the 64 

genes encompassed in the 35 focal regions. Of the 64 genes, 57 were represented on the 

gene expression microarray platform used. Fifteen of the 57 genes were overexpressed in at 

least 25% of the samples (4/16) (Table 4.8). In addition to E2F3, other genes within the set of 

15 genes have shown interesting biological function. For example, GPR39 has been shown to 

activate EGFR signaling as well as protect cells form apoptosis [40, 41]; SLC7A11 has been 

shown to have a role in drug resistance [42] and was assessed as a therapeutic target for small 

cell lung cancer [43]; PDGFD has been implicated in many different cancer types [44]; and 

PRDM8, a histone methyltransferase, is a member of the PRDM transcription factor family and 

these factors have been implicated as proto-oncogenes [45]. 

4.5 Conclusion 

In summary, we have shown an unexpectedly high prevalence of UPD in the lung 

adenocarcinoma genome, with comparable amounts of the genome affected being comparable 

to copy number gain and loss. While a number of known oncogenes were shown to be in 

regions of frequent UPD, potentially novel lung oncogenes have also been shown to be affected 

by UPD with downstream consequential change in gene expression. Further studies are needed 

to elicit their roles in lung adenocarcinoma. 

87

Figure 4.1. Detection of UPD using allele specific copy number. Total copy number (top) 

and allelic specific copy number (bottom) plots. In the allele specific copy number plot, the red 

data points represent the level of the major allele and the blue data points represent the level of 

the minor allele. The total copy number plot represents a the sum of the allele specific copy 

number. (a) Sample with neutral copy number and no imbalance of chromosome 12. While the 

total copy number is neutral, when examining the allele specific copy number, imbalance 

between the alleles is evident. (b) Sample with regions of copy number gains and UPD (in 

orange) on chromosome 12q. (c) Sample with whole chromosome UPD of chromosome 14. 

88

Figure 4.1 

a 

b 

c 

# of copies 


4 

2 

0 

3 

2 

1 

0 





4 

2 

0 

3 

2 

1 

0 

4 

2 

0 

3 

2 

1 

0 

Chromosome 12 

Chromosome 12q 

Chromosome 14 

89 

Total copy number 

Allele speci�c 

copy number 



copy number 



copy number

Figure 4.2. Comparison of frequent regions of gain, loss and UPD in the lung 

adenocarcinoma genome. Frequent regions of gain (red), loss (green) and UPD (blue) in the 

lung adenocarcinoma genome. Only regions which were altered in at least 40% of the samples, 

by either gain, loss, or UPD, are shown. Frequent regions of gain (such as 5p, 7p, 8q, 17q and 

20q) and loss (such as 3p, 8p, 9p, 13q), which have previously been shown, are detected. The 

fourth column, composite ("C"), represents areas of overlap between gain and UPD (red) and 

loss and UPD (green). 

90

Figure 4.2 

1 G L U C 2 G L U C 3 G L U C 4 G L U C 





21 G L U C 22 G L U C 

91 

G - Gain 

L - Loss 

U - UPD 

C - Composite

Figure 4.3 

Gain Loss 

642 441 

7 

Figure 4.3. Venn diagram illustrating the amount of the genome covered by frequent 

gain, loss, and UPD. Numbers provided are in megabases (Mb) of genome sequence. 

92 

335 

UPD 

58

Figure 4.4. Genomic profile of an individual lung adenocarcinoma sample. Regions of 

gain (red), loss (green), and UPD (blue) are shown in this single lung adenocarcinoma profile. 

Comparable amounts of the genome are affected by all three of these alterations. 

93

94 

Figure 4.4 

1 2 3 4 5 6 7 8 

9 10 11 12 13 14 15 16 

17 18 19 20 21 22 Gain 

Loss 

UPD

95 

Figure 4.5 

a 

chr12 (p12.1-p11.21) 12q12 15 22 

b 

85060201 

85070205 

85070159 

85060358 

85050147 

85060186 

85050235 

85070021 

85070093 

85060276 

85050241 

85040031 

85060354 

85060256 

85050172 

85040001 

85050140 

85060342 

85060206 

85060311 

85060251 

85050207 

85060098 

85070205 

85060186 

85060098 

85060342 

85060251 

85050177 

85070081 

85050011 

85060358 

85060256 

85050147 

85040001 

85060221 

85070085 

85060216 

85060068 

85050172 

85070061 

85060311 

85070093 

85040031 

KRAS 

KRAS UPD Regions 

chr13 (q13.3-q21.1) 13 p12 11.2 21.1 q31.1 q34 

RB1 UPD Regions 

Figure 4.5. Examination of UPD events at the KRAS and RB1 loci. KRAS shown in (a) and RB1 shown in (b). The 

region of UPD encompassing these loci varies in size between samples, with some samples illustrating larger sizes of UPD 

than others. The existence of these different size events are likely a result of a different underlying mechanism of UPD. 

RB1

Figure 4.6 

a b 

c 

Percent of cases 

All Genes (n=67) 

KRAS (n=33) 

BRAF (n=11) 

Gain 

Loss 

UPD 

Neutral 

50 KRAS (n=21) 

40 

30 

20 

10 

0 

Gain UPD Loss Neutral 

# of copies # of alleles 

# of copies # of alleles 

3 

2 

1 

0 

4 

2 

0 

3 

2 

1 

0 

4 

2 

0 

A427 

Chromosome 7 BRAF 

NCI-H2030 

KRAS Chromosome 12 

Figure 4.6. Relationship of homozygous mutation at oncogenes and genomic alteration. 

Using the Wellcome Trust Sanger COSMIC database for somatic mutation data and 

SNP 6.0 data available for over 700 cancer cell lines from their database, prevalence of 

UPD was assessed in this dataset. Specifically, only those cell lines with oncogenes and 

homozygous mutation were analyzed. (a) In total, 67 instances of homozygous mutation at 

an oncogene loci were identified. While a large fraction of cases exhibited copy number 

increase (51%), the second most prominent alteration is UPD (34%). Of the 12 different 

genes assessed, KRAS and BRAF are the most frequently homozygously mutated oncogenes 

and those two genes show similar frequency distribution patterns of genomic alteration 

to the whole set. (b) An example of UPD at BRAF in A427 and KRAS in NCI-H2030 

where both BRAF and KRAS are homozygously mutated. (c) With available mutation data 

on KRAS from the 46 lung tumor/matched non-malignant tissue pairs, similar analysis was 

performed and it was found that the patterns of genomic alteration were similar to what was 

observed in cancer cell lines. 

96 

Allele Speci�c 

Copy Number 

Total Copy Number 

Allele Speci�c 

Copy Number 

Total Copy Number

Figure 4.7. Identification of E2F3 in a focal region of UPD. (a) One of the focal regions 

identified was located on chromosomal region 6p22.3. There were only three RefSeq 

annotated genes that were completely encompassed within this region: E2F3, ID4, and 

MBOAT1. The UCSC Genome Browser (genome build hg18) was used to identify genes and 

visualize region [46]. (b) Analyzing gene expression amongst a subset of the tumors profiled on 

the SNP array, it was found that E2F3 was the most frequently overexpressed amongst the 

three genes assessed, with a frequency of overexpression of 62.5%. 

97

Figure 4.7 

a 

b 

6p25.3 

Log2 fold change 

6p25.1 

2.5 

2.0 

1.5 

1.0 

0.5 

0 

-0.5 

6p24.3 

6p24.1 

6p23 

6p22.3 

6p22.2 

6p22.1 

Samples 1 to 16 

98 

6p21.33 

6p21.31 

6p21.2 

6p21.1 

6p12.3 

6p12.2 

6p12.1 

6p11.2 

6p11.1

Table 4.1. Regions of the genome exhibiting frequent UPD 

Chr BPStart BPEnd # of Chr BPStart BPEnd # of 

markers 

markers 

1 57240523 57708781 261 6 8651004 9629747 293 

1 88627690 88937185 107 6 18605671 20757531 903 

1 213006484 213325225 153 6 71651506 72037387 152 

2 33915294 34575063 336 6 73271789 76714973 996 

2 102028604 102246459 130 6 82824129 84992961 566 

2 103342214 104703078 369 6 87778426 91268388 1147 

2 107288361 107909534 151 6 97800818 101044085 940 

2 113475801 113597836 110 6 105272060 114467061 2939 

2 123104889 123771933 262 6 116466834 119757979 966 

2 129285355 130086326 353 6 121042746 123214382 558 

2 132852444 133093232 131 6 125234328 126352419 406 

2 134976541 137102451 519 6 130012107 130399246 197 

2 138356995 142103903 1239 6 131442876 133188489 644 

2 148597921 149606831 175 6 134265257 144908460 3331 

2 150732270 153458836 918 6 147524934 170759956 9496 

2 154569594 157096948 576 7 109870908 110924988 272 

2 158283600 163854187 1560 7 119981096 121751585 455 

2 165114877 166441984 353 7 122940488 123963613 280 

2 167549158 172376141 1670 7 125810873 126573092 258 

2 173385819 175183422 649 8 82600012 84910522 562 

2 178506144 178971533 144 8 87475660 91661039 1094 

2 182593441 183843529 379 8 109473113 113912000 1026 

2 185976166 192057947 1482 9 32370194 32717136 127 

2 195998062 198251401 604 10 86487543 87543389 442 

2 214938398 217672315 1092 10 97401670 97992222 155 

2 222098215 223638933 514 10 99819620 100845692 389 

2 224915228 225683208 295 11 7088070 8626013 675 

2 234086493 235425331 673 11 9745400 16571856 2963 

3 38947562 40467146 489 11 17785506 20708448 1441 

3 75597086 77391013 520 11 22527476 27641093 1997 

3 120734475 121565472 248 11 31290899 36427391 2111 

3 126442524 128090913 466 11 37624038 46083889 3066 

3 131310768 131908449 194 11 78488179 78811363 210 

3 133156739 140816203 2231 11 81391754 83571743 737 

3 141841646 145572883 1095 11 85133890 86695809 613 

3 148635117 153815363 1592 11 92858003 94403193 618 

3 154850963 162426559 2109 11 99344667 102033351 1020 

3 163593089 164340331 178 11 103294671 104342367 382 

3 171033453 175715353 1551 11 106728968 107654049 275 

3 179650656 194058084 4585 11 111090113 111676300 118 

99

Chr BPStart BPEnd # of Chr BPStart BPEnd # of 

markers 

markers 

4 56867092 57264514 111 11 114516077 115632014 460 

4 59586524 62756073 878 11 121222174 122326244 431 

4 68014200 73981908 1418 11 127365170 128934004 622 

4 75048874 79252113 1473 11 131240209 132411274 555 

4 81069345 81632642 125 12 4901875 5859088 616 

4 83519602 86772381 1036 12 7251496 15972576 2966 

4 95266916 96342489 299 12 19068978 20077939 414 

4 99713479 100262528 179 12 21583225 28079788 2489 

4 102215835 109269168 1774 12 29252288 31395946 1005 

4 110431176 111696150 384 12 36144018 38218718 356 

4 113392931 114074009 165 12 43846459 44709526 206 

4 119368629 120609480 324 12 46714417 47193357 116 

4 122057241 123147706 383 12 49541041 50471244 235 

4 128685983 130719464 455 12 53439918 53858695 150 

4 138792901 139993006 454 12 74639231 75745944 244 

5 52835375 56296363 1274 12 92376456 96256397 1605 

5 59963217 62083765 600 12 97428291 100976394 1075 

5 64017546 65834248 562 12 102585961 120121516 6328 

5 70702961 72963048 707 12 124477764 127353782 1525 

5 74029627 81781885 2453 12 128730700 129400678 383 

5 86329552 88138619 328 13 17943628 32731810 6266 

5 90373713 90752434 102 13 35198318 36512766 543 

5 95109222 96467147 461 13 39604817 41933751 808 

5 98108008 98876920 190 13 43687180 52149041 2627 

5 106982444 108758672 584 13 80666670 81857944 333 

5 113412673 114060668 291 13 97080915 99917507 1076 

5 115177955 116152174 521 14 26463057 27074470 206 

5 118498618 118830120 101 14 39744666 40655458 316 

5 123995175 124348209 112 15 23337531 24061503 396 

5 130208827 131907815 390 17 1646832 4192054 695 

5 139359170 140952015 262 17 5443385 6771106 606 

5 145140525 146950721 652 17 8395366 9333459 369 

5 148164516 149400891 515 17 10552154 14949757 1968 

5 153600774 154479059 292 20 8028477 9014932 524 

5 156203870 159492184 1245 20 53205763 53952115 387 

5 162483304 163524481 416 

5 165711472 166223866 189 

5 180068185 180629495 176 

100

Table 4.2. List of major oncogenes and tumor suppressor genes assessed 

Gene Chr Gene Chr Gene Chr 

ABL1 9 EVI1 3 NF2 22 

ABL2 1 FBXW7 4 NKX2-1 14 

AKT1 14 FEV 2 NOTCH1 9 

AKT2 19 FGFR1 8 NRAS 1 

ALK 2 FGFR2 10 NTRK1 1 

APC 5 FGFR3 4 NTRK3 15 

ATM 11 FH 1 PDGFB 22 

BCL2 18 FHIT 3 PDGFRA 4 

BCL3 19 FLT3 13 PDGFRB 5 

BCL6 3 FOXO1A 13 PHOX2B 4 

BMPR1A 10 FOXO3A 6 PIK3CA 3 

BRAF 7 FOXP1 3 PIK3R1 5 

BRCA1 17 GNAS 20 PIM1 6 

BRCA2 13 GSTP1 11 PRKAR1A 17 

BUB1B 15 HRAS 11 PTCH 9 

CAV1 7 HRPT2 1 PTEN 10 

CBL 11 ITK 5 PTPN11 12 

CCND1 11 JAK2 9 RARB 3 

CCND2 12 JAK3 19 RASSF1A 3 

CCND3 6 KIT 4 RB1 13 

CD44 11 KRAS 12 REL 2 

CDH1 16 LCK 1 RET 10 

CDH11 16 MAF 16 RUNX1 21 

CDH13 16 MAFB 20 SEMA3B 3 

CDK4 12 MAML2 11 SMO 7 

CDK6 7 MAP2K4 17 STK11 19 

CDKN2A 9 MDM2 12 SUFU 10 

CEBPA 11 MEN1 11 SYK 9 

CHEK2 22 MET 7 TCF1 12 

CRK 17 MLH1 3 TIMP3 22 

CTNNB1 3 MLL 11 TP53 17 

CYLD 16 MPL 1 TSC1 9 

DAPK1 9 MSH2 2 TSC2 16 

EGFR 7 MSH6 2 TSHR 14 

ERBB2 17 MYC 8 VHL 3 

ERCC2 19 MYCL1 1 WT1 11 

ERG 21 MYCN 2 

ETV6 12 NF1 17 

101

Table 4.3. Overlap of oncogenes in frequent regions of genomic alteration 

Gene 

Symbol 

Location Gain Loss UPD 

ABL1 9q34.1 X 

ABL2 1q24-q25 X 

AKT1 14q32.32 X 

AKT2 19q13.1q13.2 

X 

BCL6 3q27 X 

CCND1 11q13 X 

CCND3 6p21 X 

CD44 11p13 X 

CDK4 12q14 X 

CEBPA 19q13.11 X 

CRK 17p13.3 X 

EGFR 7p12.3-p12.1 X 

ERBB2 17q21.1 X 

ETV6 12p13 X 

FEV 2q36 X 

FGFR3 4p16.3 X 

FLT3 13q12 X 

GNAS 20q13.2 X 

HRAS 11p15.5 X 

ITK 5q31-q32 X 

KRAS 12p12.1 X 

LCK 1p35-p34.3 X 

MAFB 20q11.2q13.1 

X 

MDM2 12q15 X 

MEN1 11q13 X 

MPL 1p34 X 

MYC 8q24.12q24.13 

X 

MYCL1 1p34.3 X 

NOTCH1 9q34.3 X 

NTRK1 1q21-q22 X 

PDGFB 22q12.3q13.1 

X 

PDGFRB 5q31-q32 X 

PIK3CA 3q26.3 X 

PIM1 6p21.2 X 

PRKAR1A 17q23-q24 X 

SMO 7q31-q32 X 

102

Table 4.4. Overlap of tumor suppressor genes in frequent regions of genomic alteration 

Gene 

Symbol 

Location Gain Loss UPD 

BRCA1 17q21 X 

BRCA2 13q12 X X 

CDH1 16q22.1 X 

CDKN2A 9p21 X 

CYLD 16q12-q13 X 

FH 1q42.1 X 

FHIT 3p14.2 X 

GSTP1 X 

MAP2K4 17p11.2 X X 

NF1 17q12 X 

PTPN11 12q24.1 X 

RARB 3p24.2 X 

RB1 13q14 X X 

TSC1 9q34 X 

TSC2 16p13.3 X 

WT1 11p13 X 

103

Table 4.5. Cell lines and oncogene loci with homozygous mutation 

Sample Primary Tissue Gene Sample Primary Tissue Gene 

EFM-19 breast PIK3CA NCI-H460 lung KRAS 

NCI-ADR-RES breast ERBB2 NCI-H727 lung KRAS 

OCUB-M breast PIK3CA PC-14 lung EGFR 

AM-38 central nervous system BRAF SHP-77 lung KRAS 

OMC-1 cervix PIK3CA SW1573 lung KRAS 

HEC-1 endometrium KRAS KYSE-450 oesophagus NOTCH1 

ECC4 gastrointestinal tract KRAS OVCAR-5 ovary KRAS 

BE-13 haematopoietic and 

lymphoid tissue 

NOTCH1 AsPC-1 pancreas KRAS 

HEL haematopoietic and 


JAK2 CAPAN-1 pancreas KRAS 

OPM-2 haematopoietic and 


FGFR3 HuP-T4 pancreas KRAS 

LS-174T large intestine CTNNB1 MIA-PaCa-2 pancreas KRAS 

LS-411N large intestine BRAF PANC-08-13 pancreas KRAS 

RCM-1 large intestine KRAS SW1990 pancreas KRAS 

SK-CO-1 large intestine KRAS YAPC pancreas KRAS 

SNU-C2B large intestine KRAS A375 skin BRAF 

SW1463 large intestine KRAS COLO-679 skin BRAF 

SW403 large intestine KRAS CP66-MEL skin NRAS 

SW620 large intestine KRAS GAK skin NRAS 

A427 lung CTNNB2 HT-144 skin BRAF 

A549 lung KRAS MEL-HO skin BRAF 

COLO-668 lung KRAS MEL-JUSO skin HRAS 

COR-L23 lung KRAS SH-4 skin BRAF 

COR-L23 lung RUNX1 SK-MEL-2 skin NRAS 

IA-LM lung KRAS SK-MEL-28 skin BRAF 

LCLC-97TM1 lung KRAS SK-MEL-28 skin EGFR 

LU-65 lung KRAS UACC-62 skin BRAF 

NCI-H1092 lung CTNNB3 RD soft tissue NRAS 

NCI-H1155 lung KRAS BCPAP thyroid BRAF 

NCI-H1395 lung BRAF CAL-62 thyroid KRAS 

NCI-H1793 lung KRAS BB49-HNC upper 

aerodigestive 

tract 

HRAS 

NCI-H2030 lung KRAS 639-V urinary tract PIK3CA 

NCI-H2122 lung KRAS T-24 urinary tract HRAS 

NCI-H2291 lung KRAS UM-UC-3 urinary tract KRAS 

NCI-H2347 lung NRAS 

104

Table 4.6. Summary of homozygous mutation analysis in cancer cell lines 

Gene # of Hz mutations # UPD # Gain # Loss # Neutral 

KRAS 33 10 18 4 1 

BRAF 11 4 6 1 0 

NRAS 5 1 3 1 0 

PIK3CA 4 1 2 1 0 

CTNNB1 3 2 0 1 0 

HRAS 3 2 1 0 0 

EGFR 2 1 1 0 0 

NOTCH1 2 0 1 1 0 

FGFR3 1 0 1 0 0 

JAK2 1 0 1 0 0 

ERBB2 1 1 0 0 0 

RUNX1 1 1 0 0 0 

Total 67 23 34 9 1 

105

Table 4.7. RefSeq genes in focal regions of UPD 

Gene Symbol Chr Gene Symbol Chr 

DAB1 1 TNFAIP8 5 

IL1R1 2 ZNF608 5 

IL1RL2 2 E2F3 6 

GPR39 2 ID4 6 

EPC2 2 MBOAT1 6 

KIF5C 2 B3GAT2 6 

MBD5 2 C6orf191 6 

OSBPL6 2 IMMP2L 7 

RBM45 2 LRRN3 7 

CUL3 2 GRM8 7 

DOCK10 2 ODZ4 11 

FAM124B 2 CASP4 11 

FRG2C 3 DDI1 11 

ZNF717 3 PDGFD 11 

COL29A1 3 CADM1 11 

COL6A6 3 HNT 11 

LPHN3 4 OPCML 11 

ANTXR2 4 ANO2 12 

FGF5 4 KCNA5 12 

PRDM8 4 NTF3 12 

ADH5 4 AEBP2 12 

EIF4E 4 PLEKHA5 12 

METAP1 4 ALG10B 12 

SLC7A11 4 CPNE8 12 

ARRDC3 5 KIF21A 12 

CHD1 5 TMEM132B 12 

RGMB 5 FZD10 12 

FBXL17 5 ZNF10 12 

FER 5 ZNF140 12 

PJA2 5 ZNF268 12 

KCNN2 5 ATP10A 15 

DMXL1 5 PLCB1 20 

106

Table 4.8. Genes overexpressed in focal regions of UPD 

Probe ID Gene Symbol 

107 

Frequency of 

Overexpression 

merck-NM_001508_a_at GPR39 13 

merck-AJ270693_at PLEKHA5 12 

merck-NM_014331_at SLC7A11 10 

merck-NM_001949_at E2F3 10 

merck-C17174_at CUL3 9 

merck-AA651853_at ARRDC3 7 

merck-AF336376_a_at PDGFD 6 

merck-CR624190_a_at KIF21A 6 

merck-NM_016522_at HNT 5 

merck-NM_003854_at IL1RL2 5 

merck-NM_000845_at GRM8 5 

merck-AY358331_s_at HNT 5 

merck-CR625009_at ZNF140 5 

merck-X52332_a_at ZNF10 5 

merck-AK127693_s_at PLCB1 4 

merck-NM_020226_at PRDM8 4


1. Bell DW: Our changing view of the genomic landscape of cancer. J Pathol 2010, 

220(2):231-243. 

2. Chari R, Thu KL, Wilson IM, Lockwood WW, Lonergan KM, Coe BP, Malloff CA, Gazdar 

AF, Lam S, Garnis C et al: Integrating the multiple dimensions of genomic and 

epigenomic landscapes of cancer. Cancer Metastasis Rev 2010. 

3. Zhu X, Dunn JM, Goddard AD, Squire JA, Becker A, Phillips RA, Gallie BL: 

Mechanisms of loss of heterozygosity in retinoblastoma. Cytogenet Cell Genet 

1992, 59(4):248-252. 


15(3):120-128. 

5. Li C, Beroukhim R, Weir BA, Winckler W, Garraway LA, Sellers WR, Meyerson M: Major 

copy proportion analysis of tumor samples using SNP arrays. BMC Bioinformatics 

2008, 9:204. 





Genet 2007, 81(1):114-126. 

7. Andersen CL, Wiuf C, Kruhoffer M, Korsgaard M, Laurberg S, Orntoft TF: Frequent 

occurrence of uniparental disomy in colorectal cancer. Carcinogenesis 2007, 

28(1):38-48. 

8. Darbary HK, Dutt SS, Sait SJ, Nowak NJ, Heinaman RE, Stoler DL, Anderson GR: 

Uniparentalism in sporadic colorectal cancer is independent of imprint status, and 

coordinate for chromosomes 14 and 18. Cancer Genet Cytogenet 2009, 189(2):77- 

86. 

9. Fitzgibbon J, Iqbal S, Davies A, O'Shea D, Carlotti E, Chaplin T, Matthews J, Raghavan 

M, Norton A, Lister TA et al: Genome-wide detection of recurring sites of 

uniparental disomy in follicular and transformed follicular lymphoma. Leukemia 

2007, 21(7):1514-1520. 

10. Kawamata N, Ogawa S, Seeger K, Kirschner-Schwabe R, Huynh T, Chen J, Megrabian 

N, Harbott J, Zimmermann M, Henze G et al: Molecular allelokaryotyping of relapsed 

pediatric acute lymphoblastic leukemia. Int J Oncol 2009, 34(6):1603-1612. 

11. Gondek LP, Tiu R, O'Keefe CL, Sekeres MA, Theil KS, Maciejewski JP: Chromosomal 

lesions and uniparental disomy detected by SNP arrays in MDS, MDS/MPD, and 

MDS-derived AML. Blood 2008, 111(3):1534-1542. 

12. Sanada M, Suzuki T, Shih LY, Otsu M, Kato M, Yamazaki S, Tamura A, Honda H, 

Sakata-Yanagimoto M, Kumano K et al: Gain-of-function of mutated C-CBL tumour 

suppressor in myeloid neoplasms. Nature 2009, 460(7257):904-908. 

13. Tiu RV, Gondek LP, O'Keefe CL, Huh J, Sekeres MA, Elson P, McDevitt MA, Wang XF, 

Levis MJ, Karp JE et al: New lesions detected by single nucleotide polymorphism 

array-based chromosomal analysis have important clinical impact in acute 

myeloid leukemia. J Clin Oncol 2009, 27(31):5219-5226. 

14. Teh MT, Blaydon D, Chaplin T, Foot NJ, Skoulakis S, Raghavan M, Harwood CA, Proby 

CM, Philpott MP, Young BD et al: Genomewide single nucleotide polymorphism 

microarray mapping in basal cell carcinomas unveils uniparental disomy as a key 

somatic event. Cancer Res 2005, 65(19):8597-8603. 

15. Suzuki M, Kato M, Yuyan C, Takita J, Sanada M, Nannya Y, Yamamoto G, Takahashi A, 

Ikeda H, Kuwano H et al: Whole-genome profiling of chromosomal aberrations in 

hepatoblastoma using high-density single-nucleotide polymorphism genotyping 

microarrays. Cancer Sci 2008, 99(3):564-570. 

108

16. Walsh CS, Ogawa S, Scoles DR, Miller CW, Kawamata N, Narod SA, Koeffler HP, 

Karlan BY: Genome-wide loss of heterozygosity and uniparental disomy in 

BRCA1/2-associated ovarian carcinomas. Clin Cancer Res 2008, 14(23):7645-7651. 

17. Kralovics R, Guan Y, Prchal JT: Acquired uniparental disomy of chromosome 9p is 

a frequent stem cell defect in polycythemia vera. Exp Hematol 2002, 30(3):229-236. 

18. Grand FH, Hidalgo-Curtis CE, Ernst T, Zoi K, Zoi C, McGuire C, Kreil S, Jones A, Score 

J, Metzgeroth G et al: Frequent CBL mutations associated with 11q acquired 

uniparental disomy in myeloproliferative neoplasms. Blood 2009, 113(24):6182- 

6192. 

19. Fitzgibbon J, Smith LL, Raghavan M, Smith ML, Debernardi S, Skoulakis S, Lillington D, 

Lister TA, Young BD: Association between acquired uniparental disomy and 

homozygous gene mutation in acute myeloid leukemias. Cancer Res 2005, 

65(20):9152-9154. 





21. Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, Speed TP: 

Exploration, normalization, and summaries of high density oligonucleotide array 

probe level data. Biostatistics 2003, 4(2):249-264. 

22. Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, 

Ge Y, Gentry J et al: Bioconductor: open software development for computational 

biology and bioinformatics. Genome Biol 2004, 5(10):R80. 

23. Ding L, Getz G, Wheeler DA, Mardis ER, McLellan MD, Cibulskis K, Sougnez C, 

Greulich H, Muzny DM, Morgan MB et al: Somatic mutations affect key pathways in 

lung adenocarcinoma. Nature 2008, 455(7216):1069-1075. 

24. Futreal PA, Coin L, Marshall M, Down T, Hubbard T, Wooster R, Rahman N, Stratton 

MR: A census of human cancer genes. Nat Rev Cancer 2004, 4(3):177-183. 






Nature 2007, 450(7171):893-898. 




28. Kendall J, Liu Q, Bakleh A, Krasnitz A, Nguyen KC, Lakshmi B, Gerald WL, Powers S, 

Mu D: Oncogenic cooperation and coamplification of developmental transcription 

factor genes in lung cancer. Proc Natl Acad Sci U S A 2007, 104(42):16663-16668. 

29. Garnis C, Lockwood WW, Vucic E, Ge Y, Girard L, Minna JD, Gazdar AF, Lam S, 

MacAulay C, Lam WL: High resolution analysis of non-small cell lung cancer cell 

lines by whole genome tiling path array CGH. Int J Cancer 2006, 118(6):1556-1564. 

30. Borczuk AC, Gorenstein L, Walter KL, Assaad AA, Wang L, Powell CA: Non-small-cell 

lung cancer molecular signatures recapitulate lung developmental pathways. Am J 

Pathol 2003, 163(5):1949-1960. 

31. Cooper CS, Nicholson AG, Foster C, Dodson A, Edwards S, Fletcher A, Roe T, Clark J, 

Joshi A, Norman A et al: Nuclear overexpression of the E2F3 transcription factor in 

human lung cancer. Lung Cancer 2006, 54(2):155-162. 

32. Goransson H, Edlund K, Rydaker M, Rasmussen M, Winquist J, Ekman S, Bergqvist M, 

Thomas A, Lambe M, Rosenquist R et al: Quantification of normal cell fraction and 

copy number neutral LOH in clinical lung cancer samples using SNP array data. 

PLoS One 2009, 4(6):e6057. 

109

33. Garnis C, Coe BP, Lam SL, MacAulay C, Lam WL: High-resolution array CGH 

increases heterogeneity tolerance in the analysis of clinical samples. Genomics 

2005, 85(6):790-793. 

34. Raghavan M, Lillington DM, Skoulakis S, Debernardi S, Chaplin T, Foot NJ, Lister TA, 

Young BD: Genome-wide single nucleotide polymorphism analysis reveals 

frequent partial uniparental disomy due to somatic recombination in acute 

myeloid leukemias. Cancer Res 2005, 65(2):375-378. 

35. Haruta M, Arai Y, Sugawara W, Watanabe N, Honda S, Ohshima J, Soejima H, 

Nakadate H, Okita H, Hata J et al: Duplication of paternal IGF2 or loss of maternal 

IGF2 imprinting occurs in half of Wilms tumors with various structural WT1 

abnormalities. Genes Chromosomes Cancer 2008, 47(8):712-727. 

36. Bjornsson HT, Albert TJ, Ladd-Acosta CM, Green RD, Rongione MA, Middle CM, 

Irizarry RA, Broman KW, Feinberg AP: SNP-specific array-based allele-specific 

expression analysis. Genome Res 2008, 18(5):771-779. 

37. Gimelbrant A, Hutchinson JN, Thompson BR, Chess A: Widespread monoallelic 

expression on human autosomes. Science 2007, 318(5853):1136-1140. 

38. Palacios R, Gazave E, Goni J, Piedrafita G, Fernando O, Navarro A, Villoslada P: 

Allele-specific gene expression is widespread across the genome and biological 

processes. PLoS One 2009, 4(1):e4150. 

39. Zhang K, Li JB, Gao Y, Egli D, Xie B, Deng J, Li Z, Lee JH, Aach J, Leproust EM et al: 

Digital RNA allelotyping reveals tissue-specific and allele-specific gene 

expression in human. Nat Methods 2009, 6(8):613-618. 

40. Alvarez CJ, Lodeiro M, Theodoropoulou M, Camina JP, Casanueva FF, Pazos Y: 

Obestatin stimulates Akt signalling in gastric cancer cells through beta-arrestinmediated 

epidermal growth factor receptor transactivation. Endocr Relat Cancer 

2009, 16(2):599-611. 

41. Dittmer S, Sahin M, Pantlen A, Saxena A, Toutzaris D, Pina AL, Geerts A, Golz S, 

Methner A: The constitutively active orphan G-protein-coupled receptor GPR39 

protects from cell death by increasing secretion of pigment epithelium-derived 

growth factor. J Biol Chem 2008, 283(11):7074-7081. 

42. Lo M, Ling V, Wang YZ, Gout PW: The xc- cystine/glutamate antiporter: a mediator 

of pancreatic cancer growth with a role in drug resistance. Br J Cancer 2008, 

99(3):464-472. 

43. Guan J, Lo M, Dockery P, Mahon S, Karp CM, Buckley AR, Lam S, Gout PW, Wang YZ: 

The xc- cystine/glutamate antiporter as a potential therapeutic target for small-cell 

lung cancer: use of sulfasalazine. Cancer Chemother Pharmacol 2009, 64(3):463- 

472. 

44. Wang Z, Kong D, Li Y, Sarkar FH: PDGF-D signaling: a novel target in cancer 

therapy. Curr Drug Targets 2009, 10(1):38-41. 

45. Kinameri E, Inoue T, Aruga J, Imayoshi I, Kageyama R, Shimogori T, Moore AW: Prdm 

proto-oncogene transcription factor family expression and interaction with the 

Notch-Hes pathway in mouse neurogenesis. PLoS One 2008, 3(12):e3859. 

46. Rhead B, Karolchik D, Kuhn RM, Hinrichs AS, Zweig AS, Fujita PA, Diekhans M, Smith 

KE, Rosenbloom KR, Raney BJ et al: The UCSC Genome Browser database: update 

2010. Nucleic Acids Res 2010, 38(Database issue):D613-619. 

110

Chapter 5: Integrating the multiple dimensions of genomic and 

epigenomic landscapes of cancer 4 

4 Sections 5.1 to 5.3, 5.4.1 to 5.4.4, 5.5 and 5.6 of this chapter has been published. Chari R, 

Thu KL, Wilson IM, Lockwood WW, Lonergan KM, Coe BP, Malloff CA, Gazdar AF, Lam S, 

Garnis C, MacAulay CE, Alvarez CE, Lam WL. (2010) Integrating the multiple dimensions of 

genomic and epigenomic landscapes of cancer. Cancer and Metastasis Reviews, 29(1):73-93. 

doi: 10.1007/s10555-010-9199-2. Sections 5.4.5 to 5.4.6 were not published previously. 

111


In the past decade, advancements in genome profiling technologies have greatly improved our 

ability to understand the landscape of cancer genomes. From the emergence of array based 

comparative genomic hybridization (CGH) and spectral karyotyping (SKY) to the current state of 

next generation sequencing (NGS), the improvement in resolution at which the genome can be 

described has been over a million fold [1-6]. Likewise, the recent development of integrative 

platforms to relate multiple dimensions of DNA features (such as copy number, allelic status, 

sequence mutations, and DNA methylation) to gene expression pattern, has dramatically 

improved our ability to identify causal genetic events and decipher their downstream 

consequences in the context of gene networks and biological functions [7, 8] (Table 5.1). 

Landmark events in cancer genomics, from the launch of Cancer Genome Anatomy Project at 

the beginning of the decade to the recent publications of complete cancer genome sequences, 

are highlighted in Figure 5.1 [3-6, 8-43]. 

Multiple levels of genetic and epigenetic disruption are instrumental to cancer development, 

whereby specific genes may be altered by a variety of mechanisms. For example, the tumor 

suppressor CDKN2A can be inactivated through copy number loss, DNA hypermethylation, or 

sequence mutation. These mechanisms of disruption can occur in a tumor-specific manner or, 

may occur concurrently in the same tumor, i.e. a two hit scenario. Moreover, in the former 

situation, if a given gene or pathway's frequency of alteration is low when examined by one 

mechanism or dimension, it is likely the gene/pathway would be overlooked by the analysis. 

However, when multiple dimensions of disruption are considered in the analyses, alteration of 

the gene in question may be detected at a high frequency, albeit at low frequencies by any one 

mechanism. This illustrates the need for and the benefit of integrative analytical approaches. In 

this article, we discuss the impact of multi-dimensional genomic analyses on our view of the 

cancer genome landscape, and the contribution of such new knowledge to our understanding of 

cancer progression and metastasis. 

112

5.2 Genomic alterations 

5.2.1 Chromosomal aberrations 

Chromosomal aberrations and rearrangements, such as translocations and gains/losses of 

whole or portions of chromosome arms are detected through direct examination using molecular 

cytogenetic techniques such as G-banding, SKY, fluorescence in situ hybridization (FISH) and 

CGH [2, 44-48]. The manifestation of such alterations are generally attributed to mitotic errors, 

where centrosomal aberrations and telomere dysfunction play key causative roles [49-53]. 

Aberrations such as gains and losses have been further refined using technologies such as 

microarray CGH (see below). While primarily associated with different types of leukemia and 

lymphomas, recent genomic studies have identified translocations in epithelial tumors such as 

prostate and lung cancer [54-61]. A compilation of cumulative cytogenetic data from three main 

sources - NCI/NCBI SKY/M-FISH & CGH Database, NCI Mitelman Database of Chromosome 

Aberrations in Cancer, and NCI Recurrent Aberrations in Cancer – are now integrated into 

NCBI's Entrez system as Cancer Chromosomes [62] (Table 5.2). 

5.2.2 Gene dosage, allelic imbalance, mutational status 

Gene dosage. Genomic DNA copy number alterations are a prominent mechanism of gene 

disruption that contributes to tumor development [63]. Segmental amplification may lead to an 

increase in gene and protein expression of oncogenes, while deletions may lead to 

haploinsufficiency or the loss of expression of tumor suppressor genes. Since its development 

in the mid 1990s, advances in microarray-based CGH technology have dramatically increased 

genome coverage and target density, improving both the resolution and sensitivity of detection 

of copy number alterations [64, 65]. The first genome-wide array CGH analysis utilized cDNA 

microarrays originally designed for gene expression profiling [66]. Since these first 

experiments, whole genome tiling path arrays with tens of thousands of bacterial artificial 

chromosome (BAC) clones, oligonucleotide (25-80 bp nucleotide probes) and single nucleotide 

113

polymorphism (SNP) arrays with over one million DNA elements and the essential 

bioinformatics tools for visualization and analysis of high density array CGH data have been 

developed (Figure 5.1) [7, 33, 67-71]. These innovations have enabled increasingly precise 

mapping of the boundaries and magnitude of genetic alterations throughout the genome in a 

single experiment, greatly increasing our understanding of the cancer genome landscape in the 

context of DNA copy number [33, 72-76]. While early attempts have been made utilizing 

sequence-based approaches [77-80], recent studies have begun to illustrate the improvement in 

detection resolution through the advances in high throughput sequencing technologies [6, 11, 

13, 14]. The popularity of genome sequencing will depend on further cost reduction in data 

generation and major advancements in analysis [81]. 

Copy number variation. The discovery of a vast abundance of germ line segmental DNA copy 

number variation (CNV) in the normal human population has not only provided a baseline for 

interpretation of cancer genome data, but also highlighted the need for comparison against 

paired normal tissue [18, 19, 31, 32, 82-89]. Moreover, it has been shown that many of the 

reported CNVs overlap with loci involved with sensory perception and more importantly, disease 

susceptibility. While the role of CNV in cancer is not well understood, a recent study showed 

that these regions are more susceptible to genomic rearrangement and may initiate subsequent 

alterations during tumorigenesis [90]. Moreover, CNV at 1q21.1 was recently shown to be 

associated with neuroblastoma and implicated NBPF23, a new member of the Neuroblastoma 

Breakpoint Family, in tumorigenesis [91]. A database of all known CNVs is available at 

http://projects.tcag.ca/variation [31]. In addition, as copy number profiles of cancer genomes 

accumulate, hotspots for amplification and deletion are becoming evident, and signature 

alterations associated with specific diseases and cancer histologic subtypes are emerging [92- 

96]. The manifestation of “oncogene addiction” through lineage specific DNA amplification is a 

case in point [38, 39, 97-100]. 

114

Allelic status. Single nucleotide polymorphism (SNP) arrays are best known for their application 

in genome wide association studies (GWAS), where the correlation of haplotype with phenotype 

implicates disease susceptibility [101, 102]. SNP array platforms have shown tremendous 

advances in resolution, with the number of SNPs that can be simultaneously measured 

increased by 1000-fold since initial development. Currently, for example, the Affymetrix SNP 

6.0 array platform measures 1.8 million elements representing 906,600 SNP elements and > 

946,000 CNV elements. Likewise, on the Illumina HumanOmni1 platform, over 1,000,000 sites 

(representing a mixture of SNP and CNV elements) can be simultaneously assessed. In 

addition to their application in GWAS, SNP arrays can also be used to detect somatic 

alterations and when applied in this context, can allow for the simultaneous detection of copy 

number alteration and allele imbalance in tumor genomes. In the example in Figure 5.2, when 

the SNP array profile of a lung cancer genome is compared against that of its paired non- 

cancerous lung tissue, it is not only possible to distinguish regions of allelic balanced copy 

neutrality (Figure 5.2a) from allelic imbalance (Figure 5.2b, 5.2c), but also regions of allelic 

imbalance due to segmental DNA copy number alteration (Figure 5.2b) from those without 

change in total copy number (Figure 5.2c). 

Mutational profiling and whole genome sequencing. In cancer, oncogenes are thought to 

harbor mutations which lead to increased protein expression or constitutive protein activation 

while tumor suppressor genes are thought to harbor mutations which are inactivating, either 

through total loss of protein expression or expression of mutant, non-functional protein. In 

addition, activating and inactivating mutations can also be accompanied by changes in gene 

dosage or allele status (see below). Traditionally, mutation screening has been focused on 

specific oncogene and tumor suppressor loci. With the availability of newer and cheaper 

sequencing technologies [103], recent studies have expanded from single gene analyses to 

genome-wide screens [6, 11, 13, 14, 104]. For example, in studies using small cell lung cancer 

and melanoma cell lines, tens of thousands of somatic mutations were identified in each cell 

line, with a proportion of these mutations being attributed to cigarette smoke (G to T 

115

substitutions) and UV exposure (C to T), respectively [4, 5]. It will be interesting to see if other 

cancers have such mutation signatures. Another observation made in both studies was that the 

uneven distribution of mutations suggests that DNA sequence integrity is largely maintained by 

transcription-associated DNA repair. While these and future studies will uncover a vast number 

of mutations, the contribution of those mutations to tumorigenesis will need to be determined 

[105, 106]. 

5.2.3 Genomic landscape: Gains, losses and uniparental disomy 

Individually, the study of genomic dimensions has yielded a global description of cancer 

genomes in terms of gene dosage, allelic status and somatic mutation. Collectively, however, 

the integration of these three dimensions has brought two concepts to the forefront: allele 

specific copy number alterations and uniparental disomy (UPD) (Figure 5.2). Typically, the 

relationship between somatic mutation and allele specific copy number alterations have been 

associated with tumor suppressor genes (e.g. RB1 and TP53) whereby mutation is combined 

with loss to achieve bi-allelic inactivation [107, 108]. However, recent studies have shown 

preferential amplification of alleles encoding mutated oncogenes as well [109-114]. In non- 

small cell lung cancer, mutant allele specific imbalance (MASI) is frequently present in mutant 

EGFR and KRAS tumor cells, and is associated with increased mutant allele transcription and 

gene activity [114]. 

UPD is the presence of two copies of a chromosome segment from one parent, and the 

absence of that DNA from the other parent. Somatic UPD, also known as copy neutral LOH 

(CNLOH), results in loss of heterozygosity (tumor versus normal), without a change in total DNA 

copy number [115-117]. UPD is observed at tumor suppressor gene loci whereby upon loss of 

the wild type allele, the mutated allele is duplicated resulting in a diploid state with homozygous 

mutation of the target gene [118]. Interestingly, UPD events are also detected at mutated 

oncogenes [114, 119-121]. Until recently, due to limitations in the resolution of genomic array 

platforms, the prevalence of this event has been widely underestimated and underappreciated. 

116

Recent studies have shown that UPD events are frequently observed in tumor genomes, with 

most of the findings reported from hematological malignancies [122-131]. Our genome wide 

analysis of segmental gain, loss and UPD in the T47D breast cancer cell line genome identified 

that a significant portion of the genome exhibits UPD, rivaling the proportion of the genome 

affected by segmental gain and loss, and highlighting the potential of UPD as a prominent 

mechanism of gene disruption in epithelial cancer (Figure 3). Interestingly, PIK3CA and TP53 

mutations in T47D are noted in the Catalogue of Somatic Mutations in Cancer [132]. Integrative 

analysis at these loci detected copy number increase at PIK3CA and copy number loss at TP53 

illustrating the MASI concept described above (Figure 3). 

Somatic UPD also exists at genes without mutation. The potential significance of this somatic 

event is not readily apparent, but it raises the intriguing possibility of allelic conversion of 

epigenetic status [117, 122, 133]. 

5.3 Epigenomic alterations 

5.3.1 The cancer methylome 

Abnormal DNA methylation patterns occur in cancer, whereby focal hypermethylation at many 

CpG islands is evident in a background of global DNA hypomethylation [134-137]. Broad 

hypomethylation may lead to genomic instability, while hypermethylation of CpG islands 

silences transcription of specific genes [136, 138-140]. Non-random methylation of multiple 

CpG islands observed in colon cancer led to the discovery of CpG island methylator phenotype 

(CIMP), which is causally linked to microsatellite instability via silencing of the mismatch repair 

gene, MLH1 [141-143]. 

The determination of DNA methylation status relies on the ability to discriminate between 

methylated and unmethylated cytosines. This is achieved by exploiting methylation- 

sensitive/insensitive isoschizomer restriction-enzyme pairs [144-150], chemical conversion of 

unmethylated cytosine to uracil [151-156], and the affinity for methylated DNA of specially 

117

developed antibodies and methylated-DNA binding proteins [24, 157-163]. Several 

computational methods have been developed for deriving approximations of actual methylation 

levels from the relative levels generated by most microarray and locus specific sequencing 

assays [147, 162, 164, 165]. However, it is important to note that CpG targets represented on 

microarrays may or may not be the only elements controlling gene expression. Recently, it was 

shown that in the human colon cancer methylome sequences up to 2 kb away from CpG 

islands, termed CpG shores, exhibited more methylation than CpG islands and had greater 

influence on gene expression than CpG islands [166]. Furthermore, while excess promoter 

methylation is typically associated with transcriptional repression, the loss of required 

methylation within gene bodies, proximal to promoters, can have the same effect [167]. DNA 

methylation of epigenetic neighborhoods in the megabase size range has also been reported 

[168]. Validation of methylation-mediated control of gene-specific expression, and evaluation of 

biological significance, can be achieved via pharmacologic manipulation of DNA methylation, for 

example by 5-azacytidine treatment, to relieve methylation silencing and invoke re-expression 

[20, 169]. 

The first single-base-resolution maps of the human methylome have recently been generated 

by sequencing of bisulfite converted DNA from human embryonic stem cells and fetal fibroblasts 

[12, 170]. This landmark study will greatly advance the analysis of DNA methylation by 

providing whole genome reference maps of methylation in these specific cells. However, it is 

well known that DNA methylation is tissue specific and that it changes throughout development 

thus, methylome maps for all tissues at various stages of development may be necessary to 

provide adequate maps of 'normal' methylation patterns for use in deciphering aberrant 

methylation patterns characteristic of tumors [171-176]. In recognition of this, the Human 

Epigenome Project was launched in 2004 to map the methylomes of all major human tissues 

[177]. 

118

5.3.2 Integration of cancer genomic and epigenomic events 

DNA methylation and genomic instability. Cancer-specific aberrant DNA methylation is 

associated with reduced genomic stability and subsequent copy number alterations, including 

preferential loss of certain imprinted alleles (LOI) [178-184]. Mechanistically, this instability may 

be related to the susceptibility of hypomethylated DNA to undergo inappropriate recombination 

events [185]. Another mechanism known to negatively impact genomic integrity in lung cancer 

is the relaxation of transposable element control that is mediated by DNA methylation [186-190]. 

DNA hypomethylation and DNA amplification. Preliminary evidence of specific 

demethylation of somatic segmental amplifications (or amplicons) has been put forth in lung 

cancer, perhaps representing a novel mechanism of aberrant oncogene activation [189, 191]. 

Further studies using large-scale sequencing of bisulfite treated DNA will help to clarify this 

phenomenon [12]. Hypomethylation has also been implicated in the formation of specific copy 

number alterations in glioblastoma multiforme [192]. One potentially interesting application for 

DNA methylation profiling of cancer amplicons such as these, is in the discrimination between 

"driver" and "passenger" genes within the amplified sequence. It may be that DNA methylation 

within the promoters or gene bodies of these genes is responsible for the lack of uniform 

overexpression of genes residing within amplicons. 

DNA hypermethylation and copy number loss. The relationship between DNA 

hypermethylation and allelic loss is well documented. Tumor suppressor genes are frequently 

found in regions of common LOH, and these same TSGs are frequently found to be 

hypermethylated, perhaps best exemplified by the FHIT gene on chromosome 3p [193]. 

Although it is unclear whether loss or hypermethylation occurs first, both are known to be very 

early events in tumorigenesis preceding any histologic alterations [194-196]. With the advent of 

high resolution genome-wide technologies it has become possible to comprehensively search 

for genes that are inactivated by both mechanisms simultaneously [197]. 

119

Histone modification states. While DNA methylation and gene dosage profiling technologies 

have become accessible, technologies for global assays of other key epigenetic marks including 

histone modifications are not widely available. One of the main challenges to conducting the 

highest quality studies of genome wide chromatin-immunoprecipitation on microarray (ChIP- 

chip) or on sequencing platform (ChIP-seq) experiments is the requirement of high quality DNA 

from pure cells – which essentially means growing cells in culture. It is thus difficult to analyze 

these dimensions from clinical specimens. However, much has been learned from studies of 

the relationship between different histone modification states and transcriptional activation or 

repression in model systems. Such examples utilizing ChIP-chip include: cell or context 

specific histone modification patterns related to cell or context specific gene expression; histone 

3 lysine 27 (H3K27) trimethylation patterns associated with prostate, lung and breast cancers; 

and H3K9 and H3K79 modification patterns in leukemia [198-204]. Examples utilizing ChIP-seq 

include: the analysis of the growth inhibition program of the androgen receptor, and the 

chromatic interaction network of the estrogen receptor [205, 206]. 

5.4 Relating genetic and epigenetic events to changes in the 

transcriptome through integrative analysis 

Aberrations in individual genetic or epigenetic dimensions are prominent across various cancer 

types, culminating in changes to the transcriptome. However, for a given gene, most of the 

events documented previously, such as copy number amplification, homozygous deletion, 

somatic mutation, or DNA hypermethylation, do not occur in 100% of tumors for a given cancer 

type. Moreover, it has been observed that the same gene may be activated or inactivated by 

different mechanisms. Since most of the studies described above analyzed single DNA 

dimensions, it is likely many genes would be overlooked due to a low frequency of alteration in 

a single dimension; the same gene may be detected at a high frequency when multiple 

dimensions are considered. Thus, analysis of more dimensions may reveal higher frequency 

120

gene-specific disruption with corresponding transcriptome aberrations for particular cancer 

types, as would be expected for genes causative to cancer development. 

5.4.1 Multiple mechanisms of gene disruption 

Expression profiling studies have been instrumental in detecting genes dysregulated in cancer 

[207-209]. However, aberrant expression of some genes may simply reflect incidental genome 

instability or secondary dysregulation. Global gene expression profiling alone may not 

distinguish causal events and bystander changes. One of the first studies to relate gene 

expression changes with gene dosage status on a global scale was a parallel analysis of DNA 

and mRNA [66, 210]. The same cDNA microarray platform was used to investigate impact of 

DNA copy number alterations on the expression of over 6,500 genes. This study determined 

that 62% of genes located within regions of DNA amplification showed elevated expression in 

breast cancer. Subsequent studies in other cancer types revealed a broad range in the 

correlation between increased gene dosage and expression levels for protein coding genes 

(19% to 62%) [92, 207, 210-213]. Studies integrating gene dosage and gene expression have 

identified cancer subtype-specific pathway activation and signatures associated with clinical 

outcome [96, 214-217]. In addition, when examining known disease-relevant pathways, it has 

been shown that even though individual components of a pathway are disrupted at a low 

frequency, collectively, these alterations can result in frequent disruption of a given pathway [16, 

92]. Similarly, alterations in DNA methylation or histone modification status can also affect gene 

expression and have subsequent pathway level consequences (see above). 

5.4.2 Multiple mechanisms of disrupting non-coding RNA levels 

Segmental DNA copy number alterations also affect the expression of non-coding RNAs 

(ncRNA) [218-222]. MicroRNAs (miRNA) have been shown to have a significant role in cancer 

development with specific miRNAs implicated in a number of different cancer types [26, 223- 

225]. Specific miRNA expression signatures are associated with critical steps in tumor initiation 

and development including cell hyperproliferation, angiogenesis, tumor formation and 

121

metastasis [226]. High throughput analysis of microRNAs has been of interest and microarrays 

have been developed to assess essentially all annotated microRNAs. To date, >700 miRNAs 

have been annotated in the genome (http://mirdb.org/miRDB/statistics.html, [227]), with more 

likely to be discovered. For example, we recently demonstrated that a deletion on chromosome 

5q leads to the reduced expression of two miRNAs that are abundant in hematopoietic 

stem/progenitor cells. This study revealed haploinsufficiency and reduced expression of miR- 

145 and miR-146a as mediators of a subtype of myelodysplastic syndrome [221]. Although the 

genomic loss and underexpression implicates a tumor-suppressive role for these specific 

miRNAs, others undergo activating genomic alterations and elevated expression and hence are 

thought to be oncogenic [228, 229]. 

Just as copy number alterations can alter miRNA activity, epigenetic alterations have also been 

shown to affect miRNA expression [230-232]. Aberrant methylation of miRNAs has been 

reported in a variety of cancer types, and the disruption of epigenetically-mediated miRNA 

control has been shown to have oncogenic effects due to downstream gene deregulation [233]. 

For example, abnormal DNA methylation of miRNAs has been associated with tumor 

metastasis, leading to the appreciation of a group of metastasis-related miRNAs [229]. 

5.4.3 Multi-dimensional integration of genome, epigenome, and transcriptome 

Large scale initiatives. Since multiple genomic/epigenomic mechanisms can influence gene 

expression and lead to disruption of a given function, an integrative multi-dimensional analysis 

is necessary for a more comprehensive understanding of the cancer phenotype (Figure 4). 

Specific programs and initiatives such as those by The Cancer Genome Atlas (TCGA) project 

and the cancer Biomedical Informatics Grid (caBIG) enable parallel and multi-dimensional 

analysis of cancer genomes [8, 16] (Table 5.2). Recently, studies in glioblastoma and 

osteosarcoma have shown that integrative genomic and epigenomic approaches can indeed 

reveal the specific genetic pathways involved in different cancers [16, 234]. 

122

Gene disruption by multiple mechanisms. One of the two key reasons for using an 

integrative approach is the ability to detect critical genes that are disrupted by multiple 

mechanisms across a sample set, but are disrupted at a low frequency by any one mechanism. 

These genes would have been overlooked in previous, single dimensional studies. The second 

key advantage of integrative approaches is the ability to identify genes that are simultaneously 

disrupted by multiple mechanisms -- two hits -- in a single sample. Using a dataset comprised 

of DNA copy number, allelic status, DNA methylation, and gene expression profiles from ten 

lung adenocarcinomas and matched non-malignant tissue controls, we illustrate these benefits 

below. 

If gene expression changes are a consequence of alterations at the DNA level, then a higher 

proportion of the observed expression changes can be directly attributed to a defined causal 

event when multiple types of DNA alterations are examined (Figure 5.5a). While some 

samples have over 70% of the expression associated with DNA level changes (Sample 7, 

Sample 8), other samples have only 30% (Sample 5, Sample 9). Additionally, consequential to 

associating more gene expression changes with DNA level changes within a sample, more 

disrupted genes are detected, and in turn, more disrupted pathways are identified across a 

sample set (Figure 5.5b, 5.5c). In fact, in our example, nearly five times as many genes 

(~1100 compared to ~200) are detected as disrupted in at least 50% of the samples when we 

account for multiple mechanisms of disruption (vs. one mechanism alone) (Figure 5.5c). This 

result illustrates that without using an integrative approach, many potentially important genes 

would be dismissed as they are disrupted by low frequency events when a single DNA 

dimension is analyzed. This also holds true at the pathway level when the identified genes are 

grouped based on their biological function (Figure 5.5d). For example, the Hepatic 

Fibrosis/Hepatic Stellate Cell Activation pathway and the RAR Activation pathway, which are 

identified when all DNA dimensions are considered, would not be detected as significantly 

altered when using individual DNA dimensions alone. 

123

Implications on sample size requirements. In the example above, we illustrate that a 

significant number of genes and pathways exhibit a low frequency of disruption when examining 

single dimensions (and thus would be overlooked) but, indeed exhibit a high frequency of 

disruption when multiple dimensions are considered (Figure 5.5). Notably, these findings imply 

that integrative multi-dimensional analysis of individual samples may directly impact the cohort 

sample size required for gene discovery on the basis of frequency of disruption (Figure 5.5e). 

Reduction in sample size requirements means that one can extend this approach to situations 

involving rare specimens where accrual of hundreds of samples in a reasonable timeframe is 

not possible. Moreover, reduced sample sizes are particularly applicable to familial cancers or 

to isolated populations at increased risk for specific cancers. 

Bi-allelic gene disruption. Two-hit bi-allelic inactivation of genes and high level gene 

amplifications are typically considered to be causal mechanisms that inflict gene expression 

changes. When examining multiple DNA dimensions, concerted bi-allelic disruption of a gene in 

the same sample can be readily identified; copy number loss with hypermethylation resulting in 

underexpression, or copy number gain with hypomethylation and overexpression are examples. 

Indeed, we do identify genes harboring concerted disruptions using the same lung 

adenocarcinoma dataset mentioned above. The MUC1 locus exhibits concurrent copy number 

increase with hypomethylation and overexpression (Figure 5.4). MUC1 has previously been 

shown to be important in lung and breast cancers and is currently a target for therapeutic 

intervention [235-237]. Collectively, we have demonstrated how an integrative, multi- 

dimensional approach can be utilized for cancer gene and pathway discovery. 

5.4.4 Disruption of multiple components in biological pathways 

We described above how an integrative, multi-dimensional approach improves the detection of 

disrupted genes, especially those affected by multiple low frequency mechanisms. This 

concept can be extended to identify biological pathways, where multiple pathway components 

are disrupted at low frequencies (see above; Figure 5.5d). The EGFR signaling pathway is a 

124

well documented dysregulated component of lung cancer. Using the same multi-dimensional 

profiling dataset from Figure 5 above, seven genes were detected with gene dosage alteration 

at a frequency ≥30%. However, when we considered alterations in gene dosage, allelic status, 

DNA methylation and somatic mutation collectively (for KRAS and EGFR only), 18 genes in the 

pathway were identified to be altered at ≥30% frequency (Figure 5.6). The detection of the 

additional 11 genes illustrates the benefit of employing an integrative approach and extends the 

sample size reduction argument to the pathway level. 

5.4.5 Identification of a novel gene involved with EGFR signaling deregulated in 

adenocarcinoma 

In the section above, I have shown that more of the well known components are frequently 

altered when we examine multiple DNA dimensions as opposed to a single DNA dimension, 

such as DNA copy number, alone. When this analysis is expanded to include more genes 

based on literature evidence, I found that the most frequently disrupted gene is signal-regulatory 

protein alpha (SIRPA) (Figure 5.7). 

SIRPA has been shown to be down-regulated when EGFR is activated in glioblastoma and up- 

regulated when EGFR is suppressed [238, 239]. SIRPA has also shown to be a tumor 

suppressor gene in multiple cancer types including liver and breast cancer [240, 241]. 

Moreover, in the resting lung, SIRPA has been thought to modulate the inflammatory response 

through SHP-1 (also known as PTPN6) and eventually, NFKB [242]. While most studies have 

documented the association of SIRPA with SHP-2 (also known as PTPN11) [243-245], few 

studies have shown the association of SIRPA with SHP-1. 

To discern the association of SIRPA with SHP-1 and SHP-2, mutual information network 

analysis was utilized [246, 247]. Briefly, using our gene expression dataset and a publicly 

available dataset [248], Affymetrix exon array datasets were normalized separately using the 

aroma.affymetrix package (Bengtsson et al 2008 Berkeley). Subsequently, each dataset was 

125

analyzed using the "minet" package in the Bioconductor suite in R [247, 249]. From these 

analysis, for each gene, a score between each gene and every other gene is calculated. The 

top 5% of gene-gene interactions (based on the score) from each dataset were retained and 

those interactions which were in the top 5% of both analyses were retained. Finally, gene-gene 

interactions involving SIRPA were extracted, resulting in a total of 310 genes found to highly 

correlate with SIRPA expression (Table 5.3). Within this list of genes, PTPN6 was present and 

PTPN11 was not, suggesting that SIRPA is likely involving PTPN6 rather than PTPN11 in lung 

adenocarcinoma. 

5.4.6 Prevalence of SIRPA deregulation and association with clinical characteristics 

Given that the sample set we examined comprised of only 10 samples, we then wanted to 

assess the expression in a larger panel of samples to validate the frequency of underexpression 

observed in the initial set. Using 59 lung adenocarcinoma and matched non-malignant sample 

pairs, the prevalence of SIRPA underexpression was assessed and the correlation of SIRPA 

and PTPN6 was re-evaluated. It was found that 47/59 pairs exhibited at least a 1.5-fold 

reduction of SIRPA in tumors as compared to matched non-malignant tissue, representing 

~80% of tumors assessed (Figure 5.8a). In addition, correlating SIRPA and PTPN6 expression 

using a Pearson correlation, a correlation coefficient of 0.907 was found (Figure 5.8b). 

It should also be noted that there was a small number of samples which exhibited 

overexpression of SIRPA. This finding was somewhat unexpected given the high prevalence of 

underexpression observed in the initial dataset. However, the initial ten tumors were from 

individuals who were former smokers and the set of 59 tumors was comprised of 23 current 

smokers, 21 former smokers and 15 never smokers. When stratifying the differential gene 

expression based on smoking status, overexpression was not observed in any of the current or 

former smokers, while overexpression was only observed in a subset of never smokers (Figure 

5.8c). 

126

Finally, using publicly microarray datasets with patient survival information [250-252], Kaplan- 

Meier analysis was performed on each of these datasets based on SIRPA expression levels. 

The association was deemed significant if the gene had a p-value ≤ 0.05 based on a Mantel- 

Cox (or log ranks) test. Two of the five datasets showed a statistically significant association 

between SIRPA expression levels and overall patient survival with an additional two datasets 

close to significance with p values ≤ 0.18 (Figure 5.9). 

5.5 Tracking clonal expansion in spatial dimensions 

Delineating the clonal relationship between multiple tumors in the same patient is relevant not 

only to clinical management of disease but also to the understanding of metastasis. Multiple 

tumors in the same patient may not necessarily share an identical genomic profile. The 

similarities and differences in genomic landscape between tumors are quantifiable and therefore 

can be used for delineating relatedness. Whole genome comparison based on array CGH 

profiles is a new tool for distinguishing metastatic from primary synchronous carcinomas. A 

multitude of genomic features, for example the boundaries of segmental deletions, are used to 

delineate the presence and the sequence of events in clonal evolution [253-261]. 

Furthermore, signature genetic alterations can be used to track clonality in a cell population, 

putting genetic events in the context of tumor tissue architecture. By assessing the appearance 

of pre-selected markers in individual nuclei on a tissue section by FISH, the clustering and the 

expansion of clonally related cells can be delineated by analyzing the marker patterns of 

neighboring cells (Figure 5.10). 

5.6 Evaluating the biological significance of integrative genomics 

findings 

The utilization of an integrative genomic, epigenomic and transcriptomic approach will 

undoubtedly improve our ability to identify gene disruptions and their effects on gene 

127

expression. The next challenge is to develop approaches for the determination of functional 

and phenotypic evidence of the biological relevance of such gene disruptions in a high 

throughput manner -- for example, functional genomic screens by RNAi, proteomic profiling and 

metabolite profiling. Forced expression of genes and RNAi knockdown of gene expression are 

commonly used methods for assessing growth and invasion phenotypes in cell models. 

Genome wide RNAi screens, comprised of large libraries of short hairpin RNA sequences 

redundantly targeting thousands of genes, have been used to identify genes essential to 

tumorigenesis, including tumor suppressor genes as well as cooperative genes with oncogenic 

mutation in several malignancies [22, 28, 29, 262-270]. Animal models are also instrumental to 

functional validation of genes singly or in combination, but this topic is beyond the scope of this 

article. Cross referencing genomic findings with proteomic profiles will determine the functional 

consequences yielding information on expression levels, post-translational modification, and 

protein-protein interactions [271-275]. As recent studies have highlighted the importance of the 

metabolome in cancer, the genomic landscape can also be integrated with metabolome profiles 

to determine the role of genetic and epigenetic alterations in cellular physiology relevant to 

cancer development [276-278]. 

The progress made in the development of technologies and approaches to analyze the 

genome, epigenome, and transcriptome have allowed for much improved understanding of 

cancer landscapes. With the increased application of sequence based approaches to analyze 

genetic and epigenetic dimensions and the additional complexity with the proteome and 

metabolome to follow, an unprecedented definition of the cancer cell can be achieved. The next 

key challenge will be the synthesis of this information to better understand fundamental cancer 

processes such as progression, metastasis and drug resistance. 

128

Figure 5.1. Advances in cancer genomic landscape post Y2K. 

2009 

2007 

2006 

2005 

2004 

2002 

DNA nanoballs sequencing technology [3] 

Breast, lung & skin cancer genomes sequen ced [4-6,11] 

Human met hylomes sequen ced [12] 

Acute myeloid leukemia genome sequen ced [13,14] 

International Cancer Genome Consortium initiated 

1000 genomes p roject launched [15] 

Integrative study of glioblas toma [16] 

Genome RNAi database established [9,10] 

2nd gene ration human haplo type map with >3M SN Ps [17] 

Next generation, massi vely parallel sequencing technolo gies 

Copy number variation maps [18,19] 

5-Azacytidine re-expression of met hylated cancer genes [20] 

Exome sequencing mut ation detection [21] 

The RNAi Consortium (TRC) [22] 

Bead Arrays for bisulfite DNA methylation [23] 

NIH Cancer Genome Atlas (CGA) initiated 

Methylome map by MeDNA immunop recipitation [24] 

First human genome haplo type map [25] 

MicroRNA expression profiles classify can cers [26] 

Catalogue of som atic mutations in can cer (COSMIC) [27] 

Large scale RN Ai-based sc reens [28,29] 

Cancer Gene Census published [30] 

Large scale copy number variation in humans [31,32] 

Whole genome tiling p ath CGH microarrays [33] 

Tiling path analysis of human t ranscribed sequen ces [34] 

Cancer Biomedical Informatics Grid (caBIG) launched [8] 

The Ensembl genome d atabase project [35] 

The human genome b rowser at UCSC [36] 

BeadArray genotyping pl atforms [37] 

Concept of oncogene addi ction [38,39] 

First human genome sequen ces [40,41] 

CGAP launched [42,43] 

Figure 5.1. Advances in cancer genomic landscape post Y2K. The timeframe of events 

are estimated based on time of publication. 

129

Figure 5.2 



4 

2 

0 

Allele speci�c copy number 

3 


2 

1 

0 

(a) 



4 

2 

0 

3 

2 

1 

0 

130 



Neutral (b) Gain (c) UPD 

Figure 5.2. SNP array analysis to identify areas of altered copy number and allelic 

composition in a clinical lung cancer specimen. Shown here are (a) a region that is copy 

neutral with no observed allelic imbalance and regions containing a (b) segmental gain and 

(c) UPD. Examining the allele specific copy number plot, the gain (in b) is likely a single 

copy change and the UPD event (in c) is signified by the shift in allele levels while maintaining 

total copy number neutral status.

Figure 5.3 

TP53 

1 2 3 4 5 6 7 8 

9 10 

PIK3CA 

11 12 13 14 15 16 

17 18 19 20 21 22 

131 

Gain 

Loss 

UPD 

Figure 5.3. Overlay of chromosomal regions of gain, loss and UPD (copy number 

neutral LOH) inherent to the T47D breast cancer cell line. The chromosomal loci for 

PIK3CA and TP53 (modified by activating and inactivating mutations, respectively, in this cell 

line), are indicated. The majority of the genome is affected by any one of the three genomic 

alterations. Raw SNP 6.0 array data was obtained from the Sanger database with mutation 

status obtained from the COSMIC database [132]. Copy number and allelic status changes 

were determined using Partek Genomics Suite and reference genomes used were 72 

individuals from the HapMap collection. Data was visualized using the SIGMA2 software 

[7].

Figure 5.4. Integration of copy number, allelic status, DNA methylation, and gene 

expression for a single lung adenocarcinoma sample. (a) Copy number and (b) allele 

status analyses revealed a high level allele-specific DNA amplification (highlighted in yellow, 

image generated with Partek Genomics Suite); (c) individual CpG loci within this region were 

assessed for differential methylation between tumor and non-malignant tissue. 

Hypomethylation at the indicated CpG locus, which corresponds to the MUC1 gene, is observed 

(visualized with Genesis). (d) Expression analysis revealed four-fold overexpression of the 

MUC1 transcript when a tumor sample was compared to matched, adjacent non-malignant 

tissue. Copy number and allele status profiling was performed using the Affymetrix SNP 6.0 

array; DNA methylation profiling using the Illumina Infinium HM27 platform; and gene 

expression using the Affymetrix Human Exon 1.0 ST array. 

132

Figure 5.4 

a 

b 

c 

d 



4 

2 

0 

3 

2 

1 

0 

Normal 

Tumor 

Relative expression 

7000 

6000 

5000 

4000 

3000 

2000 

1000 

0 

Total copy number: Ampli�cation 


DNA hypomethylation 

Overexpression 

MUC1 

Normal Tumor 

133

Figure 5.5. Integration of copy number, allelic status, DNA methylation, and gene 

expression for a single lung adenocarcinoma sample. Enhanced analysis of the cancer 

phenotype using an integrative and multi-dimensional approach. (a) On average, a higher 

proportion of differential gene expression can be associated with genomic alterations when 

examining multiple DNA dimensions relative to single dimensions. (b) Using a fixed frequency 

threshold of 50 %, more genes are revealed to be frequently disrupted when multiple 

mechanisms of genomic alteration (e.g. altered copy number, DNA methylation, or copy number 

neutral LOH) are considered, (~200 genes versus more than 1000 genes). (c) Pathway 

analyses performed using gene lists derived from a multi-dimensional approach, identifies an 

enhanced number of aberrant pathways relative to those identified from a uni-dimensional 

approach. (d) Functional pathways identified using the integrated gene list are of relatively high 

significance; the top 10 such pathways are shown. This suggests that the additional identified 

genes associate with specific pathways rather than with random functions. The four bars 

represent, from left to right: all dimensions, copy number, DNA methylation, and UPD. 

Ingenuity Pathway Analysis was used for analyses in (c) and (d). (e) Example of two genes that 

are missed when a single DNA dimension is studied, but captured when multiple DNA 

dimensions are examined. Both ribonucleotide reductase M2 (RRM2) [279, 280] and retinoic 

acid receptor responder (tazarotene induced) 2 (RARRES2) [281, 282] are known to be 

deregulated in multiple cancer types. 

134

Figure 5.5 

a 

Proportion of di�erentially expressed genes 

d 

-log(pvalue) 

e 

0.9 

0.8 

0.7 

0.6 

0.5 

0.4 

0.3 

0.2 

0.1 

6 

4 

2 

0 

0 


Hepatic Fibrosis / 

Hepatic Stellate 

Cell Activation 

RAR Activation 

RARRES2 

Copy Number Neutral 

LOH (CNNLOH) 

Macropinocytosis 

Copy Number 

Complement System 

Leveraging All 

Dimensions 

Leukocyte Extravasation 

Signaling 

135 

Reelin Signaling 

in Neurons 

Sample 1 

Sample 2 

Sample 3 

Sample 4 

Sample 5 

Sample 6 

Sample 7 

Sample 8 

Sample 9 

Sample 10 

Average 

Copy Number Copy Number 

DNA Methylation DNA Methylation 

CNNLOH CNNLOH 

All Dimensions All Dimensions 

b 

c 

Oncostatin M 

Signaling 

RRM2 

# of genes identi�ed 

# of Signi�cant Pathways 

IL-8 Signaling 

1400 

1200 

1000 

800 

600 

400 

200 

60 

50 

40 

30 

20 

10 

0 

0 

CNNLOH 

Copy Number 

Copy Number 


Acute Phase 

Response Signaling 


All Dimensions 

CNNLOH 

All Dimensions 

CXCR4 Signaling 

0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 

Frequency of Disruption (%) Frequency of Disruption (%)

Figure 5.6 

SHC1 

GRB2 

SOS2 

RRAS 

ER 

RAF1 

MAP2K1 

MAPK1 

ITPR1 

IP3 

PLCG1 

DAG 

Ca2+ 

DUSP4 

MAPK1 CCND1 

EGF 



EGFR 

PIK3R1 

136 

TGFA 

ERBB2 

PDK1 

MUC1 

PIP2 PIP3 

PRKCA 

CASP9 

MYC 

Cell Cycle 

RASSF5 

MST1 

Apoptosis 

AKT2 AKT1 

FOXO3 

Apoptosis 

X 

KRAS 

RASSF1 

X 

CCND1 

Proliferation 

BAD 

Figure 5.6. Identification of multiple disrupted components in a biological pathway. 

Integrative analysis identifies more genes affected in the EGFR signaling pathway than a 

single dimensional analysis alone. In this example, multi-dimensional profiling data were 

generated from ten lung adenocarcinomas and their paired non-cancerous lung tissue. 

Analysis of DNA copy number (gene dosage) alterations that affected expression, identified 

7 genes (in green) that are disrupted at ≥ 30% frequency. However, when alterations in 

copy number, DNA methylation, sequence mutation and/or copy neutral LOH were considered, 

17 genes disrupted at ≥ 30% frequency were identified to be associated with a change 

in expression, with an additional gene, KRAS, harboring frequent mutation. The 11 additional 

genes are indicated in red. Genes in gray are not significant in this dataset as they 

did not meet the frequency criteria.

Figure 5.7 

SHC1 CN 

S7 S9S10 

GE 

M 

L 

S5 S6S7 

GE 

CN 

GRB2 M 

L 

S1 S8 

GE 

CN 

SOS2 M 

L 

* 

SIRPA CN 

S1 S2S3 S4S5 S6 S7S8S9S10 

GE 

M 

L 

PI3K-AKT 

Signalling 

RRAS CN 

S1 S2S3 S4S6 S9 S10 

GE 

M 

L 

ER 

RAF1 

MAP2K1 

MAPK1 

ITPR1 CN 

S1 S2S3 S4 S6 S7S8S9 

GE 

M 

L 

S8 S10 

GE 

CN 

M 

L 

S2 S6 S7S8 

GE 

CN 

M 

L 

S6 S7S8 

GE 

CN 

M 

L 

Ca2+ 

DUSP4 

MAPK1 CCND1 

S1 S2S4 S6S7 S8 S10 

S1 S3S4 S5S9 

GE * GE 

CN 

CN 

M 

EGF TGFA M 

L 

L 

S1 S2S4 S7S8 S9 S10 

GE 

CN 

M 

L 

µ µ 

IP3 


PLCG1 CN 

S2 

GE 

M 

L 

DAG 

PRKCA CN 

S1 S2 S4 S6 S8 

GE 

M 

L 

S5 S6S7 S8S10 

GE 

CN 

M 

L 

MYC 

Cell Cycle 

EGFR 

CASP9 

PIK3R1 

S1 S2S3 S4S5 S6 S7S8S9S10 

GE 

CN 

M 

L 

RASSF5 

RASSF1 

Legend: 





M: DNA Methylation: Hypo Hyper 

Figure 5.7. Multi-dimensional analysis of the epidermal growth factor receptor signaling 

pathway. Integrative analysis identifies more genes affected in the EGFR signaling 

pathway than a single dimensional analysis alone. In this example, multi-dimensional 

profiling data were generated from ten lung adenocarcinomas and their paired noncancerous 

lung tissue. Analysis of DNA copy number (gene dosage) alterations that 

affected expression, identified 7 genes (in green) that are disrupted at ≥ 30% frequency. 

However, when alterations in copy number, DNA methylation, sequence mutation and/or 

copy neutral LOH were considered, 17 genes disrupted at ≥ 30% frequency were identified 

to be associated with a change in expression, with an additional gene, KRAS, harboring 

frequent mutation. The 11 additional genes are indicated in red. Genes in gray are not 

significant in this dataset as they did not meet the frequency criteria. Genome profiles were 

generated using the Affymetrix SNP 6.0 platform, DNA methylation data were genrated 

using the Illumina Infinium HM27 platform and gene expression profiles were generated 

using the Affymetrix Exon Array. 

137 

* 

ERBB2 

PIP2 PIP3 

S2 S3S4 S5S8 S9 S10 

GE 

CN 

M 

L 

MUC1 CN 

S3 S5S6 S7S10 

GE 

M 

L 

PDK1 CN 

S2 S3S4 S5S8 S9 

GE 

M 

L 

S7 S10 

GE 

CN 

M 

L 

S3 S4S5 S6 S7S8 

GE 

CN 

M 

L 

AKT2 AKT1 

FOXO3 

S1 

GE 

CN 

M 

L 

Apoptosis 

X 

S1 S3S4 S6S8 S9 S10 

GE 

CN 

M 

L 

KRAS CN 

S1 S5S6 S8S9 

GE 

M 

L 

µ µ µ µ µ 

X 

CCND1 

Proliferation 

MST1 

Apoptosis 

S2 S5S8S9S10 

GE 

CN 

M 

L 

S1 S3S4 S6S7 S10 

GE 

CN 

M 

L 

BAD 

S2 

GE 

CN 

M 

L

Figure 5.8 

a 

Log2 Fold Change (T vs. N) 

b 

c 

4 

3 

2 

1 

0 

-1 

-2 

-3 

-4 

% of underexpressing cases 

100 

80 

60 

40 

20 

0 

PTPN6 

r = 0.907426 

SIRPA 

Figure 5.8. Prevalence of SIRPA underexpression and its relationship with PTPN6 

and smoking status. (a) Analysis of SIRPA and PTPN6 expression in 59 lung adenocarcinoma 

tumor/non-malignant pairs using quantitative PCR. Plotted are the log2 fold changes 

of each tumors versus its matched non-malignant sample. PCR data were normalized with 

Beta-Actin. All samples were done in triplicate. Threshold lines denote a 1.5-fold change. 

(b) Pairwise comparison of SIRPA and PTPN6 fold changes in the 59 sample pairs. Spearman 

correlation coefficient was calculated. (c) Stratification of qPCR results based on 

smoking status. While the majority of current smokers (CS, n=22) and former smokers 

(n=22) show underexpression, a subset of never smokers (NS, n=15) exhibit overexpression. 

138 

% of overexpressing cases 

30 

20 

10 

SIRPA 

PTPN6 

0 

CS FS NS CS FS NS 

40 

Threshold

Figure 5.9 

Survival Ratio 


1 

0.9 

0.8 

0.7 

0.6 

0.5 

0.4 

Duke H. Lee Mo�tt 

LowExpression 

HighExpression 

0.3 

p = 0.009 

0.1 

p = 0.009 

0.2 

0 5 10 15 20 25 30 35 40 45 

0 

0 20 40 60 80 100 120 

Time (months) 

Time 

1 

0.9 

0.8 

0.7 

0.6 

0.5 

0.4 

0.3 

0.2 

0.1 

MSKCC 

LowExpression 


p = 0.150 p = 0.180 

0 

0 50 100 150 200 250 

Time 

Figure 5.9. Kaplan-Meier analysis of SIRPA in four independent microarray datasets. 

Using publicly available gene expression microarray data, Kaplan-Meier analysis was 

performed to assess the association of SIRPA expression levels and overall patient survival. 

Briefly, for each dataset, samples were sorted based on ascending SIRPA expression and 

survival distributions of the top 1/3 of samples expressing SIRPA and bottom 1/3 of samples 

expression SIRPA were compared. In total, five datasets were tested with two of the datsets 

(Duke, H. Lee Moffitt) showing a stastistically significant association. In an additional two 

datasets (MSKCC, Michigan), the p-values were close to statistical significance. All expression 

data were normalized using RMA. P-values were calculated using a Mantel-Cox log 

rank test. 



139 

1 

0.9 

0.8 

0.7 

0.6 

0.5 

0.4 

0.3 

0.2 

1 

0.9 

0.8 

0.7 

0.6 

0.5 

0.4 

0.3 

Michigan 

LowExpression 


LowExpression 


0.2 

0 20 40 60 80 

Time 

100 120 140 160

Figure 5.10 

(a) (b) 

Figure 5.10. Automated detection of selected clonal populations of cells within a 

cancer biopsy tissue section. All nuclei (~150,000 in this example) are detected and 

FISH probe signal counts are enumerated for each nucleus. FISH signal pattern for each 

cell is compared against its neighbor in order to define spatial association (or neighborhood). 

A mathematical model is then applied to determine clonal cell relationships. (a) 

Mapping cancer cells on a tissue section. A gain or loss of any one of three FISH markers 

indicates a cancer cell. This image shows the density of cancer cells (so defined) in neighborhoods 

as a color overlay. Red indicates high fraction of cancer cells, yellow indicates 

medium fraction of cancer cells and blue indicates low to none (see scale bar). Most of the 

section is highlighted except for the surrounding normal stromal infiltrates. (b) Mapping 

clonal cells. The same image data were analyzed for concurrent gains of each of the three 

of the markers. The two clusters of cells, magnified within the white boxes, are cells harboring 

gain of all three markers. 

140

Table 5.1. List of software for integrative analysis 

Software 

Agilent 

Genomic 

Workbench 

5.0 

Source: 

Commercia 

l (C) 

or 

Academic 

(A) 

Genome 

Epigenome 

Transcriptome 

Integrative 

141 

Citatio 

n 

C X X X X N/A 

SIGMA2 A X X X X [7] 

Integrative 

Genomics 

Viewer 

Nexus Copy 

Number 

Website (http://www.) 

chem.agilent.com/enus/products/instruments/dnamicroar 

rays/ 

dnaanalyticssoftware/pages/default. 

aspx 

flintbox.com/technology.asp?page= 

3716 

A X X X N/A broadinstitute.org/igv/ 

C X X X N/A biodiscovery.com/index/nexus 

CGH Fusion C X X N/A infoquant.com/index/cghfusion 

ISA-CGH A X X X [283] isacgh.bioinfo.cipf.es 

VAMP 

Partek 

A X X X X [284] bioinfo-out.curie.fr/projects/vamp/ 

Genomics 

Suite 

C X X X X N/A partek.com/partekgs

Table 5.2. List of genomic resources and databases 

Name Description Citation Website (http://www.) 

ArrayExpress Gene 

Expression Atlas 

Gene expression analysis of public 

datasets 

[285] ebi.ac.uk/gxa 

BioDrugScreen 

Catalogue of Somatic 

Protein/Small molecule interaction 

database 

[286] biodrugscreen.org 

Mutations in Cancer 

(COSMIC) 

Listing of somatic mutations in cancer [132] sanger.ac.uk/cosmic 

Cancer Gene Expression 

Database (CGED) 

Gene expression analysis of cancer [287] cged.hgc.jp 

Database of Differentially 

Expressed Proteins in 

human Cancers (dbDEPC) 

Differentially expressed proteins in 

cancer 

[288] dbdepc.biosino.org/index 

Database of Genomics 

Variants 

Reported normal copy number 

variations 

[31] projects.tcag.ca/variation 

European Bioinformatics 

Institute (EBI) 

Integrated database of multiple 

biological resources 

[289] ebi.ac.uk 

GeneCards 



[290] genecards.org 

GenomeRNAi RNAi experiment results [10] rnai2.dkfz.de/GenomeRNAi 

Human DNA Methylome 

Whole genome methylation sequences 

of multiple individuals 

[12] 

neomorph.salk.edu/human 

_methylome 

Human Histone Modification 

Database (HHMD) 

Histone modification database [291] bioinfo.hrbmu.edu.cn/hhmd 

microRNA.org Annotated microRNAs and their targets [292] microRNA.org 

miR2Disease Deregulated microRNAs in cancer [293] miR2Disease.org 

miRDB Annotated microRNAs and their targets [227] mirdb.org 

miRGen Annotated microRNAs and their targets [294] 

diana.cslab.ece.ntua.gr/mir 

gen 

National Center for 

Biotechnology Information 

(NCBI) 



[295] ncbi.nlm.nih.gov 

NCBI Cancer Chromosomes 

Curated cytogenetic alterations in 

cancer 

[295] 

ncbi.nlm.nih.gov/sites/entre 

z?db=cancerchromosomes 

NCBI GEO Profiles 


datasets 

[296] 

ncbi.nlm.nih.gov/sites/entre 

z?db=geo 

Oncomine 


datasets 

[297] oncomine.org 

PROGENETIX 

Copy number aberrations in cancer by 

CGH 

[298] progenetix.net 

PRoteomics IDentifications 

Database (PRIDE) 

Mass spectrometry results [299] ebi.ac.uk/pride 

Sanger CGP LOH And Copy 

Number Analysis 

Copy number and LOH profiles of 

cancer cell lines 

- 

sanger.ac.uk/cgibin/genetics/CGP/cghviewe 

r/CghHome.cgi 

siRecords 

System for Integrative 

RNAi experiment results [300] 

siRecords.umn.edu/siRecor 

ds 

Genomic Microarray 

Analysis (SIGMA) 

Array CGH profiles of cancer cell lines [301] sigma.bccrc.ca 

The Cancer Genome 

Anatomy Project (CGAP) 

Gene expression analysis of cancer [43] cgap.nci.nih.gov/ 

The Cancer Genome Atlas 

(TCGA) 

Multi-dimensional description of cancer 

genomes 

[16] 

cancergenome.nih.gov/dat 

aportal/data/about/ 

UCSC Genome Browser 



[302] 

genome.ucsc.edu/cgibin/hgNear 

142

Table 5.3. Genes interacting with SIRPA as identified by network analysis 

Gene Gene Gene Gene Gene Gene Gene Gene 

ABCG1 C5AR1 DOK2 GPR65 LILRA1 NTNG1 RECK STK10 

ABI3BP C7orf44 DPEP2 GPR85 LILRA5 NTRK3 RHOG STK33 

ACOT1 CANT1 DSE GPX3 LILRB1 NUP62CL RHOJ STX11 

ACP5 CCND2 EMP3 GSPT2 LILRB2 OGN RNASEK SULF1 

ACSL4 CD14 EMR1 GTDC1 LILRB5 

LOC440 

OLFML1 RTN1 TACSTD1 

ACVRL1 CD163 ETS1 GYPC 295 OR1J1 RUNX1T1 TARP 

ACY1 CD300C EVI2B HCK LOXL2 PARVB SAMHD1 TARS2 

ADAMTSL4 CD300LF FAAH2 HERPUD1 LPAR1 PCDH15 

PDCD1LG 

SELPLG TCEAL2 

ADARB1 CD33 FAM107A HIST2H4A LPIN1 2 SH2B3 TCF21 

ADC CD34 FAM65A HSD11B1 LPXN PDE3B SIGLEC7 TDRKH 

ADCY4 CD4 FBLN5 HSPB7 LRCH2 PHEX SIP1 TFE3 

ADCY7 CD53 FBXL17 HVCN1 LRRC25 PHKA1 SIRPB1 TGFBR3 

ADPRH CD86 FBXL2 IFI30 LRRC33 

LRRC37 

PIK3AP1 SLA TLN1 

ADRA1A CD93 FCER1G IGSF10 A 

LRRC8 

PIK3R5 SLC15A3 TLR4 

ADRBK1 CD97 FCGR1A IGSF2 C PILRA SLC16A2 TLR8 

AGER CDH1 FCGR3A IL17RA LST1 PLEK SLC22A25 TM6SF1 

AKR7A3 CDKL3 FERMT3 IL8RA LTBP2 PLEKHA8 SLC25A10 TMED3 

TMEM183 

AKT1 CFD FGD2 IRF6 MAF PLEKHO2 SLC25A29 B 

MAN1C 

TMEM184 

ALS2CR12 CFP FGF2 IRF8 1 

MAP3K 

PMP22 SLC2A9 A 

ANGPTL1 CLEC4E FGFR4 ITGA5 3 

MARCH 

PNPLA6 SLC31A2 TMEM47 

ANKRD36 CLN3 FGL2 ITGAL 1 PPAP2B SLC7A11 TMTC1 

TNFRSF1 

AOC3 CMKLR1 FGR ITPR3 MARCO PPM1F SLC7A7 B 

ARHGAP3 

MCOLN PPP1R14 

0 COASY FHL1 ITPRIP 1 B SLC8A1 TPK1 

ARRB2 COG1 FIBIN JAM2 MFNG PRCP SLCO2B1 TPRG1 

ATP6V1B2 COMMD3 FIGF JUNB MMP19 PREX1 SLFN13 TRPV2 

BAALC CPA3 FLI1 KCNJ5 MORC2 PROM1 SMARCA2 TSPAN18 

BHLHB3 CPVL FLJ22662 KCNK1 MRAS PRUNE2 SMYD3 TTC13 

BRCC3 CSF1R FPR1 KCTD12 MRC2 PTGDS SNN TTC30B 

BTK CSF2RB 

CSGALNA 

FRMD4A KIF26B MS4A15 PTGER4 SORBS1 TYROBP 

BVES CT1 FXYD6 KLF13 MSRB3 

MTMR1 

PTGIS SORD TYW1 

BZW2 CYP4Z1 GFRA2 KLF4 0 PTPN6 SPARCL1 USP48 

C10orf54 CYTH4 GIMAP1 KMO MYCT1 PTPRG SPI1 VAMP7 

C10orf72 CYYR1 GIMAP4 LAIR1 MYD88 PVRL4 SPN VASH1 

C14orf49 DAB2 GIMAP5 LAMB3 NCF1 

NCKAP 

QKI SPOCK2 VAT1 

C1orf38 DNAH7 GIMAP6 LAMC2 1L QSER1 SPON1 VSIG4 

C1QA DNAJB4 GIMAP8 LAPTM5 NEK4 RAD54B SRGN VWF 

C1QB DNASE1L3 GLIPR2 LAT2 NEXN RASSF2 SRPX WDR60 

C1QC DOCK11 GMFG LCP1 NLRC4 RBM17 STAP2 

C1RL DOCK8 GNAI2 LDB2 NTAN1 RBM35A STAT5A 

143


1. Pinkel D, Segraves R, Sudar D, Clark S, Poole I, Kowbel D, Collins C, Kuo WL, Chen C, 

Zhai Y et al: High resolution analysis of DNA copy number variation using 

comparative genomic hybridization to microarrays. Nat Genet 1998, 20(2):207-211. 

2. Schrock E, du Manoir S, Veldman T, Schoell B, Wienberg J, Ferguson-Smith MA, Ning 

Y, Ledbetter DH, Bar-Am I, Soenksen D et al: Multicolor spectral karyotyping of 

human chromosomes. Science 1996, 273(5274):494-497. 

3. Drmanac R, Sparks AB, Callow MJ, Halpern AL, Burns NL, Kermani BG, Carnevali P, 

Nazarenko I, Nilsen GB, Yeung G et al: Human Genome Sequencing Using 

Unchained Base Reads on Self-Assembling DNA Nanoarrays. Science 2009. 

4. Pleasance ED, Cheetham RK, Stephens PJ, McBride DJ, Humphray SJ, Greenman CD, 

Varela I, Lin ML, Ordonez GR, Bignell GR et al: A comprehensive catalogue of 

somatic mutations from a human cancer genome. Nature 2009. 

5. Pleasance ED, Stephens PJ, O'Meara S, McBride DJ, Meynert A, Jones D, Lin ML, 

Beare D, Lau KW, Greenman C et al: A small-cell lung cancer genome with complex 

signatures of tobacco exposure. Nature 2009. 

6. Stephens PJ, McBride DJ, Lin ML, Varela I, Pleasance ED, Simpson JT, Stebbings LA, 

Leroy C, Edkins S, Mudie LJ et al: Complex landscapes of somatic rearrangement in 

human breast cancer genomes. Nature 2009, 462(7276):1005-1010. 




9:422. 

8. von Eschenbach AC, Buetow K: Cancer Informatics Vision: caBIG. Cancer Inform 

2007, 2:22-24. 

9. Horn T, Arziman Z, Berger J, Boutros M: GenomeRNAi: a database for cell-based 

RNAi phenotypes. Nucleic Acids Res 2007, 35(Database issue):D492-497. 

10. Gilsdorf M, Horn T, Arziman Z, Pelz O, Kiner E, Boutros M: GenomeRNAi: a database 

for cell-based RNAi phenotypes. 2009 update. Nucleic Acids Res 2010, 38(Database 

issue):D448-452. 

11. Shah SP, Morin RD, Khattra J, Prentice L, Pugh T, Burleigh A, Delaney A, Gelmon K, 

Guliany R, Senz J et al: Mutational evolution in a lobular breast tumour profiled at 

single nucleotide resolution. Nature 2009, 461(7265):809-813. 

12. Lister R, Pelizzola M, Dowen RH, Hawkins RD, Hon G, Tonti-Filippini J, Nery JR, Lee L, 

Ye Z, Ngo QM et al: Human DNA methylomes at base resolution show widespread 

epigenomic differences. Nature 2009. 

13. Ley TJ, Mardis ER, Ding L, Fulton B, McLellan MD, Chen K, Dooling D, Dunford-Shore 

BH, McGrath S, Hickenbotham M et al: DNA sequencing of a cytogenetically normal 

acute myeloid leukaemia genome. Nature 2008, 456(7218):66-72. 

14. Mardis ER, Ding L, Dooling DJ, Larson DE, McLellan MD, Chen K, Koboldt DC, Fulton 

RS, Delehaunty KD, McGrath SD et al: Recurring mutations found by sequencing an 

acute myeloid leukemia genome. N Engl J Med 2009, 361(11):1058-1066. 

15. Wise J: Consortium hopes to sequence genome of 1000 volunteers. BMJ 2008, 

336(7638):237. 

16. Comprehensive genomic characterization defines human glioblastoma genes and 

core pathways. Nature 2008, 455(7216):1061-1068. 

17. Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL, Gibbs RA, Belmont JW, 

Boudreau A, Hardenbol P, Leal SM et al: A second generation human haplotype map 

of over 3.1 million SNPs. Nature 2007, 449(7164):851-861. 

144

18. Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, Fiegler H, Shapero MH, 

Carson AR, Chen W et al: Global variation in copy number in the human genome. 

Nature 2006, 444(7118):444-454. 

19. Wong KK, deLeeuw RJ, Dosanjh NS, Kimm LR, Cheng Z, Horsman DE, MacAulay C, 

Ng RT, Brown CJ, Eichler EE et al: A comprehensive analysis of common copynumber 

variations in the human genome. Am J Hum Genet 2007, 80(1):91-104. 




2006, 3(12):e486. 

21. Sjoblom T, Jones S, Wood LD, Parsons DW, Lin J, Barber TD, Mandelker D, Leary RJ, 

Ptak J, Silliman N et al: The consensus coding sequences of human breast and 

colorectal cancers. Science 2006, 314(5797):268-274. 

22. Root DE, Hacohen N, Hahn WC, Lander ES, Sabatini DM: Genome-scale loss-offunction 

screening with a lentiviral RNAi library. Nat Methods 2006, 3(9):715-719. 







37(8):853-862. 

25. A haplotype map of the human genome. Nature 2005, 437(7063):1299-1320. 

26. Lu J, Getz G, Miska EA, Alvarez-Saavedra E, Lamb J, Peck D, Sweet-Cordero A, Ebert 

BL, Mak RH, Ferrando AA et al: MicroRNA expression profiles classify human 

cancers. Nature 2005, 435(7043):834-838. 

27. Bamford S, Dawson E, Forbes S, Clements J, Pettett R, Dogan A, Flanagan A, Teague 

J, Futreal PA, Stratton MR et al: The COSMIC (Catalogue of Somatic Mutations in 

Cancer) database and website. Br J Cancer 2004, 91(2):355-358. 

28. Paddison PJ, Silva JM, Conklin DS, Schlabach M, Li M, Aruleba S, Balija V, 

O'Shaughnessy A, Gnoj L, Scobie K et al: A resource for large-scale RNAinterference-based 

screens in mammals. Nature 2004, 428(6981):427-431. 

29. Schlabach MR, Luo J, Solimini NL, Hu G, Xu Q, Li MZ, Zhao Z, Smogorzewska A, Sowa 

ME, Ang XL et al: Cancer proliferation gene discovery through functional 

genomics. Science 2008, 319(5863):620-624. 

30. Futreal PA, Coin L, Marshall M, Down T, Hubbard T, Wooster R, Rahman N, Stratton 

MR: A census of human cancer genes. Nat Rev Cancer 2004, 4(3):177-183. 

31. Iafrate AJ, Feuk L, Rivera MN, Listewnik ML, Donahoe PK, Qi Y, Scherer SW, Lee C: 

Detection of large-scale variation in the human genome. Nat Genet 2004, 36(9):949- 

951. 

32. Sebat J, Lakshmi B, Troge J, Alexander J, Young J, Lundin P, Maner S, Massa H, 

Walker M, Chi M et al: Large-scale copy number polymorphism in the human 

genome. Science 2004, 305(5683):525-528. 




34. Bertone P, Stolc V, Royce TE, Rozowsky JS, Urban AE, Zhu X, Rinn JL, Tongprasit W, 

Samanta M, Weissman S et al: Global identification of human transcribed 

sequences with genome tiling arrays. Science 2004, 306(5705):2242-2246. 

35. Hubbard T, Barker D, Birney E, Cameron G, Chen Y, Clark L, Cox T, Cuff J, Curwen V, 

Down T et al: The Ensembl genome database project. Nucleic Acids Res 2002, 

30(1):38-41. 

36. Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D: The 

human genome browser at UCSC. Genome Res 2002, 12(6):996-1006. 

145

37. Oliphant A, Barker DL, Stuelpnagel JR, Chee MS: BeadArray technology: enabling an 

accurate, cost-effective approach to high-throughput genotyping. Biotechniques 

2002, Suppl:56-58, 60-51. 

38. Weinstein IB: Cancer. Addiction to oncogenes--the Achilles heal of cancer. Science 

2002, 297(5578):63-64. 

39. Weinstein IB, Joe A: Oncogene addiction. Cancer Res 2008, 68(9):3077-3080; 

discussion 3080. 

40. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, 

Doyle M, FitzHugh W et al: Initial sequencing and analysis of the human genome. 

Nature 2001, 409(6822):860-921. 

41. Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, 

Evans CA, Holt RA et al: The sequence of the human genome. Science 2001, 

291(5507):1304-1351. 

42. Riggins GJ, Strausberg RL: Genome and genetic resources from the Cancer 

Genome Anatomy Project. Hum Mol Genet 2001, 10(7):663-667. 

43. Strausberg RL, Buetow KH, Emmert-Buck MR, Klausner RD: The cancer genome 

anatomy project: building an annotated gene index. Trends Genet 2000, 16(3):103- 

106. 

44. Bayani JM, Squire JA: Applications of SKY in cancer cytogenetics. Cancer Invest 

2002, 20(3):373-386. 

45. Kallioniemi A, Kallioniemi OP, Sudar D, Rutovitz D, Gray JW, Waldman F, Pinkel D: 

Comparative genomic hybridization for molecular cytogenetic analysis of solid 

tumors. Science 1992, 258(5083):818-821. 

46. Garnis C, Buys TP, Lam WL: Genetic alteration and gene expression modulation 

during cancer progression. Mol Cancer 2004, 3:9. 

47. Gebhart E: Genomic imbalances in human leukemia and lymphoma detected by 

comparative genomic hybridization (Review). Int J Oncol 2005, 27(3):593-606. 

48. Gebhart E, Liehr T: Patterns of genomic imbalances in human solid tumors 

(Review). Int J Oncol 2000, 16(2):383-399. 

49. Cahill DP, Lengauer C, Yu J, Riggins GJ, Willson JK, Markowitz SD, Kinzler KW, 

Vogelstein B: Mutations of mitotic checkpoint genes in human cancers. Nature 

1998, 392(6673):300-303. 

50. Fukasawa K: Centrosome amplification, chromosome instability and cancer 

development. Cancer Lett 2005, 230(1):6-19. 

51. Lingle WL, Lukasiewicz K, Salisbury JL: Deregulation of the centrosome cycle and 

the origin of chromosomal instability in cancer. Adv Exp Med Biol 2005, 570:393- 

421. 

52. Chin K, de Solorzano CO, Knowles D, Jones A, Chou W, Rodriguez EG, Kuo WL, Ljung 

BM, Chew K, Myambo K et al: In situ analyses of genome instability in breast 

cancer. Nat Genet 2004, 36(9):984-988. 

53. O'Hagan RC, Chang S, Maser RS, Mohan R, Artandi SE, Chin L, DePinho RA: 

Telomere dysfunction provokes regional amplification and deletion in cancer 

genomes. Cancer Cell 2002, 2(2):149-155. 

54. Green AR: Transcription factors, translocations and haematological malignancies. 

Blood Rev 1992, 6(2):118-124. 

55. Rowley JD: Chromosomal translocations: revisited yet again. Blood 2008, 

112(6):2183-2189. 

56. Watson SK, deLeeuw RJ, Horsman DE, Squire JA, Lam WL: Cytogenetically balanced 

translocations are associated with focal copy number alterations. Hum Genet 

2007, 120(6):795-805. 

57. Brenner JC, Chinnaiyan AM: Translocations in epithelial cancers. Biochim Biophys 

Acta 2009, 1796(2):201-215. 

146

58. Mani RS, Tomlins SA, Callahan K, Ghosh A, Nyati MK, Varambally S, Palanisamy N, 

Chinnaiyan AM: Induced chromosomal proximity and gene fusions in prostate 

cancer. Science 2009, 326(5957):1230. 

59. Tomlins SA, Rhodes DR, Perner S, Dhanasekaran SM, Mehra R, Sun XW, Varambally 

S, Cao X, Tchinda J, Kuefer R et al: Recurrent fusion of TMPRSS2 and ETS 

transcription factor genes in prostate cancer. Science 2005, 310(5748):644-648. 

60. Dang TP, Gazdar AF, Virmani AK, Sepetavec T, Hande KR, Minna JD, Roberts JR, 

Carbone DP: Chromosome 19 translocation, overexpression of Notch3, and human 

lung cancer. J Natl Cancer Inst 2000, 92(16):1355-1357. 

61. Soda M, Choi YL, Enomoto M, Takada S, Yamashita Y, Ishikawa S, Fujiwara S, 

Watanabe H, Kurashina K, Hatanaka H et al: Identification of the transforming EML4- 

ALK fusion gene in non-small-cell lung cancer. Nature 2007, 448(7153):561-566. 

62. Knutsen T, Gobu V, Knaus R, Padilla-Nash H, Augustus M, Strausberg RL, Kirsch IR, 

Sirotkin K, Ried T: The interactive online SKY/M-FISH & CGH database and the 

Entrez cancer chromosomes search database: linkage of chromosomal 

aberrations with the genome sequence. Genes Chromosomes Cancer 2005, 

44(1):52-64. 

63. Albertson DG, Collins C, McCormick F, Gray JW: Chromosome aberrations in solid 

tumors. Nat Genet 2003, 34(4):369-376. 





J Hum Genet 2006, 14(2):139-148. 

66. Pollack JR, Perou CM, Alizadeh AA, Eisen MB, Pergamenschikov A, Williams CF, 

Jeffrey SS, Botstein D, Brown PO: Genome-wide analysis of DNA copy-number 

changes using cDNA microarrays. Nat Genet 1999, 23(1):41-46. 

67. Almagro-Garcia J, Manske M, Carret C, Campino S, Auburn S, Macinnis BL, Maslen G, 

Pain A, Newbold CI, Kwiatkowski DP et al: SnoopCGH: software for visualizing 

comparative genomic hybridization data. Bioinformatics 2009, 25(20):2732-2733. 

68. Chari R, Lockwood WW, Lam WL: Computational methods for the analysis of array 

comparative genomic hybridization. Cancer Inform 2007, 2:48-58. 

69. Chi B, DeLeeuw RJ, Coe BP, MacAulay C, Lam WL: SeeGH--a software tool for 

visualization of whole genome array comparative genomic hybridization data. 

BMC Bioinformatics 2004, 5:13. 

70. Chi B, deLeeuw RJ, Coe BP, Ng RT, MacAulay C, Lam WL: MD-SeeGH: a platform for 

integrative analysis of multi-dimensional genomic data. BMC Bioinformatics 2008, 

9:243. 



72. Bignell GR, Huang J, Greshock J, Watt S, Butler A, West S, Grigorova M, Jones KW, 

Wei W, Stratton MR et al: High-resolution analysis of DNA copy number using 

oligonucleotide microarrays. Genome Res 2004, 14(2):287-295. 

73. Iacobucci I, Storlazzi CT, Cilloni D, Lonetti A, Ottaviani E, Soverini S, Astolfi A, Chiaretti 

S, Vitale A, Messa F et al: Identification and molecular characterization of recurrent 

genomic deletions on 7p12 in the IKZF1 gene in a large cohort of BCR-ABL1positive 

acute lymphoblastic leukemia patients: on behalf of Gruppo Italiano 

Malattie Ematologiche dell'Adulto Acute Leukemia Working Party (GIMEMA AL 

WP). Blood 2009, 114(10):2159-2167. 

74. Niini T, Lopez-Guerrero JA, Ninomiya S, Guled M, Hattinger CM, Michelacci F, Bohling 

T, Llombart-Bosch A, Picci P, Serra M et al: Frequent deletion of CDKN2A and 

recurrent coamplification of KIT, PDGFRA, and KDR in fibrosarcoma of bone-An 

array comparative genomic hybridization study. Genes Chromosomes Cancer 2010, 

49(2):132-143. 

147

75. Selzer RR, Richmond TA, Pofahl NJ, Green RD, Eis PS, Nair P, Brothman AR, Stallings 

RL: Analysis of chromosome breakpoints in neuroblastoma at sub-kilobase 

resolution using fine-tiling oligonucleotide array CGH. Genes Chromosomes 

Cancer 2005, 44(3):305-319. 

76. Zhao X, Li C, Paez JG, Chin K, Janne PA, Chen TH, Girard L, Minna J, Christiani D, Leo 

C et al: An integrated view of copy number and allelic alterations in the cancer 

genome using single nucleotide polymorphism arrays. Cancer Res 2004, 

64(9):3060-3071. 

77. Wang TL, Maierhofer C, Speicher MR, Lengauer C, Vogelstein B, Kinzler KW, 

Velculescu VE: Digital karyotyping. Proc Natl Acad Sci U S A 2002, 99(25):16156- 

16161. 

78. Tuzun E, Sharp AJ, Bailey JA, Kaul R, Morrison VA, Pertz LM, Haugen E, Hayden H, 

Albertson D, Pinkel D et al: Fine-scale structural variation of the human genome. 

Nat Genet 2005, 37(7):727-732. 

79. Volik S, Raphael BJ, Huang G, Stratton MR, Bignel G, Murnane J, Brebner JH, 

Bajsarowicz K, Paris PL, Tao Q et al: Decoding the fine-scale structure of a breast 

cancer genome and transcriptome. Genome Res 2006, 16(3):394-404. 

80. Volik S, Zhao S, Chin K, Brebner JH, Herndon DR, Tao Q, Kowbel D, Huang G, Lapuk 

A, Kuo WL et al: End-sequence profiling: sequence-based analysis of aberrant 

genomes. Proc Natl Acad Sci U S A 2003, 100(13):7696-7701. 

81. McPherson JD: Next-generation gap. Nat Methods 2009, 6(11 Suppl):S2-5. 

82. Alkan C, Kidd JM, Marques-Bonet T, Aksay G, Antonacci F, Hormozdiari F, Kitzman JO, 

Baker C, Malig M, Mutlu O et al: Personalized copy number and segmental 

duplication maps using next-generation sequencing. Nat Genet 2009, 41(10):1061- 

1067. 

83. Conrad DF, Andrews TD, Carter NP, Hurles ME, Pritchard JK: A high-resolution 

survey of deletion polymorphism in the human genome. Nat Genet 2006, 38(1):75- 

81. 

84. Conrad DF, Pinto D, Redon R, Feuk L, Gokcumen O, Zhang Y, Aerts J, Andrews TD, 

Barnes C, Campbell P et al: Origins and functional impact of copy number variation 

in the human genome. Nature 2009. 

85. Fiegler H, Redon R, Andrews D, Scott C, Andrews R, Carder C, Clark R, Dovey O, Ellis 

P, Feuk L et al: Accurate and reliable high-throughput detection of copy number 

variation in the human genome. Genome Res 2006, 16(12):1566-1574. 

86. Jakobsson M, Scholz SW, Scheet P, Gibbs JR, VanLiere JM, Fung HC, Szpiech ZA, 

Degnan JH, Wang K, Guerreiro R et al: Genotype, haplotype and copy-number 

variation in worldwide human populations. Nature 2008, 451(7181):998-1003. 

87. Kidd JM, Cooper GM, Donahue WF, Hayden HS, Sampas N, Graves T, Hansen N, 

Teague B, Alkan C, Antonacci F et al: Mapping and sequencing of structural 

variation from eight human genomes. Nature 2008, 453(7191):56-64. 

88. McCarroll SA, Kuruvilla FG, Korn JM, Cawley S, Nemesh J, Wysoker A, Shapero MH, 

de Bakker PI, Maller JB, Kirby A et al: Integrated detection and population-genetic 

analysis of SNPs and copy number variation. Nat Genet 2008, 40(10):1166-1174. 

89. Shaikh TH, Gai X, Perin JC, Glessner JT, Xie H, Murphy K, O'Hara R, Casalunovo T, 

Conlin LK, D'Arcy M et al: High-resolution mapping and analysis of copy number 

variations in the human genome: a data resource for clinical and research 

applications. Genome Res 2009, 19(9):1682-1690. 

90. Hastings PJ, Ira G, Lupski JR: A microhomology-mediated break-induced 

replication model for the origin of human copy number variation. PLoS Genet 

2009, 5(1):e1000327. 

91. Diskin SJ, Hou C, Glessner JT, Attiyeh EF, Laudenslager M, Bosse K, Cole K, Mosse 

YP, Wood A, Lynch JE et al: Copy number variation at 1q21.1 associated with 

neuroblastoma. Nature 2009, 459(7249):987-991. 

148




93. Myllykangas S, Himberg J, Bohling T, Nagy B, Hollmen J, Knuutila S: DNA copy 

number amplification profiling of human neoplasms. Oncogene 2006, 25(55):7324- 

7332. 

94. Teschendorff AE, Caldas C: The breast cancer somatic 'muta-ome': tackling the 

complexity. Breast Cancer Res 2009, 11(2):301. 

95. Chin SF, Teschendorff AE, Marioni JC, Wang Y, Barbosa-Morais NL, Thorne NP, Costa 

JL, Pinder SE, van de Wiel MA, Green AR et al: High-resolution aCGH and 

expression profiling identifies a novel genomic subtype of ER negative breast 

cancer. Genome Biol 2007, 8(10):R215. 




97. Bass AJ, Watanabe H, Mermel CH, Yu S, Perner S, Verhaak RG, Kim SY, Wardwell L, 

Tamayo P, Gat-Viks I et al: SOX2 is an amplified lineage-survival oncogene in lung 

and esophageal squamous cell carcinomas. Nat Genet 2009, 41(11):1238-1242. 

98. Garraway LA, Widlund HR, Rubin MA, Getz G, Berger AJ, Ramaswamy S, Beroukhim 

R, Milner DA, Granter SR, Du J et al: Integrative genomic analyses identify MITF as 

a lineage survival oncogene amplified in malignant melanoma. Nature 2005, 

436(7047):117-122. 



Nature 2007, 450(7171):893-898. 

100. Kwei KA, Kim YH, Girard L, Kao J, Pacyna-Gengelbach M, Salari K, Lee J, Choi YL, 

Sato M, Wang P et al: Genomic profiling identifies TITF1 as a lineage-specific 

oncogene amplified in lung cancer. Oncogene 2008, 27(25):3635-3640. 

101. Plomin R, Haworth CM, Davis OS: Common disorders are quantitative traits. Nat 

Rev Genet 2009, 10(12):872-878. 

102. Savas S, Liu G: Genetic variations as cancer prognostic markers: review and 

update. Hum Mutat 2009, 30(10):1369-1377. 

103. Ansorge WJ: Next-generation DNA sequencing techniques. N Biotechnol 2009, 

25(4):195-203. 

104. Shah SP, Kobel M, Senz J, Morin RD, Clarke BA, Wiegand KC, Leung G, Zayed A, Mehl 

E, Kalloger SE et al: Mutation of FOXL2 in granulosa-cell tumors of the ovary. N 

Engl J Med 2009, 360(26):2719-2729. 

105. Greenman C, Stephens P, Smith R, Dalgliesh GL, Hunter C, Bignell G, Davies H, 

Teague J, Butler A, Stevens C et al: Patterns of somatic mutation in human cancer 

genomes. Nature 2007, 446(7132):153-158. 

106. Stratton MR, Campbell PJ, Futreal PA: The cancer genome. Nature 2009, 

458(7239):719-724. 

107. Cavenee WK, Hansen MF, Nordenskjold M, Kock E, Maumenee I, Squire JA, Phillips 

RA, Gallie BL: Genetic origin of mutations predisposing to retinoblastoma. Science 

1985, 228(4698):501-503. 

108. Knudson AG, Jr.: Mutation and cancer: statistical study of retinoblastoma. Proc Natl 

Acad Sci U S A 1971, 68(4):820-823. 

109. Benz CC, Fedele V, Xu F, Ylstra B, Ginzinger D, Yu M, Moore D, Hall RK, Wolff DJ, 

Disis ML et al: Altered promoter usage characterizes monoallelic transcription 

arising with ERBB2 amplification in human breast cancers. Genes Chromosomes 

Cancer 2006, 45(11):983-994. 

110. LaFramboise T, Weir BA, Zhao X, Beroukhim R, Li C, Harrington D, Sellers WR, 

Meyerson M: Allele-specific amplification in cancer revealed by SNP array 

analysis. PLoS Comput Biol 2005, 1(6):e65. 

149

111. Melcher R, Al-Taie O, Kudlich T, Hartmann E, Maisch S, Steinlein C, Schmid M, 

Rosenwald A, Menzel T, Scheppach W et al: SNP-Array genotyping and spectral 

karyotyping reveal uniparental disomy as early mutational event in MSS- and MSIcolorectal 

cancer cell lines. Cytogenet Genome Res 2007, 118(2-4):214-221. 

112. Nomura M, Shigematsu H, Li L, Suzuki M, Takahashi T, Estess P, Siegelman M, Feng 

Z, Kato H, Marchetti A et al: Polymorphisms, mutations, and amplification of the 

EGFR gene in non-small cell lung cancers. PLoS Med 2007, 4(4):e125. 

113. Sholl LM, Yeap BY, Iafrate AJ, Holmes-Tisch AJ, Chou YP, Wu MT, Goan YG, Su L, 

Benedettini E, Yu J et al: Lung adenocarcinoma with EGFR amplification has 

distinct clinicopathologic and molecular features in never-smokers. Cancer Res 

2009, 69(21):8341-8348. 





115. Bacolod MD, Schemmann GS, Giardina SF, Paty P, Notterman DA, Barany F: 

Emerging paradigms in cancer genetics: some important findings from highdensity 

single nucleotide polymorphism array studies. Cancer Res 2009, 69(3):723- 

727. 

116. Robinson WP: Mechanisms leading to uniparental disomy and their clinical 

consequences. Bioessays 2000, 22(5):452-459. 


15(3):120-128. 



1992, 59(4):248-252. 

119. Gondek LP, Dunbar AJ, Szpurka H, McDevitt MA, Maciejewski JP: SNP array 

karyotyping allows for the detection of uniparental disomy and cryptic 

chromosomal abnormalities in MDS/MPD-U and MPD. PLoS One 2007, 2(11):e1225. 









Genet 2007, 81(1):114-126. 




86. 




6192. 

124. Gupta M, Raghavan M, Gale RE, Chelala C, Allen C, Molloy G, Chaplin T, Linch DC, 

Cazier JB, Young BD: Novel regions of acquired uniparental disomy discovered in 

acute myeloid leukemia. Genes Chromosomes Cancer 2008, 47(9):729-739. 

125. Kawamata N, Ogawa S, Gueller S, Ross SH, Huynh T, Chen J, Chang A, Nabavi-Nouis 

S, Megrabian N, Siebert R et al: Identified hidden genomic changes in mantle cell 

lymphoma using high-resolution single nucleotide polymorphism genomic array. 

Exp Hematol 2009, 37(8):937-946. 

150

126. Makishima H, Cazzolli H, Szpurka H, Dunbar A, Tiu R, Huh J, Muramatsu H, O'Keefe C, 

Hsi E, Paquette RL et al: Mutations of e3 ubiquitin ligase cbl family members 

constitute a novel common pathogenic lesion in myeloid malignancies. J Clin 

Oncol 2009, 27(36):6109-6116. 

127. Walter MJ, Payton JE, Ries RE, Shannon WD, Deshmukh H, Zhao Y, Baty J, Heath S, 

Westervelt P, Watson MA et al: Acquired copy number alterations in adult acute 

myeloid leukemia genomes. Proc Natl Acad Sci U S A 2009, 106(31):12950-12955. 

128. Yin D, Ogawa S, Kawamata N, Tunici P, Finocchiaro G, Eoli M, Ruckert C, Huynh T, Liu 

G, Kato M et al: High-resolution genomic copy number profiling of glioblastoma 

multiforme by single nucleotide polymorphism DNA microarray. Mol Cancer Res 

2009, 7(5):665-677. 

129. Purdie KJ, Lambert SR, Teh MT, Chaplin T, Molloy G, Raghavan M, Kelsell DP, Leigh 

IM, Harwood CA, Proby CM et al: Allelic imbalances and microdeletions affecting 

the PTPRD gene in cutaneous squamous cell carcinomas detected using single 

nucleotide polymorphism microarray analysis. Genes Chromosomes Cancer 2007, 

46(7):661-669. 

130. Akagi T, Ito T, Kato M, Jin Z, Cheng Y, Kan T, Yamamoto G, Olaru A, Kawamata N, 

Boult J et al: Chromosomal abnormalities and novel disease-related regions in 

progression from Barrett's esophagus to esophageal adenocarcinoma. Int J 

Cancer 2009, 125(10):2349-2359. 

131. Andersen CL, Wiuf C, Kruhoffer M, Korsgaard M, Laurberg S, Orntoft TF: Frequent 

occurrence of uniparental disomy in colorectal cancer. Carcinogenesis 2007, 

28(1):38-48. 

132. Forbes SA, Tang G, Bindal N, Bamford S, Dawson E, Cole C, Kok CY, Jia M, Ewing R, 

Menzies A et al: COSMIC (the Catalogue of Somatic Mutations in Cancer): a 

resource to investigate acquired mutations in human cancer. Nucleic Acids Res 

2010, 38(Database issue):D652-657. 

133. Kerkel K, Spadola A, Yuan E, Kosek J, Jiang L, Hod E, Li K, Murty VV, Schupf N, Vilain 

E et al: Genomic surveys by methylation-sensitive SNP analysis identify 

sequence-dependent allele-specific DNA methylation. Nat Genet 2008, 40(7):904- 

908. 

134. Jones PA, Baylin SB: The epigenomics of cancer. Cell 2007, 128(4):683-692. 

135. Esteller M: Epigenetics in cancer. N Engl J Med 2008, 358(11):1148-1159. 

136. Feinberg AP: Phenotypic plasticity and the epigenetics of human disease. Nature 

2007, 447(7143):433-440. 

137. Vucic EA, Brown CJ, Lam WL: Epigenetics of cancer progression. 

Pharmacogenomics 2008, 9(2):215-234. 

138. Feinberg AP, Gehrke CW, Kuo KC, Ehrlich M: Reduced genomic 5-methylcytosine 

content in human colonic neoplasia. Cancer Res 1988, 48(5):1159-1161. 

139. Feinberg AP, Tycko B: The history of cancer epigenetics. Nat Rev Cancer 2004, 

4(2):143-153. 

140. Lo PK, Sukumar S: Epigenomics and breast cancer. Pharmacogenomics 2008, 

9(12):1879-1902. 

141. Toyota M, Ahuja N, Ohe-Toyota M, Herman JG, Baylin SB, Issa JP: CpG island 

methylator phenotype in colorectal cancer. Proc Natl Acad Sci U S A 1999, 

96(15):8681-8686. 

142. Issa JP: CpG island methylator phenotype in cancer. Nat Rev Cancer 2004, 

4(12):988-993. 

143. Tanemura A, Terando AM, Sim MS, van Hoesel AQ, de Maat MF, Morton DL, Hoon DS: 

CpG island methylator phenotype predicts progression of malignant melanoma. 

Clin Cancer Res 2009, 15(5):1801-1807. 

144. Dai Z, Lakshmanan RR, Zhu WG, Smiraglia DJ, Rush LJ, Fruhwald MC, Brena RM, Li 

B, Wright FA, Ross P et al: Global methylation profiling of lung cancer identifies 

novel methylated genes. Neoplasia 2001, 3(4):314-323. 

151

145. Takai D, Yagi Y, Wakazono K, Ohishi N, Morita Y, Sugimura T, Ushijima T: Silencing of 

HTR1B and reduced expression of EDN1 in human lung cancers, revealed by 

methylation-sensitive representational difference analysis. Oncogene 2001, 

20(51):7505-7513. 

146. Hu M, Yao J, Cai L, Bachman KE, van den Brule F, Velculescu V, Polyak K: Distinct 

epigenetic changes in the stromal cells of breast cancers. Nat Genet 2005, 

37(8):899-905. 

147. Irizarry RA, Ladd-Acosta C, Carvalho B, Wu H, Brandenburg SA, Jeddeloh JA, Wen B, 

Feinberg AP: Comprehensive high-throughput arrays for relative methylation 

(CHARM). Genome Res 2008, 18(5):780-790. 

148. Yan PS, Chen CM, Shi H, Rahmatpanah F, Wei SH, Caldwell CW, Huang TH: 

Dissecting complex epigenetic alterations in breast cancer using CpG island 

microarrays. Cancer Res 2001, 61(23):8375-8380. 

149. Yamamoto F, Yamamoto M: A DNA microarray-based methylation-sensitive (MS)- 

AFLP hybridization method for genetic and epigenetic analyses. Mol Genet 

Genomics 2004, 271(6):678-686. 

150. Omura N, Li CP, Li A, Hong SM, Walter K, Jimeno A, Hidalgo M, Goggins M: Genomewide 

profiling of methylated promoters in pancreatic adenocarcinoma. Cancer Biol 

Ther 2008, 7(7):1146-1156. 

151. Trinh BN, Long TI, Laird PW: DNA methylation analysis by MethyLight technology. 

Methods 2001, 25(4):456-462. 

152. Fan JB, Gunderson KL, Bibikova M, Yeakley JM, Chen J, Wickham Garcia E, Lebruska 

LL, Laurent M, Shen R, Barker D: Illumina universal bead arrays. Methods Enzymol 

2006, 410:57-73. 

153. Houshdaran S, Cortessis VK, Siegmund K, Yang A, Laird PW, Sokol RZ: Widespread 

epigenetic abnormalities suggest a broad DNA methylation erasure defect in 

abnormal human sperm. PLoS One 2007, 2(12):e1289. 

154. Houseman EA, Christensen BC, Karagas MR, Wrensch MR, Nelson HH, Wiemels JL, 

Zheng S, Wiencke JK, Kelsey KT, Marsit CJ: Copy number variation has little impact 

on bead-array-based measures of DNA methylation. Bioinformatics 2009, 

25(16):1999-2005. 

155. Breton CV, Byun HM, Wenten M, Pan F, Yang A, Gilliland FD: Prenatal tobacco 

smoke exposure affects global and gene-specific DNA methylation. Am J Respir 

Crit Care Med 2009, 180(5):462-467. 

156. Taylor KH, Pena-Hernandez KE, Davis JW, Arthur GL, Duff DJ, Shi H, Rahmatpanah 

FB, Sjahputera O, Caldwell CW: Large-scale CpG methylation analysis identifies 

novel candidate genes and reveals methylation hotspots in acute lymphoblastic 

leukemia. Cancer Res 2007, 67(6):2617-2625. 

157. Weber M, Hellmann I, Stadler MB, Ramos L, Paabo S, Rebhan M, Schubeler D: 

Distribution, silencing potential and evolutionary impact of promoter DNA 

methylation in the human genome. Nat Genet 2007, 39(4):457-466. 

158. Rauch T, Pfeifer GP: Methylated-CpG island recovery assay: a new technique for 

the rapid detection of methylated-CpG islands in cancer. Lab Invest 2005, 

85(9):1172-1180. 

159. Jacinto FV, Ballestar E, Ropero S, Esteller M: Discovery of epigenetically silenced 

genes by methylated DNA immunoprecipitation in colon cancer cells. Cancer Res 

2007, 67(24):11481-11486. 

160. Ballestar E, Paz MF, Valle L, Wei S, Fraga MF, Espada J, Cigudosa JC, Huang TH, 

Esteller M: Methyl-CpG binding proteins identify novel sites of epigenetic 

inactivation in human cancer. EMBO J 2003, 22(23):6335-6345. 

161. Serre D, Lee BH, Ting AH: MBD-isolated Genome Sequencing provides a highthroughput 

and comprehensive survey of DNA methylation in the human genome. 

Nucleic Acids Res 2009. 

152

162. Down TA, Rakyan VK, Turner DJ, Flicek P, Li H, Kulesha E, Graf S, Johnson N, Herrero 

J, Tomazou EM et al: A Bayesian deconvolution strategy for immunoprecipitationbased 

DNA methylome analysis. Nat Biotechnol 2008, 26(7):779-785. 

163. Thu KL, Pikor LA, Kennett JY, Alvarez CE, Lam WL: Methylation analysis by DNA 

immunoprecipitation. J Cell Physiol 2009, 222(3):522-531. 

164. Pelizzola M, Koga Y, Urban AE, Krauthammer M, Weissman S, Halaban R, Molinaro 

AM: MEDME: an experimental and analytical methodology for the estimation of 

DNA methylation levels based on microarray derived MeDIP-enrichment. Genome 

Res 2008, 18(10):1652-1659. 

165. Yamashita S, Hosoya K, Gyobu K, Takeshima H, Ushijima T: Development of a Novel 

Output Value for Quantitative Assessment in Methylated DNA 

Immunoprecipitation-CpG Island Microarray Analysis. DNA Res 2009. 

166. Irizarry RA, Ladd-Acosta C, Wen B, Wu Z, Montano C, Onyango P, Cui H, Gabo K, 

Rongione M, Webster M et al: The human colon cancer methylome shows similar 

hypo- and hypermethylation at conserved tissue-specific CpG island shores. Nat 

Genet 2009, 41(2):178-186. 

167. Lorincz MC, Dickerson DR, Schmitt M, Groudine M: Intragenic DNA methylation alters 

chromatin structure and elongation efficiency in mammalian cells. Nat Struct Mol 

Biol 2004, 11(11):1068-1075. 

168. Frigola J, Song J, Stirzaker C, Hinshelwood RA, Peinado MA, Clark SJ: Epigenetic 

remodeling in colorectal cancer results in coordinate gene suppression across an 

entire chromosome band. Nat Genet 2006, 38(5):540-549. 

169. Zhong S, Fields CR, Su N, Pan YX, Robertson KD: Pharmacologic inhibition of 

epigenetic modifications, coupled with gene expression profiling, reveals novel 

targets of aberrant DNA methylation and histone deacetylation in lung cancer. 

Oncogene 2007, 26(18):2621-2634. 

170. Lister R, Ecker JR: Finding the fifth base: genome-wide sequencing of cytosine 

methylation. Genome Res 2009, 19(6):959-966. 

171. Byun HM, Siegmund KD, Pan F, Weisenberger DJ, Kanel G, Laird PW, Yang AS: 

Epigenetic profiling of somatic tissues from human autopsy specimens identifies 

tissue- and individual-specific DNA methylation patterns. Hum Mol Genet 2009, 

18(24):4808-4817. 

172. Fraga MF, Ballestar E, Paz MF, Ropero S, Setien F, Ballestar ML, Heine-Suner D, 

Cigudosa JC, Urioste M, Benitez J et al: Epigenetic differences arise during the 

lifetime of monozygotic twins. Proc Natl Acad Sci U S A 2005, 102(30):10604-10609. 

173. Deng J, Shoemaker R, Xie B, Gore A, LeProust EM, Antosiewicz-Bourget J, Egli D, 

Maherali N, Park IH, Yu J et al: Targeted bisulfite sequencing reveals changes in 

DNA methylation associated with nuclear reprogramming. Nat Biotechnol 2009, 

27(4):353-360. 

174. Costello JF, Fruhwald MC, Smiraglia DJ, Rush LJ, Robertson GP, Gao X, Wright FA, 

Feramisco JD, Peltomaki P, Lang JC et al: Aberrant CpG-island methylation has nonrandom 

and tumour-type-specific patterns. Nat Genet 2000, 24(2):132-138. 

175. Gama-Sosa MA, Midgett RM, Slagel VA, Githens S, Kuo KC, Gehrke CW, Ehrlich M: 

Tissue-specific differences in DNA methylation in various mammals. Biochim 

Biophys Acta 1983, 740(2):212-219. 

176. Richardson B: Impact of aging on DNA methylation. Ageing Res Rev 2003, 2(3):245- 

261. 

177. Eckhardt F, Beck S, Gut IG, Berlin K: Future potential of the Human Epigenome 

Project. Expert Rev Mol Diagn 2004, 4(5):609-618. 

178. Kohda M, Hoshiya H, Katoh M, Tanaka I, Masuda R, Takemura T, Fujiwara M, 

Oshimura M: Frequent loss of imprinting of IGF2 and MEST in lung 

adenocarcinoma. Mol Carcinog 2001, 31(4):184-191. 

153

179. Kondo M, Suzuki H, Ueda R, Osada H, Takagi K, Takahashi T: Frequent loss of 

imprinting of the H19 gene is often associated with its overexpression in human 

lung cancers. Oncogene 1995, 10(6):1193-1198. 

180. Rainier S, Johnson LA, Dobry CJ, Ping AJ, Grundy PE, Feinberg AP: Relaxation of 

imprinted genes in human cancer. Nature 1993, 362(6422):747-749. 

181. Pal N, Wadey RB, Buckle B, Yeomans E, Pritchard J, Cowell JK: Preferential loss of 

maternal alleles in sporadic Wilms' tumour. Oncogene 1990, 5(11):1665-1668. 

182. Schroeder WT, Chao LY, Dao DD, Strong LC, Pathak S, Riccardi V, Lewis WH, 

Saunders GF: Nonrandom loss of maternal chromosome 11 alleles in Wilms 

tumors. Am J Hum Genet 1987, 40(5):413-420. 

183. Scrable H, Cavenee W, Ghavimi F, Lovell M, Morgan K, Sapienza C: A model for 

embryonal rhabdomyosarcoma tumorigenesis that involves genome imprinting. 

Proc Natl Acad Sci U S A 1989, 86(19):7480-7484. 

184. Gaudet F, Hodgson JG, Eden A, Jackson-Grusby L, Dausman J, Gray JW, Leonhardt H, 

Jaenisch R: Induction of tumors in mice by genomic hypomethylation. Science 

2003, 300(5618):489-492. 

185. Rizwana R, Hahn PJ: CpG methylation reduces genomic instability. J Cell Sci 1999, 

112 ( Pt 24):4513-4519. 

186. Daskalos A, Nikolaidis G, Xinarianos G, Savvari P, Cassidy A, Zakopoulou R, Kotsinas 

A, Gorgoulis V, Field JK, Liloglou T: Hypomethylation of retrotransposable elements 

correlates with genomic instability in non-small cell lung cancer. Int J Cancer 2009, 

124(1):81-87. 

187. Walsh CP, Chaillet JR, Bestor TH: Transcription of IAP endogenous retroviruses is 

constrained by cytosine methylation. Nat Genet 1998, 20(2):116-117. 

188. Chalitchagorn K, Shuangshoti S, Hourpai N, Kongruttanachok N, Tangkijvanich P, 

Thong-ngam D, Voravud N, Sriuranpong V, Mutirangura A: Distinctive pattern of LINE- 

1 methylation level in normal tissues and the association with carcinogenesis. 

Oncogene 2004, 23(54):8841-8846. 

189. Rauch TA, Zhong X, Wu X, Wang M, Kernstine KH, Wang Z, Riggs AD, Pfeifer GP: 

High-resolution mapping of DNA hypermethylation and hypomethylation in lung 

cancer. Proc Natl Acad Sci U S A 2008, 105(1):252-257. 

190. Groudine M, Eisenman R, Weintraub H: Chromatin structure of endogenous 

retroviral genes and activation by an inhibitor of DNA methylation. Nature 1981, 

292(5821):311-317. 

191. Wilson IM, Davies JJ, Weber M, Brown CJ, Alvarez CE, MacAulay C, Schubeler D, Lam 

WL: Epigenomics: mapping the methylome. Cell Cycle 2006, 5(2):155-158. 

192. Cadieux B, Ching TT, VandenBerg SR, Costello JF: Genome-wide hypomethylation in 

human glioblastomas associated with specific copy number alteration, 

methylenetetrahydrofolate reductase allele status, and increased proliferation. 

Cancer Res 2006, 66(17):8469-8476. 

193. Zabarovsky ER, Lerman MI, Minna JD: Tumor suppressor genes on chromosome 3p 

involved in the pathogenesis of lung and other cancers. Oncogene 2002, 

21(45):6915-6935. 

194. Belinsky SA, Palmisano WA, Gilliland FD, Crooks LA, Divine KK, Winters SA, Grimes 

MJ, Harms HJ, Tellez CS, Smith TM et al: Aberrant promoter methylation in 

bronchial epithelium and sputum from current and former smokers. Cancer Res 

2002, 62(8):2370-2377. 

195. Palmisano WA, Divine KK, Saccomanno G, Gilliland FD, Baylin SB, Herman JG, 

Belinsky SA: Predicting lung cancer by detecting aberrant promoter methylation in 

sputum. Cancer Res 2000, 60(21):5954-5958. 

196. Belinsky SA: Gene-promoter hypermethylation as a biomarker in lung cancer. Nat 

Rev Cancer 2004, 4(9):707-717. 

154

197. Tessema M, Willink R, Do K, Yu YY, Yu W, Machida EO, Brock M, Van Neste L, Stidley 

CA, Baylin SB et al: Promoter methylation of genes in and around the candidate 

lung cancer susceptibility locus 6q23-25. Cancer Res 2008, 68(6):1707-1714. 

198. Heintzman ND, Hon GC, Hawkins RD, Kheradpour P, Stark A, Harp LF, Ye Z, Lee LK, 

Stuart RK, Ching CW et al: Histone modifications at human enhancers reflect global 

cell-type-specific gene expression. Nature 2009, 459(7243):108-112. 

199. Komashko VM, Acevedo LG, Squazzo SL, Iyengar SS, Rabinovich A, O'Geen H, Green 

R, Farnham PJ: Using ChIP-chip technology to reveal common principles of 

transcriptional repression in normal and cancer cells. Genome Res 2008, 

18(4):521-532. 

200. Ke XS, Qu Y, Rostad K, Li WC, Lin B, Halvorsen OJ, Haukaas SA, Jonassen I, Petersen 

K, Goldfinger N et al: Genome-wide profiling of histone h3 lysine 4 and lysine 27 

trimethylation reveals an epigenetic signature in prostate carcinogenesis. PLoS 

One 2009, 4(3):e4687. 

201. Kondo Y, Shen L, Cheng AS, Ahmed S, Boumber Y, Charo C, Yamochi T, Urano T, 

Furukawa K, Kwabi-Addo B et al: Gene silencing in cancer by histone H3 lysine 27 

trimethylation independent of promoter DNA methylation. Nat Genet 2008, 

40(6):741-750. 

202. Yu J, Rhodes DR, Tomlins SA, Cao X, Chen G, Mehra R, Wang X, Ghosh D, Shah RB, 

Varambally S et al: A polycomb repression signature in metastatic prostate cancer 

predicts cancer outcome. Cancer Res 2007, 67(22):10657-10663. 

203. Wu J, Wang SH, Potter D, Liu JC, Smith LT, Wu YZ, Huang TH, Plass C: Diverse 

histone modifications on histone 3 lysine 9 and their relation to DNA methylation 

in specifying gene silencing. BMC Genomics 2007, 8:131. 

204. Krivtsov AV, Feng Z, Lemieux ME, Faber J, Vempati S, Sinha AU, Xia X, Jesneck J, 

Bracken AP, Silverman LB et al: H3K79 methylation profiles define murine and 

human MLL-AF4 leukemias. Cancer Cell 2008, 14(5):355-368. 

205. Lin B, Wang J, Hong X, Yan X, Hwang D, Cho JH, Yi D, Utleg AG, Fang X, Schones DE 

et al: Integrated expression profiling and ChIP-seq analyses of the growth 

inhibition response program of the androgen receptor. PLoS One 2009, 4(8):e6589. 

206. Fullwood MJ, Liu MH, Pan YF, Liu J, Xu H, Mohamed YB, Orlov YL, Velkov S, Ho A, Mei 

PH et al: An oestrogen-receptor-alpha-bound human chromatin interactome. 

Nature 2009, 462(7269):58-64. 


expression analysis of cancer. J Cell Physiol 2008, 217(3):590-597. 

208. Liang P, Pardee AB: Analysing differential gene expression in cancer. Nat Rev 

Cancer 2003, 3(11):869-876. 

209. Nevins JR, Potti A: Mining gene expression profiles: expression signatures as 

cancer phenotypes. Nat Rev Genet 2007, 8(8):601-609. 





211. Heidenblad M, Lindgren D, Veltman JA, Jonson T, Mahlamaki EH, Gorunova L, van 

Kessel AG, Schoenmakers EF, Hoglund M: Microarray analyses reveal strong 

influence of DNA copy number alterations on the transcriptional patterns in 

pancreatic cancer: implications for the interpretation of genomic amplifications. 

Oncogene 2005, 24(10):1794-1801. 




213. Wolf M, Mousses S, Hautaniemi S, Karhu R, Huusko P, Allinen M, Elkahloun A, Monni 

O, Chen Y, Kallioniemi A et al: High-resolution analysis of gene copy number 

155

alterations in human prostate cancer using CGH on cDNA microarrays: impact of 

copy number on gene expression. Neoplasia 2004, 6(3):240-247. 

214. Adelaide J, Finetti P, Bekhouche I, Repellini L, Geneix J, Sircoulomb F, Charafe-Jauffret 

E, Cervera N, Desplans J, Parzy D et al: Integrated profiling of basal and luminal 

breast cancers. Cancer Res 2007, 67(24):11565-11575. 

215. Broet P, Camilleri-Broet S, Zhang S, Alifano M, Bangarusamy D, Battistella M, Wu Y, 

Tuefferd M, Regnard JF, Lim E et al: Prediction of clinical outcome in multiple lung 

cancer cohorts by integrative genomics: implications for chemotherapy selection. 

Cancer Res 2009, 69(3):1055-1062. 




217. Natrajan R, Weigelt B, Mackay A, Geyer FC, Grigoriadis A, Tan DS, Jones C, Lord CJ, 

Vatcheva R, Rodriguez-Pinilla SM et al: An integrative genomic and transcriptomic 

analysis reveals molecular pathways and networks regulated by copy number 

aberrations in basal-like, HER2 and luminal cancers. Breast Cancer Res Treat 2009. 

218. Deng S, Calin GA, Croce CM, Coukos G, Zhang L: Mechanisms of microRNA 

deregulation in human cancer. Cell Cycle 2008, 7(17):2643-2646. 

219. Kuo KT, Guan B, Feng Y, Mao TL, Chen X, Jinawath N, Wang Y, Kurman RJ, Shih Ie M, 

Wang TL: Analysis of DNA copy number alterations in ovarian serous tumors 

identifies new molecular genetic changes in low-grade and high-grade 

carcinomas. Cancer Res 2009, 69(9):4036-4042. 

220. Lionetti M, Agnelli L, Mosca L, Fabris S, Andronache A, Todoerti K, Ronchetti D, 

Deliliers GL, Neri A: Integrative high-resolution microarray analysis of human 

myeloma cell lines reveals deregulated miRNA expression associated with allelic 

imbalances and gene expression profiles. Genes Chromosomes Cancer 2009, 

48(6):521-531. 

221. Starczynowski DT, Kuchenbauer F, Argiropoulos B, Sung S, Morin R, Muranyi A, Hirst 

M, Hogge D, Marra M, Wells RA et al: Identification of miR-145 and miR-146a as 

mediators of the 5q- syndrome phenotype. Nat Med 2009. 

222. Zhang L, Volinia S, Bonome T, Calin GA, Greshock J, Yang N, Liu CG, Giannakakis A, 

Alexiou P, Hasegawa K et al: Genomic and epigenetic alterations deregulate 

microRNA expression in human epithelial ovarian cancer. Proc Natl Acad Sci U S A 

2008, 105(19):7004-7009. 

223. Calin GA, Croce CM: MicroRNA signatures in human cancers. Nat Rev Cancer 2006, 

6(11):857-866. 

224. Nicoloso MS, Spizzo R, Shimizu M, Rossi S, Calin GA: MicroRNAs--the micro 

steering wheel of tumour metastases. Nat Rev Cancer 2009, 9(4):293-302. 

225. Wolf NG, Farver C, Abdul-Karim FW, Schwartz S: Analysis of microsatellite 

instability and X-inactivation in ovarian borderline tumors lacking numerical 

abnormalities by comparative genomic hybridization. Cancer Genet Cytogenet 

2003, 145(2):133-138. 

226. Olson P, Lu J, Zhang H, Shai A, Chun MG, Wang Y, Libutti SK, Nakakura EK, Golub 

TR, Hanahan D: MicroRNA dynamics in the stages of tumorigenesis correlate with 

hallmark capabilities of cancer. Genes Dev 2009, 23(18):2152-2165. 

227. Wang X: miRDB: a microRNA target prediction and functional annotation database 

with a wiki interface. RNA 2008, 14(6):1012-1017. 

228. Garzon R, Calin GA, Croce CM: MicroRNAs in Cancer. Annu Rev Med 2009, 60:167- 

179. 

229. Lujambio A, Esteller M: How epigenetics can explain human metastasis: a new role 

for microRNAs. Cell Cycle 2009, 8(3):377-382. 

230. Iorio MV, Visone R, Di Leva G, Donati V, Petrocca F, Casalini P, Taccioli C, Volinia S, 

Liu CG, Alder H et al: MicroRNA signatures in human ovarian cancer. Cancer Res 

2007, 67(18):8699-8707. 

156

231. Lujambio A, Esteller M: CpG island hypermethylation of tumor suppressor 

microRNAs in human cancer. Cell Cycle 2007, 6(12):1455-1459. 

232. Lujambio A, Ropero S, Ballestar E, Fraga MF, Cerrato C, Setien F, Casado S, Suarez- 

Gauthier A, Sanchez-Cespedes M, Git A et al: Genetic unmasking of an 

epigenetically silenced microRNA in human cancer cells. Cancer Res 2007, 

67(4):1424-1429. 

233. Guil S, Esteller M: DNA methylomes, histone codes and miRNAs: tying it all 

together. Int J Biochem Cell Biol 2009, 41(1):87-95. 

234. Sadikovic B, Yoshimoto M, Chilton-MacNeill S, Thorner P, Squire JA, Zielenska M: 

Identification of interactive networks of gene expression associated with 

osteosarcoma oncogenesis by integrated molecular profiling. Hum Mol Genet 

2009, 18(11):1962-1975. 

235. Joshi MD, Ahmad R, Yin L, Raina D, Rajabi H, Bubley G, Kharbanda S, Kufe D: MUC1 

oncoprotein is a druggable target in human prostate cancer cells. Mol Cancer Ther 

2009, 8(11):3056-3065. 

236. Khodarev NN, Pitroda SP, Beckett MA, MacDermed DM, Huang L, Kufe DW, 

Weichselbaum RR: MUC1-induced transcriptional programs associated with 

tumorigenesis predict outcome in breast and lung cancer. Cancer Res 2009, 

69(7):2833-2837. 

237. Senapati S, Das S, Batra SK: Mucin-interacting proteins: from function to 

therapeutics. Trends Biochem Sci 2009. 

238. Wu CJ, Chen Z, Ullrich A, Greene MI, O'Rourke DM: Inhibition of EGFR-mediated 

phosphoinositide-3-OH kinase (PI3-K) signaling and glioblastoma phenotype by 

signal-regulatory proteins (SIRPs). Oncogene 2000, 19(35):3999-4010. 

239. Kapoor GS, Kapitonov D, O'Rourke DM: Transcriptional regulation of signal 

regulatory protein alpha1 inhibitory receptors by epidermal growth factor receptor 

signaling. Cancer Res 2004, 64(18):6444-6452. 

240. Yamasaki Y, Ito S, Tsunoda N, Kokuryo T, Hara K, Senga T, Kannagi R, Yamamoto T, 

Oda K, Nagino M et al: SIRPalpha1 and SIRPalpha2: their role as tumor 

suppressors in breast carcinoma cells. Biochem Biophys Res Commun 2007, 

361(1):7-13. 

241. Qin JM, Wan XW, Zeng JZ, Wu MC: Effect of Sirpalpha1 on the expression of 

nuclear factor-kappa B in hepatocellular carcinoma. Hepatobiliary Pancreat Dis Int 

2007, 6(3):276-283. 

242. Gardai SJ, Xiao YQ, Dickinson M, Nick JA, Voelker DR, Greene KE, Henson PM: By 

binding SIRPalpha or calreticulin/CD91, lung collectins act as dual function 

surveillance molecules to suppress or enhance inflammation. Cell 2003, 115(1):13- 

23. 

243. Takada T, Matozaki T, Takeda H, Fukunaga K, Noguchi T, Fujioka Y, Okazaki I, Tsuda 

M, Yamao T, Ochi F et al: Roles of the complex formation of SHPS-1 with SHP-2 in 

insulin-stimulated mitogen-activated protein kinase activation. J Biol Chem 1998, 

273(15):9234-9242. 

244. Motegi S, Okazawa H, Ohnishi H, Sato R, Kaneko Y, Kobayashi H, Tomizawa K, Ito T, 

Honma N, Buhring HJ et al: Role of the CD47-SHPS-1 system in regulation of cell 

migration. EMBO J 2003, 22(11):2634-2644. 

245. Kharitonenkov A, Chen Z, Sures I, Wang H, Schilling J, Ullrich A: A family of proteins 

that inhibit signalling through tyrosine kinase receptors. Nature 1997, 

386(6621):181-186. 

246. Meyer PE, Kontos K, Lafitte F, Bontempi G: Information-theoretic inference of large 

transcriptional regulatory networks. EURASIP J Bioinform Syst Biol 2007:79879. 

247. Meyer PE, Lafitte F, Bontempi G: minet: A R/Bioconductor package for inferring 

large transcriptional networks using mutual information. BMC Bioinformatics 2008, 

9:461. 

157

248. Xi L, Feber A, Gupta V, Wu M, Bergemann AD, Landreneau RJ, Litle VR, Pennathur A, 

Luketich JD, Godfrey TE: Whole genome exon arrays identify differential 

expression of alternatively spliced, cancer-related genes in lung cancer. Nucleic 

Acids Res 2008, 36(20):6535-6547. 

249. Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, 

Ge Y, Gentry J et al: Bioconductor: open software development for computational 

biology and bioinformatics. Genome Biol 2004, 5(10):R80. 

250. Bild AH, Yao G, Chang JT, Wang Q, Potti A, Chasse D, Joshi MB, Harpole D, Lancaster 

JM, Berchuck A et al: Oncogenic pathway signatures in human cancers as a guide 

to targeted therapies. Nature 2006, 439(7074):353-357. 

251. Shedden K, Taylor JM, Enkemann SA, Tsao MS, Yeatman TJ, Gerald WL, Eschrich S, 

Jurisica I, Giordano TJ, Misek DE et al: Gene expression-based survival prediction in 

lung adenocarcinoma: a multi-site, blinded validation study. Nat Med 2008, 

14(8):822-827. 




253. Buys TP, Aviel-Ronen S, Waddell TK, Lam WL, Tsao MS: Defining genomic alteration 

boundaries for a combined small cell and non-small cell lung carcinoma. J Thorac 

Oncol 2009, 4(2):227-239. 

254. Brommesson S, Jonsson G, Strand C, Grabau D, Malmstrom P, Ringner M, Ferno M, 

Hedenfalk I: Tiling array-CGH for the assessment of genomic similarities among 

synchronous unilateral and bilateral invasive breast cancer tumor pairs. BMC Clin 

Pathol 2008, 8:6. 

255. Kawanishi H, Takahashi T, Ito M, Matsui Y, Watanabe J, Ito N, Kamoto T, Kadowaki T, 

Tsujimoto G, Imoto I et al: Genetic analysis of multifocal superficial urothelial 

cancers by array-based comparative genomic hybridisation. Br J Cancer 2007, 

97(2):260-266. 

256. Mhawech-Fauceglia P, Rai H, Nowak N, Cheney RT, Rodabaugh K, Lele S, Odunsi K: 

The use of array-based comparative genomic hybridization (a-CGH) to distinguish 

metastatic from primary synchronous carcinomas of the ovary and the uterus. 

Histopathology 2008, 53(4):490-495. 

257. Nakano H, Soda H, Nakamura Y, Uchida K, Takasu M, Nakatomi K, Izumikawa K, 

Hayashi T, Nagayasu T, Tsukamoto K et al: Different epidermal growth factor 

receptor gene mutations in a patient with 2 synchronous lung cancers. Clin Lung 

Cancer 2007, 8(9):562-564. 

258. Ryoo BY, Na, II, Yang SH, Koh JS, Kim CH, Lee JC: Synchronous multiple primary 

lung cancers with different response to gefitinib. Lung Cancer 2006, 53(2):245-248. 

259. Speel EJ, van de Wouw AJ, Claessen SM, Haesevoets A, Hopman AH, van der Wurff 

AA, Osieka R, Buettner R, Hillen HF, Ramaekers FC: Molecular evidence for a clonal 

relationship between multiple lesions in patients with unknown primary 

adenocarcinoma. Int J Cancer 2008, 123(6):1292-1300. 

260. Wa CV, DeVries S, Chen YY, Waldman FM, Hwang ES: Clinical application of arraybased 

comparative genomic hybridization to define the relationship between 

multiple synchronous tumors. Mod Pathol 2005, 18(4):591-597. 

261. Agelopoulos K, Tidow N, Korsching E, Voss R, Hinrichs B, Brandt B, Boecker W, 

Buerger H: Molecular cytogenetic investigations of synchronous bilateral breast 

cancer. J Clin Pathol 2003, 56(9):660-665. 

262. Whitehurst AW, Bodemann BO, Cardenas J, Ferguson D, Girard L, Peyton M, Minna 

JD, Michnoff C, Hao W, Roth MG et al: Synthetic lethal screen identification of 

chemosensitizer loci in cancer cells. Nature 2007, 446(7137):815-819. 

263. Barbie DA, Tamayo P, Boehm JS, Kim SY, Moody SE, Dunn IF, Schinzel AC, Sandy P, 

Meylan E, Scholl C et al: Systematic RNA interference reveals that oncogenic 

KRAS-driven cancers require TBK1. Nature 2009, 462(7269):108-112. 

158

264. Berns K, Hijmans EM, Mullenders J, Brummelkamp TR, Velds A, Heimerikx M, 

Kerkhoven RM, Madiredjo M, Nijkamp W, Weigelt B et al: A large-scale RNAi screen 

in human cells identifies new components of the p53 pathway. Nature 2004, 

428(6981):431-437. 

265. Gobeil S, Zhu X, Doillon CJ, Green MR: A genome-wide shRNA screen identifies 

GAS1 as a novel melanoma metastasis suppressor gene. Genes Dev 2008, 

22(21):2932-2940. 

266. Luo B, Cheung HW, Subramanian A, Sharifnia T, Okamoto M, Yang X, Hinkle G, Boehm 

JS, Beroukhim R, Weir BA et al: Highly parallel identification of essential genes in 

cancer cells. Proc Natl Acad Sci U S A 2008, 105(51):20380-20385. 

267. Luo J, Emanuele MJ, Li D, Creighton CJ, Schlabach MR, Westbrook TF, Wong KK, 

Elledge SJ: A genome-wide RNAi screen identifies multiple synthetic lethal 

interactions with the Ras oncogene. Cell 2009, 137(5):835-848. 

268. Moffat J, Grueneberg DA, Yang X, Kim SY, Kloepfer AM, Hinkle G, Piqani B, Eisenhaure 

TM, Luo B, Grenier JK et al: A lentiviral RNAi library for human and mouse genes 

applied to an arrayed viral high-content screen. Cell 2006, 124(6):1283-1298. 

269. Scholl C, Frohling S, Dunn IF, Schinzel AC, Barbie DA, Kim SY, Silver SJ, Tamayo P, 

Wadlow RC, Ramaswamy S et al: Synthetic lethal interaction between oncogenic 

KRAS dependency and STK33 suppression in human cancer cells. Cell 2009, 

137(5):821-834. 

270. Silva JM, Marran K, Parker JS, Silva J, Golding M, Schlabach MR, Elledge SJ, Hannon 

GJ, Chang K: Profiling essential genes in human mammary cells by multiplex RNAi 

screening. Science 2008, 319(5863):617-620. 

271. Apweiler R, Aslanidis C, Deufel T, Gerstner A, Hansen J, Hochstrasser D, Kellner R, 

Kubicek M, Lottspeich F, Maser E et al: Approaching clinical proteomics: current 

state and future fields of application in cellular proteomics. Cytometry A 2009, 

75(10):816-832. 

272. Apweiler R, Aslanidis C, Deufel T, Gerstner A, Hansen J, Hochstrasser D, Kellner R, 

Kubicek M, Lottspeich F, Maser E et al: Approaching clinical proteomics: current 

state and future fields of application in fluid proteomics. Clin Chem Lab Med 2009, 

47(6):724-744. 

273. Peng XQ, Wang F, Geng X, Zhang WM: Current advances in tumor proteomics and 

candidate biomarkers for hepatic cancer. Expert Rev Proteomics 2009, 6(5):551-561. 

274. Tainsky MA: Genomic and proteomic biomarkers for cancer: a multitude of 

opportunities. Biochim Biophys Acta 2009, 1796(2):176-193. 

275. Zamo A, Cecconi D: Proteomic analysis of lymphoid and haematopoietic 

neoplasms: There's more than biomarker discovery. J Proteomics 2009. 

276. Griffin JL, Kauppinen RA: A metabolomics perspective of human brain tumours. 

FEBS J 2007, 274(5):1132-1139. 

277. Spratlin JL, Serkova NJ, Eckhardt SG: Clinical applications of metabolomics in 

oncology: a review. Clin Cancer Res 2009, 15(2):431-440. 

278. Sreekumar A, Poisson LM, Rajendiran TM, Khan AP, Cao Q, Yu J, Laxman B, Mehra R, 

Lonigro RJ, Li Y et al: Metabolomic profiles delineate potential role for sarcosine in 

prostate cancer progression. Nature 2009, 457(7231):910-914. 

279. Adamovic T, Trosso F, Roshani L, Andersson L, Petersen G, Rajaei S, Helou K, Levan 

G: Oncogene amplification in the proximal part of chromosome 6 in rat 

endometrial adenocarcinoma as revealed by combined BAC/PAC FISH, 

chromosome painting, zoo-FISH, and allelotyping. Genes Chromosomes Cancer 

2005, 44(2):139-153. 

280. Ferrandina G, Mey V, Nannizzi S, Ricciardi S, Petrillo M, Ferlini C, Danesi R, Scambia 

G, Del Tacca M: Expression of nucleoside transporters, deoxycitidine kinase, 

ribonucleotide reductase regulatory subunits, and gemcitabine catabolic enzymes 

in primary ovarian cancer. Cancer Chemother Pharmacol 2009. 

159

281. Fernandez-Ranvier GG, Weng J, Yeh RF, Khanafshar E, Suh I, Barker C, Duh QY, 

Clark OH, Kebebew E: Identification of biomarkers of adrenocortical carcinoma 

using genomewide gene expression profiling. Arch Surg 2008, 143(9):841-846; 

discussion 846. 

282. Segditsas S, Sieber O, Deheragoda M, East P, Rowan A, Jeffery R, Nye E, Clark S, 

Spencer-Dene B, Stamp G et al: Putative direct and indirect Wnt targets identified 

through consistent gene expression changes in APC-mutant intestinal adenomas 

from humans and mice. Hum Mol Genet 2008, 17(24):3864-3875. 

283. Conde L, Montaner D, Burguet-Castell J, Tarraga J, Medina I, Al-Shahrour F, Dopazo J: 

ISACGH: a web-based environment for the analysis of Array CGH and gene 

expression which includes functional profiling. Nucleic Acids Res 2007, 35(Web 

Server issue):W81-85. 

284. La Rosa P, Viara E, Hupe P, Pierron G, Liva S, Neuvial P, Brito I, Lair S, Servant N, 

Robine N et al: VAMP: visualization and analysis of array-CGH, transcriptome and 

other molecular profiles. Bioinformatics 2006, 22(17):2066-2073. 

285. Kapushesky M, Emam I, Holloway E, Kurnosov P, Zorin A, Malone J, Rustici G, Williams 

E, Parkinson H, Brazma A: Gene expression atlas at the European bioinformatics 

institute. Nucleic Acids Res 2010, 38(Database issue):D690-698. 

286. Li L, Bum-Erdene K, Baenziger PH, Rosen JJ, Hemmert JR, Nellis JA, Pierce ME, 

Meroueh SO: BioDrugScreen: a computational drug design resource for ranking 

molecules docked to the human proteome. Nucleic Acids Res 2010, 38(Database 

issue):D765-773. 

287. Kato K, Yamashita R, Matoba R, Monden M, Noguchi S, Takagi T, Nakai K: Cancer 

gene expression database (CGED): a database for gene expression profiling with 

accompanying clinical information of human cancer tissues. Nucleic Acids Res 


288. Li H, He Y, Ding G, Wang C, Xie L, Li Y: dbDEPC: a database of Differentially 

Expressed Proteins in human Cancers. Nucleic Acids Res 2010, 38(Database 

issue):D658-664. 

289. Brooksbank C, Cameron G, Thornton J: The European Bioinformatics Institute's data 

resources. Nucleic Acids Res 2010, 38(Database issue):D17-25. 

290. Safran M, Chalifa-Caspi V, Shmueli O, Olender T, Lapidot M, Rosen N, Shmoish M, 

Peter Y, Glusman G, Feldmesser E et al: Human Gene-Centric Databases at the 

Weizmann Institute of Science: GeneCards, UDB, CroW 21 and HORDE. Nucleic 

Acids Res 2003, 31(1):142-146. 

291. Zhang Y, Lv J, Liu H, Zhu J, Su J, Wu Q, Qi Y, Wang F, Li X: HHMD: the human 

histone modification database. Nucleic Acids Res 2010, 38(Database issue):D149- 

154. 

292. Betel D, Wilson M, Gabow A, Marks DS, Sander C: The microRNA.org resource: 

targets and expression. Nucleic Acids Res 2008, 36(Database issue):D149-153. 

293. Jiang Q, Wang Y, Hao Y, Juan L, Teng M, Zhang X, Li M, Wang G, Liu Y: 

miR2Disease: a manually curated database for microRNA deregulation in human 

disease. Nucleic Acids Res 2009, 37(Database issue):D98-104. 

294. Alexiou P, Vergoulis T, Gleditzsch M, Prekas G, Dalamagas T, Megraw M, Grosse I, 

Sellis T, Hatzigeorgiou AG: miRGen 2.0: a database of microRNA genomic 

information and regulation. Nucleic Acids Res 2010, 38(Database issue). 

295. Sayers EW, Barrett T, Benson DA, Bolton E, Bryant SH, Canese K, Chetvernin V, 

Church DM, Dicuccio M, Federhen S et al: Database resources of the National Center 

for Biotechnology Information. Nucleic Acids Res 2010, 38(Database issue):D5-D16. 

296. Barrett T, Troup DB, Wilhite SE, Ledoux P, Rudnev D, Evangelista C, Kim IF, Soboleva 

A, Tomashevsky M, Marshall KA et al: NCBI GEO: archive for high-throughput 

functional genomic data. Nucleic Acids Res 2009, 37(Database issue):D885-890. 

297. Rhodes DR, Kalyana-Sundaram S, Mahavisno V, Varambally R, Yu J, Briggs BB, 

Barrette TR, Anstet MJ, Kincead-Beal C, Kulkarni P et al: Oncomine 3.0: genes, 

160

pathways, and networks in a collection of 18,000 cancer gene expression profiles. 

Neoplasia 2007, 9(2):166-180. 

298. Baudis M: Genomic imbalances in 5918 malignant epithelial tumors: an explorative 

meta-analysis of chromosomal CGH data. BMC Cancer 2007, 7:226. 

299. Vizcaino JA, Cote R, Reisinger F, Barsnes H, Foster JM, Rameseder J, Hermjakob H, 

Martens L: The Proteomics Identifications database: 2010 update. Nucleic Acids Res 


300. Ren Y, Gong W, Zhou H, Wang Y, Xiao F, Li T: siRecords: a database of mammalian 

RNAi experiments and efficacies. Nucleic Acids Res 2009, 37(Database issue):D146- 

149. 




302. Rhead B, Karolchik D, Kuhn RM, Hinrichs AS, Zweig AS, Fujita PA, Diekhans M, Smith 

KE, Rosenbloom KR, Raney BJ et al: The UCSC Genome Browser database: update 

2010. Nucleic Acids Res 2010, 38(Database issue):D613-619. 

161

Chapter 6: Conclusions 

162

6.1 Summary 

Lung adenocarcinoma is the most commonly diagnosed form of lung cancer today, with a large 

percentage of patients exhibiting poor overall survival and prognosis. Genomic analysis has 

provided much insight into this disease with the identification of specific differentially expressed 

genes, somatically mutated genes, hyper- and hypomethylated genes and genes which are 

amplified and deleted at the DNA copy number level. While tools and platforms for whole 

genome analysis of gene dosage and gene expression are widely accessible and have 

improved in resolution, only until recently have technologies to assess DNA methylation in a 

high throughput manner been made available. Hence, the logical step with these vast amounts 

of data is to integrate the information from these different assays in a parallel manner to gain a 

better understanding of the biology of lung adenocarcinoma. 

6.1.1 Development of the integrative genetic and epigenetic approach 

In chapter 2, the development of the SIGMA2 software package is discussed [1]. At the time 

the package was developed, there were no analysis tools for integrative genetic and epigenetic 

analysis, let alone tools to integrate gene dosage and gene expression, which were two well 

established high throughput platforms. Moreover, for array CGH data alone, limited number of 

tools existed. Hence, as a precursor to SIGMA2, SIGMA [2], was developed and used as a 

framework for SIGMA2. 

In chapter 3, I demonstrated how when this integrative approach is applied to model systems, 

that we learn much more using multiple dimensions as compared to when only looking at a 

single dimension alone. Specifically, we show that we can associate more of the dysregulated 

gene expression with alterations at the DNA level, with some cell lines illustrating as much as 

80% of the observed gene expression changes being able to be associated with DNA level 

changes. In addition, I also illustrated two key concepts: (i) the “Or” (multiple alternate hit 

mechanisms) concept where across a sample set and using a fixed frequency of disruption, we 

163

can identify nearly three times as many genes when we can account for multiple types of 

disruption as opposed to accounting for only a single type of alteration and (ii) the “And” 

(coupled multiple hits) concept where we identify genes which are targeted my multiple 

mechanisms in the sample and show that these genes can have significant biological and/or 

clinical relevance. 

6.1.2 Identification of a prevalent genetic alteration in lung adenocarcinoma 

Genetic and epigenetic alterations have been shown to be prominent in lung adenocarcinoma. 

Within genetic alterations, the majority of documented alterations have involved alterations in 

gene dosage, somatic mutation, and loss of heterozygosity (LOH) or allelic imbalance (whereby 

an allele or a portion of the allele is lost or gained in the tumor). In terms of allelic imbalance, 

the majority of the time this event is captured as a decrease or increase in gene dosage. 

However, there are also cases where allelic imbalance exists but there is no net change in gene 

dosage, termed copy neutral LOH or somatic uniparental disomy (UPD). Chapter 4 discusses 

the unexpected prevalence of UPD in the lung adenocarcinoma genome. 

Though previous studies were done using SNP arrays on lung adenocarcinoma tumors, the 

prevalence of UPD was likely underestimated due to a number of reasons. Amongst the 

reasons include the use of heterogeneous samples with high normal cell contamination due to 

lack of microdissection, lower resolution of alterations identifiable by previous SNP arrays, use 

of non patient-matched controls as reference and movement from call-based algorithms to 

algorithms which use allele specific copy number [3]. 

In addition to the prevalence of UPD, the other key finding from this chapter is the presence of 

frequent UPD at both known and novel oncogenes. While UPD has previously been shown to 

affect tumor suppressor genes such as RB1 [4], the association of UPD at oncogene loci has 

not been reported as often in solid tumors. Moreover, in the previous studies in hematological 

malignancies such as leukemias, lymphomas, or myeloid dysplastic syndromes, the observed 

164

UPD at oncogenes was typically accompanied by an acquired homozygous mutation at the 

locus [3, 5-8]. From our data, though UPD was also observed at mutated KRAS, as shown 

previously [9], there was also frequent UPD in cases where KRAS was not mutated. This 

finding suggests that in the cases which UPD occurs without somatic mutation, that the UPD 

event may in fact be used a mechanism for preferential allele selection. Specifically, this could 

be preference for the methylated allele for tumor suppressor genes or unmethylated allele for 

oncogenes [10, 11]. Alternatively, the preference could be for a more transcriptionally active (or 

inactive) allele as it has been shown that for specific genes, the two alleles may differ in rates of 

transcription [12-16]. Thus, integration of genetic data with epigenetic and gene expression 

data would help decipher the target of these frequently observed UPD events. 

6.1.3 Application of the integrative approach to lung adenocarcinoma specimens 

Thus far, I have shown that the integrative genetic and epigenetic approach is beneficial in 

identifying important genes; both which would have been missed if single assays alone were 

used and those which have concurrent alteration at multiple levels. Chapter 5 discusses how 

upon application of this approach to lung adenocarcinoma specimens, we see that this trend 

also holds true in clinical samples. Specifically, I show that novel canonical signaling pathways 

are significantly enriched for when multiple DNA-based dimensions are used but are missed 

(not statistically significant) when a single DNA-based dimension is used. In addition, when we 

examined the well-documented EGFR signaling pathway, a pathway known to be involved in 

lung adenocarcinoma, the most frequently disrupted gene was signal-regulatory protein alpha 

(SIRPA). 

SIRPA has been shown to be a direct downstream component of EGFR, and has been shown 

to be suppressed in expression by EGFR activation [17, 18]. In the resting lung, SIRPA has 

been postulated to control the inflammatory response through SHP-1 and eventually, NFKB 

[19]. While there are likely multiple components between SIRPA and NFKB, we wanted to 

assess expression of components directly associated with SIRPA. In addition, since this gene 

165

was identified in a small set of samples, we wanted to see if this prevalence of disruption was 

maintained in an additional, larger set of tumors. Hence, we evaluated expression of SHP-1 

and SIRPA in a panel of approximately 60 lung adenocarcinoma tumors and found (i) a high 

prevalence of underexpression of SIRPA and (ii) a strong correlation between SIRPA and SHP- 

1 expression levels. It is interesting to observe this strong relationship between SIRPA and 

SHP-1 as most cancer studies have focused on SIRPA’s relationship with SHP-2 instead of 

SHP-1. 


I have demonstrated the power of an integrative genetic and epigenetic approach to decipher 

resultant gene expression changes in lung adenocarcinoma. The development of an 

application such as SIGMA2 was integral as it represented one of the first academic/research 

applications with the ability to integrate multiple dimensions of data. To date, there have been a 

few other applications that have been developed that can perform similar functionalities but 

most of these have been developed by commercial entities. Moreover, the software still is not 

out-dated and based on the way it was built, can be extended to handle newer high throughput 

platforms including sequence-based platforms. 

In terms of what we learn from both the demonstration dataset (Chapter 3) as well as clinical 

tumor dataset (Chapter 5), we know that by using an integrative, multi-dimensional approach, 

we are detecting genes being disrupted at a much higher frequency when multiple dimensions 

are examined as compared to single dimensions alone. Moreover, at a given detection 

frequency, a gene may be disrupted by a single dimension at a low frequency but when multiple 

dimensions are accounted for, the frequency is in fact high. In Figure 5.5, I illustrate how well 

known lung cancer genes such as RRM2 are altered at both the genetic and epigenetic level 

and illustrate how more pathways are deemed significant when multiple dimensions are 

analyzed. The latter finding is likely a result of the fact that within a given pathway, not only can 

different genes be affected in different samples by one mechanism (e.g. DNA copy number 

166

amplification), but they also can be affected by different, but complementary, mechanisms (e.g. 

DNA methylation). These findings validate part A of the hypothesis. In addition, when aligning 

genetic and epigenetic profiles, I show that a high proportion of the observed differential 

expression can be attributed to genetic and epigenetic changes, validating part B of the 

hypothesis (genetic and epigenetic changes resulting in aberrant gene expression). Finally, 

when examining the EGFR signaling pathway, we observe that a number of key genes are 

frequently altered, while other genes are not altered as often. The most frequently affected 

gene, SIRPA, exhibited both genetic and epigenetic alteration and in some cases, this occurred 

concurrently within the same sample. Moreover, it was also found from both in the analyses of 

chapter 3 (breast cancer) and chapter 5 (lung cancer), that over three times as many genes are 

identified as frequently aberrant when using multiple dimensions as compared to any single 

dimension. These findings validate part C of the hypothesis, whereby a gene within a 

commonly deregulated lung cancer pathway was identified using the integrative genetic and 

epigenetic approach. 

This finding has potential implications on the sample sizes necessary to discover important 

alterations as one can look at a small number of samples in more detail rather than a large 

sample set using a single assay. Specifically, these would be the genes that are disrupted at a 

low frequency in one dimension but high frequency across multiple dimensions as large sample 

sets would be needed to be confident of the low single dimension alteration frequency. Such 

considerations exist in situations where large sample sets are not feasible due to rarity and 

preciousness of samples. 

In terms of the multi-dimensional perspective on lung adenocarcinoma, in addition to finding 

additional genes and pathways that are disrupted when we look at multiple dimensions, when 

examining known genes and pathways, we see a complex pattern of deregulation with some 

components altered more frequently than others. This added complexity highlights how each 

tumor is different from one another and the rational approach to identifying therapeutic targets 

167

will be done at the pathway level as opposed to the gene level. It is clear that within these 

pathways, key nodes or “choke points” will likely serve as the best targets for therapeutic 

intervention. 

6.3 Future directions 

There are two key future directions which should be pursued at this point; (i) further evaluation 

of SIRPA as a novel tumor suppressor gene in lung adenocarcinoma, (ii) evaluation of the novel 

signaling pathways implicated through multi-dimensional analysis, and (iii) incorporation of data 

from other dimensions not evaluated in this thesis. 

In terms of SIRPA, the first experiments that need to be done are to assess whether the 

deregulated mRNA expression of SIRPA is also observed at the protein level. One such 

approach would be using immunohistochemistry on a tissue microarray comprised of hundreds 

of samples with well annotated clinical information. Subsequently, the frequency of 

underexpression, the correlation with overall patient survival, and the overexpression 

associated with a subset of tumors from patients with never smoking history could be validated. 

In addition, depending on what clinical information is available, one could correlate to other 

parameters that were not available to me from the public gene expression microarray datasets 

to uncover other interesting clinical associations. 

Secondly, with the amount of literature suggesting that SIRPA could be a tumor suppressor 

gene in adenocarcinoma, the next set of experiments would be designed to test the tumor 

suppressor role of SIRPA. This would require the silencing of the gene in normal cells, using 

RNAi based techniques for example, and assess tumorigenic phenotypes such as anchorage 

independent growth, reduction of apoptosis, and increase in proliferation. In addition, parallel 

gene introduction experiments would also need to be done in cancer cell lines which exhibit little 

or no expression of SIRPA and the level of suppression of the above listed tumor phenotypes 

would then be assessed. 

168

One of the canonical signaling pathways that was identified as the most statistically significant 

by Ingenuity Pathway Analysis is the Hepatic Fibrosis /Hepatic Stellate Cell Activation pathway. 

While the existence and role of stellate cells have been well documented in the liver and 

pancreas, there have been a limited number of reports of stellate cells in the lung [20]. From 

what is known in the liver and pancreas, stellate cells are involved in tissue fibrosis and 

inflammation in chronic diseases such as pancreatitis and hepatitis [21-25]. In pancreatic 

tumors, activated stellate cells promote an increase in connective tissue surrounding the tumors 

(termed the desmoplastic process) and have been shown to be proliferative in the presence of 

tumor secreted factors [25]. In addition, stellate cells also have implications in drug resistance 

[26]. In the lung, it is plausible to envision a role of stellate cells in diseases such as chronic 

obstructive pulmonary disease (COPD) where tissue fibrosis and inflammation are prominent 

[27]. One of the challenges to testing this function in vitro is that it would be important to 

recapitulate the tumor microenvironment. Hence, this function would have to be tested in vivo 

using inducible mouse models where expression of secreted factors associated with stellate cell 

activation, which were identified from our analysis, can be assessed. Phenotypes such as 

cellular proliferation, apoptosis, and drug resistance could then be assayed and compared 

between pre and post-induction of these secreted factors. 

Finally, although multiple DNA dimensions were analyzed in this thesis, recent advances in 

technology have allowed for other dimensions that could be incorporated. For example, 

genome sequencing technologies allow for the detection of novel somatic mutations in a high 

throughput manner. While performing this at the whole genome level is financially and 

computationally challenging, this effort can be focused on examining the "exome" (DNA from 

gene coding exons only) using sequence capture based techniques [28, 29]. MicroRNAs have 

also shown to be important in lung cancer, with specific microRNAs shown to be differentially 

expressed [30-36]. MicroRNAs can affect downstream protein expression through a number of 

different mechanisms [37-40]. Integration of microRNA and sequence mutation data with the 

169

previously described genetic and epigenetic data would further increase our understanding of 

the biology of lung adenocarcinoma. 

170





9:422. 




3. Sanada M, Suzuki T, Shih LY, Otsu M, Kato M, Yamazaki S, Tamura A, Honda H, 

Sakata-Yanagimoto M, Kumano K et al: Gain-of-function of mutated C-CBL tumour 

suppressor in myeloid neoplasms. Nature 2009, 460(7257):904-908. 



1992, 59(4):248-252. 




6192. 

6. Kralovics R, Guan Y, Prchal JT: Acquired uniparental disomy of chromosome 9p is 

a frequent stem cell defect in polycythemia vera. Exp Hematol 2002, 30(3):229-236. 









Genet 2007, 81(1):114-126. 








86. 


15(3):120-128. 

12. Bjornsson HT, Albert TJ, Ladd-Acosta CM, Green RD, Rongione MA, Middle CM, 

Irizarry RA, Broman KW, Feinberg AP: SNP-specific array-based allele-specific 

expression analysis. Genome Res 2008, 18(5):771-779. 

13. Gimelbrant A, Hutchinson JN, Thompson BR, Chess A: Widespread monoallelic 

expression on human autosomes. Science 2007, 318(5853):1136-1140. 

14. Milani L, Lundmark A, Nordlund J, Kiialainen A, Flaegstad T, Jonmundsson G, Kanerva 

J, Schmiegelow K, Gunderson KL, Lonnerholm G et al: Allele-specific gene 

expression patterns in primary leukemic cells reveal regulation of gene 

expression by CpG site methylation. Genome Res 2009, 19(1):1-11. 

15. Palacios R, Gazave E, Goni J, Piedrafita G, Fernando O, Navarro A, Villoslada P: 

Allele-specific gene expression is widespread across the genome and biological 

processes. PLoS One 2009, 4(1):e4150. 

171

16. Zhang K, Li JB, Gao Y, Egli D, Xie B, Deng J, Li Z, Lee JH, Aach J, Leproust EM et al: 

Digital RNA allelotyping reveals tissue-specific and allele-specific gene 

expression in human. Nat Methods 2009, 6(8):613-618. 

17. Kapoor GS, Kapitonov D, O'Rourke DM: Transcriptional regulation of signal 

regulatory protein alpha1 inhibitory receptors by epidermal growth factor receptor 

signaling. Cancer Res 2004, 64(18):6444-6452. 

18. Wu CJ, Chen Z, Ullrich A, Greene MI, O'Rourke DM: Inhibition of EGFR-mediated 

phosphoinositide-3-OH kinase (PI3-K) signaling and glioblastoma phenotype by 

signal-regulatory proteins (SIRPs). Oncogene 2000, 19(35):3999-4010. 

19. Gardai SJ, Xiao YQ, Dickinson M, Nick JA, Voelker DR, Greene KE, Henson PM: By 

binding SIRPalpha or calreticulin/CD91, lung collectins act as dual function 

surveillance molecules to suppress or enhance inflammation. Cell 2003, 115(1):13- 

23. 

20. Keane MP, Strieter RM, Belperio JA: Mechanisms and mediators of pulmonary 

fibrosis. Crit Rev Immunol 2005, 25(6):429-463. 

21. Geerts A: History, heterogeneity, developmental biology, and functions of 

quiescent hepatic stellate cells. Semin Liver Dis 2001, 21(3):311-335. 

22. Hautekeete ML, Geerts A: The hepatic stellate (Ito) cell: its role in human liver 

disease. Virchows Arch 1997, 430(3):195-207. 

23. Masamune A, Shimosegawa T: Signal transduction in pancreatic stellate cells. J 

Gastroenterol 2009, 44(4):249-260. 

24. Masamune A, Watanabe T, Kikuta K, Shimosegawa T: Roles of pancreatic stellate 

cells in pancreatic inflammation and fibrosis. Clin Gastroenterol Hepatol 2009, 7(11 

Suppl):S48-54. 

25. Omary MB, Lugea A, Lowe AW, Pandol SJ: The pancreatic stellate cell: a star on the 

rise in pancreatic diseases. J Clin Invest 2007, 117(1):50-59. 

26. Mahadevan D, Von Hoff DD: Tumor-stroma interactions in pancreatic ductal 

adenocarcinoma. Mol Cancer Ther 2007, 6(4):1186-1197. 

27. Chung KF, Adcock IM: Multifaceted mechanisms in COPD: inflammation, immunity, 

and tissue repair and destruction. Eur Respir J 2008, 31(6):1334-1356. 

28. Ng SB, Buckingham KJ, Lee C, Bigham AW, Tabor HK, Dent KM, Huff CD, Shannon 

PT, Jabs EW, Nickerson DA et al: Exome sequencing identifies the cause of a 

mendelian disorder. Nat Genet 2010, 42(1):30-35. 

29. Ng SB, Turner EH, Robertson PD, Flygare SD, Bigham AW, Lee C, Shaffer T, Wong M, 

Bhattacharjee A, Eichler EE et al: Targeted capture and massively parallel 

sequencing of 12 human exomes. Nature 2009, 461(7261):272-276. 

30. Du L, Pertsemlidis A: microRNAs and lung cancer: tumors and 22-mers. Cancer 

Metastasis Rev 2010. 

31. Garofalo M, Di Leva G, Romano G, Nuovo G, Suh SS, Ngankeu A, Taccioli C, Pichiorri 

F, Alder H, Secchiero P et al: miR-221&222 regulate TRAIL resistance and enhance 

tumorigenicity through PTEN and TIMP3 downregulation. Cancer Cell 2009, 

16(6):498-509. 

32. Johnson SM, Grosshans H, Shingara J, Byrom M, Jarvis R, Cheng A, Labourier E, 

Reinert KL, Brown D, Slack FJ: RAS is regulated by the let-7 microRNA family. Cell 

2005, 120(5):635-647. 

33. Kumar MS, Erkeland SJ, Pester RE, Chen CY, Ebert MS, Sharp PA, Jacks T: 

Suppression of non-small cell lung tumor development by the let-7 microRNA 

family. Proc Natl Acad Sci U S A 2008, 105(10):3903-3908. 

34. Talotta F, Cimmino A, Matarazzo MR, Casalino L, De Vita G, D'Esposito M, Di Lauro R, 

Verde P: An autoregulatory loop mediated by miR-21 and PDCD4 controls the AP-1 

activity in RAS transformation. Oncogene 2009, 28(1):73-84. 

35. Weiss GJ, Bemis LT, Nakajima E, Sugita M, Birks DK, Robinson WA, Varella-Garcia M, 

Bunn PA, Jr., Haney J, Helfrich BA et al: EGFR regulation by microRNA in lung 

172

cancer: correlation with clinical response and survival to gefitinib and EGFR 

expression in cell lines. Ann Oncol 2008, 19(6):1053-1059. 

36. Xiao C, Srinivasan L, Calado DP, Patterson HC, Zhang B, Wang J, Henderson JM, 

Kutok JL, Rajewsky K: Lymphoproliferative disease and autoimmunity in mice with 

increased miR-17-92 expression in lymphocytes. Nat Immunol 2008, 9(4):405-414. 

37. Lee RC, Ambros V: An extensive class of small RNAs in Caenorhabditis elegans. 

Science 2001, 294(5543):862-864. 

38. Mattick JS, Makunin IV: Small regulatory RNAs in mammals. Hum Mol Genet 2005, 

14 Spec No 1:R121-132. 

39. McManus MT: MicroRNAs and cancer. Semin Cancer Biol 2003, 13(4):253-258. 

40. Vasudevan S, Tong Y, Steitz JA: Switching from repression to activation: 

microRNAs can up-regulate translation. Science 2007, 318(5858):1931-1934. 

173

APPENDIX I: List of publications 

This appendix details all of the publications that I was a part of that were either published, 

accepted, currently in submission, or prepared for submission. In total, 29 publications are 

listed below with four of them represented as chapters and an additional nine discussed in 

section 1.11. The remaining 16 publications are listed below with a brief description 

accompanying each publication. 

Publications included as thesis chapters 

1. Chari R, Coe BP, Wedseltoft C, Benetti M, Wilson IM, Vucic E, MacAulay C, Ng RT, Lam 

WL. (2008) SIGMA2: A system for the integrative genomic multi-dimensional analysis of cancer 

genomes, epigenomes, and transcriptomes. BMC Bioinformatics, 9(1):422, 1-12. 

This publication is included in the thesis as chapter 2. 

2. Chari R, Coe BP, Vucic EA, Lockwood WW, Lam WL. (2010) An integrative multi- 

dimensional genetic and epigenetic strategy to identify aberrant genes and pathways in cancer. 

BMC Systems Biology. Submitted. 

This manuscript submitted for publication is included in the thesis as chapter 3. 

3. Chari R, Lockwood WW, Coe BP, Soh J, MacAulay C, Lam S, Gazdar AF, Lam WL. (2010) 

UPD is a frequent mechanism of gene disruption in lung adenocarcinoma. 

This manuscript in preparation is included in the thesis as chapter 4. 

4. Chari R, Thu KL, Wilson IM, Lockwood WW, Lonergan KM, Coe BP, Malloff CA, Gazdar AF, 

Lam S, Garnis C, MacAulay CE, Alvarez CE, Lam WL. (2010) Integrating the multiple 

dimensions of genomic and epigenomic landscapes of cancer. Cancer and Metastasis 

Reviews, 29(1):73-93. 

174

This publication is included in the thesis as chapter 5. 

Publications discussed in section 1.11 (9 listed) 

5. Chari R, Lockwood WW, Coe BP, Chu A, Macey D, Thomson A, Davies JJ, MacAulay C, 

Lam WL. (2006) SIGMA: A system for the integrative genomic microarray analysis of cancer 

genomes. BMC Genomics, 7(1):324, 1-11. 

This publication is described in section 1.11.1. 

6. Coe BP, Chari R, MacAulay C, Lam WL. (2010) FACADE: A fast and sensitive algorithm for 

the segmentation and calling of high resolution array CGH data. Nucleic Acids Research. 

Submitted. 


7. Lonergan KM, Chari R, deLeeuw RJ, Shadeo A, Chi B, Tsao M, Jones S, Marra M, Ng R, 

MacAulay C, Lam S, Lam WL. (2006) Identification of novel lung genes in bronchial epithelium 

by serial analysis of gene expression. American Journal of Respiratory Cell and Molecular 

Biology, 35(6):651-61. 


8. Chari R, Lonergan KM, Ng RT, MacAulay C, Lam S, Lam WL. (2007) Effect of active 

smoking on the bronchial epithelial transcriptome. BMC Genomics, 8(1):297, 1-13. 


9. Lee EHL*, Chari R*, Lam A, Ng RT, Yee J, English J, Evans KG, MacAulay C, Lam S, Lam 

WL. (2008) Disruption of the non-canonical Wnt pathway in lung squamous cell carcinoma. 

Clinical Medicine: Oncology, 2:169-179. *These authors contributed equally 


175


10. Lonergan KM, Chari R, Coe BP, Wilson IM, Tsao M-S, Ng RT, MacAulay C, Lam S, Lam 

WL. (2010) Transcriptome profiles of carcinoma-in-situ and invasive non-small cell lung cancer 

as revealed by SAGE. PLoS One, 5(2):e9162, 1-22. 


11. Chari R, Lonergan KM, Pikor LA, Coe BP, Zhu CQ, Chan THW, MacAulay C, Tsao M-S, 

Lam S, Ng RT, Lam WL. (2010) A sequenced-based approach to identify reference genes for 

gene expression analysis. BMC Medical Genomics. Submitted. 


12. Lockwood WW, Chari R, Coe BP, Girard L, MacAulay C, Lam S, Gazdar AF, Minna JD, 

Lam WL. (2008) DNA amplification is a ubiquitous mechanism of oncogene activation in lung 

and other cancers. Oncogene, 27(33):4615-4624. 


13. Lockwood WW, Chari R, Coe BP, Thu KL, Garnis C, Campbell J, Williams AC, Hwang D, 

Zhu CQ, Yee J, English J, Tsao M-S, Gazdar AF, MacAulay C, Minna JD, Lam S, Lam WL. 

(2010) BRF2 is a lineage specific oncogene amplified early in squamous cell lung cancer 

development. PLoS Medicine. Submitted. 


Other publications not discussed in this thesis (16 listed) 

Array comparative genomic hybridization and its application to multiple cancer types 

14. Garnis C, Chari R, Buys TP, Zhang L, Ng RT, Rosin MP, Lam WL. (2009) Genomic 

imbalances in precancerous tissues signal oral cancer risk. Molecular Cancer, 8:50, 1-7. 

176

The development of oral cancer is thought to occur through the progression of histopathological 

stages, from different stages of dyplasia (mild, moderate, and severe) to carcinoma in situ to 

invasive disease. Similar to many cancer types, early detection of this disease is critical for 

good prognosis. As such, it is important to be able to determine which cases will and will not 

progress at early stages of the disease such as mild dysplasia. This manuscript describes the 

use of array CGH as a tool to predict progression in genomes of mild dysplasia patients and it 

was shown that the level of genomic alteration had high concordance with disease progression. 

15. Coe BP, Lockwood WW, Chari R, Lam WL. (2009) Comparative genomic hybridization on 

BAC arrays. Methods in Molecular Biology, 556:7-19. 

This publication is a chapter in the Methods in Molecular Biology textbook and describes the 

process of developing, using and analyzing data from bacterial artificial chromosome CGH 

arrays. 

16. deLeeuw RJ, Zettl A, Klinker E, Haralambieva E, Trottier M, Chari R, Ge Y, Gascoyne RD, 

Chott A, Muller-Hermelink HK, Lam WL. (2007) Whole genome analysis and HLA haplotyping 

of enteropathy-type T-cell lymphoma reveals two distinct lymphoma subtypes. 

Gastroenterology, 132(5):1902-11. 

Enteropathy-type T-cell lypmhoma (ETL) is an aggressive non-Hodgkin lymphoma and the 

genetic alterations underlying this disease were not well understood. In this publication, array 

CGH was applied to samples from patients with ETL and based on the genetic alterations and 

HLA genotyping, it was found that two distinct subytpes of this disease existed, which was 

contrary to the clinical classification used at the time. 

17. Buys TPH, Wilson IM, Coe BP, Lee EHL, Kennett JY, Lockwood WW, Tsui IFL, Shadeo A, 

Chari R, Garnis C, Lam WL. (2006) “Detailed Comparisons of Cancer Genomes” in 

Comparative Genomics: Fundamental and Applied Perspectives (Brown JR, ed.), CRC Press / 

Taylor & Francis, LLC, Boca Raton, FLA, pp. 245-259. 

177

This chapter details the technologies used for cancer genome comparisons as well as the 

different types of comparisons that are currently undertaken in research today such as the 

comparison of cancer subtypes, clonal versus multiple primary tumors, cancer susceptibility and 

drug sensitivity. 

18. Lockwood WW*, Chari R*, Chi B, Lam WL. (2006) Recent advances in array comparative 

genomic hybridization technologies and their applications in human genetics. European Journal 

of Human Genetics, 14(2):139-48. 

This publication is a review of literature describing the advances in array CGH technology and 

its application to many genetic diseases, including cancer. 

19. Buys TPH, Wilson IM, Coe BP, Lockwood WW, Davies JJ, Chari R, DeLeeuw RJ, Shadeo 

A, MacAulay C, Lam WL. (2005) “Key Features of BAC Array Production and Usage” in DNA 

Microarrays (Methods Express Series) (Schena M, ed), Scion Publishing, Ltd., Bloxham, pp. 

115-145. 

This chapter describes the production and use of bacterial artificial chromosome microarray- 

based CGH and the analysis of the data generated by this platform. In addition, protocols for 

array CGH experiments are also provided. 

Gene expression based studies 

20. Coe BP, Chari R, Lockwood WW, Lam WL. (2008) Evolving strategies for global gene 

expression analysis of cancer. Journal of Cellular Physiology, 217(3):590-597. 

This publication is a review of literature describing the advancement in technology to analyze 

gene expression in cancer and the movement of the field towards integrative genomics. 

21. Shadeo A, Chari R, Lonergan KM, Pusic A, Miller D, Ehlen T, Van Niekerk D, Matisic J, 

Richards-Kortum R, Follen M, Guillaud M, Lam WL, MacAulay C. (2008) Up regulation in gene 

178

expression of chromatin remodelling factors in cervical intraepithelial neoplasia. BMC 

Genomics, 9(1):64, 1-14. 

Cervical cancer is a major problem in developing countries. Similar to oral cancer, it is thought 

to go through a progression of histopathological stages and thus identifying markers at stages 

of intervention are crucial to the prognoses of patients with this disease. In this publication, a 

comparison of normal cervical tissue with cervical intraepithelial neoplasia (CIN) was performed 

to identify genes upregulated in CIN. It was found that genes involved in chromatin remodelling 

were upregulated in CIN. 

22. Shadeo A, Chari R, Vatcher G, Campbell J, Lonergan KM, Matisic J, van NieKerk D, Ehlen 

T, Miller D, Follen M, Lam WL, MacAulay C. (2007) Comprehensive serial analysis of gene 

expression of the cervical transcriptome. BMC Genomics, 8(1):142, 1-11. 

This publication describes the transcriptome of normal cervix tissue using serial analysis of 

gene expression. 

Integrative analysis of multiple DNA and RNA dimensions 

23. Wilson IM, Vucic EA, Chari R, Zhang Y-A, Starczynowski DT, Lonergan KM, Enfield KSS, 

Buys TPH, Yee J, Laird-Offringa I, Karsan A, Liu P, You M, Anderson M, MacAulay C, Lam S, 

Gazdar AF, Lam WL. (2010) EYA4 is a non-small cell lung cancer tumor suppressor located in 

the susceptibility locus on chromosome 6q. 

Chromosome arm 6q has been shown to harbor a region associated with lung cancer 

susceptibility based on the analysis of familial lung cancer datasets. Moreover, this specific 

region is also frequently lost in sporadic, non-familial lung cancers as well. Hence, many 

studies have been undertaken to identify the gene(s) in this region which may critical to lung 

tumorigenesis. In this manuscript, we detail the use of a genetic and epigenetic approach to 

identify key genes in this region which are frequently deregulated by concerted genetic and 

179

epigenetic alteration. This led to the identification of the gene EYA4, which we further 

demonstrate to have tumor suppressive activity. 


Shames D, Tang X, MacAulay C, Varella-Garcia M, Vooder T, Wistuba II, Lam S, Brekken R, 

Toyooka S, Minna JD, Lam WL, Gazdar AF. (2009) Oncogene mutations, copy number gains 

and mutant allele specific imbalance (MASI) frequently occur together in tumor cells. PLoS 

One, 4(10):e7464, 1-13. 

Somatic mutation of both oncogenes and tumor suppressor genes have been shown to be 

important in cancer. While tumor suppressor genes typically are recessive and require both 

alleles to harbor mutation, activating mutations of oncogenes generally only require one of the 

alleles to be mutated. However, it has been shown that for specific activating mutations, 

multiple mutated copies can exist. In this study, using some of the most commonly mutated 

genes in multiple cancer types, the prevalence of this phenomenon was assessed in set of cell 

lines and tumors representing lung, pancreatic and colorectal cancers. It was found that for the 

EGFR locus, mutation of the gene is accompanied with copy number increase whereby there is 

preferential gain of the mutated copy and for KRAS, the event observed is acquired uniparental 

disomy where the wild type copy is lost and the mutant copy is duplicated. 

25. Campbell JM, Lockwood WW, Buys TP, Chari R, Coe BP, Lam S, Lam WL. (2008) 

Integrative genomic and gene expression analysis of chromosome 7 identified novel oncogene 

loci in non-small cell lung cancer. Genome, 51(12): 1032–1039. 

Genomic alteration of chromosome 7 is a frequent event in non-small cell lung cancer. While 

the most commonly known oncogenes on this chromosome include EGFR, MET and BRAF, 

there are likely other candidate genes which may have a role in lung tumorigenesis. In this 

manuscript, utilizing an integrative genetic and gene expression approach, novel oncogene loci 

are identified. 

180

26. Buys TPH, Chari R, Lee E, Zhang M, MacAulay C, Lam S, Lam WL, Ling V. (2007) 

Genetic changes in the evolution of multidrug resistance for cultured human ovarian cancer 

cells. Genes, Chromosomes and Cancer, 46(12):1069-79. 

Drug resistance is a common problem for cancer patients treated by chemotherapeutics. One 

of the mechanisms of resistance is through the multi-drug resistance phenotype which is often 

associated with the activity of ATP-binding cassette (ABC) transporters. In this study, using an 

ovarian cancer cell line exposed to increasing concentrations of vincristine to derive drug 

resistant derivatives, the genetic and gene expression profiles were compared between these 

resistant derivatives and the original cancer cell line. It was found that in while initial resistant 

derivatives (lines exposed to lower concentration of drug) harbored copy number and gene 

expression increase of ABCC1 and ABCC6, latter resistant derivatives (lines exposed to higher 

concentration of drug) did not have the increase in ABCC1 and ABCC6, but had an increase of 

ABCB, suggesting the drug resistance phenotype may be a dynamic process. 

27. Coe BP, Lockwood WW, Girard L, Chari R, Minna JD, MacAulay C, Lam S, Gazdar AF, 

Lam WL. (2006) Differential regulation of cell cycle pathways in small cell and non-small cell 

lung cancer. British Journal of Cancer, (12):1927-35. 

Small cell lung cancer (SCLC) and non-small cell lung cancer (SCLC) are the two major cell 

types of lung cancer. While pathologically they can be distinguished, the molecular basis of 

these two cancer types is not well understood. In this study, a whole genome integrative 

genetic and gene expression comparison of NSCLC and SCLC was performed and differential 

regulation of cell cycle pathways was identified. Specifically, NSCLC is primarily deregulated at 

the receptor level while SCLC is primarily deregulated at the nuclear transcription factor level. 

Software, analysis approaches, and databases 

28. Tsui IFL, Chari R, Buys TPH, Lam WL. (2007) Public databases and software for the 

pathway analysis of cancer genomes. Cancer Informatics, 3:389-407. 

181

This manuscript describes the currently available computational resources for the analysis of 

pathways in cancer. Specifically, the use of these resources to analyze results from high 

throughput studies examining genetic, epigenetic or gene expression alterations. 

29. Chari R*, Lockwood WW*, Lam WL. (2006) Computational methods for the analysis of 

array comparative genomic hybridization. Cancer Informatics, (2):48-58. 

This manuscript describes the most commonly used analysis strategies for array CGH data and 

compares and contrasts these approaches. In addition, the specific features of currently 

available software suites are also compared. 

182

APPENDIX II: Description of cell lines 

Sample ER Status PR Status HER2 Status TP53 Mutation Status** 

HCC38 - - + 

HCC1008 - - N/A 

HCC1143 - - + 

HCC1395 + - + 

HCC1599 - - + 

HCC1937 - - + (heterozygous mutation) 

HCC2218 - + + - 

BT474 + - + + 

MCF7 + + + 

MCF10A N/A N/A N/A N/A 

** mutation status obtained from the Sanger Cancer Cell Line Project 

(http://www.sanger.ac.uk/genetics/CGP/CellLines/) 

183

APPENDIX III: Sources of data 

Sample DNA Copy 

Number - 

Array CGH 


- Affymetrix 

SNP 500K 

HCC38 GSE21540 https://cabig.n 

ci.nih.gov 

(GSE21347) 


ci.nih.gov 

(GSE21347) 


ci.nih.gov 

(GSE21347) 


ci.nih.gov 

(GSE21347) 


ci.nih.gov 

(GSE21347) 


ci.nih.gov 

(GSE21347) 


ci.nih.gov 

(GSE21347) 

BT474 GSE21540 https://cabig.n 

ci.nih.gov 

(GSE21347) 

MCF7 GSE21540 https://cabig.n 

ci.nih.gov 

(GSE21347) 

184 

DNA 

Methylation - 

Illumina 

Infinium 

new data for 

this publication 

(GSE17769) 

new data for 


(GSE17769) 

new data for 


(GSE17769) 

new data for 


(GSE17769) 

new data for 


(GSE17769) 

new data for 


(GSE17769) 

new data for 


(GSE17769) 

new data for 


(GSE17769) 

new data for 


(GSE17769) 

MCF10A N/A N/A new data for 


(GSE17769) 

Gene expression - 

Affymetrix U133 Plus 

2.0 (NCBI GEO 

Accession number 

provided) 

new data for this 

publication (GSE17768) 













https://cabig.nci.nih.gov 

(GSE17768) 

https://cabig.nci.nih.gov 

(GSE17768) 

GSM254525 

(GSE17768)

APPENDIX IV: MCD strategy and Kaplan-Meier analysis of 

TUSC3 

185

APPENDIX V: Kaplan-Meier and Oncomine expression 

analysis of frequent MCD genes 

Symbol (+) (-) Total 

Survival 

Associated* 

186 

**Status in Tumors (p-value) 

SH3TC1 0 6 6 No - 

CCNA1 0 5 5 Yes - 

COL7A1 0 5 5 No - 

KCTD4 0 5 5 N/A Not tested 

LMCD1 5 0 5 Yes O3(6.3E-4), O5(3.1E-10) 

LYAR 0 5 5 Yes U5(3.8E-4), O3(2.6E-4) 

MTMR9 0 5 5 No U3(1.8E-7) 

SYT8 0 5 5 N/A Not tested 

TUSC3 0 5 5 Yes U5(7.4E-5) 

ASAM 

0 4 4 N/A Not tested 

B3GALNT1 4 0 4 No O3(1.1E-7) 

COL17A1 0 4 4 No U1(6.8E-5), U3(1.4E-8) 

ELK3 0 4 4 Yes - 

FGFR1 0 4 4 Yes U1(2.6E-8),U3(4.2E-6),U4(1.3E-7) 

KRT17 0 4 4 No U1 (2.3E-11),U2 (1.1E-7), U3(3.9E-7) 

LCP1 0 4 4 Yes - 

OSBPL5 0 4 4 N/A Not tested 

PSD3 0 4 4 Yes - 

SFXN3 0 4 4 N/A Not tested 

SH3BGRL3 0 4 4 No O2(2.8E-4) 

SNRPN 0 4 4 No U3(5.4E-10), U5(2.7E-5) 

TNFRSF10D 0 4 4 No O5(5.2E-4), U3(9.7E-4) 

TNS4 0 4 4 N/A Not tested 

*Survival associated if gene expression was significant associated with survival in at least one 

of the two datasets tested (based on p < 0.05 using the log rank test). 

**U=underexpressed between tumor and normal, O=overexpressed between tumor and normal 

in the particular dataset; The numbers 1-5 indicate the reports from which the data originated, 

1= [1], 2= [2], 3=[3], 4=[4], 5=[5], 6=[6]; “-“ indicates gene was either not represented or not 

statistically differentially expressed based on group-wise analysis. (+) represents two-fold 

overexpression, copy number gain, hypomethylation and allelic imbalance; (-) represents twofold 

underexpression, copy number loss, hypermethylation, and LOH in the same sample and 

the number of samples in our dataset which met this criteria.

REFERENCES 




2001, 98(19):10869-10874. 



Nature 2000, 406(6797):747-752. 

3. Richardson AL, Wang ZC, De Nicolo A, Lu X, Brown M, Miron A, Liao X, Iglehart JD, 

Livingston DM, Ganesan S: X chromosomal abnormalities in basal-like human 

breast cancer. Cancer Cell 2006, 9(2):121-132. 

4. Radvanyi L, Singh-Sandhu D, Gallichan S, Lovitt C, Pedyczak A, Mallo G, Gish K, Kwok 

K, Hanna W, Zubovits J et al: The gene associated with trichorhinophalangeal 

syndrome in humans is overexpressed in breast cancer. Proc Natl Acad Sci U S A 

2005, 102(31):11005-11010. 

5. Finak G, Bertos N, Pepin F, Sadekova S, Souleimanova M, Zhao H, Chen H, Omeroglu 

G, Meterissian S, Omeroglu A et al: Stromal gene expression predicts clinical 

outcome in breast cancer. Nat Med 2008, 14(5):518-527. 

6. Karnoub AE, Dash AB, Vo AP, Sullivan A, Brooks MW, Bell GW, Richardson AL, Polyak 

K, Tubo R, Weinberg RA: Mesenchymal stem cells within tumour stroma promote 

breast cancer metastasis. Nature 2007, 449(7162):557-563. 

187

APPENDIX VI: Summary of Kaplan-Meier survival analysis 

GeneSymbol 

Alternative 

Names 

van de Vijver - Pvalue 

188 

Sorlie - Pvalue 

SH3TC1 FLJ20356 Fail N/A 

CCNA1 0.01484628 N/A 

COL7A1 Fail N/A 

KCTD4 N/A N/A 

LMCD1 Fail 0.00261366 

LYAR FLJ20425 0.00551113 Fail 

MTMR9 DKFZP434K171 Fail N/A 

SYT8 DKFZp434K0322 N/A N/A 

TUSC3 N33 0.01696356 Fail 

ASAM N/A N/A 

B3GALNT1 B3GALT3 Fail N/A 

COL17A1 Fail Fail 

ELK3 0.04816902 N/A 

FGFR1 Fail 0.0147898 

KRT17 Fail Fail 

LCP1 0.01132949 0.04024164 

OSBPL5 N/A N/A 

PSD3 DKFZp761K1423 0.00205916 N/A 

SFXN3 N/A N/A 

SH3BGRL3 N/A Fail 

SNRPN Fail Fail 

TNFRSF10D Fail N/A 

TNS4 N/A N/A 

Fail = p-value > 0.05; N/A = not represented on array platform

APPENDIX VII: Copy of UBC Research Ethics Board 

certificate of approval 

189

University of British Columbia - British Columbia Cancer Agency 

Research Ethics Board (UBC BCCA REB) 

UBC BCCA Research Ethics Board 

Fairmont Medical Building (6th Floor) 

614 - 750 West Broadway 

Vancouver, BC V5Z 1H5 

Tel: (604) 877-6284 Fax: (604) 708-2132 

E-mail: reb@bccancer.bc.ca 

Website: http://www.bccancer.bc.ca > 

Research Ethics 

RISe: http://rise.ubc.ca 

Certificate of Expedited Approval: Annual 

Renewal 

PRINCIPAL INVESTIGATOR: INSTITUTION / DEPARTMENT: REB NUMBER: 

Wan Lam 

BCCA/BCCA/Cancer Genetics & 

Development (BCCA) 

H08-01392 

INSTITUTION(S) WHERE RESEARCH WILL BE CARRIED OUT: 

Institution Site 

BC Cancer Agency 

Other locations where the research will be conducted: 

N/A 

Vancouver BCCA 

PRINCIPAL INVESTIGATOR FOR EACH ADDITIONAL PARTICIPATING BCCA CENTRE: 

Vancouver: Wan Lam Vancouver Island: N/A 

Fraser Valley: N/A Southern Interior: N/A 

Abbotsford Centre: N/A 

SPONSORING AGENCIES AND COORDINATING GROUPS: 

Canadian Institutes of Health Research (CIHR) 

PROJECT TITLE: 

Development of a multi-spectral platform for integrated analysis of clinical and research samples. 

APPROVAL DATE: EXPIRY DATE OF THIS APPROVAL: PAA#: H08-01392-A003 

August 4, 2009 August 4, 2010 

CERTIFICATION: 

1. The membership of the UBC BCCA REB complies with the membership requirements for research ethics 

boards defined in Division 5 of the Food and Drug Regulations of Canada. 

2. The UBC BCCA REB carries out its functions in a manner fully consistent with Good Clinical Practices. 

3. The UBC BCCA REB has reviewed and approved the research project named on this Certificate of Approval 

including any associated consent form and taken the action noted above. This research project is to be 

conducted by the provincial investigator named above. This review and the associated minutes of the UBC 

BCCA REB have been documented electronically and in writing. 

The UBC BCCA Research Ethics Board has reviewed the documentation for the above named project. The research 

study as presented in documentation, was found to be acceptable on ethical grounds for research involving human 

subjects and was approved for renewal by the UBC BCCA REB. 

UBC BCCA Ethics Board Approval of the above has been verified by one of the following: 

Dr. George Browman, Chair 

Dr. Lynne Nakashima, Second Vice-Chair 

If you have any questions, please call: 

Bonnie Shields, Manager, BCCA Research Ethics Board: 604-877-6284 or e-mail: reb@bccancer.bc.ca 

Dr. George Browman, Chair: 604-877-6284 or e-mail: gbrowman@bccancer.bc.ca 

Dr. Lynne Nakashima, Second Vice-Chair: 604-707-5989 or e-mail: lnakas@bccancer.bc.ca 

https://rise.ubc.ca/rise/Doc/0/JKQ8088GG9RKN55VLAL9OHM869/fromString.html 

Page 1 of 1 

15/04/2010

Chapter 2 - University of British Columbia

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?