27.01.2013 Views

Exome sequencing identifies recurrent mutations of the splicing ...

Exome sequencing identifies recurrent mutations of the splicing ...

Exome sequencing identifies recurrent mutations of the splicing ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

© 2011 Nature America, Inc. All rights reserved.<br />

Here we perform whole-exome <strong>sequencing</strong> <strong>of</strong> samples from<br />

05 individuals with chronic lymphocytic leukemia (CLL) ,2 ,<br />

<strong>the</strong> most frequent leukemia in adults in Western countries.<br />

We found ,246 somatic <strong>mutations</strong> potentially affecting gene<br />

function and identified 78 genes with predicted functional<br />

alterations in more than one tumor sample. Among <strong>the</strong>se<br />

genes, SF3B1, encoding a subunit <strong>of</strong> <strong>the</strong> spliceosomal U2<br />

small nuclear ribonucleoprotein (snRNP), is somatically<br />

mutated in 9.7% <strong>of</strong> affected individuals. Fur<strong>the</strong>r analysis<br />

in 279 individuals with CLL showed that SF3B1 <strong>mutations</strong><br />

were associated with faster disease progression and poor<br />

overall survival. This work provides <strong>the</strong> first comprehensive<br />

catalog <strong>of</strong> somatic <strong>mutations</strong> in CLL with relevant clinical<br />

correlates and defines a large set <strong>of</strong> new genes that may drive<br />

<strong>the</strong> development <strong>of</strong> this common form <strong>of</strong> leukemia. The results<br />

reinforce <strong>the</strong> idea that targeting several well-known genetic<br />

pathways, including mRNA <strong>splicing</strong>, could be useful in <strong>the</strong><br />

treatment <strong>of</strong> CLL and o<strong>the</strong>r malignancies.<br />

Chronic lymphocytic leukemia is a common neoplasia <strong>of</strong> B lymphocytes<br />

in which <strong>the</strong>se cells progressively accumulate in <strong>the</strong> bone marrow,<br />

blood and lymphoid tissues 1,2 . The clinical evolution <strong>of</strong> <strong>the</strong> disease is<br />

heterogeneous, with <strong>the</strong> malignancy following an indolent course in<br />

some affected individuals and being characterized by aggressive disease<br />

and short survival times in o<strong>the</strong>rs. These clinical differences have mainly<br />

been associated with two major molecular subtypes <strong>of</strong> <strong>the</strong> disease,<br />

which are defined by <strong>the</strong> extent to which somatic hyper<strong>mutations</strong><br />

(SHMs) occur in <strong>the</strong> variable regions <strong>of</strong> <strong>the</strong> immunoglobulin genes 2–4 .<br />

Nature GeNetics ADVANCE ONLINE PUBLICATION<br />

l e t t e r s<br />

<strong>Exome</strong> <strong>sequencing</strong> <strong>identifies</strong> <strong>recurrent</strong> <strong>mutations</strong> <strong>of</strong> <strong>the</strong><br />

<strong>splicing</strong> factor SF3B1 gene in chronic lymphocytic leukemia<br />

Víctor Quesada 1 , Laura Conde 2 , Neus Villamor 2 , Gonzalo R Ordóñez 1 , Pedro Jares 2 , Laia Bassaganyas 3 ,<br />

Andrew J Ramsay 1 , Sílvia Beà 2 , Magda Pinyol 4 , Alejandra Martínez-Trillos 5 , Mónica López-Guerra 2 , Dolors Colomer 2 ,<br />

Alba Navarro 2 , Tycho Baumann 5 , Marta Aymerich 2 , María Rozman 2 , Julio Delgado 5 , Eva Giné 5 ,<br />

Jesús M Hernández 6 , Marcos González-Díaz 6 , Diana A Puente 1 , Gloria Velasco 1 , José M P Freije 1 ,<br />

José M C Tubío 3 , Romina Royo 7 , Josep L Gelpí 7 , Modesto Orozco 7 , David G Pisano 8 , Jorge Zamora 8 ,<br />

Miguel Vázquez 8 , Alfonso Valencia 8 , Heinz Himmelbauer 9 , Mónica Bayés 10 , Simon Heath 10 , Marta Gut 10 ,<br />

Ivo Gut 10 , Xavier Estivill 3 , Armando López-Guillermo 5 , Xose S Puente 1 , Elías Campo 2,11 & Carlos López-Otín 1,11<br />

We have previously used whole­genome <strong>sequencing</strong> to identify 46<br />

somatic <strong>mutations</strong> potentially affecting gene function in four individuals<br />

with CLL 5 . Subsequent analysis in additional subjects with CLL<br />

showed that four <strong>of</strong> <strong>the</strong> affected genes were <strong>recurrent</strong>ly mutated,<br />

<strong>the</strong>reby showing <strong>the</strong> usefulness <strong>of</strong> this approach for <strong>the</strong> identification<br />

<strong>of</strong> genes that may drive tumor progression.<br />

To improve our understanding <strong>of</strong> <strong>the</strong> genes involved in <strong>the</strong> pathogenesis<br />

and clinical evolution <strong>of</strong> CLL, we have performed whole­exome<br />

<strong>sequencing</strong> <strong>of</strong> matched tumor and normal samples from 105 individuals<br />

with CLL (cases), comprising 60 subjects with mutated IGHV<br />

regions and 45 individuals in whom <strong>the</strong>se regions are not mutated<br />

(Supplementary Tables 1–3 and Supplementary Note). We identified<br />

a median <strong>of</strong> 45 somatic <strong>mutations</strong> per case, which represents about<br />

0.9 <strong>mutations</strong> per megabase <strong>of</strong> sequenced DNA, in agreement with<br />

previous reports 6–10 (Supplementary Table 4). To gain insight into <strong>the</strong><br />

genes affected by <strong>the</strong> transformation process in CLL, we focused on <strong>the</strong><br />

<strong>mutations</strong> predicted to result in protein­coding changes, identifying<br />

a total <strong>of</strong> 1,246 <strong>mutations</strong> after excluding those occurring in immunoglobulin<br />

loci (Fig. 1a and Supplementary Table 5). The number<br />

<strong>of</strong> protein­altering <strong>mutations</strong> per case was higher in IGHV­mutated<br />

CLL than in CLL without <strong>mutations</strong> in <strong>the</strong> IGHV region (12.8 ± 0.7<br />

versus 10.6 ± 0.7; P = 0.038) (Fig. 1b). We also confirmed <strong>the</strong> previous<br />

observation that A>C/T>G transversions are more frequent than o<strong>the</strong>r<br />

tranversion types in IGHV­mutated CLL 5 (Fig. 1c).<br />

The distribution <strong>of</strong> somatic <strong>mutations</strong> identified by whole­exome<br />

<strong>sequencing</strong> indicated that a total <strong>of</strong> 1,100 different genes were<br />

affected in <strong>the</strong> 105 cases (Supplementary Table 5). We found 60<br />

genes with a somatic mutation rate higher than expected (P < 0.05),<br />

1 Departamento de Bioquímica y Biología Molecular, Instituto Universitario de Oncología, Universidad de Oviedo, Oviedo, Spain. 2 Unidad de Hematopatología,<br />

Servicio de Anatomía Patológica, Hospital Clínic, Universitat de Barcelona, Institut d’Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Barcelona,<br />

Spain. 3 Genes and Disease Programme, Center for Genomic Regulation, Pompeu Fabra University (CRG-UPF), Barcelona, Spain. 4 Unidad de Genómica, IDIBAPS,<br />

Barcelona, Spain. 5 Servicio de Hematología, Hospital Clínic, Universidad de Barcelona, Barcelona, Spain. 6 Servicio de Hematología, Hospital Universitario, Centro de<br />

Investigación del Cáncer, Universidad de Salamanca, Salamanca, Spain. 7 Programa Conjunto de Biología Computacional, Barcelona Supercomputing Center (BSC),<br />

Institut de Reçerca Biomèdica (IRB), Spanish National Bioinformatics Institute, Universitat de Barcelona, Barcelona, Spain. 8 Structural Biology and Biocomputing<br />

Programme, Spanish National Cancer Research Centre (CNIO), Spanish National Bioinformatics Institute, Madrid, Spain. 9 Ultra<strong>sequencing</strong> Unit, CRG-UPF,<br />

Barcelona, Spain. 10 Centro Nacional de Análisis Genómico, Parc Científic de Barcelona, Barcelona, Spain. 11 These authors jointly directed this work.<br />

Correspondence should be addressed to C.L.-O. (clo@uniovi.es) or E.C. (ecampo@clinic.ub.es).<br />

Received 12 July; accepted 10 November; published online 11 December 2011; doi:10.1038/ng.1032


© 2011 Nature America, Inc. All rights reserved.<br />

l e t t e r s<br />

a b<br />

761<br />

326<br />

325<br />

321<br />

290<br />

282<br />

279<br />

278<br />

275<br />

272<br />

270<br />

266<br />

195<br />

188<br />

186<br />

184<br />

182<br />

166<br />

157<br />

155<br />

148<br />

145<br />

141<br />

100<br />

083<br />

082<br />

063<br />

054<br />

053<br />

052<br />

049<br />

048<br />

044<br />

042<br />

038<br />

030<br />

029<br />

027<br />

023<br />

022<br />

020<br />

017<br />

016<br />

013<br />

008<br />

006<br />

005<br />

785<br />

758<br />

680<br />

642<br />

618<br />

375<br />

328<br />

324<br />

323<br />

322<br />

319<br />

280<br />

276<br />

274<br />

273<br />

267<br />

264<br />

197<br />

194<br />

193<br />

192<br />

191<br />

189<br />

185<br />

181<br />

178<br />

175<br />

174<br />

173<br />

172<br />

171<br />

170<br />

168<br />

165<br />

159<br />

156<br />

152<br />

146<br />

144<br />

136<br />

124<br />

117<br />

110<br />

091<br />

090<br />

064<br />

051<br />

045<br />

043<br />

041<br />

040<br />

039<br />

033<br />

032<br />

019<br />

018<br />

009<br />

007<br />

Chr. 1<br />

Chr. 2<br />

Chr. 3<br />

Chr. 4<br />

Chr. 5<br />

which we defined as class 1 <strong>recurrent</strong>ly mutated (RM)­CLL genes.<br />

Fur<strong>the</strong>rmore, we identified 18 class 2 RM­CLL genes, which we<br />

defined as those <strong>recurrent</strong>ly mutated and present in at least one case<br />

with no mutated class 1 RM­CLL genes, with TP53 mutation representing<br />

a special case (Supplementary Table 6). Of note, about 90%<br />

<strong>of</strong> <strong>the</strong> cases in <strong>the</strong> study had somatic <strong>mutations</strong> in at least one <strong>of</strong> <strong>the</strong><br />

78 RM­CLL genes, suggesting that we have identified many previously<br />

unrecognized candidate driver genes in CLL. Moreover, an analysis <strong>of</strong><br />

<strong>the</strong> somatic structural variants in ten CLL cases showed that some <strong>of</strong><br />

<strong>the</strong> RM­CLL genes may be affected by large genomic rearrangements<br />

(Supplementary Table 7).<br />

A functional clustering analysis <strong>of</strong> <strong>the</strong> mutated genes showed a<br />

substantial enrichment <strong>of</strong> genes in pathways involved in mRNA <strong>splicing</strong><br />

and transport, Toll­like receptor (TLR) signaling and apoptosis,<br />

among o<strong>the</strong>rs (Supplementary Table 8). The functional and clinical<br />

impact <strong>of</strong> some <strong>of</strong> <strong>the</strong>se pathways, including MYD88­associated TLR,<br />

NF­κB and JAK­STAT3 signaling, has already been demonstrated in<br />

our previous whole­genome <strong>sequencing</strong> study <strong>of</strong> CLL 5 . The pathway<br />

analysis <strong>of</strong> <strong>the</strong> mutated genes also identified notable differences<br />

between <strong>the</strong> two major CLL subtypes. Mutations in genes <strong>of</strong> <strong>the</strong> TLR<br />

Chr. 6<br />

Chr. 7<br />

Chr. 8<br />

Chr. 9<br />

Chr. 10<br />

Chr. 11<br />

Chr. 12<br />

Chr. 13<br />

Chr. 14<br />

Chr. 15<br />

Chr. 16<br />

Chr. 17<br />

Chr. 18<br />

Chr. 19<br />

Chr.20<br />

Chr. 21<br />

Chr. 22<br />

Chr. X<br />

Chr. Y<br />

and pattern recognition pathways were enriched in IGHV­mutated<br />

CLL (Supplementary Fig. 1 and Supplementary Tables 9 and 10),<br />

supporting <strong>the</strong> idea that different molecular mechanisms might be<br />

implicated in <strong>the</strong> development <strong>of</strong> <strong>the</strong> two major subtypes <strong>of</strong> CLL.<br />

O<strong>the</strong>r genes <strong>recurrent</strong>ly mutated and distinct from those previously<br />

identified in our whole­genome <strong>sequencing</strong>–based study<br />

<strong>of</strong> individuals with CLL 5 were those encoding SF3B1, a subunit <strong>of</strong><br />

<strong>the</strong> spliceosomal U2 snRNP 11 ; POT1, a nuclear protein involved in<br />

telomere maintenance 12 ; CHD2, which regulates gene expression<br />

by modification <strong>of</strong> chromatin structure 13 ; and LRP1B, which has<br />

recently been defined as a tumor suppressor in different malignancies<br />

14 (Table 1). All POT1 somatic <strong>mutations</strong> appeared in tumors<br />

without IGHV region <strong>mutations</strong>, whereas CHD2 somatic <strong>mutations</strong><br />

exclusively appeared in IGHV­mutated tumors. This finding is consistent<br />

with different mechanisms being involved in disease development<br />

in CLL cases with and without IGHV <strong>mutations</strong>. Because <strong>the</strong><br />

high GC nucleotide content <strong>of</strong> NOTCH1 hampers <strong>sequencing</strong> using<br />

exome­capture approaches, we analyzed <strong>the</strong> coding exons <strong>of</strong> this<br />

gene by Sanger <strong>sequencing</strong> <strong>of</strong> DNA from 260 individuals with CLL 5 .<br />

We found somatic <strong>mutations</strong> in 25 cases in this series (9.6%; Table 1).<br />

2 ADVANCE ONLINE PUBLICATION Nature GeNetics<br />

RM-CLL<br />

Nonsynonymous <strong>mutations</strong><br />

c<br />

Substitution frequency<br />

40<br />

30<br />

20<br />

10<br />

30<br />

25<br />

20<br />

15<br />

10<br />

5<br />

IGHV<br />

mutated<br />

A>C<br />

T>G<br />

A>G<br />

T>C<br />

*<br />

IGHV<br />

unmutated<br />

IGHV mutated<br />

IGHV unmutated<br />

Figure 1 Somatic mutation pr<strong>of</strong>iles <strong>of</strong> 105 CLL exomes. (a) Chromosomal distribution and location <strong>of</strong> protein-coding <strong>mutations</strong> (dots) and insertions<br />

and deletions (indels; Xs) identified by exome <strong>sequencing</strong> <strong>of</strong> 60 CLL samples with IGHV region <strong>mutations</strong> (blue) and 45 without IGHV region <strong>mutations</strong><br />

(red). RM-CLL genes are highlighted with vertical bars and summarized for each individual with orange dots. (b) Box plot representation comparing<br />

<strong>the</strong> frequency <strong>of</strong> coding, nonsynonymous somatic <strong>mutations</strong> in CLL samples with and without IGHV region <strong>mutations</strong>. Error bars, range <strong>of</strong> <strong>the</strong> data set<br />

(*P = 0.038). (c) Frequency <strong>of</strong> substitutions in <strong>the</strong> 105 CLL samples for <strong>the</strong> six possible mutation classes. Error bars, s.d. (***P = 0.0017).<br />

***<br />

A>T<br />

T>A<br />

C>A<br />

G>T<br />

C>G<br />

G>C<br />

C>T<br />

G>A


© 2011 Nature America, Inc. All rights reserved.<br />

table 1 Most common <strong>recurrent</strong>ly mutated genes in Cll<br />

Gene Case ID Exon Encoded alteration Total frequency<br />

Notably, among <strong>the</strong> class 1 RM­CLL genes, we detected single <strong>mutations</strong><br />

in BRAF (encoding p.Asp594Gly or p.Lys601Glu alterations) in<br />

two cases. These <strong>mutations</strong> differ from that described to encode <strong>the</strong><br />

p.Val600Glu variant found in hairy­cell leukemia 15 , but have been<br />

Nature GeNetics ADVANCE ONLINE PUBLICATION<br />

Freq. in IGHV-<br />

mutated cases<br />

l e t t e r s<br />

Freq. in IGHV-<br />

unmutated cases<br />

NOTCH1 415 34 p.Phe2482Phefs*2 12.1% (31/255) 2.8% (2/72) 10.1% (7/69)<br />

457 34 p.Gln2503*<br />

12, 15, 19, 24, 48, 69, 85, 89, 92, 96, 102,<br />

112, 138, 163, 184, 209, 410, 431, 436, 447,<br />

448, 467, 507, 511, 520, 527, 531, 537, 722<br />

34 p.Pro2515Argfs*4<br />

SF3B1 247 14 p.Tyr623Cys 9.7% (27/279) 7.9% (6/76) 20.5% (15/73)<br />

119 14 p.Arg625His<br />

6, 182 14 p.Asn626Tyr<br />

154 14 p.His662Asp<br />

53, 651 14 p.Thr663Ile<br />

102, 116 14 p.Lys666Glu<br />

9, 29, 156, 209, 365, 632, 734, 755, 758 15 p.Lys700Glu<br />

19 15 p.Val701Phe<br />

85 16 p.Lys741Asn<br />

68, 79, 99, 197, 604, 742 16 p.Gly742Asp<br />

83 18 p.Asp894Gly<br />

POT1 44 4_5 Splice site 4.8% (5/105) 0% (0/60) 11.1% (5/45)<br />

184 5 p.Met1Leu<br />

13 6 p.Tyr36Asn<br />

157 7 p.Tyr66*<br />

6 9 p.Tyr223Cys<br />

CHD2 185 16 p.His620Leu 4.8% (5/105) 8.3% (5/60) 0% (0/45)<br />

175 17 p.Leu698Leufs*6<br />

181 27 p.Phe1146Leu<br />

43 30 p.Leu1270Phe<br />

7 35_36 Splice site<br />

LRP1B 5 41 p.Cys2205Phe 4.8% (5/105) 5.0% (3/60) 4.4% (2/45)<br />

a<br />

Human<br />

Fly<br />

Worm<br />

Aspergillus<br />

Rice<br />

Conservation<br />

Human<br />

Fly<br />

Worm<br />

Aspergillus<br />

Rice<br />

Conservation<br />

322 46 p.Cys2567Ser<br />

326 59 p.Val3150Ile<br />

274 62 p.Ser3352Pro<br />

276 86 p.Tyr4436Phe<br />

623 625 626 662 663 666<br />

700 701 741 742 894<br />

Figure 2 Structural impact <strong>of</strong> SF3B1 alterations. (a) Protein sequence<br />

alignments <strong>of</strong> <strong>the</strong> SF3B1 C-terminal domain around <strong>the</strong> altered residues<br />

(arrows) in evolutionarily diverse species. (b) Schematic representation<br />

<strong>of</strong> <strong>the</strong> human SF3B1 protein with <strong>the</strong> primary structural domains<br />

highlighted. The locations <strong>of</strong> <strong>the</strong> different somatic alterations determined<br />

to be encoded in CLL samples (top) and <strong>the</strong> frequencies <strong>of</strong> each alteration<br />

(bottom) are shown. (c) Molecular model <strong>of</strong> <strong>the</strong> C-terminal portion <strong>of</strong><br />

<strong>the</strong> human SF3B1 protein and detailed view <strong>of</strong> <strong>the</strong> altered amino acids<br />

identified in CLL cases.<br />

observed in diffuse large B­cell lymphoma (Catalogue <strong>of</strong> Somatic Mutations<br />

in Cancer (COSMIC) database; see URLs). A recent study <strong>of</strong> kinase<br />

genes in CLL also found BRAF <strong>mutations</strong> in 1.6% <strong>of</strong> cases, including two<br />

individuals with a p.Lys601Glu alteration 16 . Notably, only one <strong>of</strong> our<br />

b<br />

Arg625<br />

U2AF2 interaction<br />

SF3B14 interaction<br />

HEAT domain<br />

c<br />

Asn626<br />

Thr663<br />

Lys666<br />

His662<br />

Tyr623<br />

Lys700<br />

Y623C<br />

K700E V701F<br />

R625H H662D<br />

N626Y T663I K666E<br />

K741N<br />

G742D D894G<br />

Val701<br />

Asp894 Gly742<br />

Lys741


© 2011 Nature America, Inc. All rights reserved.<br />

l e t t e r s<br />

a<br />

Coverage<br />

Exons<br />

Q-rich<br />

CLL cases carried a TP53 mutation, which was associated with a 17p<br />

deletion. The low frequency <strong>of</strong> <strong>mutations</strong> we observed in this tumor<br />

suppressor gene is probably due to <strong>the</strong> low­risk CLL characteristics <strong>of</strong><br />

our subjects, as our cohort comprises a nonselected, population­based<br />

series (Supplementary Table 1), which carries a lower frequency <strong>of</strong><br />

high­risk factors than cases in referral or clinical trial studies 17 .<br />

Among <strong>the</strong> genes newly found to be <strong>recurrent</strong>ly mutated in this<br />

study, we focused on SF3B1, which had somatic point <strong>mutations</strong> in<br />

~10% <strong>of</strong> <strong>the</strong> CLL cases. Moreover, two somatic <strong>mutations</strong> were present<br />

in more than one case: a mutation encoding a p.Lys700Glu substitution<br />

in four cases and a mutation encoding a p.Asn626Tyr alteration<br />

in two o<strong>the</strong>r cases. We confirmed and extended <strong>the</strong>se results by capillary<br />

<strong>sequencing</strong> <strong>of</strong> 279 samples from individuals with CLL. We found<br />

somatic <strong>mutations</strong> in 27 <strong>of</strong> <strong>the</strong>se subjects (9.7%), including three additional<br />

<strong>recurrent</strong>ly mutated residues causing p.Thr663Ile, p.Lys666Glu<br />

and p.Gly742Asp alterations (Fig. 2 and Supplementary Table 11).<br />

These findings make SF3B1 one <strong>of</strong> <strong>the</strong> most frequently mutated genes<br />

reported to date in CLL, along with NOTCH1 (refs. 5,18).<br />

SF3B1 encodes a protein involved in <strong>the</strong> binding <strong>of</strong> <strong>the</strong> spliceosomal<br />

U2 snRNP to <strong>the</strong> branch point close to <strong>the</strong> 3′ <strong>splicing</strong> sites 19,20 .<br />

The SF3B1 protein interacts with RNA sequences in <strong>the</strong> vicinity<br />

<strong>of</strong> <strong>the</strong> branch point, as well as with <strong>the</strong> early 3′­splice­site recognition<br />

factor U2AF65 and <strong>the</strong> branch point–binding protein SF3B14.<br />

Consistent with such essential roles in gene expression, <strong>the</strong> amino acid<br />

sequence <strong>of</strong> SF3B1 shows a high level <strong>of</strong> phylogenetic conservation,<br />

especially in <strong>the</strong> regions that are coded for by <strong>the</strong> CLL somatic <strong>mutations</strong><br />

identified in this study (Fig. 2a and Supplementary Fig. 2).<br />

Structurally, <strong>the</strong> SF3B1 protein has two welldefined<br />

regions: <strong>the</strong> N­terminal hydrophilic<br />

region, containing several protein­binding<br />

motifs, and <strong>the</strong> C­terminal region, which<br />

consists <strong>of</strong> 22 nonidentical HEAT repeats<br />

and is where all somatic alterations identified<br />

in CLL cases are located (Fig. 2b and<br />

Supplementary Fig. 2). To investigate<br />

<strong>the</strong> impact <strong>of</strong> <strong>the</strong> encoded somatic alterations,<br />

we first constructed a model <strong>of</strong> <strong>the</strong><br />

C­terminal domain <strong>of</strong> <strong>the</strong> SF3B1 protein. In<br />

this model, most <strong>of</strong> <strong>the</strong> alterations occur on<br />

<strong>the</strong> inner surface <strong>of</strong> <strong>the</strong> structure and might<br />

define a binding interface, consistent with<br />

<strong>the</strong> idea that <strong>the</strong> helical repeats form <strong>the</strong><br />

outer shell <strong>of</strong> <strong>the</strong> SF3B complex 21 (Fig. 2c).<br />

The p.Tyr623Cys, p.Arg625His, p.Asn626Tyr,<br />

18<br />

FOXP1w<br />

C2H2-Zf Forkhead<br />

19 19b 20<br />

Control<br />

*<br />

PEST b<br />

p.His662Asp, p.Thr663Ile, p.Lys666Glu, p.Lys700Glu and p.Val701Phe<br />

alterations affect residues that are predicted to be spatially close to<br />

one ano<strong>the</strong>r and <strong>the</strong>refore might have a similar functional impact. In<br />

contrast, <strong>the</strong> p.Lys741Asn and p.Gly742Asp substitutions affect a<br />

predicted external loop, and p.Asp894Gly affects a different area <strong>of</strong><br />

<strong>the</strong> domain, such that <strong>the</strong>se alterations might not result in <strong>the</strong> same<br />

functional consequences.<br />

Although <strong>splicing</strong> is a pleiotropic mechanism necessary for cell<br />

function, specific alterations in <strong>the</strong> <strong>splicing</strong> <strong>of</strong> oncogenes and tumor<br />

suppressors have been related to cancer development 22 . Additionally,<br />

<strong>splicing</strong> factors have been reported to have oncogenic roles in <strong>the</strong> malignant<br />

transformation <strong>of</strong> cells 23,24 . Consistent with <strong>the</strong>se observations,<br />

<strong>the</strong> COSMIC database shows SF3B1 <strong>mutations</strong> in several additional<br />

tumor types, including breast carcinomas (encoding p.Gln534Pro),<br />

pancreatic cancer (p.Arg568His, p.Gln699His and p.Lys700Glu) and<br />

melanoma (p.Pro718Leu). This pattern <strong>of</strong> <strong>mutations</strong>, toge<strong>the</strong>r with <strong>the</strong><br />

findings that some CLL­<strong>recurrent</strong> alterations, such as p.Lys700Glu, are<br />

also detected in solid tumors and that SF3B1 is <strong>the</strong> target <strong>of</strong> antitumor<br />

drugs 25 , reinforces <strong>the</strong> importance <strong>of</strong> SF3B1 in cancer development.<br />

In this work, we have also identified somatic <strong>mutations</strong> in o<strong>the</strong>r genes<br />

involved in <strong>the</strong> <strong>splicing</strong> machinery, such as SFRS1 (encoding p.Tyr82*<br />

and p.Gly4Glyfs*2), SFRS7 (p.Leu18Gln) and U2AF2 (p.Gln143Leu<br />

and p.Gln190Leu), indicating that alterations in this post­<br />

transcriptional mechanism may be very relevant in CLL pathogenesis.<br />

During <strong>the</strong> revision <strong>of</strong> this manuscript, three studies were published<br />

that described associations <strong>of</strong> somatic <strong>mutations</strong> affecting SF3B1 and<br />

o<strong>the</strong>r components <strong>of</strong> <strong>the</strong> <strong>splicing</strong> machinery with myelodysplastic<br />

4 ADVANCE ONLINE PUBLICATION Nature GeNetics<br />

Log 10 relative quantity<br />

5<br />

4<br />

3<br />

2<br />

1<br />

0<br />

29 156<br />

182 32 152<br />

Mutated SF3B1 Wild-type SF3B1<br />

FOXP1w<br />

Control<br />

Figure 3 Novel alternative <strong>splicing</strong> <strong>of</strong> FOXP1 in CLL cases. (a) An expanded view <strong>of</strong> <strong>the</strong> protein interval subject to truncation as a result <strong>of</strong> alternative<br />

<strong>splicing</strong> is shown. The alternative <strong>splicing</strong> event that generates <strong>the</strong> novel transcript encoding this protein, FOXP1w, is shown as a red line, and <strong>the</strong><br />

primers used for RT-PCR amplification <strong>of</strong> FOXP1w and full-length FOXP1 (control) as arrows. Q-rich, glutamine-rich region; C2H2-Zf, Cys 2 His 2 zinc<br />

finger. (b) Quantitative RT-PCR analysis <strong>of</strong> truncated FOXP1w levels in CLL samples with and without SF3B1 somatic <strong>mutations</strong>. Error bars, s.d.<br />

a b c<br />

Percentage <strong>of</strong> cases<br />

WT SF3B1<br />

100 1.0<br />

MUT SF3B1<br />

80<br />

60<br />

40<br />

20<br />

0<br />

*** *<br />

A B<br />

Binet<br />

C<br />

Unmutated<br />

IGHV<br />

High<br />

ZAP-70<br />

Probability <strong>of</strong> progression<br />

0.8<br />

0.6<br />

0.4<br />

0.2<br />

WT SF3B1<br />

MUT SF3B1<br />

0<br />

0 5.00 10.00 15.00 20.00 25.00<br />

Years<br />

WT SF3B1<br />

MUT SF3B1<br />

Figure 4 Clinical analysis <strong>of</strong> SF3B1 in CLL. (a) Distribution <strong>of</strong> disease stage (Binet), IGHV region<br />

mutational status and ZAP-70 expression in individuals with (MUT) or without (WT) <strong>mutations</strong> in<br />

SF3B1 (***P = 0.004, *P = 0.03). (b) Actuarial probability <strong>of</strong> disease progression <strong>of</strong> CLL cases with<br />

mutated or wild-type SF3B1 (P = 0.0001). (c) Actuarial probability <strong>of</strong> overall survival <strong>of</strong> CLL cases<br />

with mutated or wild-type SF3B1 (P = 0.002).<br />

Probability <strong>of</strong> survival<br />

1.0<br />

0.8<br />

0.6<br />

0.4<br />

0.2<br />

0<br />

0 5.00 10.00 15.00 20.00 25.00<br />

Years


© 2011 Nature America, Inc. All rights reserved.<br />

syndromes 26–28 , which reinforces <strong>the</strong> notion that aberrant <strong>splicing</strong><br />

may underlie <strong>the</strong> development <strong>of</strong> certain neoplasias. According to<br />

<strong>the</strong>se reports, <strong>the</strong> only known <strong>mutations</strong> expected to affect <strong>splicing</strong><br />

machinery that are present in both myelodysplasia and CLL are located<br />

in SF3B1. Two o<strong>the</strong>r genes, U2AF2 and ZRSR2, are also mutated in<br />

both types <strong>of</strong> hematological neoplasias, albeit at different positions in<br />

each cancer type. To determine whe<strong>the</strong>r SF3B1 could also be mutated<br />

in o<strong>the</strong>r lymphoid neoplasias, we sequenced exons 14, 15, 16 and<br />

18 in 156 non­Hodgkin’s lymphomas. No <strong>mutations</strong> were found in<br />

any <strong>of</strong> <strong>the</strong>se tumors, suggesting that SF3B1 <strong>mutations</strong> in lymphoid<br />

neoplasias may be specific for CLL.<br />

Comparative analysis <strong>of</strong> cases in which SF3B1 is mutated (N = 8) and<br />

not mutated (N = 12) using exon arrays uncovered a set <strong>of</strong> 184 genes<br />

with exons showing differential inclusion levels between <strong>the</strong>se two subgroups<br />

(false discovery rate (FDR) < 0.005) (Supplementary Table 12).<br />

We analyzed <strong>the</strong> results <strong>of</strong> high­throughput <strong>sequencing</strong> <strong>of</strong> total RNA<br />

isolated from <strong>the</strong> tumor cells <strong>of</strong> 15 cases, including 4 with <strong>mutations</strong> in<br />

SF3B1. We searched for those <strong>splicing</strong> junctions that were differentially<br />

expressed between tumors with or without SF3B1 <strong>mutations</strong>, while<br />

maintaining similar overall gene expression levels (Supplementary<br />

Table 13). All <strong>the</strong> differential <strong>splicing</strong> junctions identified with this<br />

method contained a previously described 5′ donor site and a new,<br />

abnormal 3′ acceptor site. Because SF3B1 ensures <strong>the</strong> fidelity <strong>of</strong> <strong>the</strong><br />

3′ branch point 20 , activation <strong>of</strong> cryptic 3′ splice sites is <strong>the</strong> expected<br />

effect <strong>of</strong> altering SF3B1 function. The low number <strong>of</strong> candidate <strong>splicing</strong><br />

targets obtained with this method suggests that <strong>the</strong> SF3B1 somatic<br />

<strong>mutations</strong> found in CLL do not impair <strong>the</strong> general function <strong>of</strong> <strong>the</strong><br />

protein but ra<strong>the</strong>r alter its function in some specific instances.<br />

We confirmed <strong>the</strong> enhanced expression <strong>of</strong> truncated mRNAs from<br />

candidate <strong>splicing</strong> target genes in SF3B1­mutated cases with quantitative<br />

PCR analysis (Fig. 3 and Supplementary Fig. 3). The new<br />

forms include truncated versions <strong>of</strong> SLC23A2, encoding a vitamin<br />

C transporter 29 , and TCIRG1, one <strong>of</strong> whose gene products is a<br />

T­cell immune regulator 30 . In addition, one <strong>of</strong> <strong>the</strong> novel <strong>splicing</strong> sites<br />

affects FOXP1, encoding a forkhead transcription factor whose altered<br />

expression has been linked to diffuse large B­cell lymphoma 31 . The<br />

predicted product <strong>of</strong> <strong>the</strong> newly identified FOXP1 transcript, which<br />

we call FOXP1w, encodes a protein truncated at its C terminus that<br />

lacks two putative PEST sequences involved in protein degradation<br />

(Fig. 3a). A quantitative RT­PCR analysis showed that <strong>the</strong> expression<br />

<strong>of</strong> FOXP1w was three times higher in SF3B1­mutated CLL cases than<br />

in cases without mutation in SF3B1 (Fig. 3b).<br />

Finally, a clinical analysis showed that individuals with SF3B1<br />

somatic <strong>mutations</strong> present advanced disease at diagnosis and adverse<br />

biological features, such as elevated serum β 2 ­microglobulin and<br />

IGHV loci without <strong>mutations</strong>, compared to individuals not carrying<br />

<strong>the</strong>se <strong>mutations</strong> (Fig. 4 and Supplementary Table 14). Fur<strong>the</strong>rmore,<br />

individuals with SF3B1 somatic <strong>mutations</strong> had significantly shorter<br />

time to disease progression (P = 0.0001) and lower 10­year overall<br />

survival rates (P = 0.002). Cox analyses suggested that <strong>the</strong> mutational<br />

status <strong>of</strong> SF3B1 has a prognostic value independent <strong>of</strong> clinical stage<br />

or ZAP­70 or CD38 expression (Supplementary Table 15). However,<br />

when <strong>the</strong> mutational status <strong>of</strong> SF3B1 was compared with that <strong>of</strong><br />

<strong>the</strong> IGHV region, only <strong>the</strong> latter retained prognostic significance.<br />

Collectively, <strong>the</strong>se data suggest that SF3B1 <strong>mutations</strong> are associated<br />

with aggressive forms <strong>of</strong> CLL.<br />

In summary, we have performed exome analysis <strong>of</strong> 105 CLL cases<br />

in <strong>the</strong> context <strong>of</strong> <strong>the</strong> International Cancer Genome Consortium<br />

initiative 32 and have found more than 70 new genes that are <strong>recurrent</strong>ly<br />

mutated in CLL, thus illustrating <strong>the</strong> heterogeneity <strong>of</strong> <strong>the</strong> disease<br />

and reinforcing <strong>the</strong> relevance <strong>of</strong> genome­wide studies to identify <strong>the</strong><br />

l e t t e r s<br />

mutated genes causally involved in each individual with cancer. The<br />

high frequency <strong>of</strong> <strong>mutations</strong> in some specific genes, such as SF3B1,<br />

sets <strong>the</strong> stage for fur<strong>the</strong>r studies. In addition, <strong>the</strong> fact that a substantial<br />

fraction <strong>of</strong> <strong>the</strong> identified genes participate in well­known genetic<br />

pathways, including gene <strong>splicing</strong>, suggests that treatment strategies<br />

targeting common pathogenic mechanisms could be used for CLL<br />

and o<strong>the</strong>r malignancies. New knowledge provided by <strong>the</strong>se large­scale<br />

efforts may eventually result in <strong>the</strong> identification <strong>of</strong> new <strong>the</strong>rapies for<br />

this frequent type <strong>of</strong> human leukemia.<br />

URLs. COSMIC database, http://www.sanger.ac.uk/genetics/CGP/<br />

cosmic/, European Genotype­phenome Archive (EGA), http://www.<br />

ebi.ac.uk/ega/; Ensembl, http://asia.ensembl.org/index.html; SAMtools,<br />

http://samtools.sourceforge.net/SAM1.pdf; Picard, http://picard.<br />

sourceforge.net/index.shtml/; HHpred, http://toolkit.tuebingen.mpg.<br />

de/hhpred/; Jpred, http://www.compbio.dundee.ac.uk/www­jpred/.<br />

MeTHods<br />

Methods and any associated references are available in <strong>the</strong> online<br />

version <strong>of</strong> <strong>the</strong> paper at http://www.nature.com/naturegenetics/.<br />

Accession numbers. Sequencing, expression and genotyping array<br />

data have been deposited at <strong>the</strong> European Genome­Phenome<br />

Archive, which is hosted at <strong>the</strong> European Bioinformatics Institute<br />

(EGAS00000000092).<br />

Note: Supplementary information is available on <strong>the</strong> Nature Genetics website.<br />

ACkNOwLEDGMENTS<br />

We are grateful to P. Klatt for continuous support, E. Montserrat, J. Valcárcel,<br />

P. Nicolás and C. Romeo­Casabona for helpful comments, S. Guijarro, S. Martín,<br />

C. Capdevila, M. Sánchez and L. Plá for excellent technical assistance, and<br />

N. Villahoz and C. Muro for excellent work in <strong>the</strong> coordination <strong>of</strong> <strong>the</strong> CLL<br />

Spanish Consortium. We are also very grateful to all <strong>the</strong> individuals with CLL who<br />

participated in this study. This work was funded by <strong>the</strong> Spanish Ministry <strong>of</strong> Science<br />

and Innovation (MICINN) through <strong>the</strong> Instituto de Salud Carlos III (ISCIII)<br />

and Red Temática de Investigación del Cáncer (RTICC) del ISCIII. C.L­O. is an<br />

Investigator <strong>of</strong> <strong>the</strong> Botín Foundation.<br />

AUTHOR CONTRIBUTIONS<br />

V.Q., G.R.O., A.J.R., G.V., J.M.P.F. and X.S.P. developed <strong>the</strong> bioinformatic<br />

algorithms and performed <strong>the</strong> analysis <strong>of</strong> sequence data. L.C., P.J., M.P., M.L.­G.,<br />

D.C. and A.N. were responsible for downstream validation analysis and<br />

functional studies. L.B., S.B. and J.M.C.T. studied structural variants. D.A.P.,<br />

H.H., M.B., S.H. and M.G. were responsible for generating libraries, performing<br />

exome capture and running sequencers. M.A. prepared and supervised<br />

<strong>the</strong> bioethics requirements. N.V., A.M.­T., T.B., J.D., E.G., A.L.­G. and E.C.<br />

performed clinical and biological studies. M.R., M.G.­D., N.V. and J.M.H.<br />

reviewed <strong>the</strong> pathologic data and confirmed <strong>the</strong> diagnosis. R.R., J.L.G., M.O.,<br />

D.G.P., J.Z., M.V. and A.V. were in charge <strong>of</strong> bioinformatics data management.<br />

I.G. coordinated <strong>the</strong> <strong>sequencing</strong> efforts and performed primary data analysis.<br />

V.Q., X.S.P., X.E., A. L.­G., E.C. and C.L.­O. directed <strong>the</strong> research and wrote <strong>the</strong><br />

manuscript, which all authors have approved.<br />

COMPETING FINANCIAL INTERESTS<br />

The authors declare no competing financial interests.<br />

Published online at http://www.nature.com/naturegenetics/.<br />

Reprints and permissions information is available online at http://www.nature.com/<br />

reprints/index.html.<br />

1. Rozman, C. & Montserrat, E. Chronic lymphocytic leukemia. N. Engl. J. Med. 333,<br />

1052–1057 (1995).<br />

2. Zenz, T., Mertens, D., Kuppers, R., Dohner, H. & Stilgenbauer, S. From pathogenesis<br />

to treatment <strong>of</strong> chronic lymphocytic leukaemia. Nat. Rev. Cancer 10, 37–50 (2010).<br />

3. Damle, R.N. et al. Ig V gene mutation status and CD38 expression as novel prognostic<br />

indicators in chronic lymphocytic leukemia. Blood 94, 1840–1847 (1999).<br />

Nature GeNetics ADVANCE ONLINE PUBLICATION 5


© 2011 Nature America, Inc. All rights reserved.<br />

l e t t e r s<br />

4. Hamblin, T.J., Davis, Z., Gardiner, A., Oscier, D.G. & Stevenson, F.K. Unmutated<br />

Ig V H genes are associated with a more aggressive form <strong>of</strong> chronic lymphocytic<br />

leukemia. Blood 94, 1848–1854 (1999).<br />

5. Puente, X.S. et al. Whole-genome <strong>sequencing</strong> <strong>identifies</strong> <strong>recurrent</strong> <strong>mutations</strong> in<br />

chronic lymphocytic leukaemia. Nature 475, 101–105 (2011).<br />

6. Greenman, C. et al. Patterns <strong>of</strong> somatic mutation in human cancer genomes. Nature<br />

446, 153–158 (2007).<br />

7. Pleasance, E.D. et al. A comprehensive catalogue <strong>of</strong> somatic <strong>mutations</strong> from a<br />

human cancer genome. Nature 463, 191–196 (2010).<br />

8. Pleasance, E.D. et al. A small-cell lung cancer genome with complex signatures <strong>of</strong><br />

tobacco exposure. Nature 463, 184–190 (2010).<br />

9. Greif, P.A. et al. Identification <strong>of</strong> recurring tumor-specific somatic <strong>mutations</strong> in<br />

acute myeloid leukemia by transcriptome <strong>sequencing</strong>. Leukemia 25, 821–827<br />

(2011).<br />

10. Chapman, M.A. et al. Initial genome <strong>sequencing</strong> and analysis <strong>of</strong> multiple myeloma.<br />

Nature 471, 467–472 (2011).<br />

11. Wahl, M.C., Will, C.L. & Luhrmann, R. The spliceosome: design principles <strong>of</strong> a<br />

dynamic RNP machine. Cell 136, 701–718 (2009).<br />

12. Wang, H., Nora, G.J., Ghodke, H. & Opresko, P.L. Single molecule studies <strong>of</strong><br />

physiologically relevant telomeric tails reveal POT1 mechanism for promoting<br />

G-quadruplex unfolding. J. Biol. Chem. 286, 7479–7489 (2011).<br />

13. Marfella, C.G. & Imbalzano, A.N. The Chd family <strong>of</strong> chromatin remodelers.<br />

Mutat. Res. 618, 30–40 (2007).<br />

14. Prazeres, H. et al. Chromosomal, epigenetic and microRNA-mediated inactivation<br />

<strong>of</strong> LRP1B, a modulator <strong>of</strong> <strong>the</strong> extracellular environment <strong>of</strong> thyroid cancer cells.<br />

Oncogene 30, 1302–1317 (2011).<br />

15. Tiacci, E. et al. BRAF <strong>mutations</strong> in hairy-cell leukemia. N. Engl. J. Med. 364,<br />

2305–2315 (2011).<br />

16. Zhang, X. et al. Sequence analysis <strong>of</strong> 515 kinase genes in chronic lymphocytic<br />

leukemia. Leukemia published online (24 June 2011), doi:10.1038/leu.2011.163.<br />

17. Zainuddin, N. et al. TP53 <strong>mutations</strong> are infrequent in newly diagnosed chronic<br />

lymphocytic leukemia. Leuk. Res. 35, 272–274 (2011).<br />

18. Fabbri, G. et al. Analysis <strong>of</strong> <strong>the</strong> chronic lymphocytic leukemia coding genome:<br />

role <strong>of</strong> NOTCH1 mutational activation. J. Exp. Med. 208, 1389–1401 (2011).<br />

19. Folco, E.G., Coil, K.E. & Reed, R. The anti-tumor drug E7107 reveals an essential<br />

role for SF3b in remodeling U2 snRNP to expose <strong>the</strong> branch point-binding region.<br />

Genes Dev. 25, 440–444 (2011).<br />

20. Corrionero, A., Minana, B. & Valcarcel, J. Reduced fidelity <strong>of</strong> branch point<br />

recognition and alternative <strong>splicing</strong> induced by <strong>the</strong> anti-tumor drug spliceostatin<br />

A. Genes Dev. 25, 445–459 (2011).<br />

21. Golas, M.M., Sander, B., Will, C.L., Luhrmann, R. & Stark, H. Molecular architecture<br />

<strong>of</strong> <strong>the</strong> multiprotein <strong>splicing</strong> factor SF3b. Science 300, 980–984 (2003).<br />

22. David, C.J. & Manley, J.L. Alternative pre-mRNA <strong>splicing</strong> regulation in cancer:<br />

pathways and programs unhinged. Genes Dev. 24, 2343–2364 (2010).<br />

23. Karni, R. et al. The gene encoding <strong>the</strong> <strong>splicing</strong> factor SF2/ASF is a proto-oncogene.<br />

Nat. Struct. Mol. Biol. 14, 185–193 (2007).<br />

24. Golan-Gerstl, R. et al. Splicing factor hnRNP A2/B1 regulates tumor suppressor<br />

gene <strong>splicing</strong> and is an oncogenic driver in glioblastoma. Cancer Res. 71,<br />

4464–4472 (2011).<br />

25. Kaida, D. et al. Spliceostatin A targets SF3b and inhibits both <strong>splicing</strong> and nuclear<br />

retention <strong>of</strong> pre-mRNA. Nat. Chem. Biol. 3, 576–583 (2007).<br />

26. Yoshida, K. et al. Frequent pathway <strong>mutations</strong> <strong>of</strong> <strong>splicing</strong> machinery in<br />

myelodysplasia. Nature 478, 64–69 (2011).<br />

27. Papaemmanuil, E. et al. Somatic SF3B1 mutation in myelodysplasia with ring<br />

sideroblasts. N. Engl. J. Med. 365, 1384–1395 (2011).<br />

28. Visconte, V. et al. SF3B1, a <strong>splicing</strong> factor is frequently mutated in refractory<br />

anemia with ring sideroblasts. Leukemia published online (2 September 2011),<br />

doi:10.1038/leu.2011.232.<br />

29. Chen, A.A. et al. Genetic variation in <strong>the</strong> vitamin C transporter, SLC23A2, modifies<br />

<strong>the</strong> risk <strong>of</strong> HPV16-associated head and neck cancer. Carcinogenesis 30, 977–981<br />

(2009).<br />

30. Bulwin, G.C. et al. TIRC7 inhibits T cell proliferation by modulation <strong>of</strong> CTLA-4<br />

expression. J. Immunol. 177, 6833–6841 (2006).<br />

31. Brown, P.J. et al. Potentially oncogenic B-cell activation–induced smaller is<strong>of</strong>orms<br />

<strong>of</strong> FOXP1 are highly expressed in <strong>the</strong> activated B cell–like subtype <strong>of</strong> DLBCL.<br />

Blood 111, 2816–2824 (2008).<br />

32. Hudson, T.J. et al. International network <strong>of</strong> cancer genome projects. Nature 464,<br />

993–998 (2010).<br />

6 ADVANCE ONLINE PUBLICATION Nature GeNetics


© 2011 Nature America, Inc. All rights reserved.<br />

oNLINe MeTHods<br />

Samples. The current studies were approved by <strong>the</strong> institutional review board<br />

(IRB) <strong>of</strong> Hospital Clinic (Barcelona, Spain). All subjects in <strong>the</strong> initial screening<br />

gave informed consent for <strong>the</strong>ir participation according to <strong>the</strong> International<br />

Cancer Genome Consortium (ICGC) guidelines, and <strong>the</strong> subjects in <strong>the</strong><br />

mutational screening and clinical validation analysis agreed to IRB­approved<br />

informed consent for genetic studies. Detailed information about <strong>the</strong> collection<br />

and processing <strong>of</strong> samples is provided in <strong>the</strong> Supplementary Note.<br />

<strong>Exome</strong> enrichment. Genomic DNA (3 µg) from each sample was sheared and<br />

used for <strong>the</strong> construction <strong>of</strong> a paired­end <strong>sequencing</strong> library as described in<br />

<strong>the</strong> protocol provided by Illumina 33 . Enrichment <strong>of</strong> exonic sequences was<br />

<strong>the</strong>n performed for each library using <strong>the</strong> SureSelect Human All Exon 50Mb<br />

kit (Agilent) following <strong>the</strong> manufacturer’s instructions. Exon­enriched DNA<br />

was precipitated with magnetic beads coated with streptavidin (Invitrogen)<br />

and was washed and eluted. An additional 18 cycles <strong>of</strong> amplification were<br />

<strong>the</strong>n performed on <strong>the</strong> captured library. Exon enrichment was validated by<br />

real­time PCR in a 7300 Real­Time PCR System (Applied Biosystems) using a<br />

set <strong>of</strong> two pairs <strong>of</strong> primers to amplify exons and one pair to amplify an intron.<br />

Enriched libraries were sequenced in one lane <strong>of</strong> an Illumina Gene Analyzer<br />

II× sequencer, using <strong>the</strong> standard protocol.<br />

RNA <strong>sequencing</strong>. RNA was assessed for quality and was quantified using an<br />

RNA 6000 Nano LabChip kit on a 2100 Bioanalyzer (both from Agilent). The<br />

RNA­Seq libraries were prepared according to <strong>the</strong> standard Illumina protocol<br />

and <strong>the</strong> mRNA­Seq Sample Prep and Paired­End Sample Prep kits. cDNA<br />

libraries were checked for quality and quantified using <strong>the</strong> DNA­1000 kit<br />

(Agilent) on a 2100 Bioanalyzer. Each library was sequenced with <strong>the</strong> Illumina<br />

Sequencing Kit v4 on one lane <strong>of</strong> a Gene Analyzer II× sequencer to obtain<br />

76­bp paired­end reads.<br />

Read mapping and processing. For RNA­Seq, reads were aligned to <strong>the</strong> human<br />

reference genome (GRCh37) with TopHat 34 . For each gene in <strong>the</strong> Human<br />

Ensembl genes v60, <strong>splicing</strong> junctions were determined using <strong>the</strong> CIGAR<br />

string <strong>of</strong> <strong>the</strong> mapped reads (SAMtools, see URLs). For SF3B1­mutated samples,<br />

a visual inspection with tview from SAMtools confirmed that <strong>the</strong> mutant<br />

alleles were expressed at <strong>the</strong> same levels as <strong>the</strong> wild­type alleles. For exome<br />

<strong>sequencing</strong>, reads from each library were mapped to <strong>the</strong> human reference<br />

genome (GRCh37) using Burrows­Wheeler analysis (BWA) 35 with <strong>the</strong> sampe<br />

option, and a BAM file was generated using SAMtools 36 . Reads from <strong>the</strong> same<br />

paired­end libraries were merged, and optical or PCR duplicates were removed<br />

using Picard. Statistics for <strong>the</strong> number <strong>of</strong> mapped reads and depth <strong>of</strong> coverage<br />

for each sample are shown in Supplementary Table 3. For <strong>the</strong> identification<br />

<strong>of</strong> somatic substitutions, we used <strong>the</strong> Sidrón algorithm, which has previously<br />

been described 5 . Because <strong>of</strong> <strong>the</strong> robust performance <strong>of</strong> this algorithm, a single<br />

probability table was used for all <strong>the</strong> experiments, and we added new cut­<strong>of</strong>f<br />

values for <strong>the</strong> S parameter (30 for coverage higher than 50 and 50 for coverage<br />

higher than 100). The validation rate <strong>of</strong> <strong>the</strong> somatic <strong>mutations</strong> detected by<br />

Sidrón was higher than 90% as assessed by Sanger <strong>sequencing</strong>.<br />

Gene and exon expression analysis. Total RNA was extracted with TRIzol<br />

reagent (Invitrogen) following <strong>the</strong> recommendations <strong>of</strong> <strong>the</strong> manufacturer. RNA<br />

integrity was examined with <strong>the</strong> Agilent 2100 Bioanalyzer, and high­quality<br />

RNA samples were hybridized to Affymetrix GeneChip Human Genome U133<br />

plus 2.0 arrays and Human Exon 1.0 ST arrays according to Affymetrix standard<br />

protocols. The analysis <strong>of</strong> scanned images for each probe set <strong>of</strong> <strong>the</strong> array was<br />

obtained with GeneChip Operating S<strong>of</strong>tware (GCOS, Affymetrix). Expression<br />

Console s<strong>of</strong>tware (Affymetrix) was used to generate summarized expression<br />

values and <strong>the</strong> detection call for <strong>the</strong> HU133 Plus 2.0 arrays using <strong>the</strong> MAS5 algorithm<br />

and <strong>the</strong> robust multichip average (RMA). Similarly, we used RMA with<br />

<strong>the</strong> Affymetrix core set to generate summarized signaling and <strong>the</strong> detection<br />

above background (DABG) score for <strong>the</strong> exon arrays. In order to retrieve differentially<br />

expressed genes, a supervised analysis with <strong>the</strong> HU133 Plus 2.0 arrays<br />

was performed with BRB­Array Tools, v.3.6.0 s<strong>of</strong>tware. Genes were considered<br />

to be expressed in CLL when at least 10% <strong>of</strong> CLL samples showed a present<br />

detection call. The exon arrays were analyzed with <strong>the</strong> Partek Genomics Suite<br />

6.5 application to identify transcripts alternatively spliced in cases with SF3B1<br />

doi:10.1038/ng.1032<br />

mutation and control cases. The Affymetrix core probe set was used for <strong>splicing</strong><br />

analysis using <strong>the</strong> ANOVA strategy implemented in <strong>the</strong> Partek Genomics Suite.<br />

First, we removed <strong>the</strong> probe sets that showed a DABG score (P > 0.05) in more<br />

than 50% <strong>of</strong> <strong>the</strong> samples <strong>of</strong> at least one group. The probe sets with a low signal<br />

(signal 2 or


© 2011 Nature America, Inc. All rights reserved.<br />

SF3B1 <strong>mutations</strong> in this case series was assessed with multivariate analyses<br />

including this variable and <strong>the</strong> main clinical and biological variables that had<br />

prognostic significance in univariate studies (Supplementary Table 15). The<br />

multivariate analysis for survival was performed with <strong>the</strong> stepwise proportional<br />

hazards model (Cox model).<br />

To test whe<strong>the</strong>r a gene was mutated more frequently than expected by<br />

chance, we calculated <strong>the</strong> basal probability that each gene would acquire a<br />

nonsynonymous mutation (P NS ) as<br />

N NS × l<br />

PNS<br />

=<br />

( NNS + NS ) E<br />

In this equation, N NS is <strong>the</strong> total number <strong>of</strong> possible nonsynonymous <strong>mutations</strong><br />

and N S is <strong>the</strong> total number <strong>of</strong> possible synonymous <strong>mutations</strong>, with both<br />

<strong>of</strong> <strong>the</strong>se computed for each gene; l is <strong>the</strong> length <strong>of</strong> <strong>the</strong> open­reading frame<br />

(ORF) <strong>of</strong> <strong>the</strong> gene; and E is <strong>the</strong> total length <strong>of</strong> <strong>the</strong> sequenced exome. Thus, <strong>the</strong><br />

probability P <strong>of</strong> finding M or more nonsynonymous <strong>mutations</strong> in a given gene<br />

from a set <strong>of</strong> N somatic <strong>mutations</strong> in all cases is<br />

M −1<br />

⎛ N⎞<br />

j 1−<br />

j<br />

P = 1 − ∑ P P<br />

⎝<br />

⎜ j ⎠<br />

⎟ NS ( 1−<br />

NS)<br />

j = 0<br />

33. Bentley, D.R. et al. Accurate whole human genome <strong>sequencing</strong> using reversible<br />

terminator chemistry. Nature 456, 53–59 (2008).<br />

34. Trapnell, C., Pachter, L. & Salzberg, S.L. TopHat: discovering splice junctions with<br />

RNA-Seq. Bioinformatics 25, 1105–1111 (2009).<br />

35. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler<br />

transform. Bioinformatics 25, 1754–1760 (2009).<br />

36. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25,<br />

2078–2079 (2009).<br />

37. Baudot, A., de la Torre, V. & Valencia, A. Mutated genes, pathways and processes<br />

in tumours. EMBO Rep. 11, 805–810 (2010).<br />

38. Kanehisa, M., Goto, S., Furumichi, M., Tanabe, M. & Hirakawa, M. KEGG for<br />

representation and analysis <strong>of</strong> molecular networks involving diseases and drugs.<br />

Nucleic Acids Res. 38, D355–D360 (2010).<br />

39. Schaefer, C.F. et al. PID: <strong>the</strong> Pathway Interaction Database. Nucleic Acids Res. 37,<br />

D674–D679 (2009).<br />

40. Ashburner, M. et al. Gene ontology: tool for <strong>the</strong> unification <strong>of</strong> biology. The Gene<br />

Ontology Consortium. Nat. Genet. 25, 25–29 (2000).<br />

41. Reva, B., Antipin, Y. & Sander, C. Predicting <strong>the</strong> functional impact <strong>of</strong> protein<br />

<strong>mutations</strong>: application to cancer genomics. Nucleic Acids Res. 39, e118<br />

(2011).<br />

42. Wu, S. & Zhang, Y. MUSTER: improving protein sequence pr<strong>of</strong>ile-pr<strong>of</strong>ile alignments<br />

by using multiple sources <strong>of</strong> structure information. Proteins 72, 547–556<br />

(2008).<br />

43. Peto, R. & Pike, M.C. Conservatism <strong>of</strong> <strong>the</strong> approximation sigma (O-E)2-E in <strong>the</strong><br />

logrank test for survival data or tumor incidence data. Biometrics 29, 579–584<br />

(1973).<br />

Nature GeNetics doi:10.1038/ng.1032

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!