10.07.2015 Views

Untitled

Untitled

Untitled

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

volume 28 number 10 october 2010F o c u s o n : epigeneticseditorial1031 Making a mark© 2010 Nature America, Inc. All rights reserved.The painting Histone SubunitExchange portrays the dynamicnature of nucleosome structure.This issue focuses on the role ofepigenetics in health and diseaseand discusses the therapeuticprospects of targeting the epigeneticmachinery. Credit: David Sweatt.opinion and commentCOMMENTARY1033 Linking cell signaling and the epigenetic machineryHelai P Mohammad & Stephen B Baylin1039 Tackling the epigenome: challenges and opportunities for collaborationJohn S Satterlee, Dirk Schübeler & Huck-Hui Ng1045 The NIH Roadmap Epigenomics Mapping ConsortiumBradley E Bernstein, John A Stamatoyannopoulos, Joseph F Costello, Bing Ren,Aleksandar Milosavljevic, Alexander Meissner, Manolis Kellis, Marco A Marra,Arthur L Beaudet, Joseph R Ecker, Peggy J Farnham, Martin Hirst, Eric S Lander,Tarjei S Mikkelsen & James A Thomson1049 Epigenomics reveals a functional genome anatomy and a new approach tocommon diseaseAndrew P Feinbergcomputational biologyCOMMENTARY1053 Putting epigenome comparison into practiceAleksandar MilosavljevicresearchReviews1057 Epigenetic modifications and human diseaseAnna Portela & Manel Esteller1069 Epigenetic modifications as therapeutic targetsTheresa K Kelly, Daniel D De Carvalho & Peter A Jones1079 Epigenetic modifications in pluripotent and differentiated cellsAlexander Meissner1089 Genomics tools for unraveling chromosome architectureBas van Steensel & Job DekkerNature Biotechnology (ISSN 1087-0156) is published monthly by Nature Publishing Group, a trading name of Nature America Inc. located at 75 Varick Street,Fl 9, New York, NY 10013-1917. Periodicals postage paid at New York, NY and additional mailing post offices. Editorial Office: 75 Varick Street, Fl 9, New York,NY 10013-1917. Tel: (212) 726 9335, Fax: (212) 696 9753. Annual subscription rates: USA/Canada: US$250 (personal), US$3,520 (institution), US$4,050(corporate institution). Canada add 5% GST #104911595RT001; Euro-zone: €202 (personal), €2,795 (institution), €3,488 (corporate institution); Rest of world(excluding China, Japan, Korea): £130 (personal), £1,806 (institution), £2,250 (corporate institution); Japan: Contact NPG Nature Asia-Pacific, Chiyoda Building,2-37 Ichigayatamachi, Shinjuku-ku, Tokyo 162-0843. Tel: 81 (03) 3267 8751, Fax: 81 (03) 3267 8746. POSTMASTER: Send address changes to NatureBiotechnology, Subscriptions Department, 342 Broadway, PMB 301, New York, NY 10013-3910. Authorization to photocopy material for internal or personaluse, or internal or personal use of specific clients, is granted by Nature Publishing Group to libraries and others registered with the Copyright Clearance Center(CCC) Transactional Reporting Service, provided the relevant copyright fee is paid direct to CCC, 222 Rosewood Drive, Danvers, MA 01923, USA. Identificationcode for Nature Biotechnology: 1087-0156/04. Back issues: US$45, Canada add 7% for GST. CPC PUB AGREEMENT #40032744. Printed by PublishersPress, Inc., Lebanon Junction, KY, USA. Copyright © 2010 Nature America, Inc. All rights reserved. Printed in USA.i


volume 28 number 10 october 2010editorial987 Teetering on the brink© 2010 Nature America, Inc. All rights reserved.Uncertain future for ESC research,p 987 and p 991Screening for rare heart conditions,p 1003news989 Geron trial resumes, but standards for stem cell trials remain elusive990 China’s $2.4 billion splurge991 US courts throw ES cell research into disarray992 Drug user fees top $1 million992 Sugar beets still in the game992 Roche backs Aileron’s stapled peptides994 Life swallows Ion Torrent994 Anti-anemics price hike994 Genzyme resumes shipping as Sanofi-aventis hovers995 Cancer research fund launches biologics pilot plant996 Wellcome partners with India996 Hungary eyes biotech jobs996 Monsanto relaxes restrictions on sharing seeds for research997 Newsmaker: Constellation Pharmaceuticals998 data page: Drug pipeline: Q310999 news feature: Turning the tide in lung cancer1003 news feature: At the heart of genetic testingBioentrepreneurBuilding a business1007 Why you need a lawyerCraig Shimasakiopinion and commentCORRESPONDENCE1010 Safe and effective synthetic biology1012 The regulatory bottleneck for biotech specialty crops1015 ProHits: integrated software for mass spectrometry–based interaction proteomics1017 More sizzle than fizzlecommentary1018 Case study: The path less costlyBrady HuggettIP windfall for faculty Down Under,p 1019Featurepatents1019 Faculty and employee ownership of inventions in AustraliaAmanda McBratney & Julie-Anne Tarr1023 Recent patent applications in gene synthesis1023 Selected patent expirations/extensions in the second half of 2010nature biotechnologyiii


volume 28 number 10 october 2010RASMEKERKRSKRTK RTKPI(3)KTORC2 PDK1TORC1 AKTS6KNEWS AND VIEWS1025 Timing is everything in the human embryoAnn A Kiessling see also p 11151026 Taking the measure of the methylomeStephan Beck see also p 1097 and p 11061028 Tracing cancer networks with phosphoproteomicsDavid B Solit & Ingo K Mellinghoff1030 Research highlightsCompound-directed biomarkerdiscovery, p 1028research© 2010 Nature America, Inc. All rights reserved.Benchmarking DNA methylationanalysis, p 1097 and p 1106analysis1097 Comparison of sequencing-based methods to profile DNA methylation andidentification of monoallelic epigenetic modificationsR A Harris, T Wang, C Coarfa, R P Nagarajan, C Hong, S L Downey, B E Johnson,S D Fouse, A Delaney, Y Zhao, A Olshen, T Ballinger, X Zhou, K J Forsberg, J Gu,L Echipare, H O’Geen, R Lister, M Pelizzola, Y Xi, C B Epstein, B E Bernstein,R D Hawkins, B Ren, W-Y Chung, H Gu, C Bock, A Gnirke, M Q Zhang,D Haussler, J R Ecker, W Li, P J Farnham, R A Waterland, A Meissner,M A Marra, M Hirst, A Milosavljevic & J F Costello see also p 10261106 Quantitative comparison of genome-wide DNA methylation mapping technologiesC Bock, E M Tomazou, A B Brinkman, F Müller, F Simmer, H Gu, N Jäger,A Gnirke, H G Stunnenberg & A Meissner see also p 1026ARTICLE1115 Non-invasive imaging of human embryos before embryonic genome activationpredicts development to the blastocyst stageC C Wong, K E Loewke, N L Bossert, B Behr, C J De Jonge, T M Baer &R A Reijo Pera see also p 1025letter1123 Substrate elasticity provides mechanical signals for the expansion of hemopoieticstem and progenitor cellsJ Holst, S Watson, M S Lord, S S Eamegdool, D V Bax, L B Nivison-Smith,A Kondyurin, L Ma, A F Oberhauser, A S Weiss & J E J Rasko1129 errata and corrigendacareers and recruitment1131 Portfolio managing for scientistsDavid Sable1132 peopleInsights into early humandevelopment, p 1115nature biotechnologyv


in this issue© 2010 Nature America, Inc. All rights reserved.Benchmarking DNA methylation mappingOver the next few years, the DNA methylation patternsof at least 1,000 cell types will be determinedin an international effort to create high-quality referencemethylomes. In addition, many researchersinvestigate methylation profiles in their ownprojects using a multitude of different methods.So far, it has remained unclear how these methodscompare in terms of accuracy, cost and genome coverage, and how wellthe methylation maps derived from the different technologies correspondto each other. Bock et al. and Harris et al. present a systematic comparisonof the most commonly used technologies. Harris et al. compare fourtechniques that use high-throughput sequencing as readout and detectmethylated cytosines either by bisulfide conversion or affinity enrichmentof sequences with methylated cytosines. Bock et al. evaluate threeof the sequencing-based methods and one methylation-sensitive array.Overall, both studies find an encouragingly high concordance betweenthe methylation calls made by the different methods, although theydiffer significantly in genome coverage and cost per cytosine assayed.[Analysis, p. 1106, p. 1097; News and Views, p. 1026]MEPatent roundupA recent decision by the Australian High Court means that,unless faculty are bound by an assignment or intellectualproperty policy, they may own inventions resulting from theirresearch. McBratney and Tarr discuss the case’s implicationsfor inventors and the prospects of Bayh-Dole style legislationcoming to fruition in Australia. [Patent Article, p. 1019] MFRecent patent applications in gene synthesis.[New patents, p. 1023] MFStem cells and elasticityBiomechanical forces such asshear stress and elasticity areknown to influence the behaviorof certain types of stem cell.Rasko and colleagues have nowinvestigated the effects of elasticityon hematopoietic stemand progenitor cells. Mousebone marrow cells or humancord blood cells are cultured ondishes coated with tropoelastin, the precursor of elastin, which conferselasticity to the skin and other tissues. Culture on tropoelastinleads to a several-fold expansion of primitive hematopoietic cellpopulations. The increase in cell numbers is similar to that achievedby a cytokine cocktail, and the two effects are additive. These findingssuggest that manipulation of substrate elasticity may be avaluable complement to other strategies for in vitro expansion ofhematopoietic stem cells. [Letters, p. 1123]KANext month in• Differentiation of hES cells towards chondrocytes• Antibody discovery using small libraries• pH-dependent binding prolongs antibody longevity• Multicolor in situ hybridization in whole embryos• Vascular stem cells cultured for natural productsviiivolume 28 number 10 OCTOBER 2010 nature biotechnology


© 2010 Nature America, Inc. All rights reserved.www.nature.com/naturebiotechnologyEDITORIAL OFFICEbiotech@us.nature.com75 Varick Street, Fl 9, New York, NY 10013-1917Tel: (212) 726 9200, Fax: (212) 696 9635Chief Editor: Andrew MarshallSenior Editors: Laura DeFrancesco (News & Features), Kathy Aschheim (Research),Peter Hare (Research), Michael Francisco (Resources and Special Projects)Business Editor: Brady HuggettAssociate Business Editor: Victor BethencourtNews Editor: Lisa MeltonAssociate Editors: Markus Elsner (Research), Craig Mak (Research)Editor-at-Large: John HodgsonContributing Editors: Mark Ratner, Chris ScottContributing Writer: Jeffrey L. FoxSenior Copy Editor: Teresa MooganManaging Production Editor: Ingrid McNamaraProduction Editor: Amanda CrawfordSenior Illustrator: Katie VicariIllustrator: Marina CorralCover design: Erin DeWaltSenior Editorial Assistant: Ania LevinsonMANAGEMENT OFFICESNPG New York75 Varick Street, Fl 9, New York, NY 10013-1917Tel: (212) 726 9200, Fax: (212) 696 9006Publisher: Melanie BrazilExectutive Editor: Veronique KiermerChief Technology Officer: Howard RatnerHead of Nature Research & Reviews Marketing: Sara GirardCirculation Manager: Stacey NelsonProduction Coordinator: Diane TempranoHead of Web Services: Anthony BarreraSenior Web Production Editor: Laura GogginNPG LondonThe Macmillan Building, 4 Crinan Street, London N1 9XWTel: 44 207 833 4000, Fax: 44 207 843 4996Managing Director: Steven InchcoombePublishing Director: Peter CollinsEditor-in-Chief, Nature Publications: Philip CampbellMarketing Director: Della SarDirector of Web Publishing: Timo HannayNPG Nature Asia-PacificChiyoda Building, 2-37 Ichigayatamachi, Shinjuku-ku, Tokyo 162-0843Tel: 81 3 3267 8751, Fax: 81 3 3267 8746Publishing Director — Asia-Pacific: David SwinbanksAssociate Director: Antoine E. BocquetManager: Koichi NakamuraOperations Director: Hiroshi MinemuraMarketing Manager: Masahiro YamashitaAsia-Pacific Sales Director: Kate YoneyamaAsia-Pacific Sales Manager: Ken MikamiDISPLAY ADVERTISINGdisplay@us.nature.com (US/Canada)display@nature.com (Europe)nature@natureasia.com (Asia)Global Head of Advertising and Sponsorship: Dean Sanderson, Tel: (212) 726 9350,Fax: (212) 696 9482Global Head of Display Advertising and Sponsorship: Andrew Douglas, Tel: 44 207 843 4975,Fax: 44 207 843 4996Asia-Pacific Sales Director: Kate Yoneyama, Tel: 81 3 3267 8765, Fax: 81 3 3267 8746Display Account Managers:New England: Sheila Reardon, Tel: (617) 399 4098, Fax: (617) 426 3717New York/Mid-Atlantic/Southeast: Jim Breault, Tel: (212) 726 9334, Fax: (212) 696 9481Midwest: Mike Rossi, Tel: (212) 726 9255, Fax: (212) 696 9481West Coast: George Lui, Tel: (415) 781 3804, Fax: (415) 781 3805Germany/Switzerland/Austria: Sabine Hugi-Fürst, Tel: 41 52761 3386, Fax: 41 52761 3419UK/Ireland/Scandinavia/Spain/Portugal: Evelina Rubio-Hakansson, Tel: 44 207 014 4079,Fax: 44 207 843 4749UK/Germany/Switzerland/Austria: Nancy Luksch, Tel: 44 207 843 4968, Fax: 44 207 843 4749France/Belgium/The Netherlands/Luxembourg/Italy/Israel/Other Europe: Nicola Wright,Tel: 44 207 843 4959, Fax: 44 207 843 4749Asia-Pacific Sales Manager: Ken Mikami, Tel: 81 3 3267 8765, Fax: 81 3 3267 8746Greater China/Singapore: Gloria To, Tel: 852 2811 7191, Fax: 852 2811 0743NATUREJOBSnaturejobs@us.nature.com (US/Canada)naturejobs@nature.com (Europe)nature@natureasia.com (Asia)US Sales Manager: Ken Finnegan, Tel: (212) 726 9248, Fax: (212) 696 9482European Sales Manager: Dan Churchward, Tel: 44 207 843 4966, Fax: 44 207 843 4596Asia-Pacific Sales & Business Development Manager: Yuki Fujiwara, Tel: 81 3 3267 8765,Fax: 81 3 3267 8752SPONSORSHIPg.preston@nature.comGlobal Head of Sponsorship: Gerard Preston, Tel: 44 207 843 4965, Fax: 44 207 843 4749Business Development Executive: David Bagshaw, Tel: (212) 726 9215, Fax: (212) 696 9591Business Development Executive: Graham Combe, Tel: 44 207 843 4914, Fax: 44 207 843 4749Business Development Executive: Reya Silao, Tel: 44 207 843 4977, Fax: 44 207 843 4996SITE LICENSE BUSINESS UNITAmericas: Tel: (888) 331 6288institutions@us.nature.comAsia/Pacific: Tel: 81 3 3267 8751institutions@natureasia.comAustralia/New Zealand: Tel: 61 3 9825 1160nature@macmillan.com.auIndia: Tel: 91 124 2881054/55npgindia@nature.comROW: Tel: 44 207 843 4759institutions@nature.comCUSTOMER SERVICEwww.nature.com/helpSenior Global Customer Service Manager: Gerald CoppinFor all print and online assistance, please visit www.nature.com/helpPurchase subscriptions:Americas: Nature Biotechnology, Subscription Dept., 342 Broadway, PMB 301, New York, NY 10013-3910, USA. Tel: (866) 363 7860, Fax: (212) 334 0879Europe/ROW: Nature Biotechnology, Subscription Dept., Macmillan Magazines Ltd., Brunel Road,Houndmills, Basingstoke RG21 6XS, United Kingdom. Tel: 44 1256 329 242, Fax: 44 1256 812 358Asia-Pacific: Nature Biotechnology, NPG Nature Asia-Pacific, Chiyoda Building,2-37 Ichigayatamachi, Shinjuku-ku, Tokyo 162-0843. Tel: 81 3 3267 8751, Fax: 81 3 3267 8746India: Nature Biotechnology, NPG India, 3A, 4th Floor, DLF Corporate Park, Gurgaon 122002, India.Tel: 91 124 2881054/55, Tel/Fax: 91 124 2881052REPRINTSreprints@us.nature.comNature Biotechnology, Reprint Department, Nature Publishing Group, 75 Varick Street, Fl 9,New York, NY 10013-1917, USA.For commercial reprint orders of 600 or more, please contact:UK Reprints: Tel: 44 1256 302 923, Fax: 44 1256 321 531US Reprints: Tel: (617) 494 4900, Fax: (617) 494 4960


© 2010 Nature America, Inc. All rights reserved.At the heart of genetic testingGenetic testing for rare heart conditions might someday expand tomore common cardiac ailments. Already there are signs testing isdramatically changing how some conditions are treated and doctors’definition of who a patient is. Stephen Strauss reports.It has not been a very happy year for those hopingthat genetic testing was going to revolutionizeour ability to predict who was and who wasn’tgoing to come down with major heart diseases.Not to mention using that knowledge to dosomething about the conditions. In February,an article in the Journal of the American MedicalAssociation found that when 19,000 Americanwomen were followed an average of 12 years,an analysis of theirgenetic differences“did not improvecardiovascularrisk prediction” 1 .The catchline ofan article in Sciencemagazine in Junedeclared “So far,genome-wideassociation studieshave not foundcommon geneswith a big impacton heart health” 2 .And The New YorkTimes also in JuneCamaroon soccer star Marc-Vivien Foe collapsed anddied of hypertrophic cardiomyopathy at the age of 28.declared “after10 years of effort,geneticists are almost back to square one inknowing where to look for the roots of commondisease” 3 .Buried beneath the gloom is what might betermed a good news asterisk. It reads: none of theabove is true if we shift our gaze from commonheart conditions to a wide range of less common,but genetically linked, cardiac diseases.Over the past five years or so, testing for genemutations connected to them has been transforminghow doctors diagnose illnesses, treatpatients and expand that treatment to includefamily members.It has also given birth to a commercial genetesting industry that believes it is perched on thebrink of a major leap forward.Preventing early deathThe difference between what is happening intwo domains is so acute that David Margulies,cofounder and CEO of Correlagen, a Waltham,Massachusetts–based genetic diagnostics company(now a LabCorp subsidiary), is absolutelytart in his criticism of any linkage between whatis called genome-wide association studies inheart disease and monogenic sequencing testsfor gene-specific heart conditions. “It’s like comparingapples to zebras,” he says.A classic example of a testing apple canbe seen in the use the Canadian province ofNewfoundland and Labrador has been makingof a genetic screeningfor the heartdisorder known asarrhythmogenicright ventricularcardiomyopathy(ARVC). ARVCcauses a fattybuildup in theheart, which oftenwithout warninggenerates a highlyCorbisirregular heartbeat and then, noheartbeat at all.ARVC has becomeinfamous in theworld of sport asone explanationfor why previously ostensibly healthy athletessuddenly collapse after a competition.“ARVC often goes undetected until a persondrops dead,” says Kathy Hodgkinson, a geneticistand clinical epidemiologist at MemorialUniversity in St. John’s, Newfoundland, whowrote her PhD thesis on ARVC genetics in theprovince. “And it appears in Newfoundlandfamilies more often than it does elsewhere.”Although it is estimated that ARVC worldwideafflicts roughly 1-in-5,000 people, thatnumber may be as high as 1-in-1,000 people inNewfoundland. The high incidence is the fruitof a highly penetrant mutation, which a studyof old records and family bibles suggests firstappeared in the late 1700s in descendents of aBritish immigrant.After ARVC was first clinically described inthe 1980s, researchers at Memorial Universityin St. John’s, Newfoundland began to studythe genetics of the early and unexpected heartattacks occurring in the province. In 1997, thenews featuregroup initiated a formal search for the gene thatgives rise to the condition. They first localizedit to chromosome 3 and then in 2007 uncoveredthe exact gene where the Newfoundland-rootedmutation occurs. This research has off the topallowed scientists to get a more precise measurementof Newfoundland’s ARVC’s deadlydemographics.When 18 extended families carrying themutations were studied—the largest one comprising1,200 people with records of heart deathsextending over ten generations—it turned outthat the median age of death for men is 41.Women, probably because of the mitigatingeffect of estrogen, on average die at 71.But equally important, when you know whocarries the gene defect, there is something youcan do about it. Newfoundland doctors are nowcounseling family members with the mutationto have a cardiac defibrillator implanted. Therecommendation is being made to boys in theirlate teens and girls in their late 20s, even if thereis no overt sign of any heart disease.With their families’ history of early deaths ontheir minds, the cardioverter defibrillator (ICD)implantation is an option that Newfoundlandersare seizing upon. By 2009, 104 adults who carrythe mutation have been offered an ICD, andonly nine refused to be implanted. And theintervention is working. Last year the Memorialresearchers reported that the five-year mortalityrate in men who had an ICD implanted in themwas zero. This compares with a death rate of 28%for men who didn’t have the implantation.“We have been able to take a heart attack,which in the past was seen as an act of God,explain it as an act of genetics, and then dosomething to keep their genetics from prematurelykilling people,” says Terry-Lynn Young,a professor of molecular genetics at MemorialUniversity, who has been spearheading the studyof the mutation in the province.Spawning diagnosticsThe gene became part of a generalized ARVCscreening test that Newton, Massachusettsbased-PGxHealth offers for five genes associatedwith variants of the condition. Butmore significantly, it has now become part ofPGxHealth’s suite of heart disease gene screening.Beginning in 2004, with a test for long QTsyndrome (LQTS), which is also a suddenand unexpected heart killer, PGxHealth nowtests for six separate heart conditions. In total,upwards of 100 genes associated with geneticallylinked heart conditions are being screenedfor by various companies (Table 1).The tests have become increasingly sophisticatedand can now quantify the percentage ofcases that can be linked to each individual genemutation. The differences are rather striking.nature biotechnology volume 28 number 10 OCTOBER 2010 1003


NEWS featureTable 1 Genetics of rare heart conditionsDiseaseNumber ofgenesApproximatefrequencyTreatmentsHypertrophic cardiomyopathy 17 1 in 500 Beta blockers, implantable cardioverter-defibrillator, lifestyle changesDilated cardiomyopathy 23 1 in 2,500 Avoidance of alcohol, lowered salt intake, various heart failure drugs,Implantable cardioverter-defibrillator, heart transplantsLong QT syndrome 12 1 in 5,000–7,000 Beta blockers, implantable cardioverter-defibrillator, avoidance of strenuousactivitiesBrugada syndrome 6 1 in 2,000–10,000 Implantable cardioverter-defibrillatorArrhythmogenic right-ventricularcardiomyopathyCatecholaminergic polymorphicventricular tachycardia7 1 in 1,000–-10,000 Beta blockers, implantable cardioverter-defibrillator, avoidance of strenuousactivities2 1 in 10,000 Beta blockers© 2010 Nature America, Inc. All rights reserved.Thus, whereas in dilated cardiomyopathy(one of a group of diseases in which the heartmuscle wastes away) 12 genes associated withthe condition account for no more than 6% ofthe cases, in hypertrophic cardiomyopathy (athickening of the heart, particularly of the leftventricle) two of nine associated genes used byseveral companies in gene testing account for asmuch as 40–60% of the cases.The growing number of genes associatedwith these disorders is important, not simplybecause it leads to a deeper understanding ofthe biological pathways involved, but becausefor certain genes, the specific mutation a personcarries may have profound clinical significance.Effectively, conditions that before genetics testingwere seen as singular illnesses have in thepast few years been grouped into closely relatedconditions, each of which may manifest itself,and be treated, differently.For example, the genetic tests for LQTS differentiateseveral varieties of the condition associatedwith different genes. Type 1 LQTS accountsfor 35% of the cases, type 2 for 30% and type 3for 10%. The other ten genes currently associatedwith the condition collectively account foronly about 2% of the cases.The triggers for the variants can be quite different.Strenuous exercise, particularly swimming,has been associated with attacks anddeaths in type 1 LQTS. However, Peter Schwartz,a cardiologist at the University of Padua in Italy,who has been studying the condition sincethe early 1970s, says “we found, and that wasa surprise, that [those with] type 2 and 3 are atvery low risk during exercise, as it is not a triggerfor them.” What triggers type 2 LQTS areloud noises, think a telephone suddenly ringingor an alarm clock bell. Conversely, in type 3LQTS, the most important trigger is depressionand sleeping.What has also followed from the splittingof the condition into three genetically differentiateddisorders is a partial realization of thedream of personalized medicine. Doctors nowrecommend that people with type 1 LQTS limitstrenuous activities but those with type 2 or type3 need not.What’s more, there are implications for drugprescription. Schwartz has shown that betablockers, which typically were given to everyonediagnosed with LQTS, are significantly moreprotective for those with type 1 LQTS than forthose with type 2, and perhaps not at all effectivefor type 3.“Screening is, in variance what with a lot ofpeople think, not just a research tool; it is a clinicaltool. There is no doubt that cardiac geneticsis allowing us to modify disease management,”remarks Schwartz.Screening exercisesAnother part of screening’s clinical significanceis that it has added a significant new tool tocardiologists’ diagnostic armatorium. Many ofthe classic diagnostic technologies that indicateheart disease fail when it comes to conditions inwhich the heart suddenly stops beating becauseof a genetic abnormality.“Many times people with these conditionscan have a normal EKG [electrocardiogram],because your EKG is just a spot look,” saysSherri Bale, co-president and clinical director ofGeneDx, a gene screening diagnostics companyin Gaithersburg, Maryland. “It is a minute-anda-half,or three minutes, or whatever, snapshotof your heart. If an arrhythmia doesn’t occurduring that time, you don’t see anything.”One marker of the significance of genescreening for diagnosis is that professionalorganizations are beginning to recommendthat screening for disease-causing gene mutationsbecome a normal part of the diagnosisprocess. For example, the European TaskForce on Diagnosing ARVC recently recommendedthat the diagnostic criteria be revisedto include “identification of a pathogenicmutation categorized as associated or probablyassociated with ARVC/D in the patientunder evaluation” 4 .Who’s your patient?The diagnostic reach of cardiac disease testing isdoing more than improving diagnoses, it is nowforcing physicians to reconfigure their view asto whom their patients are. “Traditionallycardiologists are good at seeing the disease infront of them and then nailing it, attacking it,treating it. They are hardwired to treat an individualpatient well,” says Michael Ackerman, apediatric cardiologist, who is director of MayoClinic’s Long QT Syndrome Clinic in Rochester,Minnesota.“What we are not good at in cardiology historicallyis thinking of these as genetic diseasesand reflecting ‘I now have to think like a familymedicine doctor. I now have to take care of allthe family’,” he says.Some of the changes require an organizationalreconfiguration. Ackerman points to theLQTS clinic he set up at Mayo in 2000 that isgeared to evaluate, counsel and treat all affectedfamily members, regardless of age rather thanhaving the children seen in one medical facilityand the adults seen in another across the city.This is important because potentially quite a lotof family members might come in to be treated,particularly if the gene is dominant and thereforecould have been passed on to half of closeblood relatives.Heidi Rehm, a geneticist at Harvard MedicalSchool and director of the Laboratory forMolecular Medicine at Partners HealthCareCenter for Personalized Genetic Medicine inCambridge, Massachusetts, is preparing a paperon the genetic testing of over 2,000 people withhypertrophic cardiomyopathy at her facilityfrom 2004 to 2010. Of the first 533 individualswho tested positive for the mutation, 255 subsequentlybrought in at least one family memberto be tested. All told, an average of 3.4 peopleper family were tested with the range being asingle family member to 33 members of onehuge extended family.Even so, expanding these practices to includegene-carrying family members has proven1004 volume 28 number 10 OCToBER 2010 nature biotechnology


news feature© 2010 Nature America, Inc. All rights reserved.daunting to doctors, in part because, as Schwartzremarks, “a large majority of physicians grew upnot knowing a thing about genetics.” As a consequence,gene diagnosis companies are tryingto bridge the information gap by having geneticcounselors on staff whose specific job it is tocounsel not patients but the doctors who musttreat them.The issues are both complex and varied. Forexample, family testing means cardiologistsmust now confront a new emotional elementin their practices. “There is a lot of sudden anxietywhen people have to deal not only with thedeath of a family member but with somethingthat now can affect the rest of the family. Thereis an emotional overload for a lot of peoplecoming to get this type of testing,” says AmyDaly, a genetic counselor with GeneDx.At the same time, cardiologists must dealwith family members’ refusing genetic testingfor themselves—and the dire consequences ofthat ignorance. The Newfoundland group wrotein a recent paper of a 31-year-old man whodeclined to be tested, even though ARVC hadbeen detected in his family. He subsequentlydied while golfing from what turned out to beARVC 5 .“There sometimes is total denial, peoplejust saying ‘this isn’t going to happen to me’,”Memorial University’s Young explains.A different, and happier result, wasreached when it was discovered that a youngNewfoundland man training to become acommercial pilot carried the ARVC mutationwith its risk of sudden death. “We just talkedthings through,” says geneticist Hodgkinson,“and he decided to change careers.” An interestingconundrum for physicians is what to doif subjects at high risk of sudden death chooseto ignore the information and continue in aprofession where their condition might putother lives in danger. Parents must also decidewhether or not to test their potentially at-riskchildren for the mutations.The difficulties that these and other geneticscreening and diagnosis issues introduce intoa medical practice have fed into what is seenas a general reluctance by many cardiologiststo expand their treatment to include genetesting and gene counseling for family members.Some feel only a legal impetus is going tochange this.In a recent editorial in the Journal of theAmerican College of Cardiology, Schwartzhas argued that only the threat of malpracticewill produce a general acceptance of what isbeing termed ‘cascade screening’ 6 . “I am afraidthe turning point will be when someone willbe convicted in court for not having recommendedgenetic screening and someone died,”he says.The mutational conundrumWhereas the rapid expansion and almost immediateapplications of genetic screening for lesscommon heart conditions clearly has been beneficial,it has brought with it several unresolvedissues. One is the meaning and the multiplicityof mutations. Rehm points to data she analyzedseveral years ago, where she found that out ofmore than 1,000 mutations in her database “850of them were pathogenic, 150 were not.”What is unclear in the extreme is how to differentiatethe dangerous from the benign whenit comes to mutations. “When you scan a largegroup of healthy volunteers … rare variants popup in them, pop up right next door to aminoacids in which there is no doubt about diseasemutations,” says Ackerman.Not to mention the effect of multiple mutations.About 7% of the people in Rehm’s studyhave at least one additional mutation. “The significanceof the second mutation isn’t alwaysclear,” says Rehm. This is confusing to doctors.“Physicians have a lot of questions about whatwe call ‘variants of unknown significance’,” saysGeneDx’s Daly. But it may be even more confusingto patients and family members who haveto decide if they are going to initiate treatmentsor actions to reduce their risks. In a soon-tobe-publishedpaper, Rehm and her associatesdescribe how, when a positive mutation resultcame in, one mother decided to severely reducethe activity of one of her children, only to beinformed a year later that the laboratory thatscreened for the disease had decided the mutationwas benign.The money gameAnd then there is the question of who paysand how much they pay for the testing. Indeed,people point out that the differences betweencountries when it comes to paying for genetesting is almost a litmus test for that country’smedical system. Ackerman says that when thetests for LQTS genes first became commerciallyavailable in 2004, there was a great deal of excitementbecause it was felt that the tests were finallygoing to be of clinical significance.This was in part driven by the fast turnaroundtime of the commercial tests—6 to 8 weeks asopposed to months or even years when universitylaboratories alone oversaw testing. “Butguess what? What we learned —our patients’insurance was not paying for it,” Ackermannotes.It has only been in the past couple of yearsthat many US payers have been picking up most,generally about 75%, of the price of the testing.Part of what has convinced them has been theeconomics of a negative screening. In place ofconducting yearly magnetic resonance imagingor EKGs on patients whose susceptibilityto the disease is unknown, noncarriers can beexcluded from the testing lists.Others point out that governments in placeslike New Zealand and Canada are more willingto pay for the screenings because it is in theirlong-term economic interest. “The payer whois paying for the test is same [one] who pays forthe treatment of heart disease two decades later,”says Correlagen CEO Margulies.Because this is not the case in the US, gene testprices are not so much what actually get paid, butthe opening level at which negotiations betweenpayers and gene screening companies begins.“We are paid very different amounts by differentpayers on different days,” says Margulies.Moving the technology forwardAsk people involved what the future holds forheart gene testing and the first words that comeout are “more, better, cheaper.”Using what are called next-generation orthird-generation sequencing platforms, companiesare racing to increase the number of genesbeing tested and decrease the costs of the tests.Ackerman foresees the day in five or ten yearswhen everyone gets a test for their gene variationsfor less than $1,000.That might mean that today’s specific testsfor specific heart genes may be folded into ageneralized gene screening. “I believe we are ina ten-year window for disease-specific genetictesting,” Ackerman says. GeneDx’s Bale on theother hand doesn’t believe gene-specific diagnostictests for inherited heart failure are goingto cease to be conducted. With more genes willcome more complexity and “unfortunately wewill identify tons of stuff we don’t know how tointerpret,” she says.Nonetheless change is happening now. Rehmsays Partners HealthCare is working on a heartscreening test for 65–70% of the most frequentmutations associated with rare heart conditions.“I think we will catch half of all positives withthis screening test. You never will get an inclusiveresult, because we will only test for variantswe know the significance of.”That change won’t be five or ten years awayand cost $1,000. “The goal is to do that testingfor under $500. We hope to have such a testavailable by the end of the year,” Rehm says.Stephen Strauss, Toronto1. Paynter, N.P. et al. J. Am. Med. Assoc. 303, 631–637(2010).2. Couzin- Frankel, J. Sci. 328, 1220–1221 (2010).3. Wade, N. The New York Times, 12 June 2010 4. Marcus, F.I. et al. Circulation 121, 1533–1541(2010).5. Hodgkinson, K. et al. Genet. Med. 11, 859–865(2009).6. Schwartz, P.J. J. Am. Coll. Cardiol. 55, 2577–2579(2010).nature biotechnology volume 28 number 10 OCTOBER 2010 1005


uilding a businessWhy you need a lawyerCraig ShimasakiWhat’s involved in formally starting a biotech company?© 2010 Nature America, Inc. All rights reserved.Creating a sustainable biotech company isanalogous to driving from New York Cityto Los Angeles. There are myriads of routes toget there, but if you start out heading north,you will never arrive. More to the point, ifyou’re headed north and not legally licensedto drive, not only will you fail to reach yourdestination but you may also experiencedisastrous consequences.For would-be entrepreneurs, establishing aventure as a legal entity is the key first stepin making the business a reality and movingit forward. This article summarizes the keytasks in legally founding your company andoutlines the different types of legal expertiseyou will need to recruit. Doing this correctlyat the beginning will pay dividends in termsof your ability to attract capital, align businessand scientific goals, and set your company onthe path to success.The legal teamSo you have a concept for your new venture.Your first step in making it a reality is to find agreat attorney. You might ask, “Why do I needan attorney? Aren’t there legal forms availableonline that can save me a lot of money?” Yes,there are, and in fact most attorneys use theirown boilerplate documents. But when you hirean attorney, you are paying for experienced legaladvice and business guidance—not for someonewho fills out forms.You should consider your attorney themost critical employee for your buddingorganization because his or her counsel andadvice will directly impact the direction youtake in corporate and financing matters. Forinstance, your attorney will advise you on theimpact of terms for founders’ agreements, yourstrategy for issuing stock options, the implicationsof tax law, and securities and financingCraig Shimasaki is CEO of BioSourceConsulting, Oklahoma City, USA.e-mail: cs@biosourceconsulting.comBox 1 Count the costsLegal expenses are typically greater than you might anticipate, but getting your businessestablished correctly will save you major headaches later. Depending on their experience andlocale, corporate attorney rates for biotech startup expertise can range from $200 to morethan $750 per hour. All attorneys should give a complimentary initial visit to discuss yoursituation. If they insist on charging you for an initial consultation, find another attorney.Getting your company established and drawing up founder and employee documents anda license agreement can cost $5,000–$25,000 or more. Cost depends on the complexity ofyour business, the number of founders and the issues related to a technology license.Legal assistance for closing a round of capital can be $10,000–$50,000 or moredepending on the size of the round, the number of investors and other terms related tofunding. Your attorney should provide you with a good estimate before beginning anytransaction, and some may even give you a flat rate if the work is clearly defined. Forlarger deals, such as closing on a venture capital round of financing, you may be able toget a commitment for a maximum limit on legal fees. Some attorneys that specialize instartup organizations may even accept deferred compensation but may charge a higher feeand take a small equity position.issues. Your attorney will also give you adviceon the best practices in intellectual property(IP) protection, how to interpret employmentlaw matters and how best to structure variouscontracts and agreements.The truth is, the biotech entrepreneurwill need help from three types of attorneys:corporate, patent and securities. Whenestablishing a company, you should first retaina corporate attorney.A corporate attorney specializes incorporate and business matters for biotechstartups and practices business law. He orshe should be experienced in startup issues,such as organizational structure, employmentagreements, stock options and financingstructures—particularly venture capital deals.You will also need a patent attorney whospecializes in patent law and biotech patentprosecution—litigation in particular. Makesure this person understands your technologyarea. Look for a patent attorney with a combinedbackground or dual degree in the area of yourtechnology, such as someone with a JD and aPhD or ChemE. These individuals provideadded value because they understand the scienceand can add to the patent in ways that only anexperienced scientist can.During the early stages of your organizationone of the most valuable assets you have is yourIP, so be sure that it is managed well. If you are theinventor, you already have a working relationshipwith a patent attorney. If you licensed IP froman institution, your patent portfolio is alreadybeing managed by a patent attorney. However,be sure you are confident with the capabilities ofthis person—or find another.The final type of legal expertise you’ll requireis a securities attorney. This person specializes inthe legal aspects of acquiring funding, handlingprivate placements and dealing with securitieslaws. He or she will provide guidance on manyissues related to raising capital and will be surethat you are complying with securities laws andprotecting the company’s interests as you raisemoney. Occasionally, you may be able to locate agood corporate attorney who is also experiencedin securities.Finding the right attorney is probablyeasier said than done, as it’s unlikely you’llknow experienced biotech attorneys whenfirst starting your firm. One of the best waysnature biotechnology volume 28 number 10 OCTOBER 2010 1007


uilding a business© 2010 Nature America, Inc. All rights reserved.Box 2 Changing namesThere are certain situations in which you might want to consider changing an establishedcompany name. Here are some examples:• If the company has a troubled past that haunts the new management as it tries to raisemoney, or if you are reorganizing the company or doing a restart.• If the name is a source of confusion because it was strongly associated with a formerfocus and the company has a new focus.• If the previous management had a notorious reputation and a clear separation is needed.• If the current name is problematic for business because it ties the company to anunrelated field.to find one is through networking—start byasking other biotech entrepreneurs who theywould recommend. Search for reputable lawfirms specializing in startup biotechs in yourarea. You should try to find an attorney withoffices in your city because you don’t want tobe boarding a plane just to have a face-to-facemeeting. But if you don’t live in a biotech hub,you may have no other option than to hire anattorney who does. Long-distance travel isn’toptimal; however, a lawyer living in a biotechhub can provide advantages: these experiencedlawyers usually have venture capital contactsand access to seasoned biotech executives,which can help with financing and recruiting.Ideally, you will want to work with an attorneywho is a partner or senior member in a smalltomedium-sized law firm—this is preferable toworking with less-experienced junior staff ata mega law firm. Of course, your fees will behigher working with a senior partner, but youget what you pay for (Box 1).Establishing your companyBefore you incorporate your company youneed a name that brands the company andits future. Barring anything unforeseen (andusually bad), you’ll keep that name for the lifeof the company (Box 2). There are at least fouraspects to consider when choosing a companyname: does it represent the current and futurefocus of the organization, is it relatively easy topronounce and recognize, is it unique enoughthat it will not be confused with the names ofother organizations and will it work well withenvisioned products? There are, of course, otherissues to think about, too (Nat. Biotechnol. 28,16–19, 2010).After selecting a company name, the nextstep is to formally incorporate and set up a legalstructure. This allows for the issuance of stock topotential investors, founders or future employeesand it reduces your exposure to liabilities andprotects personal assets. But it also providesmaximum advantage of tax laws, includingcarry-forward losses for the business.Another important decision is the choice ofcorporate structure, which should be discussedwith your attorney and will be based upon yourcurrent plans and future direction. There arefive corporate structure options in the US: soleproprietorship, partnership, limited liabilitycorporation (LLC), S corporation (S-corp) andC corporation (C-corp). In the UK, there arealso limited (Ltd.), public limited (PLC) andunlimited corporations.The selection of your legal structure impactshow the business is taxed and sets differencesin liabilities to the owners and fiduciary agentsof the company. Some startups may begin asan LLC until they get significant investments.However, because we are talking about a biotechcompany, ultimately any enterprise in theUS will need to be a C-corp, which is this industry’sstandard business entity because of lawspertaining to ownership, structuring flexibility,finances and taxation.When incorporating a business, your attorneyfiles the company’s articles of incorporationand bylaws. This filing designates the numberof authorized company shares, the number ofboard members and other related matters. Yourstate of incorporation can be where you areactually located, but before you secure ventureor institutional capital, you’ll likely need to beincorporated in Delaware, where corporate lawsand tax laws are more favorable. Your attorneycan handle this.Issuing stockNext, your corporate counsel will assist withissuing stock or stock options to the founders,inventors, IP holders and key staff. You shouldissue stock soon after the organization isestablished rather than waiting until after capitalis raised. When shares are issued upon companyformation, they can be granted to the foundersat minimum value. If stock is issued after raisinga significant amount of capital, there is a specificvalue imputed to the enterprise. If shares areissued at a discount to that value, the shareholdercould have large tax consequences.For instance, upon securing investorfinancing there is a ‘fair market’ value imputedto company shares based on the amount thatinvestor paid. If shares are simultaneouslydiscounted to founders or key employees, therecould be a tax liability based on the differencebetween the fair market value and the amount ofmoney these founders paid for their stock. Thereis no reason for founders or key employees tobe paying taxes on shares at this stage of thecompany. Your attorney will guide you throughany tax consequences of issuing stock or obtainthe help of tax counsel.Your corporate attorney should also giveadvice on what types of stock to be issued,choosing from founders’ stock, restrictedstock, preferred shares, common shares, votingand nonvoting shares, and two kinds of stockoptions: incentive stock options (ISOs) andnonqualified options (NQOs). These all havedifferent privileges, rights and restrictions.Vesting schedules are usually given with stockoptions (NQOs and ISOs) and restricted stock.If this is all sounding foreign to you, then you’rebeginning to see why hiring an attorney is oneof the first things you should do.Many biotech companies are formed by morethan one founder, and they all usually receivefounders’ shares. It’s tempting to equally divideallotted shares among each founder, but youshould first consider what each individual hascontributed to establishing the company andwhat their roles will be going forward. Willthey all be working full time? And are they allcommitted to sticking around to see it throughto success?The answer to these questions will helpdetermine the split of founders’ shares. You’llalso need a founders’ agreement that outlinesthe provisions and considerations given inexchange for work, contribution and IP rights.This document should include a provision thatthe company can buy back a certain amount ofits shares should one of the founders later leavethe organization. This prevents a founder wholeaves from watching his or her shares rise invalue on the labor and sweat of others.Beyond that, there are several otheragreements needed for founders andemployees alike (Box 3).The boardYour articles of incorporation will stipulate thatyou set up a board of directors. This group hasa legal obligation to the company in that theypossess a fiduciary (trustee) responsibilityto look after the best interests of the overallorganization. You and your shareholders electthe board (even if, at startup, the shareholdersare just you and a few angel investors).Carefully select board members based onexpertise and ability. Do not include friendsand family unless they are actually qualified andeven then be aware of the pitfalls. Rememberthat difficult issues are decided by the board1008 volume 28 number 10 OCTOBER 2010 nature biotechnology


uilding a business© 2010 Nature America, Inc. All rights reserved.and you do not want personal relationshipsinfluencing decisions.The board has two main duties. The first iscalled ‘duty of care’, meaning it has an obligationto make decisions in a reasonable, careful andprudent manner. All decisions have risk, andany decision can be second guessed, but if theboard made a rational decision that’s consideredjudicious at the time, it has operated under theduty of care.The second is the ‘duty of loyalty’, meaningall decisions or transactions with and for thecompany must not be motivated by self-dealingor any conflict of interest. If a conflict arises, thatboard member should disclose it and abstainfrom voting on that particular issue.A board needs a chairman, and if the CEO isnot the chairman, it’s usually a board memberappointed by the major shareholders ( investorsor otherwise). If you are fortunate enough tohave good venture capitalists with depth ofexperience in your field, they will guide andstrengthen the remaining board memberselection.Odd numbers of board members are chosento avoid voting logjams, and your boardshould grow in size as the company grows.In the beginning, the board may consist ofonly three members. Later, it may grow tofive or even seven. A publicly traded biotechcompany may have nine to eleven members,but it is always advantageous to have fewerinstead of more.Board members that are investors or executivesof the company are not usually compensated fortheir participation as they are simply managingtheir investment. As the company grows andindependent board members are added, boardcompensation is usually a mix of cash, such asan annual retainer, and some form of equitycompensation.Depending on the stage of the company, thecompensation may simply be reimbursement forout-of-pocket expenses or may be up to severalthousand dollars annually. Generally, equitycompensation for directors is given as stockoptions, though it can also be in other forms ofstock, as discussed previously. The amount ofstock may be between 0.25%–2% of outstandingshares or more depending on the value of thesemembers to the organization.The SABThe scientific advisory board (SAB) is calledupon for advice and assistance in matterspertaining to the science. An SAB should beformed early and should be selected based onexpertise and knowledge in the technologyBox 3 The dotted line for allThese are some typical agreements that cover founders and employees, and they protectintellectual property (IP) assets and provide the assurances that are expected by any newinvestor in the company.Confidential Disclosure Agreement or Nondisclosure Agreement. This protects thecompany by requiring that each employee appropriately handle confidential information.By doing this, the company protects its know-how and IP from competitors.Invention Assignment Agreement. This transfers assignment of any and all new inventionsconceived by the employee to the company. This ensures that the organization owns theIP required to develop and market its products. There are allowances given for inventionsbefore hire.Non-compete Agreement. This prevents an employee from quitting and starting anidentical business in the same field using the same technology. It protects the companyfrom disgruntled founders or key employees going out and starting a competitive businesswith the information they have been using in your company.Employment Agreement. This contains any other provisions that constitute employment,especially for those who may be considered key employees; these provisions may becombined with the other agreements.or science of the company—these individualsshould be considered experts by their peers.An SAB is not a legally constituted board, andits members do not have fiduciary responsibilities.For that matter, this group could be calleda scientific advisory committee if preferred.The number of SAB members will vary, thoughthree to seven is usually sufficient. Have yourcorporate attorney provide a thorough SABagreement, which contains member duties,type of compensation, a confidential disclosureor nondisclosure agreement, and specificationsabout publications and inventions.A secondary purpose of the SAB is tobolster credibility for your company’s science.Individuals considered experts in your fieldindirectly give credibility to the business ventureand are reassuring to potential investors.The SAB members should be willing topresent reports on the scientific progress atconferences. Using SABs in this manner can alsoaccelerate acceptance of the company’s work inthe eyes of future investors. Having an SABco-author peer-reviewed publications shows itsinvolvement in and contribution to developingthe science.Like the board of directors, the SAB istypically compensated with either stockoptions or restricted stock. The amount ofstock options granted varies depending onthe company and the critical need of eachindividual. Ranges for stock options caninclude 0.1%–2% of outstanding shares.Ranges for restricted stock can be 0.1%–0.5%of outstanding shares. If your members arehighly sought after, sometimes you may needto pay a per-meeting fee or nominal annualretainer to the SAB at early stages. However,it is not unusual to just provide equity andcover out-of-pocket expenses that membersincur to attend SAB meetings. After laterstagefunding, you may add an annualretainer or a per-meeting fee when thefinances of the company can support this.ConclusionsThe importance of a good attorney cannot beoverstated. I have observed potential investorswalk away from investing in an organizationbecause of sloppy corporate structure, missingemployment and IP agreements, or convolutedand overly complicated licensing agreements.Investors need to have confidence in themanagement’s ability to run an organizationbefore they will invest.You don’t want to learn later that the optimalroute was not taken for your company’s developmentor that critical agreements were not draftedappropriately. Setting a solid legal frameworkwith appropriate and detailed contracts, licensesand agreements gives new investors confidenceand is a key first step to setting the foundationfor your business’ future success.To discuss the contents of this article, join the Bioentrepreneur forum on Nature Network:http://network.nature.com/groups/bioentrepreneur/forum/topicsnature biotechnology volume 28 number 10 OCTOBER 2010 1009


correspondenceSafe and effective synthetic biology© 2010 Nature America, Inc. All rights reserved.To the Editor:A letter in your January issue highlights theneed for harmonizing biosecurity oversightfor gene synthesis 1 . The US governmentis currently preparing to publish its final,formal ‘guidelines’ on the procedures atDNA synthesis companies for screeningincoming orders for sequences of potentialdual-use concern. As the research communitycontinues to debate the promise and risks ofsynthetic biology, we report here discussionsat two major synthetic biology conferenceswith important implications for safe andeffective progress within the field.The 2009 National Academies Keck FuturesInitiative on Synthetic Biology (NAKFI-SB)took place in Irvine, California, on November19–22 and convened more than 160 experts toexplore the engineering, scientific and socialimpact of synthetic biology. Participants wereasked to consider such basic questions aswhat tools and technologies are required toadvance the field, why man-made biologicsystems are more fragile than natural onesand how to create and improve intercellularcommunication. Discussions also coveredrisk assessments, the religious and ethicalimplications of synthetic biology and how bestto leverage the technologies to explore otherbiological systems.Although the primary focus ofNAKFI-SB was to discuss future researchand promote interdisciplinary cooperation,the significant inherent risks and potentialbioethical implications of synthetic biologywere recognized by attendees. In terms ofrisk assessment, the NAKFI-SB discussionsfocused on the value of revisiting the selfexaminationand self-regulation imposedon early adopters of recombinant DNAtechnology at the Asilomar meeting 2 inlight of the increased complexity andambitious goals for synthetic biology.Attendees also recognized the need for a‘safety switch’ to disable undesirable ‘neoorganisms’(Table 1).A second meeting, convened by theAmerican Association for the Advancementof Science (AAAS) Center for Science,Technology and Security Policy on January 11in Washington, DC, at the request of the USDepartment of Health and Human Services(DHHS) and the US Department of State,focused on the government’s perspectiveon minimizing the risk of synthetic biologyand critiqued the recent DHHS draft set ofvoluntary guidelines entitled “ScreeningTable 1 Summary of deliberations at NAKFI-SB meetingQuestionWhat is needed tofacilitate syntheticbiology?What are the bioethicalconsiderations?Is synthetic biologyuseful as aninvestigativemodality?Is synthetic biologyuseful for multicellularsystems?How do we makesynthetic systems asstable as natural ones?Is synthetic biologyuseful for multiorganismsystems?Are there alternativesto using genes withinsynthetic biology?Is it important thatsynthetic biologicsystems ‘evolve’?What is required tofulfill the potential ofsynthetic biology?Framework Guidance for Synthetic Double-Stranded DNA Providers” released inNovember 2009 (ref. 3).Comments were solicited fromrepresentatives of the US governmentagencies, gene-synthesis providerorganizations and the biotech andResponse• Integration of biological vocabulary within computer programming.• Improved analytical and design modeling.• Novel cellular monitoring techniques.• Improved screening technologies.• Enhanced cell lines to improve productivity.• Cheaper technology.• Techniques to create complex entities.• ‘Fail-safe’ systems.• A ‘kill switch’ for neo-organisms.• Synthetic biology is similar, but not identical, to other genetic engineeringtechniques.• Implications require regulatory oversight.• Novel ethical issues necessitate specific risk-benefit evaluation.• Ongoing public communication and input is vital.• Can be used to evaluate intracellular systems.• Would require advances in current technology, but that is expected.• A sharable library of results is essential but that requires standardization of acontext-sensitive archiving format.• Can be used to evaluate extra-cellular communication and integration.• Could create novel tissues, organs and complete organisms.• Integrate redundancy.• Increase adaptability.• Improve evaluation techniques.• Can evaluate inter-organism interaction.• Can search for unique genetic material.• Requires improved database administration.• Chemical and physical interactions can be used to modify biological reactions.• Unique nongenetic compounds can be developed to influence outcomes.• Alternative engineering techniques (e.g., application of computer design tools)will likely improve results.• Create novel methods for system interfaces and interactions (e.g., optical inputsand outputs).• Isolate created functions from natural processes (e.g., create syntheticorganelles or ‘subroutines’).• Provides adaptability.• Improved modeling would be valuable.• Need techniques to speed up process to be useful.• Enhanced education opportunities at all levels.• Improved and consistent public education and communication.1010 volume 28 number 10 OCTOBER 2010 nature biotechnology


correspondencepharmaceutical industries as well asbiosecurity experts and academics industryplayers and other concerned parties (asummary of the meeting’s main themes canbe found elsewhere 4 and is summarized inTable 2). As expected from the diverse natureof the participants, some of the concernsraised were contradictory, but the conferencedeliberations were constructive in providingthe perspective of the major companiesinvolved in commercial gene synthesis andhighlighting perceived weaknesses within thecurrent strategy for verifying sequences ofpotential concern.The two conferences provided twocontrasting perspectives on the field.NAKFI-SB was a broad evaluation of thecurrent status of synthetic biology andthe final recommendations focused onmethods to advance the field. Besidesoutlining some technical improvementscurrently needed to improve productivity,the participants recognized the paramountimportance of public communication and oflay participation in regulation and oversightto address potential bioethical issues. Theyalso advocated specific technological steps toimprove the stability of engineered biologicalsystems, including enhanced redundancy andadaptability as characterized by a capacityto evolve to improve efficacy. In terms ofapplications, participants suggested thatsynthetic biology is likely to be employed inthe evaluation and synthesis of more complexbiological systems in the coming years and toprogress beyond using single genes to createmore complex gene circuits with mechanismsthat regulate these novel systems.As the AAAS meeting was convened tocomment on proposed US governmentalsafety regulations, the recommendations wereunderstandably narrower. The importance© 2010 Nature America, Inc. All rights reserved.Table 2 Summary of deliberations at AAAS meeting 4Theme Comments RecommendationDHHS guidanceCustomer screeningSequence searchmethodologyImplementation andevaluationInternational engagement• May inhibit competition and innovation.• How will proprietary information be protected?• No mechanism for ‘garage biology’ oversight.• No mechanism for DNA providers to share customer information.• No ongoing, updated database of entities prohibited fromobtaining synthetic biology technology.• DNA providers may refuse to fill orders for sequences thatrequire additional expenses to participate in oversight programs.• No oversight of synthesis providers to assure security and safety.• Although the purchase of synthesis technology is a privatetransaction, there is a lack of an established appeal processfor refused orders.• No mechanism to determine who is the end user of technology.• Costs associated with compliance may be prohibitory.• Automated reviews of DNA sequences are inadequate.• Screening against a list does not consider the possible contextof use since ‘sequence does not necessarily predict function’.• Innovation and discovery would be inhibited if orders arelimited to previously described sequences.• Mandatory reporting of DNA sequence orders may compromiseproprietary information.• Cannot identify sequences changed by end users.• ‘Best match’ determinations that search for sequences thatare more similar to harmful than nonharmful patterns are betterthan ‘thresholds’ but may be below current industry standards.• Labeling a sequence as potentially ‘of concern’ does notdetermine actual harmful nature.• Proprietary screening software is inadequate.• 200 bp minimum size for sequence screening is inadequate.• Success is determined by degree of implementation.• The costs of implementation are minimal when comparedwith other costs of doing business.• Regulatory compliance is difficult to determine.• Voluntary compliance and cooperation is crucial to assuresafety and security.• Coordinate customer and sequence screening to assure safetyand security across all DNA providers.• Provide a mechanism to assure safety and security of syntheticbiology technology providers.• Enhance accountability of all aspects of synthetic biologyincluding reporting and appeal mechanisms.• Supply precise customer screening modalities and criteria toassure safety and security.• Shift some compliance requirements from providers to customerinstitutions, including ‘Biosafety Committee–like’ review boards.• Compile, review and update a database of approved customersand consider a licensing requirement to allow purchase ofsynthetic biology technology.• Human review of all sequence orders.• Compile, review and update a database of harmful sequences.• Promote research to determine the fundamentals of harmfulsequences and use this information for screening.• Create and promote protocols for sequence screening ‘bestpractices’.• Establish list of subject matter experts for each potentiallyharmful select agent.• Screen each order against any potentially harmful sequencenot just those on select agent and commercial control lists.• Mandate the use of open-source screening software that iscontinuously updated.• Screen all orders irrespective of sequence length.• Ongoing, regular governmental communication and interactionwith industry and research institutions is critical.• Models of illegal and noncompliance methods should be usedto evaluate screening modalities.• Screening methods require continuous governmental andindustry evaluations of effectiveness.• Screening methods require ongoing evaluation of financialimpact on industry.• Effectiveness can be determined in part by the number ofproviders that claim compliance with regulations and by thenumber that perform follow-up screening.• DNA providers should be certified.• Coordinate and streamline international screening ofsequences, customers and industry providers.nature biotechnology volume 28 number 10 OCTOBER 2010 1011


correspondence© 2010 Nature America, Inc. All rights reserved.of improved oversight along the entire chainof production within synthetic biology wasemphasized. Increased oversight includedimprovements in customer and end-productscreening modalities and greater cooperationbetween governments, industry andacademics both within the US and elsewhere.Some of the AAAS participants noted that theincreased financial burden required to complywith these regulations may impede privateindustry’s investment in the technology.Discussions at both conferencesrecognized that the promise of syntheticbiology is associated with the potential forsignificant harm. There is a need to preparefor malicious acts using purely syntheticor hybrid synthetic and/or natural neoorganisms.Additionally, strategies should bein place to predict and prevent such eventsand to trace the source of such materialsshould they surface. Current preventionefforts rely on voluntary participation in asoftware-based matching system that checksorders against select agent sequences to headoff the commercial synthesis of select agentgenes, but, as the AAAS report details 4 , thatsystem could be improved.In addition, it is imperative to identify astrong method to label synthetic genes so theycan readily be identified as such. Unencryptedwatermarks have already been reported inpublished sequences of synthetic genes (http://www.wired.com/wiredscience/2008/01/venter-institut/). Although such watermarksare feasible, currently there is a lack ofregulatory controls against surreptitiousinsertions of sequence; synthetic genes canbe tagged with DNA encoding natural aminoacids, but the ability to remove, modifyor even counterfeit such sequences usingconventional molecular biology tools suggeststhat more robust strategies will be needed.One potential solution would be to createa ‘serial number’ that could be traced backto individual synthesis laboratories or evenindividual synthesis machines, and encodedinto the synthetic gene using an appropriatecombination of public-key and private-keyhash algorithms.Going forward, public-private cooperationwill be vital for safe and effective progresswithin synthetic biology and to ensure thatthe field is not restrained by public fears.There must be a concerted effort to minimizethe expense associated with regulatorycompliance; however, the inherent risks ofsynthetic biology mandate rigorous oversightespecially because the burdens of a major‘accident’ will be borne by the public.The financial expenditures that companiessynthesizing genes will have to bear toproactively reduce the risk of potentialmisuse of the technology are substantiallyless than the estimated costs to respond to abiological disaster. Safety must be designedinto the system and not become a secondaryconcern. In this respect, the attempt toshift the oversight burden from the genemanufacturers to their customers throughthe creation of institutional ‘biosafety reviewboards’ modeled after institutional animalcare and use committees is likely to beproblematic as it would further decentralizethe review process and rely on committeestructures that were not designed topreemptively detect hazardous modalities.The AAAS 4 and NAKFI-SB 5 meetingswere an excellent starting point for debateand we strongly recommend that thediscussions be expanded and that thesubsequent safety recommendations becomeexpeditiously implemented.COMPETING FINANCIAL INTERESTSThe authors declare no competing financial interests.David A LaVan 1 & Louis M Marmon 21 Materials Science and Engineering Laboratory,National Institute for Standards and Technology,Gaithersburg, Maryland, USA. 2 Departmentof Surgery, Division of Thoracic and GeneralPediatric Surgery, Sheikh Zayed Institute forPediatric Surgical Innovation, Children’s NationalMedical Center, George Washington UniversitySchool of Medicine, Washington, DC, USA.e-mail: david.lavan@nist.gov1. Fischer, M. & Maurer, S.M. Nat. Biotechnol. 28, 20–22(2010).2. Berg, P. et al. Proc. Nat. Acad. Sci. USA. 72, 1981–1984 (1975).3. Department of Health and Human Services. Fed. Reg.74, 62319–62327 (November 27, 2009).4. Marfatia Berger, K., Pinard, W., Coat, G. & Epstein,G.L. Scientists’ Views on the U.S. Government’sGuidance on Synthetic Genomics (AAAS, Washington,DC, 2010). http://cstsp.aaas.org/files/syn%20bio%20summary%20012110.pdf5. Synthetic Biology: Building on Nature’s Inspiration (TheNational Academics Press, Washington, DC, 2010).The regulatory bottleneck forbiotech specialty cropsTo the Editor:Specialty crops, which include fruits,vegetables, nuts, turf and ornamental crops,are important components of human dietsand provide environmental amenities 1 .In 2007, such crops represented ~40%of the $140 billion in total agriculturalreceipts, despite being cultivated on just4% of the total cropped area 2 . Althoughtomato was the first genetically modified(GM) food crop to be commercialized in1994, the only GM specialty crop traitscurrently marketed are virus-resistantpapaya and squash, insect-resistant sweetcorn and violet carnations. All of thesereceived initial regulatory approval over 10years ago. As a group, GM specialty cropshave garnered limited market share (theexception is GM papaya resistant to papayaringspot virus 1 , which now produces 90%of Hawaii’s crop). In contrast, GM fieldcrops, such as soybean, maize, cottonand canola, have come to dominate themarkets in countries where they have beenreleased 3 . What is responsible for thisdisparity in the commercialization of GMfield crops versus specialty crops?One possibility is that the dearth ofGM specialty crops indicates a lack ofcurrent research or of beneficial traitsfor crop improvement through geneticengineering. Alternatively, research mayhave continued but progression throughthe regulatory process to the marketplacemay have failed. Anticipated lack of marketacceptance could have stopped eitherresearch or regulatory submissions. To findout why specialty crops with GM traitshave fared so poorly, we have analyzed theresearch, regulatory and market pipelineto determine which steps in the processmay be responsible for the limited range ofcommercially available products.To assess the recent research anddevelopment pipeline for GM specialtycrops, an extensive search was conductedon a global scale for scientific journalarticles, describing work in specialty cropsusing recombinant DNA (transgenic)methods, published between January2003 and October 2008 (SupplementaryTable 1). In most cases, these reportsdemonstrate proof of concept of theeffectiveness of the transgene in producingthe phenotypic trait in the species studied.Among 313 published articles on specialtycrops, 46 species were represented,of which tobacco, potato and tomatoaccounted for 59% of the total reports, inpart due to their use as easily transformed1012 volume 28 number 10 OCTOBER 2010 nature biotechnology


correspondence© 2010 Nature America, Inc. All rights reserved.aNumber of journal articles80706050403020100TobaccoPotatoTomatoIndian mustardPapayaCassavaAppleLettucePeanutPearFlaxTomato20%Tobacco24%Potato23%EggplantCarrotPetuniaRyegrassCabbageBeanPineappleField mustardBananamodel plants in research laboratories (Fig.1a). Although the United States is theleader in the number of articles published,many reports originate from the EuropeanUnion (EU; Brussels), India, Japan andChina (Fig. 1b). Other plant biotechsurveys also indicate that a number of GMspecialty crops are being developed inChina 4,5 .Following laboratory studies and proof ofconcept, development of GM crops generallyproceeds to field trials. Because countriesbegan establishing their independentregulatory processes specifically for GMorganisms beginning in the early 1990s,thousands of field trial permits have beengranted worldwide. The Organization forEconomic Co-operation and Development(OECD; Paris) developed the UNU-MERITfield trial database, which collates GM trialsthat are ongoing in 24 developed countries,although data for China and India are notincluded (A. Arundel, OECD, personalcommunication). During this six-yearperiod (2003–2008), the United Statesaccounted for ~70% of all field trials, with15% of the total field trials being conductedon specialty crops (Fig. 2a). The UnitedStates and Canada were responsible for88% of the 1,231 permitted field trials onspecialty crops, with the majority of theCanadian trials focused on mustard crops.The Information Systems for Biotechnologydatabase (http://gophisb.biochem.vt.edu)was also queried to identify all approvedfield test permit applications in the UnitedStates between 1992 and October 2008.Field trials of specialty crops averaged 39%of the number in commodity crops from1992 to 2002, but only 18% since 2003(Fig. 2b). Qualitative data on GM cropsunder development internationally confirmthat although laboratory and field trials80706050403020100United StatesIndiaChinaJapanItalyGermanySouth KoreaTaiwanCanadaEnglandFranceUnitedStates27%India10%ChinaJapan 10%8%PolandNew ZealandAustraliaBrazilThe NetherlandsFigure 1 International scientific journal publications on transgenic crops. (a) Number of publishedarticles describing research on the top 20 GM specialty crops (of 46 total species). The percentageof reports on each crop is also shown (inset). (b) Number of published articles according to countryof origin. The percentage of total articles by country is also shown (inset). A complete list of allpublications is in Supplementary Table 1.bNumber of journal articlesSwedenSpainArgentinahave been conducted on GM specialty cropsin many countries, none has progressed tocommercial production outside the UnitedStates, except perhaps virus-resistant tomatoand pepper in China, the commercial statusof which is currently uncertain 6,7 .To further evaluate the scope ofresearch that has been conducted onGM specialty crops, we categorized thetraits from scientific reports and fieldtrials into two categories: output traits,which would directly benefit consumers;and input traits, which primarily benefitproducers and only indirectly benefitconsumers through reduced agriculturalinputs, higher productivity, lower costor reduced environmental impacts. Thiscompilation identified 77 specialty crops(listed in Supplementary Table 2) and260 unique traits (Supplementary Dataand Supplementary Table 1). The outputtraits included modifications in oil,sugar and starch content, protein qualityand amino acid composition, vitamincontent and nutritional quality, flavorand postharvest quality as well as reducedallergenicity. Input traits included toleranceto abiotic and biotic stresses, insect andnematode resistance, herbicide tolerance,nitrogen acquisition and yield. These datademonstrate that there is a broad globalresearch pipeline for GM specialty cropsusing traits that would be beneficial to bothproducers and consumers.Governmental approval is required beforeGM crops can be marketed. Since 1992,24 governmental bodies have approvedor deregulated a total of 84 unique plantand trait combinations (http://www.cera.gmc.org/). Regulatory approvals of GMspecialty crops averaged 48% of the numberin commodity crops from 1992 to 2002,but only 5% since 2003 (Fig. 2c). AlthoughIsrael21 approvals have been granted by allgovernmental bodies for nine specialtycrops, only two have occurred since 2000.These two transgenic events are reducednicotine content in tobacco and virusresistance in plum. The tobacco productwas marketed briefly in the United Statesas an aid to smoking cessation, and the GMplum variety still awaits final approval fromthe US Environmental Protection Agencybefore it can be grown commercially.The distribution of all regulatoryapprovals exhibits two distinct phases(Fig. 2c). Approvals initially peaked in1995, followed by a decline to only oneapproval each in 2000 and 2001. Thenumber of approvals then increased, albeitslowly, but only for commodity crops. Arecent analysis shows that innovationsin agbiotech were on an exponentiallyincreasing trend during the 1990s, whichthen abruptly leveled off around 1998,with a decline in subsequent years 8 .Furthermore, new innovations enteringthe pipeline after 1998 were less likely tomove toward commercialization. Thesepatterns were attributed to a global changein regulatory and market policies towardGM crops, notably the moratorium onnew approvals and therefore marketingin the EU beginning in 1998. Our resultsindicate that in contrast to the pre-1998era, only commodity crop developers wereable to participate successfully in this newregulatory and market environment.There are a number of possible reasonswhy GM specialty crops are not progressingpast the research phase, and exploring thesedeserves further research. Previous analyseshave documented that the $1–15 millionin additional costs per insertion eventassociated with receiving regulatoryapproval 9,10 (which is not required forvarieties developed using other breedingmethods) are out of proportion to thepotential additional market value that canbe recovered on the limited areas devotedto these crops 11 . Similarly, a review onornamental specialty crops concluded thatalthough there is considerable technologyavailable and valuable traits to be exploited,GM varieties are still unattractive from aneconomic perspective, primarily due toregulatory costs 9 .Lack of demand or market rejection ofGM specialty crops could also be the reasonfor their absence. This is undoubtedly thecase in some countries and markets thatunconditionally ban GM products, but thehypothesis is difficult to test, as until theyreceive regulatory approval, GM productsnature biotechnology volume 28 number 10 OCTOBER 2010 1013


correspondence© 2010 Nature America, Inc. All rights reserved.are not available for consumers to acceptor reject. For example, although IndianMinister for the Environment JairamRamesh cited a lack of public confidencewhen he recently blocked regulatoryapproval of insect-resistant GM brinjal(eggplant) 12 , his action precluded consumersfrom having the opportunity to demonstratetheir preferences in the marketplace. Giventhe limited number of GM specialty cropsthat have received regulatory approval,consumer acceptance remains largelyuntested in the market. Our interviews withspecialty crop seed companies and nurseriesprovide extensive anecdotal evidence thatmany potentially marketable GM productshave been created and tested in the privatesector, but the cost and uncertainty ofthe regulatory process has made furtherdevelopment uneconomical and preventedthem from testing actual market acceptance.The justification for requiring costlyregulatory testing of GM plants is to ensurethat potential risks are fully assessed beforecommercial release. Thus, it can be argued,if specialty crops cannot meet this standardeconomically, that is the price to be paidto eliminate risk. However, even virtuallyidentical traits do not require such approvalif developed using non-GM methods andno actual risks unique to the recombinantDNA process per se have been experiencedwith the GM crops currently marketed.On the other hand, the constriction incommercialization of GM traits has resultedin lost societal benefits due to foregoneinnovations that are estimated to be in thebillions of dollars 10,13 . When GM cropscould reduce environmental impacts orimprove health and nutrition relative tocurrent varieties (Supplementary Data),failure to use them also constitutes risks thatgenerally are not considered in regulatoryevaluations 14 . Although research on GMspecialty crops continues to explore a widerange of input and output applications,their commercialization may depend upona reexamination of the balance betweenpotential risks versus foregone societalbenefits and consequent adjustments inregulatory requirements.Note: Supplementary information is available on theNature Biotechnology website.AcknowledgementsJ.K.M. is partially funded through the UC DiscoveryFellows program (http://ucdiscovery.org/).This study also received support from SpecialtyabcCommodity crops80%Number of field trialsNumber of new regulatory approvals1,2001,000800600400200012Figure 2 Field trials and regulatory approvals. (a) Using the UNU-MERIT database, field trialsconducted in 24 developed countries between 2003 and 2008 were separated on the basis ofcommodity, forest tree or specialty crop. From this, the specialty crops were further subdividedbased on the country in which the field trial was conducted. (b) The numbers of field trial permitsacknowledged or issued in the United States are plotted by year for commodity crops and specialtycrops. (c) The 84 unique transgenic events that have been granted regulatory approval by one or morecountries are plotted by year of approval. If the year of approval varied among countries, the first year ofregulatory approval granted by any agency for a given event was used.Crop Regulatory Assistance (http://www.specialtycropassistance.org/).COMPETING FINANCIAL INTERESTSThe authors declare no competing financial interests.Jamie K Miller & Kent J BradfordSeed Biotechnology Center, University ofCalifornia, Davis, California, USA.e-mail: kjbradford@ucdavis.eduForesttree crops5%Specialtycrops15%1. Alston, J.M. & Pardey, P.G. Hortscience 43, 1461–1470(2008).2. USDA-NASS. Summary and State Data Vol. 1 (USDA-NASS, 2009).3. Brookes, G. & Barfoot, P. AgBioForum 12, 184–208(2009).4. Huang, J.K. et al. Science 295, 674–677 (2002).5. Wang, D.P. J. Integrative Plant Biol. 49, 1281–1283(2007).10864201992Germany3%Other5%Spain1%Canada19%Sweden1%Specialty cropCommodity cropThe Netherlands1%UnitedStates69%1993199419951996199719981999200020012002200320042005200620072008Australia1%6. Evenson, R.E. in Regulating Agricultural Biotechnology:Economics and Policy (eds. Just, R.E., Alston, J.M. &Zilberman, D.) 103–123 (Springer Publishers, New York,2006).7. Stein, A.J. & Rodriguez-Cerezo, E. The Global Pipeline ofNew GM crops: Implications of Asynchronous approvalfor International Trade. (Joint Research Center Scientificand Technical Reports, Brussels, 2009).8. Graff, G.D., Zilberman, D. & Bennett, A.B. Nat.Biotechnol. 27, 702–704 (2009).9. Dobres, M.S. in Floriculture, Ornamental and PlantBiotechnology, Vol. V (ed. Teixera da Silva, J.A.) 1–14(Global Science Books, 2008).10. Kalaitzandonakes, N., Alston, J.M. & Bradford, K.J. Nat.Biotechnol. 25, 509–511 (2007).11. Bradford, K.J., Alston, J.M. & Kalaitzandonakes, N. inRegulating Agricultural Biotechnology: Economics andPolicy (eds. Just, R.E., Alston, J.M. & Zilberman, D.)683–697 (Springer Publishers, New York, 2006).12. Bagla, P. Science 327, 767–767 (2010).13. Graff, G.D., Hochman, G. & Zilberman, D. AgBioForum12, 34–36 (2009).14. Potrykus, I. Nature 466, 561–561 (2010).1014 volume 28 number 10 OCTOBER 2010 nature biotechnology


correspondenceProHits: integrated software for mass spectrometry–based interaction proteomics© 2010 Nature America, Inc. All rights reserved.To the Editor:Affinity purification coupled with massspectrometric identification (AP-MS)is now a method of choice for chartingnovel protein-protein interactions andhas been applied to a large number ofboth small-scale and high-throughputstudies 1 . However, general and intuitivecomputational tools for sample tracking,AP-MS data analysis and annotation havenot kept pace with rapid methodologicaland instrument improvements. To addressthis need, we have developed the ProHitslaboratory information managementsystem platform.ProHits is a complete open sourcesoftware solution for MS-based interactionproteomics that manages the entirepipeline from raw MS data files to fullyannotated protein-protein interactiondata sets. It was designed to provide anintuitive user interface from the biologist’sperspective and can accommodate multipleinstruments within a facility, multiple usergroups, multiple laboratory locations andany number of parallel projects. ProHitscan manage all project scales and supportscommon experimental pipelines, includingthose using gel-based separation, gel-freeanalysis and multidimensional protein orpeptide separation.This software platform is a clientbasedHTML program written in PHP(PHP: Hypertext Preprocessor) that runsa MySQL database on a dedicated server.The complete ProHits software solutionconsists of two main components: a ‘DataManagement’ module, and an ‘Analyst’module (Fig. 1a; see Supplementary Fig. 1for data structure tables). These modulesare supported by an ‘Admin Office’module, in which projects, instruments,user permissions and protein databasesare managed (Supplementary Fig. 2). Asimplified version of the software suite(‘ProHits Lite’), consisting only of theAnalyst module and Admin Office, isalso available for users with preexistingdata management solutions or whoreceive precomputed search results fromanalyses performed in a core MS facility(Supplementary Fig. 3). A step-by-stepinstallation package, installation guide anduser manual (Supplementary Data) areavailable on the ProHits website (http://www.prohitsMS.com/).In the Data Management module, rawdata from all mass spectrometers in a facilityor user group are copied to a single securestorage location in a scheduled manner.Data are organized in an instrumentspecificmanner, with folder and fileorganization mirroring the organizationon the acquisition computer. ProHits alsoassigns unique identifiers to each folder andfile. Log files and visual indicators of currentconnection status assist in monitoringthe entire system. The Data Managementmodule monitors the use of each instrumentfor reporting purposes (SupplementaryFigs. 4 and 5). Raw MS files can beautomatically converted to appropriate fileformats using the open source ProteoWizardconverters (http://proteowizard.sourceforge.net/). Converted files maybe subjected to manual or automateddatabase searches, followed by statisticalanalysis of the search results, according toany user-defined schedule; search engineparameters are also recorded to facilitatereporting and compliance with MIAPE(Minimum Information about a ProteomicsExperiment) guidelines 2 . Mascot 3 ,X!Tandem 4 and the TransProteomicsPipeline (TPP 5 ) are fully integrated withProHits via linked search engine servers(Supplementary Figs. 6 and 7).The Analyst module organizes data byproject, bait, experiment and/or sample,for gel-based or gel-free approaches(Fig. 1a; for description of a gel-basedproject, see Supplementary Fig. 8). Tocreate and analyze a gel-free affinitypurification sample, the user specifiesthe bait gene name and species. ProHitsautomatically retrieves the amino acidsequence and other annotation from itsassociated database. Bait annotation maythen be modified as necessary, for example,to specify the presence of an epitope tagor mutation (Supplementary Fig. 9). Acomprehensive annotation page tracksexperimental details (Supplementary Fig.10), including descriptions of the Sample,Affinity Purification protocol, PeptidePreparation methodology and liquidchromatography-tandem MS (LC-MS/MS) procedures. Controlled vocabularylists for experimental descriptions can beadded by drop-down menus to facilitatecompliance with annotation guidelines,such as MIAPE 6 and MIMIx (MinimumInformation about a Molecular InteractionExperiment) 7 , and to facilitate theorganization and retrieval of data files.Free text notes for cross-referencinglaboratory notebook pages, addingexperimental details not captured inother sections, describing deviationsfrom reference protocols and links to gelimages or other file types may be added inthe ‘Experimental Detail’ page. Once anexperiment is created, multiple samplesmay be linked to it (e.g., technical replicatesof the same sample or chromatographicfractions derived from the samepreparation). All baits, experiments,samples and protocols are assigned uniqueidentifiers.Once a sample is created, it is linked toboth the relevant raw files and databasesearch results. For multiple samples inhigh-throughput projects, automaticsample annotation may be established byusing a standardized file-naming system(Supplementary Fig. 11) or files may bemanually linked. Alternatively, searchresults obtained outside of ProHits (withthe X!Tandem or Mascot search engines)can be manually imported into the Analystmodule (Supplementary Fig. 12). TheProHits Lite version enables uploading ofexternal search results for users with anestablished MS data management system.In the Analyst module, MS data canbe explored in an intuitive manner,and results from individual samples,experiments or baits can be viewed andfiltered (Supplementary Figs. 13 and14). A user interface enables alignment ofdata from multiple baits or MS analysesusing the ‘Comparison’ viewing tool.Data from individual MS runs, or derivedfrom any user-defined sample group,are selected for visualization in a tabularformat, for side-by-side comparisons(Fig. 1b and Supplementary Figs. 15–17).In the Comparison view, control groupsand individual baits, experiments orsamples are displayed by column. Proteinsidentified in each MS run or group ofruns are displayed by row, and each cellcorresponds to a putative protein hit,according to user-specified database searchscore cutoff. Cells display spectral countnumber, unique peptides, scores fromsearch engines and/or protein coverageinformation; a mouse-over functionnature biotechnology volume 28 number 10 OCTOBER 2010 1015


correspondence© 2010 Nature America, Inc. All rights reserved.reveals all associated data for each cellin the table. For each protein displayedin the Comparison view, an associated‘Peptide’ link (Fig. 1b) may also be selectedto reveal information such as sequence,location, spectral counts and score, foreach associated peptide. Importantly, allsearch results can be filtered. For example,ProHits allows the removal of nonspecificbackground proteins from the hit list,as defined by negative controls, searchengine score thresholds or contaminantlists. Links to the external US NationalCenter for Biotechnology Information(NCBI) and the Biological GeneralRepository for Interaction Datasets(BioGRID) 8 databases are provided foreach hit to facilitate data interpretation.Overlap with published interaction datahoused in the BioGRID database 8 can bedisplayed to allow immediate identificationof new interaction partners. A flexibleexport function enables visualization ina graphical format with Cytoscape 9 , inwhich spectral counts, unique peptidesand search engine scores can be visualizedas interaction edge attributes. The Analystmodule also includes advanced searchfunctions, bulk export functions for filteredor unfiltered data, and management ofexperimental protocols and backgroundlists (Supplementary Figs. 18–20).Deposition of all MS-associated datain public repositories is likely to becomemandatory for publication of proteomicsexperiments 2,7,10 . Open access to rawfiles is essential for data reanalysis andcross-platform comparison; however,data submission to public repositoriescan be laborious due to strict formattingrequirements. ProHits facilitates extractionof the necessary details in compliance withcurrent standards and generates ProteomicStandard Initiative (PSI) v2.5 compliantreports 11 , either in the MITAB (MapInfo.TAB binary) format for BioGRID 8or in XML format for submission toInternational Molecular Exchange (IMEx)consortium databases 12 , including IntAct 13(Supplementary Fig. 21). MS raw filesassociated with a given project can also beeasily retrieved and grouped for submissionto data repositories, such as Tranche 14 .ProHits was developed to managemany large-scale in-house projects,including a systematic analysis of kinaseand phosphatase interactions in yeast,consisting of 986 affinity purifications 15 .Smaller-scale projects from individuallaboratories are readily handled in a similarmanner. Examples of AP-MS data fromaDatamanagementbAnalystControl8 WASLSite 1 Site 2Mass spectrometer 1 Mass spectrometer 2 Mass spectrometer 39 RAF1RAW fileSearch resultsSampleExperimentBaitProject 1Total Peptide Number7 EIF4A2Bait6 MEPCERAW fileSearch resultsExperimentBait comparisonSampleExperimentBaitProject 2HitsRAW fileSearch resultsSampleBaitRead, write Read only Read, writeExperimentGene Name Protein ID PeptidePRPSAP1 [BioGRID] 194018537PACSIN1NCK1PACSIN2 CTTNPACSIN3Peptide Sequence Descending AscendingPeptide SequenceGene ID: 644150Gene Name: WIPF3Figure 1 Overview of ProHits. (a) Modular organization of ProHits. The Data Management module backsup all raw MS data from acquisition computers and handles data conversion and database searches.The Analyst module organizes data by project, bait, experiment and sample (gel-free project shown; seeSupplementary Fig. 8 for gel-based organization). Search results from the Data Management module areparsed to individual samples defined within the Analyst module. ProHits can handle large collaborativeprojects and offers several security layers. In the Analyst module, several view, filter and export functionsenable data analysis. Functions provided by external software are listed on the right. (b) ProHitsComparison page. On the left are shown filtered Comparison results for four human baits and one negativecontrol (see Supplementary Fig. 17 for unfiltered data). Display, sort, filter and literature overlap optionsare listed on the top; selected options in this example are shown in red. Filtered results are displayed atthe bottom of the page. Columns represent individual baits. Comparison at the Experiment or Samplelevels is also possible. Rows list the hits that pass selected filters. Color coding and intensity in eachcell is based on the property selected for visualization, shown for this example as total peptide numbers;mouseovers of each cell will list all properties. A star or triangle inside the cell indicates an interactionidentified in previous high-throughput (star) or low-throughput (triangle) studies in BioGRID. Each termin the hits column is hyperlinked to external databases (EntrezGene, BioGRID or NCBI Protein) or to thelist of identified peptides. The top right shows the visualization of data in Cytoscape with MS informationencoded as an edge attribute. Interactions detected for the example bait protein WASL that are notreported in BioGRID are shown as blue edges with color intensity mapped spectral counts and thicknessmapped to number of unique peptides; overlap interactions detected in both the experiment and inBioGRID are shown in green; interactions detected only in BioGRID are shown in gray. At the bottom rightis an example of the Peptide view for the protein WIPF3 in the WASL AP-MS experiment.ControlFile conversionSearch parametersTPP parametersView data reportsFilter backgroundCompare with literatureVisualize networksExport dataWIPF2WIPF1WIPF38 WASLNCK2CytoscapeRHOQWASLProteoWizardX!Tandem, MascotTPPVIPR1Peptide comparisonNCBI, SGDSAINTBioGRIDCytoscapeIMEx, TranchePFN1Experimental data onlyOverlapLiterature data onlyITSN1CDC42GRB2DNMBPProtein1016 volume 28 number 10 OCTOBER 2010 nature biotechnology


correspondence© 2010 Nature America, Inc. All rights reserved.both yeast and mammalian projects areprovided in a demonstration version ofProHits (http://www.prohitsMS.com/) andin Supplementary Data.The modular architecture of ProHits willaccommodate additional new features, asdictated by future experimental and analyticalneeds. Although ProHits has been designedto handle protein interaction data, simplemodifications of the open source code willenable straightforward adaptation to otherproteomics workflows.Note: Supplementary information is available on theNature Biotechnology website.Author contributionSG.L. and J.Z. devised and coded all aspects of theplatform; C.S. and B.-J.B. implemented proteinannotation and provided advice on databasearchitecture; Y.D. wrote the Mascot parser; B.L., A.B.,Z.-Y.L., K.C., A.P., A.I.N., T.P., J.L.W. and B.R. providedsuggestions on software features; M.T. conceived andguided the project; A.-C.G., B.R. and G.L. wrote theinstruction manuals; M.T. and A.-C.G. co-directedproject development; A.-C.G. wrote the manuscriptwith input from B.R. and M.T.AcknowledgmentsWe thank G. Bader, H. Hermjakob, S. Orchard,J.A. Vizcaíno, C. Le Roy, R. Beavis and membersof the Tyers and Gingras laboratories for helpfuldiscussions. We are grateful to D. Figeys, S. Angers,D. Fermin, T. LeBihan, F. Ellisma, C. Poitras andB. Coulombe for testing beta versions of ProHits.We thank W. Dunham, E. Deutsch, D. Fermin,T. Glatter, M. Goudreault, L. D’Ambrosio andR. Ewing for critical reading of the manuscriptand instruction manual and L. Ng, J. Wei andN. Mohammad for IT support. Supported by grantsfrom the CIHR (MOP-84314 to A.-C.G., MOP-12246 to M.T., MOP-81268 to B.R., GSP-36651to T.P., J.L.W. and M.T., FRN 82940 to M.T. and aresource grant to T.P., A.-C.G., J.L.W. and M.T.), theNIH (5R01RR024031 to M.T., 1R01GM094231-01to A.I.N. and A.-C.G., and CA-126239 to A.I.N.),MRI-ORF (T.P., J.L.W. and A.-C.G.), the CanadaFoundation for Innovation (T.P., J.L.W., A.-C.G.and M.T.), and Genome Canada through OntarioGenomics Institute (T.P. and J.L.W.). We wishto acknowledge support from the Mount SinaiHospital Foundation; Canada Research Chairs inFunctional Genomics and Bioinformatics to M.T.,in Proteomics and Molecular Medicine to B.R.,and in Functional Proteomics to A.-C.G.; the LeaReichmann Chair in Cancer Proteomics to A.-C.G.and a Scottish Universities Life Sciences AllianceResearch Professorship and a Royal Society WolfsonResearch Merit Award to M.T.COMPETING FINANCIAL INTERESTSThe authors declare no competing financial interests.Guomin Liu 1 , Jianping Zhang 1 , Brett Larsen 1 ,Chris Stark 1 , Ashton Breitkreutz 1 ,Zhen-Yuan Lin 1 , Bobby-Joe Breitkreutz 1 ,Yongmei Ding 1 , Karen Colwill 1 ,Adrian Pasculescu 1 , Tony Pawson 1,2 ,Jeffrey L Wrana 1,2 , Alexey I Nesvizhskii 3 ,Brian Raught 4 , Mike Tyers 1,2,5 &Anne-Claude Gingras 1,21 Centre for Systems Biology, Samuel LunenfeldResearch Institute, Toronto, Ontario, Canada.2 Department of Molecular Genetics, University ofToronto, Toronto, Ontario, Canada. 3 Departmentsof Pathology and Center for ComputationalMedicine and Bioinformatics, University ofMichigan, Ann Arbor, Michigan, USA. 4 OntarioCancer Institute and McLaughlin Centre forMolecular Medicine, Toronto, Ontario, Canada.5 Wellcome Trust Centre for Cell Biology, Schoolof Biological Sciences, University of Edinburgh,Edinburgh, UK.e-mail: m.tyers@ed.ac.uk or gingras@lunenfeld.ca1. Gingras, A.C., Gstaiger, M., Raught, B. & Aebersold,R. Nat. Rev. Mol. Cell Biol 8, 645–654 (2007).2. Taylor, C.F. et al. (MIAPE). Nat. Biotechnol. 25,887–893 (2007).More sizzle than fizzleTo the Editor:In an echo of Mark Twain’s response whenreading his own published obituary that“The report of my deathhas been exaggerated,”I should like to correctan inaccuracy aboutGlaxoSmithKline’s(Brentford, UK) EpiNovaDiscovery PerformanceUnit (DPU), which wasmentioned in CatherineShaffer’s news articleentitled “Pfizer exploresrare disease path” from theSeptember issue.The article suggestedthat EpiNova had ‘fizzledout.” As vice president andhead of the EpiNova DPU, I can confirmthat, on the contrary, this early drugdiscovery unit continues to research andbuild alliances in our search to apply theknowledge of epigenetics to the quest fornew medicines for patients. In fact, withour own first-class science, innovation andentrepreneurial spirit, and our externalalliances with leading epigenetics researchunits, including the biotech companyCellzome (Heidelberg, Germany), andCambridge (UK), Harvard (Cambridge,MA), Oxford (UK) and Rockefeller (NewYork) universities, we are in glowing health.Many of your readers who have heard3. Perkins, D.N., Pappin, D.J., Creasy, D.M. & Cottrell,J.S. Electrophoresis 20, 3551–3567 (1999).4. Craig, R. & Beavis, R.C. Bioinformatics 20, 1466–1467 (2004).5. Keller, A., Eng, J., Zhang, N., Li, X.J. & Aebersold, R.Mol. Syst. Biol. 1, 2005 0017 (2005).6. Ewing, R.M. et al. Mol. Syst. Biol. 3, 89 (2007).7. Orchard, S. et al. Nat. Biotechnol. 25, 894–898(2007).8. Breitkreutz, B.J. et al. Nucleic Acids Res. 36, D637–640 (2008).9. Shannon, P. et al. Genome Res. 13, 2498–2504(2003).10. Cottingham, K. J. Proteome Res. 8, 4887–4888(2009).11. Hermjakob, H. et al. Nat. Biotechnol. 22, 177–183(2004).12. Orchard, S. et al. Proteomics 7 Suppl 1, 28–34(2007).13. Kerrien, S. et al. Nucleic Acids Res. 35, D561–565(2007).14. Falkner, J.A., Hill, J.A. & Andrews, P.C. Proteomics 8,1756–1757 (2008).15. Breitkreutz, A. et al. Science 328, 1043–1046(2010).our presentations at the AmericanChemical Society meeting this August inBoston, as well as Miptec 2010 and theSociety for MedicinesResearch EpigeneticsMeeting held lastmonth in Basel andLondon, respectively,will already know this.We are also sponsoringthe “Epigenetics ofChromatin Modificationsin Inflammation” meeting,taking place in Oxford thisDecember.We look forward tobuilding on the excellentcollaborations that arealready in place andadvancing our work in the area of immuneinflammationlong into the future, withthe aim of bringing out new innovativemedicines for immuno-inflammatorydiseases.COMPETING FINANCIAL INTERESTSThe author declares competing financialinterests: details accompany the full-text HTMLversion of the paper at http://www.nature.com/naturebiotechnology/Kevin LeeEpiNova DPU, iiCEDD, GlaxoSmithKline,Stevenage, UK.e-mail: Kevin.2.lee@gsk.comnature biotechnology volume 28 number 10 OCTOBER 2010 1017


commentarycase studyThe path less costlyBrady HuggettWhen faced with a competitive threat, two companies took diametrically opposite approaches. Both were ultimatelysuccessful, but Genzyme’s decision proved to be the cleaner and cheaper option.© 2010 Nature America, Inc. All rights reserved.As the world’s leader in developing enzyme-replacement drugs,Genzyme has always understood the importance of first to market. In1991, the company obtained approval for Ceredase (alglucerase injection),an enzyme replacement therapy for lysosomal storage disease (LSD) type1 Gaucher. Three years later, its second-generation product, Cerezyme(imiglucerase for injection), was also cleared for commercialization. Thelack of treatments for lysosomal storage diseases and effective patient outreachand marketing meant that Genzyme could command soaring pricesfor its orphan treatments. In 2000, the two drugs alone provided 66% ofGenzyme’s entire product revenue.At this time, Novazyme Pharmaceuticals was a young company developinga preclinical product for another lysosomal storage disease calledPompe. Its patented phosphotransferase technology was designed to addmannose-6 phosphate and N-acetylglucosamine sugars to recombinantβ-glucocerebrosidase produced in Chinese hamster ovary (CHO) cells,thus increasing uptake of the enzyme in the patient. The idea was that lessdrug would be needed per patient, and any eventual product would bepriced lower than competitors, although approval lay some years away.By June 2001, however, the company was in financial trouble: it hadabout $5.5 million in cash and equivalents, with no incoming revenue anda six-month burn of $11.9 million. The company needed help if it was tohandle the expense of clinical trials.There was interest from Genentech, which offered a collaboration,including $30 million upfront, milestone payments and 50% of anysales. But Genzyme, which had its own product in development forPompe, wanted more than a partnership. It offered to buy the companyoutright for $137.5 million, pegging another $87.5 million on milestonessurrounding approval of one or more drugs that incorporatedNovazyme’s platform technology.Genzyme was already conducting an extension of a phase 2 trial withtransgenically derived human alpha-glucosidase for Pompe disease, andalso a phase 2 trial of a CHO cell–derived alpha-glucosidase productin-licensed from Synpac, of Research Triangle Park, North Carolina, forthe same indication. It also had its own, internally developed compoundfor Pompe. By taking Novazyme’s drug in house, Genzyme could line itup against its three candidates in a massive preclinical trial and move thebest one forward based on results.Ultimately, the best replacement protein turned out not to be Novazyme’sdrug; worse still, the data surrounding the technology platform proved tobe nonreproducible. It seemed the buyout was a bust all-round.But consider another company’s dance with a potential competitor:Genentech of S. San Francisco, Calif. and its twisting history with TanoxBrady Huggett is Business Editor at Nature Biotechnology.and Novartis of Basel. Houston-based Tanox was founded in 1986 tofocus on anti-IgE antibodies and by 1989 was looking for a clinicaldevelopment partner; it sent samples of its candidate to both Genentechand Ciba Geigy (the company that would later become Novartis).Genentech passed. Ciba Geigy, however, began working with Tanoxon anti-IgE antibodies for allergic diseases.Yet Genentech clearly had interest in the area, because it began its ownanti-IgE program a few years later—a move that prompted a misappropriationsuit from Tanox. The companies fought in court for three yearsbefore Genentech, Tanox and Novartis reached a settlement and entereda cross-licensing agreement for anti-IgE antibodies.That was hardly the end for Genentech. After the three companiessurveyed their collective R&D efforts and pushed Genentech’s anti-IgEproduct (eventually called Xolair; omalizumab) to the front, Tanox took itupon itself to develop its discarded anti-IgE antibody (TNX-901) for peanutallergy, thinking that well within the rights of the original three-waycontract. Tanox’s partners did not see it the same, and the trio went backto court to settle matters, with Tanox eventually receiving $6.6 million in2004 with the stipulation it let TNX-901 die.Nor was this the end. Xolair was approved in 2003 for asthma, and hasgone on to sell well: it approached blockbuster status in 2009 and throughlast year has sold about $3.3 billion worldwide. Tanox had a slice of salesthrough the cross-licensing agreement, which became the impetus forGenentech to approach Tanox in December 2005 with a merger offer. Thedeal was announced in November 2006 at $919 million, a 46% premiumto Tanox’s stock price pre-announcement. One can only wonder what themerger would have cost Genentech 16 years before, when Tanox first cameknocking. Instead, Genentech rebuffed that initial proposition and wentto court with Tanox twice before finally swallowing it in a move that hadmore to do with accounting than science or patients.In either of these mergers, the gem compound eventually died—hardlya surprise, given the rates of attrition in drug pipelines. At the same time,both of these buyouts could technically be called a success.Bringing aboard Novazyme and its product temporarily muddiedthe waters for Genzyme, but it knew that whatever eventually floatedto the top, it would own. The acquisition crisply and cleanly removeda noisy competitor, leaving Genzyme to focus on approval: Myozymewas cleared in 2006 and has since surpassed $1 billion in worldwidesales. The numbers worked out fine for Genentech, too—the Xolairrevenue slice pays for the price of Tanox—but the company wastedresources and money in court and ended up paying more than itmight have. It also received negative press for its court fight overTanox’s attempts to develop a peanut allergy drug. Crisp and cleanthis was not.1018 volume 28 number 10 OCTOBER 2010 nature biotechnology


patentsFaculty and employee ownership of inventionsin AustraliaAmanda McBratney & Julie-Anne TarrA recent Australian legal decision means that, unless faculty members are bound by an assignment or intellectualproperty policy, they may own inventions resulting from their research.© 2010 Nature America, Inc. All rights reserved.Thirty years after its introduction, the USBayh-Dole Act, which vests ownership ofemployee inventions in the employer universityor research organization, has become a modelfor commercialization around the world. InAustralia, despite recommendations that aBayh-Dole–style regime be adopted, the recentdecision in University of Western Australia(UWA) v. Gray 1 has moved the default legalposition in a diametrically opposite direction.A key focus of the debate was whether faculty’sduty to carry out research also encompasses aduty to invent. Late last year, the Full FederalCourt confirmed a lower court ruling that itdoes not, and this year the High Court refusedleave to appeal (denied certiorari). Thus, Graystands as Australia’s most faculty-friendlyauthority to date.The US common-law positionAbsent an express written agreement assigningthe rights in faculty inventions to a university, ora legally binding faculty handbook or universityintellectual property (IP) policy, the commonlawposition on ownership of faculty and otheremployees’ inventions remains unclear. Theusual starting point is the US Supreme Court’sdecision in Standard Parts Co. v. Peck 2 , whichstates that an employer owns employees’ inventionsif the employee was ‘hired to invent’. Underthe later United States v. Dubilier Condensercase 3 , if the employment was merely ‘general’,then even if the invention is in the employee’sfield, and relevant to the employer’s business,Amanda McBratney and Julie-Anne Tarr are inthe Faculty of Business, Queensland Universityof Technology, Brisbane, Queensland, Australia.Amanda McBratney is also a consultant withMcCullough Robertson Lawyers, Brisbane,Queensland, Australia.e-mail: amanda.mcbratney@qut.edu.auThe Australian Federal Court building in Melbourne.and even if it was developed on the employer’stime and/or with the employer’s resources, theemployee owns it. The employer gets a shop rightas compensation—an irrevocable, royalty-free,nonexclusive, largely nontransferable, impliedlicense to use the invention. If the employmentwas general and no employer time or resourceswere used, the employee owns all rights unencumbered.Whether an employee was hired to inventoften depends on the specificity of the taskdelegated by the employer. Generally, beingengaged to do research or improve productsis insufficient—if courts were too ready to vestownership in the employer, inventive creativitymight be discouraged 4 . Thus, the employeemust be hired to invent the invention at issue;if hired to invent A, they will not lose the rightto invention B.In the university context, despite increasingcommercialization, most researchers are stillnot explicitly hired to invent for the university’scommercial gain. The core of ‘public good’ stilllingers, and the pursuit of knowledge for itsown sake has yet to disappear. The difficultylies in deciding whether a faculty member’sduty to research encompasses a duty to invent.Source: Creative Commons/AdzThose arguing in the positive invariably citeSpeck v. North Carolina Dairy Foundation 5 .Speck, a professor and researcher at NorthCarolina State University, developed a processfor producing a sweet-tasting acidophilus milk.He successfully drove efforts to have the milkmass produced, but the university refused topay him anything out of its licensing royalties.Speck sued, and the case turned on whether hecould show he had a property interest in theprocess. The North Carolina Supreme Courtfound that Speck was hired to invent, so he hadno rights in the process. Proponents of Speckusually argue that the Houghton v. United Statescase lends further substance to the propositionthat researchers are hired to invent:Let a case be supposed of a charitablefoundation, which employs chemists andphysicians to study diseases, with a view ofdiscovering a cure for them, one of whoseemployees, in the course of experimentsconducted for it, discovers a remedywhich it is seeking, and for the discoveryof which the experiments are conducted,and procures a patent on it. Should suchemployee be allowed to withhold the patentfrom the foundation for his own profit,merely because the foundation does notdesire to monopolize the remedy butto give the benefit of the discovery tomankind?…To ask such a question is toanswer it… 6 .However, the Houghton case involved achemist expressly directed to develop a particularfumigant, and the court was merely dispellingthe argument that the hired-to-inventdoctrine would not apply if the employer wereuninterested in patenting. In addition, manycommentators have criticized Speck, particularlybecause the court considered Speck’s usenature biotechnology volume 28 number 10 OCTOBER 2010 1019


patents© 2010 Nature America, Inc. All rights reserved.university technology transfer offices (TTOs)could negotiate a percentage of royalty incomein return for marketing faculty inventions, andthat this could offset losses incurred. Similararrangements are proposed by Kulkarni 12 andSmith 13 .More recently, Clements uses law and economicsto argue that Bayh-Dole’s impact hasbeen marginalized by the practicalities ofimplementation: refusals to disclose inventions,difficulties of technology transfer dueto information asymmetries and deadweightlosses caused by exclusive licenses. He arguesthat faculty ownership could be achievedby amendments to the Act and, possibly, bycourts’ refusals to uphold assignment agreements.Like Chew, Clements argues that universitieswould not forego significant revenuebecause universities’ income from licensing isproportionately small compared to other revenuesources. It would remove at least part ofthe perceived issue with publication delay andsecrecy because many ‘archetypal academicscientists’ who ‘hate the Act’ would choose topublish rather than patent. It would also reversethe effect of the Act ‘steer[ing] research downless interesting avenues’ because faculty memberswho chose to publish rather than patentwould channel their efforts toward more basicresearch 14 .Kenney and Patton similarly argue thatuniversity ownership is not optimal in termsof either maximizing economic efficiency oradvancing the social interest of rapidly commercializingtechnology and encouragingentrepreneurship. Under faculty ownership,the inventors would be the principals and couldchoose their agent TTO; TTOs would thus beforced to become more competitive. Facultyownership would also shrink the gray marketin faculty inventions—in one study, over 20%of professors had founded firms without universitylicenses; in another, 42% of professorswho patented did so without informing theirTTOs 15 .Nevertheless, Kenney and Patton candidlyacknowledge that faculty ownership brings itsown problems, including the obvious rejoinderto Clements that, in fact, problems withsecrecy and nondisclosure may be exacerbatedby the faculty’s increased stake in the rewardsof patenting. Further, there is a risk that someinventions would not be commercialized, andthat some inventors may be incompetent atcommercialization—but they maintain thatthis decentralized ineptitude is better than thepresent centralized ineptitude of many TTOsthat affects all university inventors.However, Kenney and Patton also point to asignificant detriment of faculty ownership forwhich they offer no remedy: it could discourofthe university’s time and resources as a factorindicating he was hired to invent, rather than,as settled case law indicates, a factor indicatinga shop right should be awarded. So, althoughthe university ‘permitted and encouraged’ theresearch, there was no evidence Speck’s researchagenda was controlled by his department orthe university; it was ‘motivated simply by hisscientific curiosity’ 7 . Some have lamented thepolicy signals Speck sends: if faculty are hired toinvent, how does this sit with the usual notionsof academic freedom 8 ? This is the precise questionthe Australian courts grappled with, as discussedbelow.The position under Bayh-DoleGiven the uncertainty in US common law, thepassage of the Bayh-Doyle Act in 1980 was awelcome (for some) introduction of an overridingstandard for determining ownershipof inventions created with the use of federalresearch funds. The Act aims to promote thecommercialization of inventions by allowinguniversities and other recipients of federalresearch funds to elect to take title to subjectinventions if they agree to file a timely patentapplication. The university must then retaintitle to the inventions and share licensing proceedswith the employee inventors, and thebalance of licensing income must be used tosupport scientific research or education.Against the backdrop of Bayh-Dole, mostuniversities have in recent times adopted a dualapproach, by issuing IP policies that vest ownershipof faculty inventions in the university,and by requiring that faculty sign inventionassignment agreements (either as a condition ofemployment or as a belated attempt to stitch upthe ownership question). A fundamental tenetof contract law is that parties are free to enterbargains as they see fit. Consequently, wherefaculty assign inventions, it is assumed thatthis has been compensated by the payment ofwages. So courts have upheld assignments overobjections they are unconscionable, coercedunder duress, where the university has paid aslittle as $1, or where continued employment isthe only consideration 9 .However, although the arrangement underBayh-Dole has recently garnered support 10 ,there is a growing body of agitators for facultyownership. In an early article, Chew 11argues that, in fact, university ownership isnot legally required by Bayh-Dole. Facultyownership would allow universities to maintaintheir academic mission and primacy ofbasic research. It would also enhance facultycreativity. Universities, Chew argues, would notlose significant revenue because royalty incomerepresents a relatively minor part of most universityfunding. In any event, she suggests thatage collaborative research, particularly largescalemulti-institutional collaborative research,because the large number of co-owners makeslogistics extremely difficult. As Bruun explains,the fragmented ownership problem faced bylarge-scale collaborative projects was one ofthe main reasons that Finland followed otherEuropean countries and abolished the ‘teacherexception’ (faculty ownership) in 2007 andadopted Bayh-Dole–like reforms 16 .For every commentator applauding Bayh-Dole, there are others now querying its efficacy.It does not look like the debate will be settledany time soon. In the meantime, proselytizingaside, for most US academics it’s business asusual: contractual arrangements are likely tobe binding and will decide the ownership issuein the university’s favor.The UWA v. Gray caseIn Australia, employers derive ownership rightsunder §15(1)(b) of the Patents Act. This statesthat employers “would, on the grant of a patent…beentitled to have the patent assigned” tothem. Entitlement to assignment is determinedby a common-law principle established by theEnglish case of Sterling Engineering v. Patchett 17 ,which (like the US Supreme Court’s decisionsin Standard Parts and Dubilier) dictates thatthe employer will own when an employee, inthe course of employment, makes an inventionthat it was his or her duty to make.The question of whether Australian facultywere hired to invent was, until UWA v. Gray,left unclear. Some thought the questiondepended on the discipline and on whetherthe research might yield an invention 18 . Forexample, if research into diabetes could resultin an invention, it was thought that the duty toresearch might encompass a duty to invent.The protagonist in the case, Bruce Gray,was first employed at Melbourne Universityin the early 1980s, where he initiated researchinto the treatment of liver cancer by selectivelydelivering anticancer therapies to tumor sites.The University of Western Australia subsequentlyemployed Gray from 1985 to teachand “undertake research and to organize andgenerally stimulate research among the staffand students.” Gray continued in the same lineof research, and after patenting various inventions,he assigned his IP to his commercializationcompany Sirtex Medical (Lane Cove,New South Wales, Australia). Sirtex went onto float on the Australian Stock Exchange, andits current market capitalization is more thanAUS$275 million; it is now one of Australia’slargest biotech companies.Gray’s employment contract referred to theUWA Statutes and Regulations, which includedPatents Regulations and, later, IP Regulations.1020 volume 28 number 10 OCTOBER 2010 nature biotechnology


patents© 2010 Nature America, Inc. All rights reserved.However, the earlier Regulations did not vest IPrights in faculty inventions in UWA; they merelyassumed that the university had such rights.The later Regulations did purport to vest ownershipin UWA, but the trial judge and Courtof Appeal held these Regulations invalid—theAct establishing the University did not allow theSenate to make regulations divesting its faculty’sIP rights. So the invalid Regulations could notbe incorporated into Gray’s contractual terms.An additional problem was that the Regulationshad not been properly promulgated and so werenot effective until after Gray’s employment atUWA had fundamentally changed.So UWA was thrown back on its commonlawposition. It argued that Gray’s contractwas subject to a term, implied by law, that hemust assign ownership of any inventions to theUniversity. Assignment was said to be requiredwherever an employee is engaged, instructed orauthorized to solve technical problems, improvethe employer’s technology or undertake researchfrom which such invention may arise.The Australian rulingThe Appeals court, like the trial judge, foundthat the university/faculty relationship raisedsuch distinctive considerations that it wasinappropriate to accept that there is a generalpresumption that a university will own inventionsdeveloped in the course of its faculty’sresearch. Gray’s circumstances of employmentalso weighed against implication of the term.In particular:(i) Gray was not under any (express)duty to invent anything. Consistentlywith traditional notions of academic freedom,Gray was free to choose his line ofresearch and the manner of its pursuit.(ii) Gray was free to publish his researchresults and any inventions developed,notwithstanding that this might destroythe patentability of any inventions. Thefact that UWA did not impose any obligationsof secrecy was also consistent withtraditional notions of academic freedom,and inconsistent with an intention byUWA to own and commercialize facultyinventions. The Court of Appeal consideredthat the importance of the freedom topublish was ‘self-evident’. Implication of aterm as posited by UWA would have a ‘significantcollateral impact’ on academics ifit were underpinned by a correspondingduty not to disclose.(iii) Gray expended much time andeffort applying for external funds. Aswith many cash-strapped universities,UWA wanted to foster, but could notfund, Gray’s research. Unlike the generalemployer-employee scenario in which theemployer has funded the employee’s work,if UWA’s term were implied, it would‘allow UWA to reap where various entitieshad sown’.(iv) Gray engaged in significant collaborativework with external organizations.The need for inter-institutionalcooperation weighed against exclusiveappropriation of the end product byone institution via an implied term. Inaddition, the evidence on informationexchanges in Gray’s field of research demonstratedthat sharing of research results,and know-how, was both necessary andaccepted—which further argued againstimplication of the term.Ultimately, UWA was unsuccessful on allcounts. In what might seem the final legalsnub, Australian law has yet to recognizethe existence of a shop right, so UWA wasleft without any rights whatsoever to Gray’sinventions. As the High Court drily remarkedin refusing leave to appeal, the case emphasizesthe need for express contractual arrangementson ownership.However, it didn’t all go well for Gray. Sirtexsuccessfully cross-claimed against him forbreach of his directors’ duties and for misleadingand deceptive conduct (for failing toinform the company about the potential ownershipproblems), resulting in an order for Grayto pay it almost AUS$2 million.The aftermathSo the common-law position on faculty ownershipin Australia seems to have been resolved,for the moment, in favor of faculty. Not surprisingly,Gray ignited a rash of university soulsearching,in which the implications of facultyownership and how best to secure universityownership were pondered. Yet the Gray case isin many respects an unsatisfactory precedent,and there is certainly an array of very particularfacts and circumstances that will allow latercourts to sidestep it and discount it as an allembracingauthority.Nevertheless, Gray’s emergence on the legallandscape will likely lead to renewed calls fortailored statutory intervention. Christie et al. 19raise the usual arguments:The default position should not vest ownershipof patents in employee inventors norfunding agencies. This is because employeesmay not recognize the commercialvalue of their inventions and because ofthe potential problems with fragmentationof ownership. Indeed, experience inCanada has shown that academic staffmembers in universities often lack thetime and expertise required for commercialisation.Funding agencies are also notwell placed to assume ownership rights, asthey are one step removed from the inventiveprocess…There is little doubt that Bayh-Dole hasgone a considerable distance towards clarifyingtitle for research with federal fundinglinks. It means that, like the hamster bite thatkills, the uncertainty of the common law israrely a problem (but when it is, it can stillbe ugly). However, it must be acknowledgedthat a spectrum of issues—from capturingownership outside the Act, through to how itmight best be overhauled to mitigate a hostof anticompetitive byproducts—continue toraise concerns.So although arguments can be assembledin favor of the adoption of a Bayh-Dole–styleapproach in Australia, as always a carefulbalancing process should be followed. Onthe one hand, a new statute will inevitablylimit future flexibility and interfere with privatecontracting rights. On the other, giventhe vexed issue of ownership and the complicationsof Gray, the answer is probablythat some form of statutory clarification isjustified. Whether legislation like Bayh-Dolewould have a good fit with the Australianresearch landscape is another matter.The Australian Productivity Commission(Canberra, Australian Capital Territory,Australia) recently urged a “cautious” approachto adopting an Australian Bayh-Dole Act.Although Bayh-Dole was introduced inresponse to concerns that many universityinventions were not being commercialized,there is little evidence of a similar phenomenonin Australia. The Commission found that therewere already financial incentives for Australianuniversities to commercialize—such as theone-third royalty-sharing arrangement mostcommonly divided among the university, theinventor and the inventor’s academic departmentor faculty. The Commission also pointedout the potential for new Bayh-Dole–stylelegislation to adversely affect the incentivesoperating within universities—suggestingthe delicate balance between commercializationactivities and the ‘academic traditions ofopenness and curiosity driven research’ mightbe disrupted 20 .For now, Australian inventors and researchersare left with Gray, and the prospects ofBayh-Dole reforms appear to be slim to nonein the immediately foreseeable political future.nature biotechnology volume 28 number 10 OCTOBER 2010 1021


patentsFor those few faculty members or researchersat other institutions who, through good luckor good management, are not bound to contractualassignments, the decision is a windfall.For most Australian faculty, as in the UnitedStates, contract will usually reign supreme,and universities will usually win the ownershiptug-of-war. Nevertheless, the Gray decision isa salient reminder that the issue of ownershipcan never be underestimated in regard to itsability to provoke a good fight.COMPETING FINANCIAL INTERESTSThe authors declare no competing financial interests.1. University of Western Australia (UWA v. Gray) [2008]FCA 498 (first instance); [2009] FCAFC 116 (FullFederal Court); [2010] HCATrans 11 (High Court ofAustralia refusing special leave to appeal (i.e., certioraridenied)).2. Standard Parts Co. v. Peck, 264 U.S. 52 (1924).3. U.S. v. Dubilier Condenser Corp., 289 U.S. 706(1933).4. Chisum, D.S. Chisum on Patents, 8–22 (LexisNexis,Matthew Bender, 1978).5. Speck v. North Carolina Dairy Foundation, 307 S.E.2d785 (N.C. 1983) rev’d 319 S.E.2d 139 (1984).6. Houghton v. United States, 23 F.2d 386, 390 (1928,CCA, 4th cir).7. Chew, P.K. Wis. L. Rev. 259–314 (1992).8. Browning, C.G. Jr. N.C.L. Rev. 63, 1248–1259.(1985)9. Merges, R. Harv. J. Law & Tec. 13, 1–54 (1999).10. Luppino, A.J. UMKC Law Review 78, 367–427(2009).11. Chew, P.K. Wis. L. Rev. 259–314 (1992).12. Kulkarni, S.R. Hastings L.J. 47, 221–256 (1995).13. Smith, K.G. 1 V.A J.L. & Tech. 1(4), 15 (1997).14. Clements, J.D. IDEA 49, 469–516 (2009).15. Kenney, M. & Patton, D. Research Policy 39, 1407–1422 (2009).16. Bruun, N. Ga. St. U. L. Rev. 23, 913–935, 921(2007).17. Sterling Engineering Co. Ltd. v. Patchett [1955]AC.18. Monotti, A. & Ricketson, S. Universities andIntellectual Property (Oxford University Press,2003).19. Christie, A. et al. Analysis of the legal framework forpatent ownership in publicly funded research institutions(Intellectual Property Research Institute ofAustralia, 2003).20. Australian Productivity Commission. Public supportfor science and innovation (APC Final Report)(2007).© 2010 Nature America, Inc. All rights reserved.1022 volume 28 number 10 OCTOBER 2010 nature biotechnology


patents© 2010 Nature America, Inc. All rights reserved.Recent patent applications in gene synthesisPatent number Description Assignee InventorWO 2010079039 A method for modulating macronutrients, comprisingNestec (Vevey, Arigoni F,producing a synthetic gene coding for Switzerland) Bureau-Franz I,at least one enzyme, expressing and optionallyMaynard F,activating the enzyme and contacting it withPridmore Rmacronutrients; useful in, e.g., pharmaceuticalcomposition.WO 2010071602KR 2010040634WO 2010025395,US 20100055085WO 2009138954A method of synthesizing a nucleic acid moleculeinvolving assembling overlapping oligonucleotidesand amplification with PCR, using a single PCR,with distinct oligonucleotides and annealingtemperatures.A method of manufacturing a deletion cassette,comprising artificial sequence linkers, by performingPCR amplification using a primer pairexisting in the artificial sequence linker andremoving the artificial sequence linker.A new isolated, synthetic or recombinant nucleicacid comprising, e.g., a nucleic acid (polynucleotide)encoding at least one polypeptide, useful,e.g., for encoding immobilized polypeptide usedas food, and in a cosmetic or a cream.A method of synthesizing a polynucleotide on asolid support comprising partitioning a polynucleotideinto an ordered set of palindromeless subunitsand ligating the oligonucleotide precursorsof each subset.Priorityapplication date Publication date1/6/2009 7/15/2010Agency for Science, Li M, Ye H; Ying JY 12/19/2008 6/24/2010Technology &Research (Singapore)Korea ResearchInstitute ofBioscience &Biotechnology(Daejeon, S. Korea)Verenium(Cambridge, MA,USA)British ColumbiaCancer Agency(Vancouver, BC,Canada)Han S, Heo G,Kim D, Lee A,Lee H, Lee M,Nam M, Noh E,Park HBarton NR,Bueno A,Cuenca J,Dayton CLG,Hitchman T,Kline KA, Lyon J,Miller ML, Wall MACoope R, Holt RA,Horspool D10/10/2008 4/20/20108/29/2008 3/4/20105/14/2008 11/19/2009Source: Thomson Scientific Search Service. The status of each application is slightly different from country to country. For further details, contact Thomson Scientific, 1800Diagonal Road, Suite 250, Alexandria, Virginia 22314, USA. Tel: 1 (800) 337-9368 (http://www.thomson.com/scientific).Selected patent expirations/extensions in the second half of 2010Generic drug(brand) name Company Indication Patent information Drug informationCidofovir(Vistide)Bivalirudin(Angiomax)rHuPH20(Hylenex)Docetaxel(Taxotere)Gemcitabine(Gemzar)Topotecan(Hycamtin)Donepezil(Aricept)Gilead; PfizerBiogen Idec;The MedicinesCompanyHalozyme;BaxterCytomegalovirusretinitis(in AIDS patients)Percutaneous coronaryinterventions (PCIs)Adjuvant agent, forincreasing drugabsorption ordispersionThe last patent covering the selectivenucleotide inhibitor for viral DNA polymerase,which was approved by FDA in6/96, expired on 6/26/10.On 8/6/10, USPTO granted a one-yearinterim extension of US patent no.5,196,404, which covers the bivalentpeptide thrombin inhibitor used in PCIssuch as bypass surgery, until 8/13/11.On 8/12/10, US patent no. 7,767,429 wasissued, claiming the proprietary platformfor the rHuPH20 PEGylated glycoprotein,which was approved by the FDA inDecember 2005. The claim, which extendsuntil 9/23/27, also covers formulation ofrHuPH20 with other pharmaceutical agents.The European counterpart, EP1603541,provides protection until 3/5/24.Sanofi-aventis Various cancers On 11/14/10, the patents for Taxoterewill expire. The patent covering the drugsubstance expired on 5/14/10.Eli Lilly Various cancers On 11/15/10, the compound patent forGemzar will expire.GlaxoSmithKline Various cancers On 11/28/10, the compound and druguse claim patents for Hycamtin willexpire.Eisai Alzheimer’s disease On 11/25/10, the substance patent forAricept will expire. FDA approved thefirst generic version of donepezil on12/11/09.FDA, US Food and Drug Administration. NA, not available. Source: http://biomedtracker.com/S-N-(3-hydroxy-2-phosphonylmethoxy)propylcytosine derived from the purine analog,S-9-(3-hydroxy-2-phosphonomethoxy)propyl-adenineBivalirudin (D-FPRPGGGGDGDFEEIPEEYL)is a bivalent thrombin inhibitor comprisinga moiety (D-FPRP) that binds thrombin’sactive-site cleft and a hirudin-likeC-terminal region (DGDFEEIPEEYL) thatbinds to the thrombin anion-binding exositeRecombinant human hyaluronidaseTaxane, the substance from which Taxotereis derived, is found in the needles of theEuropean yew treeDeoxycytidine analog that inhibits DNAsynthesisA semisynthetic derivative of camptothecinand an anti-tumor drug with topoisomeraseI-inhibitory activityReversible acetylcholinesterase inhibitornature biotechnology volume 28 number 10 OCTOBER 2010 1023


news and viewsTiming is everything in the human embryoAnn A KiesslingA noninvasive imaging method for predicting how human embryos will develop may improve the success and safetyof in vitro fertilization.© 2010 Nature America, Inc. All rights reserved.Humans may be the least fertile mammals onearth 1,2 . Although in vitro fertilization (IVF)technology has helped many couples—anestimated 3 million IVF babies had been bornworldwide by 2002—its success rates are low(~25% per procedure), and it has led to an epidemicof high-risk multiple births 3,4 . In thisissue, Wong et al. 5 report a noninvasive strategyfor evaluating human embryos that couldboth improve IVF success rates and decreasethe likelihood of multiple gestations. Usingtime-lapse videography, they discoveredcharacteristics of initial cleavages that predictsuccessful development to the blastocyst stagewith >93% accuracy.A new human life depends upon robust,steadily increasing signals from the fertilizedegg to the mother. Failure to elaborate sufficientsignal (e.g., the pregnancy hormone, humanchorionic gonadotropin) results in a menstrualcycle and expulsion of the fertilized egg, thusfreeing maternal resources for another attemptwith a new egg. How the early human embryosends the robust, steadily increasing signals ispoorly understood, but one mechanism couldbe rapid duplication of embryonic DNA witha concomitant increase in signaling output tothe mother. This suggests a direct correlationbetween the speed of chromosome duplication,cell division and successful pregnancy,an observation often reported by programs ofassisted reproduction 4 .IVF involves hormone stimulation of awoman’s ovaries in order to mature multipleeggs, which are removed, fertilized in the laboratory,cultured for 2 to 6 days, and transferredback to her uterus for gestation. Fertilized onday 1, an egg that has duplicated its chromosomestwice and reached the 4-cell stage byAnn A. Kiessling is in the Department ofSurgery, Harvard Medical School, Boston,Massachusetts, USA.e-mail: akiessli@bidmc.harvard.eduaDay 1 Day 1 or 2 Day 2 or 3 Day 4 or 5 Day 5 or 6bZygote14 ± 6minutes2-Cell11 ± 2hoursWithin1 ± 1.6hoursearly day 2, and reached the 8-cell stage by earlyday 3, has a higher likelihood of giving riseto an offspring than an egg that duplicated itschromosomes only once and reached the 2-cellstage on day 2 and the 4-cell stage on day 3, butthe correlation is imperfect. Faster-cleavingembryos also have a higher likelihood of developingto the blastocyst stage, which marks thefirst cell-commitment event in early development,but that correlation, too, is imperfect.Many IVF programs extend embryo cultureto day 5 or 6 to transfer a single blastocyst.This practice successfully decreases the risk4-Cell Morula BlastocystFigure 1 Early human development. (a) The zygote possesses one pronucleus containing eggchromosomes and another pronucleus containing sperm chromosomes. Both sets of chromosomes areduplicated before the first cleavage to 2 cells. Wong et al. 5 discovered that, for successful developmentto the blastocyst stage, the first cleavage furrow should be only 14 ± 6 min from the beginning to theappearance of 2 cells; the 2-cell stage should last only 11 ± 2 h; and the cleavage of each of the 2cells to its daughter cells in the 4-cell stage should occur within 1 ± 1.6 h of each other. The morulaforms at the 8- to 16-cell stage, trapping 1 or 2 cells inside that undergo commitment to becomethe inner cell mass (ICM) within the blastocyst. The ICM gives rise to the fetus. The outer cells of theblastocyst become committed to trophoblast, precursor to the placenta. (b) Theoretical aneuploidy inearly development. The schematic depicts the highest rate of aneuploidy (purple cells) that could formthe ICM from a euploid cell (green) and produce a normal fetus. This theory is supported by severallines of evidence, including chromosomal analyses that reveal both aneuploid and euploid cells inhuman blastocysts 8 .of multiple gestations while yielding a higherpregnancy rate for women under the age of 36(ref. 6). But fertilized eggs from many patientsdo not form blastocysts in culture. Moreover,the well-studied mouse embryo model hastaught us that the rapid cleavage rate thatoccurs in vivo between the 4-cell and 16-cellstage is not reproduced in vitro under existingculture conditions. Because blastocystformation begins at a defined interval afterfertilization, independent of the number ofcell divisions, mouse embryos developedin vivo have more than twice as many cells atnature biotechnology volume 28 number 10 OCTOBER 2010 1025


news and views© 2010 Nature America, Inc. All rights reserved.the blastocyst stage than embryos developed inculture 7 . Should the situation be the same forhuman embryos, extended culture would leadto blastocysts with fewer cells available to formthe fetus—a possible explanation for the lowbirth weight reported for some IVF babies 3,4 .The obvious way to improve IVF rates ofpregnancy and of singleton births is to chooseone healthy embryo, as soon after fertilizationas possible, for transfer at the ideal time intothe uterus of the prospective mother. Hereinlies the value of the new work by Wong et al. 5 .By stunning time-lapse photography of 100fertilized human eggs cultured to day 5 or 6,they discovered three characteristics that couldpredict progression to the blastocyst stage witha sensitivity and a specificity of 93% and 94%,respectively: (i) duration of the first cleavagefurrow leading to 2 cells of 14 ± 6 min, (ii) a2-cell stage lasting only 11 ± 2 h and (iii) the2-cell blastomeres cleaving to 4 cells within1.0 ± 1.6 h of each other (Fig. 1a). These noninvasive,early-cleavage parameters should beeasily adaptable by IVF programs to help selectthe best embryo for transfer.In addition to establishing cell-divisionguidelines that predict blastocyst development,Wong et al. 5 correlated patterns of cleavagewith gene expression in an additional 142embryos. Errors in early cleavages that giverise to chromosomal aneuploidy have beencited as leading to embryonic failure 4,8 . Butgenome-wide analyses of gene expression innormal-appearing, 8-cell human embryos 9suggested that aneuploidy may be common inthe early cleavage stages of apparently normalembryos. These studies revealed a lack of cellcycle checkpoints, such as Rb and Wee1, andoverexpression of cell cycle drivers, such asCyclins A, B and E, and Myc, which allows forthe rapid rates of gene amplification needed formaternal signaling.In fact, aneuploidy in early-cleaving embryosmay not be lethal to fetal development becausemost of the early cells will form trophoblast(Fig. 1b). An important feature of early mammaliandevelopment is the enormous size of theegg; thus, DNA duplication and cell division toapproximately the 64-cell stage occurs withoutthe need for cell growth or, perhaps, for growthfactor stimulation. Geometrically, at the 16-cellstage, 1 or 2 cells trapped in the middle of thecell mass initiate commitment to inner cell mass(ICM) cells (precursor fetal cells), while theouter cells form the trophoblast lineage (precursorplacental cells). Cultured ICM cells canalso give rise to embryonic stem cells, which doexpress Rb and Wee1 (ref. 9), suggesting thatthe ICM has active cell cycle checkpoints tohelp maintain the chromo some integrity of thedeveloping embryo.Wong et al. 5 describe several aberrantembryo phenotypes, as well as gene expressionanalyses of individual blastomeres, that revealasynchrony in cell division, in gene expressionand in the degradation of maternal messages.These results suggest that the current view ofthe early- cleaving embryo—that all cells areequivalent and chromosomally balanced—should be revised to recognize that aneuploidyafter the 2-cell stage may be common 8 ,that aneuploid trophoblast cells may be ableto give rise to a fully functioning placenta andthat only ICM cells (as few as 12% at the 16-cellstage) must be chromosomally balanced to giverise to a normal fetus.Balancing rapid cell divisions with accuratechromosome allocation is the yin and yang ofearly human development. A new human lifearises from fewer than 30% of fertilized humaneggs 1–4 . By careful comparison of cell divisioncharacteristics with the emerging lists of cellTaking the measure of themethylomeStephan BeckWith the rapid development of new methodsfor epigenomic analysis, the need for a systematicassessment of available technologies hasbecome acute. In this issue, Harris et al. 1 andBock et al. 2 compare the performance of commonlyused techniques for DNA methylationanalysis in terms of cost, resolution, genomecoverage and accuracy. The findings provide afirst benchmark of which method works bestfor which part of the methylome.In humans, DNA methylation occurs predominantlyat cytosine bases in the form ofmethyl cytosines (mCs), methyl cytosine guaninedinucleotides (mCGs), hydroxymethylcytosines (hmCs) and, possibly, in other, yetunknown forms. Collectively, these modificationsdefine the DNA methylome of a cell.Together with the study of other epigeneticmarks, methylome analysis forms an integralpart of ongoing efforts to elucidate the epigenomesof healthy and diseased cell types.Stephan Beck is at the UCL Cancer Institute,University College London, London, UK.e-mail: s.beck@ucl.ac.ukcycle gene elements 10 , we may be able not onlyto predict which embryos have the greatestpotential for life, but also to develop therapiesfor early embryos that will balance their yin andyang in favor of development to offspring.COMPETING FINANCIAL INTERESTSThe author declares no competing financial interests.1. Evers, J.L.H. Lancet 360, 151–159 (2002).2. Macklon, N.S. et al. Hum. Reprod. Update 8, 333–343(2002).3. Adashi, E.Y. et al. Reprod. Biomed. Online 7, 515–542(2003).4. de Mouzon, J. et al. Hum. Reprod. 24, 2310–2320(2009).5. Wong, C.C. et al. Nat. Biotechnol. 28, 1115–1121(2010).6. Stillman, R.J. et al. Fertil. Steril. 92, 1895–1906(2009).7. Kiessling, A.A. et al. J. Exp. Zool. 258, 34–47(1991).8. Northrop, L.E. et al. Mol. Hum. Reprod. 16, 590–600(2010).9. Kiessling, A.A. et al. J. Assist. Reprod. Genet. 27,265–276 (2010).10. Neumann, B. et al. Nature 464, 721–727 (2010).Two comparative studies from the International Human Epigenome Projectfind high concordance between different methods for measuring genomicmethylation.Such methylome maps will allow the identificationof genomic regions involved in cell differentiationand disease.Following years of planning by theEpigenome Taskforce 3 and other initiatives, thestudies of Harris et al. 1 and Bock et al. 2 markanother milestone for the International HumanEpigenome Project 4 . Together with recentpapers by Li et al. 5 and Robinson et al. 6 (notdiscussed here), they compare the performanceof the main technologies for mCG methylomeanalysis. Until now, we did not know how wellthese methods work, what their particularstrengths and weaknesses are, or the extent towhich the resulting methylation maps overlap.Understanding these issues is especially importantwhen choosing a method for generatingso-called reference methylomes, which will beused as definitive resources in future researchand must therefore be as accurate and comprehensiveas possible.In all, Harris et al. 1 and Bock et al. 2 testedsix methods, of which five are sequencingbasedand one is array-based. Three of themethods—MethylC-seq (data from ref. 7),1026 volume 28 number 10 October 2010 nature biotechnology


news and views© 2010 Nature America, Inc. All rights reserved.the blastocyst stage than embryos developed inculture 7 . Should the situation be the same forhuman embryos, extended culture would leadto blastocysts with fewer cells available to formthe fetus—a possible explanation for the lowbirth weight reported for some IVF babies 3,4 .The obvious way to improve IVF rates ofpregnancy and of singleton births is to chooseone healthy embryo, as soon after fertilizationas possible, for transfer at the ideal time intothe uterus of the prospective mother. Hereinlies the value of the new work by Wong et al. 5 .By stunning time-lapse photography of 100fertilized human eggs cultured to day 5 or 6,they discovered three characteristics that couldpredict progression to the blastocyst stage witha sensitivity and a specificity of 93% and 94%,respectively: (i) duration of the first cleavagefurrow leading to 2 cells of 14 ± 6 min, (ii) a2-cell stage lasting only 11 ± 2 h and (iii) the2-cell blastomeres cleaving to 4 cells within1.0 ± 1.6 h of each other (Fig. 1a). These noninvasive,early-cleavage parameters should beeasily adaptable by IVF programs to help selectthe best embryo for transfer.In addition to establishing cell-divisionguidelines that predict blastocyst development,Wong et al. 5 correlated patterns of cleavagewith gene expression in an additional 142embryos. Errors in early cleavages that giverise to chromosomal aneuploidy have beencited as leading to embryonic failure 4,8 . Butgenome-wide analyses of gene expression innormal-appearing, 8-cell human embryos 9suggested that aneuploidy may be common inthe early cleavage stages of apparently normalembryos. These studies revealed a lack of cellcycle checkpoints, such as Rb and Wee1, andoverexpression of cell cycle drivers, such asCyclins A, B and E, and Myc, which allows forthe rapid rates of gene amplification needed formaternal signaling.In fact, aneuploidy in early-cleaving embryosmay not be lethal to fetal development becausemost of the early cells will form trophoblast(Fig. 1b). An important feature of early mammaliandevelopment is the enormous size of theegg; thus, DNA duplication and cell division toapproximately the 64-cell stage occurs withoutthe need for cell growth or, perhaps, for growthfactor stimulation. Geometrically, at the 16-cellstage, 1 or 2 cells trapped in the middle of thecell mass initiate commitment to inner cell mass(ICM) cells (precursor fetal cells), while theouter cells form the trophoblast lineage (precursorplacental cells). Cultured ICM cells canalso give rise to embryonic stem cells, which doexpress Rb and Wee1 (ref. 9), suggesting thatthe ICM has active cell cycle checkpoints tohelp maintain the chromo some integrity of thedeveloping embryo.Wong et al. 5 describe several aberrantembryo phenotypes, as well as gene expressionanalyses of individual blastomeres, that revealasynchrony in cell division, in gene expressionand in the degradation of maternal messages.These results suggest that the current view ofthe early- cleaving embryo—that all cells areequivalent and chromosomally balanced—should be revised to recognize that aneuploidyafter the 2-cell stage may be common 8 ,that aneuploid trophoblast cells may be ableto give rise to a fully functioning placenta andthat only ICM cells (as few as 12% at the 16-cellstage) must be chromosomally balanced to giverise to a normal fetus.Balancing rapid cell divisions with accuratechromosome allocation is the yin and yang ofearly human development. A new human lifearises from fewer than 30% of fertilized humaneggs 1–4 . By careful comparison of cell divisioncharacteristics with the emerging lists of cellTaking the measure of themethylomeStephan BeckWith the rapid development of new methodsfor epigenomic analysis, the need for a systematicassessment of available technologies hasbecome acute. In this issue, Harris et al. 1 andBock et al. 2 compare the performance of commonlyused techniques for DNA methylationanalysis in terms of cost, resolution, genomecoverage and accuracy. The findings provide afirst benchmark of which method works bestfor which part of the methylome.In humans, DNA methylation occurs predominantlyat cytosine bases in the form ofmethyl cytosines (mCs), methyl cytosine guaninedinucleotides (mCGs), hydroxymethylcytosines (hmCs) and, possibly, in other, yetunknown forms. Collectively, these modificationsdefine the DNA methylome of a cell.Together with the study of other epigeneticmarks, methylome analysis forms an integralpart of ongoing efforts to elucidate the epigenomesof healthy and diseased cell types.Stephan Beck is at the UCL Cancer Institute,University College London, London, UK.e-mail: s.beck@ucl.ac.ukcycle gene elements 10 , we may be able not onlyto predict which embryos have the greatestpotential for life, but also to develop therapiesfor early embryos that will balance their yin andyang in favor of development to offspring.COMPETING FINANCIAL INTERESTSThe author declares no competing financial interests.1. Evers, J.L.H. Lancet 360, 151–159 (2002).2. Macklon, N.S. et al. Hum. Reprod. Update 8, 333–343(2002).3. Adashi, E.Y. et al. Reprod. Biomed. Online 7, 515–542(2003).4. de Mouzon, J. et al. Hum. Reprod. 24, 2310–2320(2009).5. Wong, C.C. et al. Nat. Biotechnol. 28, 1115–1121(2010).6. Stillman, R.J. et al. Fertil. Steril. 92, 1895–1906(2009).7. Kiessling, A.A. et al. J. Exp. Zool. 258, 34–47(1991).8. Northrop, L.E. et al. Mol. Hum. Reprod. 16, 590–600(2010).9. Kiessling, A.A. et al. J. Assist. Reprod. Genet. 27,265–276 (2010).10. Neumann, B. et al. Nature 464, 721–727 (2010).Two comparative studies from the International Human Epigenome Projectfind high concordance between different methods for measuring genomicmethylation.Such methylome maps will allow the identificationof genomic regions involved in cell differentiationand disease.Following years of planning by theEpigenome Taskforce 3 and other initiatives, thestudies of Harris et al. 1 and Bock et al. 2 markanother milestone for the International HumanEpigenome Project 4 . Together with recentpapers by Li et al. 5 and Robinson et al. 6 (notdiscussed here), they compare the performanceof the main technologies for mCG methylomeanalysis. Until now, we did not know how wellthese methods work, what their particularstrengths and weaknesses are, or the extent towhich the resulting methylation maps overlap.Understanding these issues is especially importantwhen choosing a method for generatingso-called reference methylomes, which will beused as definitive resources in future researchand must therefore be as accurate and comprehensiveas possible.In all, Harris et al. 1 and Bock et al. 2 testedsix methods, of which five are sequencingbasedand one is array-based. Three of themethods—MethylC-seq (data from ref. 7),1026 volume 28 number 10 October 2010 nature biotechnology


news and views© 2010 Nature America, Inc. All rights reserved.Table 1 Key metrics of the technology comparisonreduced representation bisulfite sequencing(RRBS) and the Infinium-27K bead- array—usesodium bisulfite treatment of DNA, which convertsunmethylated but not methylated cytosineto uracil. The other three—methylated DNAimmunoprecipitation sequencing (MeDIPseq),methylated DNA capture by affinitypurification (MethylCap-seq) and methylatedDNA binding domain sequencing (MBDseq)—relyon capture of methylated DNA bya monoclonal antibody or by the recombinantmethyl-binding domains of MECP2 or MBD2,respectively.Each method was subjected to rigorous qualitycontrol, and all results were supported bycomprehensive statistical analysis of at least tworeplicate samples. Table 1 summarizes some ofthe metrics examined. In addition to cost, theother important parameters when choosing amethod for a particular methylome analysis areresolution, coverage and accuracy. With respectto resolution, the choice is straightforwardbetween the high resolution (1 bp) achievedwith the bisulfite-based methods and the lowresolution (≥100 bp) of capture-based methods.Although the highest possible resolutionMethylC-seq MeDIP-seq MethylCap-seq MBD-seq RRBS Infinium-27K aGenomic DNA 5 μg 0.3–5 μg 1 μg 3 μg 0.03–0.05 μg 0.5–1 μgReadout Sequence Sequence Sequence Sequence Sequence ArrayAssay Bisulfite conversion Capture withmonoclonalantibodyCapture with MBD ofMeCP2is usually desirable, single-base-pair resolutionis not always required because the methylationstatus of adjacent CpG sites is highly correlatedfor up to 1,000 bp.Coverage and accuracy are much more difficultto assess as the different methods have differentdependencies—including CpG density,fragment length, capture affinity, read length,read depth and, for capture methods, absenceof reads in unmethylated regions—making adirect comparison challenging. Based on thefraction of the genome that can potentiallybe analyzed by each method, the theoreticalmCG coverage is ~100% for MethylC-seq,MeDIP-seq, MethylCap-seq and MBD-seq,~10% for RRBS and ~0.1% for Infinium-27K.Determining whether this coverage is actuallyachievable in practice requires the generationand analysis of saturation data for each method.This was done only for MethylC-seq and RRBSand is not applicable to Infinium-27K.Applying thresholds of 1 to 10 reads permCG, the determined actual coverage rangesfrom 96–76% for MethylC-seq and 12–9% forRRBS, which is close to the theoretical limitsof coverage, particularly for RRBS. BecauseCapture withMBD of MBD2BisulfiteconversionBisulfite conversionResolution 1 bp 100–1,000 bp 100–1,000 bp 100–1,000 bp 1 bp 1 bpTheoretical coverageWhole-genome(~100%)Whole-genome(~100%)Whole-genome(~100%)Whole-genome(~100%)Genome-wide(~10%)Genome-wide(~0.1%)Actual coverage (1 read threshold) ~95% 1 ~67% 1 ~67% 2 ~61% 1 ~12% 1 (~0.1%)Actual coverage (5 reads threshold) ~87% 1 ~23% 1 ~28% 2 ~28% 1 ~10% 1 (~0.1%)Actual coverage (10 reads threshold) ~76% 1 ~9% 1 ~14% 2 ~20% 1 ~9% 1 (~0.1%)Cost ~$100 K b ~$2 K ~$3 K ~$2 K ~$2 K ~$0.2 KConcordance (6-, 5-way) NA NA NA NA NA NAConcordance (4-way) 1 ~99% ~99% ~99% ~99%Concordance (3-way) 1 ~100% ~100% ~100%Concordance (2-way) 1 ~96% ~96%Concordance (2-way) 1 ~96% ~96%Concordance (2-way) 2 ~84% ~84%Concordance (2-way) 2 ~88% ~88%Concordance (2-way) 1 ~91% ~91%Concordance (2-way) 1 ~97% ~97%Concordance (2-way) 2 ~92% ~92%ConclusionGold standard butissue with hmCGood all-rounder Good all-rounder Good all-rounder Good for CpGislandsGood for promotersWhere appropriate, numbers are rounded or shown as a range. As sequencing costs are falling rapidly, the estimates shown are approximate and based on the assumption of~$1K per lane on an Illumina Genome Analyser (see refs. 1 and 2 for details on the models used). To determine the maximum achievable genome coverage, MethylC and RRBSwere subjected to saturation sequencing, entailing 2 lanes of sequencing for RRBS.a For the MethylC data reported by Lister et al. 7 in 2009, the exact number of lanes could not be determined and the estimated 100 lanes (resulting in estimated costs of $100K), are likely torepresent the upper limit. The current costs for a MethylC methylome is closer to $20K. MeDIP and MBD were subjected to 2 lanes and MethylCap to 3 lanes of sequencing. b A 450K upgradeof the current 27K Infinium array has been announced for later this year.saturation sequencing was not carried out forMeDIP-seq, MethylCap-seq and MBD-seq,the actual coverage data presented for thesemethods are less representative, as evidentfrom the large variation (67–9%) in coveragewhen applying the same thresholds of 1 to 10reads per mCG.Both studies 1,2 assessed accuracy by comparingoverlapping data sets between methods.For most comparisons, the Infiniumrather than the more comprehensive MethylCdata were used as a common standard. Thisis somewhat unfortunate as the resultingcomparisons are therefore between sequencing-and array-derived data and limited to thesmall set of highly selected CpG sites on theInfinium-27K array.Nevertheless, the overall concordance isencouragingly high (84–100%), dependingon the comparison (2- to 4-way comparisonswere conducted) and the parameters. This isgood news and lends confidence to the manyexisting data sets already generated by any ofthe methods. Both studies 1,2 conclude that allof the evaluated methods are capable of producingaccurate data, and neither recommendsnature biotechnology volume 28 number 10 OCTOBER 2010 1027


news and views© 2010 Nature America, Inc. All rights reserved.a particular method for the generation of referencemethylomes, although Harris et al. 1suggest the possibility of hybrid methods andshow improved results for MeDIP-seq integratedwith MRE-seq (based on methylationsensitiverestriction).Although the two studies 1,2 have successfullyresolved many long-standing questionsin the epigenomics community, several challengesremain. The most pressing concern isthat a full methylome analysis should includemC and hmC in addition to mCG, althoughthe biological functions of these modificationshave yet to be determined. Another challengeis that bisulfite-based methods (the currentgold standard of methylation analysis) cannotdistinguish between methylation andhydroxymethylation 8 , which has implicationsfor all bisulfite-based data already depositedin public databases.As the International Human EpigenomeConsortium gears up to generate 1,000 referenceepigenomes, the participating laboratories willundoubtedly use different methylome analysismethods. It will therefore be important todevelop a procedure for assigning quality valuesThe clinical success of kinase inhibitors suchas Gleevec (imatinib) has provided a glimpseof what can be achieved by targeting the signalingpathways involved in the growth ofcancer cells 1 . But these signal-transductionnetworks are still poorly understood, hamperingefforts to apply this paradigm morebroadly to patients with advanced cancer. Tworecent studies, by Moritz et al. 2 and Andersenet al. 3 , show how this challenge might beaddressed with ‘compound-centric’ phosphoproteomics.The findings, reported in ScienceSignaling 2 and Science Translational Medicine 3 ,not only provide new insights into the signalingcircuitry responsible for cell proliferationDavid B. Solit and Ingo K. Mellinghoff are in theHuman Oncology and Pathogenesis Program,Memorial Sloan-Kettering Cancer Center,New York, New York, USA.e-mail: solitd@mskcc.org or mellingi@mskcc.orgto the methylation status of each cytosine. Asimilar metric proved to be very helpful in theassembly and use of the draft sequence of thehuman genome. For the future, there are greatexpectations that one day we will be able to readthe different forms of DNA methylation directlyusing methods such as nanopore 9 and singlemolecule,real-time 10 sequencing. For now,however, with careful management, our currenttechnology is adequate to move ‘AHEAD’.COMPETING FINANCIAL INTERESTSThe author declares no competing financial interests.1. Harris, R.A. et al. Nat. Biotechnol. 28, 1097–1105(2010).2. Bock, C. et al. Nat. Biotechnol. 28, 1106–1114(2010).3. Jones, P.A. & Martienssen, R. Cancer Res. 65, 11241–11246 (2005).4. Satterlee, J. Nat. Biotechnol. 28, 1039–1044(2010).5. Li, N. et al. Methods published online, doi:10.1016/j.ymeth.2010.04.009, 27 April 2010.6. Robinson, M. et al. Epigenomics 2, 587–598(2010).7. Lister, R. et al. Nature 462, 315–322 (2009).8. Huang, Y. et al. PLoS ONE 5, e8888 (2010).9. Clarke, J. et al. Nat. Nanotechnol. 4, 265–270 (2009).10. Flusberg, B.A. et al. Nat. Methods 7, 461–465(2010).Tracing cancer networks withphosphoproteomicsDavid B Solit & Ingo K MellinghoffA mass-spectrometry approach for identifying downstream events in cancersignaling pathways may help to tailor therapies to individual patients.but may also be of value in identifying thegreatest vulnerabilities of particular tumorcells and, therefore, in optimizing therapiesfor individual patients.Many human cancers have alterations in thephosphatidylinositol-3-OH kinase (PI(3)K)pathway, which has become an area ofparticular interest in drug development 4 .Rapamycin (Rapamune, sirolimus), an immunosuppressiveagent that inhibits the kinasemTOR (‘mammalian target of rapamycin’)and is used clinically in organ transplantation,was the first inhibitor of a PI(3)K signalingintermediate to enter broad clinical testingfor cancer 5 . But despite compelling preclinicalresults (particularly in models with aberrantPI(3)K pathway activation) and modest efficacyin patients with renal cell carcinoma, theoverall clinical success of rapamycin in oncologyhas been disappointing. These failuresmay be due in part to activation of AKT andMAPK by de-inhibition of negative feedbackloops 6,7 and to redundant regulation of keydownstream effectors of transformation byparallel signaling pathways 8 . In many respects,this experience exemplifies the challenges oftargeting a signaling network that is insufficientlyunderstood.The two new studies 2,3 used global massspectrometry (MS)-based approaches toidentify substrates of serine (Ser)/threonine(Thr) kinases downstream of receptor tyrosinekinases (RTKs), RAS, PI(3)K and mTOR.Selected members of the signaling networkcomprising RAS, PI(3)K and the mTORcontainingcomplexes TORC1 and TORC2(ref. 9) are shown in Figure 1a. Each of thesecore signaling pathways activates kinases thatphosphorylate their substrates in a contextspecificmanner, depending on the aminoacids flanking the phosphorylation site. Bothstudies 2,3 used phosphomotif-specific antibodiesfor immunoaffinity purification before MSanalysis and quantified the effects of variouspathway inhibitors on the newly identifiedSer/Thr-phosphorylation sites using anapproach based on stable isotope labeling withamino acids in cell culture (SILAC) 10 (Fig. 1b).Moritz et al. 2 identified >300 substrates inthree human cancer cell lines with mutationsin either epidermal growth factor (EGFR),hepatocyte growth factor receptor (MET) orplatelet-derived growth factor receptor α(PDGFRΑ); almost half of these substrates wereidentified for the first time. Phosphorylationof 21 proteins decreased significantly in allthree cell lines after inhibition of the oncogenicRTK. The targets include the previouslyreported Akt-RSK-S6 kinase substrates glycogensynthase kinase 3A and B, ribosomalprotein S6 (RPS6) and the proline-rich Akt1substrate (PRAS40).The study by Andersen et al. 3 focused onthe PI(3)K branch of the network and useda PTEN-deficient human prostate cancer cellline, a broader immunoaffinity purificationscheme (enriching for AKT substrates, MAPKsubstrates and PDK1-docking motifs) and adifferent set of pathway inhibitors (targetingPDK1, AKT and both PI3K and mTOR). Theauthors identified 375 nonredundant phosphopeptides,of which about a quarter showeda substantial change in phosphorylation inresponse to pathway perturbation. Someproteins (e.g., RPS6 and PRAS40) showeddecreased phosphorylation in response toall three pathway inhibitors, whereas othersshowed more selective responses to particularinhibitors (e.g., RPS6KA6 for the PDK1 inhibitor).The authors then focused on PRAS40 andshowed that its phosphorylation at Thr246positively correlates with phosphorylation of1028 volume 28 number 10 October OCTOBER 2010 nature biotechnology


news and views© 2010 Nature America, Inc. All rights reserved.a particular method for the generation of referencemethylomes, although Harris et al. 1suggest the possibility of hybrid methods andshow improved results for MeDIP-seq integratedwith MRE-seq (based on methylationsensitiverestriction).Although the two studies 1,2 have successfullyresolved many long-standing questionsin the epigenomics community, several challengesremain. The most pressing concern isthat a full methylome analysis should includemC and hmC in addition to mCG, althoughthe biological functions of these modificationshave yet to be determined. Another challengeis that bisulfite-based methods (the currentgold standard of methylation analysis) cannotdistinguish between methylation andhydroxymethylation 8 , which has implicationsfor all bisulfite-based data already depositedin public databases.As the International Human EpigenomeConsortium gears up to generate 1,000 referenceepigenomes, the participating laboratories willundoubtedly use different methylome analysismethods. It will therefore be important todevelop a procedure for assigning quality valuesThe clinical success of kinase inhibitors suchas Gleevec (imatinib) has provided a glimpseof what can be achieved by targeting the signalingpathways involved in the growth ofcancer cells 1 . But these signal-transductionnetworks are still poorly understood, hamperingefforts to apply this paradigm morebroadly to patients with advanced cancer. Tworecent studies, by Moritz et al. 2 and Andersenet al. 3 , show how this challenge might beaddressed with ‘compound-centric’ phosphoproteomics.The findings, reported in ScienceSignaling 2 and Science Translational Medicine 3 ,not only provide new insights into the signalingcircuitry responsible for cell proliferationDavid B. Solit and Ingo K. Mellinghoff are in theHuman Oncology and Pathogenesis Program,Memorial Sloan-Kettering Cancer Center,New York, New York, USA.e-mail: solitd@mskcc.org or mellingi@mskcc.orgto the methylation status of each cytosine. Asimilar metric proved to be very helpful in theassembly and use of the draft sequence of thehuman genome. For the future, there are greatexpectations that one day we will be able to readthe different forms of DNA methylation directlyusing methods such as nanopore 9 and singlemolecule,real-time 10 sequencing. For now,however, with careful management, our currenttechnology is adequate to move ‘AHEAD’.COMPETING FINANCIAL INTERESTSThe author declares no competing financial interests.1. Harris, R.A. et al. Nat. Biotechnol. 28, 1097–1105(2010).2. Bock, C. et al. Nat. Biotechnol. 28, 1106–1114(2010).3. Jones, P.A. & Martienssen, R. Cancer Res. 65, 11241–11246 (2005).4. Satterlee, J. Nat. Biotechnol. 28, 1039–1044(2010).5. Li, N. et al. Methods published online, doi:10.1016/j.ymeth.2010.04.009, 27 April 2010.6. Robinson, M. et al. Epigenomics 2, 587–598(2010).7. Lister, R. et al. Nature 462, 315–322 (2009).8. Huang, Y. et al. PLoS ONE 5, e8888 (2010).9. Clarke, J. et al. Nat. Nanotechnol. 4, 265–270 (2009).10. Flusberg, B.A. et al. Nat. Methods 7, 461–465(2010).Tracing cancer networks withphosphoproteomicsDavid B Solit & Ingo K MellinghoffA mass-spectrometry approach for identifying downstream events in cancersignaling pathways may help to tailor therapies to individual patients.but may also be of value in identifying thegreatest vulnerabilities of particular tumorcells and, therefore, in optimizing therapiesfor individual patients.Many human cancers have alterations in thephosphatidylinositol-3-OH kinase (PI(3)K)pathway, which has become an area ofparticular interest in drug development 4 .Rapamycin (Rapamune, sirolimus), an immunosuppressiveagent that inhibits the kinasemTOR (‘mammalian target of rapamycin’)and is used clinically in organ transplantation,was the first inhibitor of a PI(3)K signalingintermediate to enter broad clinical testingfor cancer 5 . But despite compelling preclinicalresults (particularly in models with aberrantPI(3)K pathway activation) and modest efficacyin patients with renal cell carcinoma, theoverall clinical success of rapamycin in oncologyhas been disappointing. These failuresmay be due in part to activation of AKT andMAPK by de-inhibition of negative feedbackloops 6,7 and to redundant regulation of keydownstream effectors of transformation byparallel signaling pathways 8 . In many respects,this experience exemplifies the challenges oftargeting a signaling network that is insufficientlyunderstood.The two new studies 2,3 used global massspectrometry (MS)-based approaches toidentify substrates of serine (Ser)/threonine(Thr) kinases downstream of receptor tyrosinekinases (RTKs), RAS, PI(3)K and mTOR.Selected members of the signaling networkcomprising RAS, PI(3)K and the mTORcontainingcomplexes TORC1 and TORC2(ref. 9) are shown in Figure 1a. Each of thesecore signaling pathways activates kinases thatphosphorylate their substrates in a contextspecificmanner, depending on the aminoacids flanking the phosphorylation site. Bothstudies 2,3 used phosphomotif-specific antibodiesfor immunoaffinity purification before MSanalysis and quantified the effects of variouspathway inhibitors on the newly identifiedSer/Thr-phosphorylation sites using anapproach based on stable isotope labeling withamino acids in cell culture (SILAC) 10 (Fig. 1b).Moritz et al. 2 identified >300 substrates inthree human cancer cell lines with mutationsin either epidermal growth factor (EGFR),hepatocyte growth factor receptor (MET) orplatelet-derived growth factor receptor α(PDGFRΑ); almost half of these substrates wereidentified for the first time. Phosphorylationof 21 proteins decreased significantly in allthree cell lines after inhibition of the oncogenicRTK. The targets include the previouslyreported Akt-RSK-S6 kinase substrates glycogensynthase kinase 3A and B, ribosomalprotein S6 (RPS6) and the proline-rich Akt1substrate (PRAS40).The study by Andersen et al. 3 focused onthe PI(3)K branch of the network and useda PTEN-deficient human prostate cancer cellline, a broader immunoaffinity purificationscheme (enriching for AKT substrates, MAPKsubstrates and PDK1-docking motifs) and adifferent set of pathway inhibitors (targetingPDK1, AKT and both PI3K and mTOR). Theauthors identified 375 nonredundant phosphopeptides,of which about a quarter showeda substantial change in phosphorylation inresponse to pathway perturbation. Someproteins (e.g., RPS6 and PRAS40) showeddecreased phosphorylation in response toall three pathway inhibitors, whereas othersshowed more selective responses to particularinhibitors (e.g., RPS6KA6 for the PDK1 inhibitor).The authors then focused on PRAS40 andshowed that its phosphorylation at Thr246positively correlates with phosphorylation of1028 volume 28 number 10 October OCTOBER 2010 nature biotechnology


news and views© 2010 Nature America, Inc. All rights reserved.AKT at Ser473 and with the sensitivity of cancercell lines to an allosteric AKT inhibitor.Until now, the technical challenges ofMS-based detection of phosphopeptide substrateshave limited our ability to detect Ser/Thr phosphorylation events in cancer. Beyondproviding new information about the signalingcircuitry downstream of Akt, MAPK, RSK andS6K, these pioneering studies 2,3 open the entirespace of Ser/Thr protein phosphorylation forfurther study. Nonetheless, it remains possiblethat some of the observed drug-inducedphosphorylation changes represent ‘off-target’effects. Additional, confirmatory experimentsinvolving genetic approaches and more specificcompounds will be needed before wecan revise our picture of the RTK/RAS/PI3K/mTOR network.The studies 2,3 are equally promising fromthe perspective of clinical drug development.First, they document the effects of compoundson a large number of phosphorylation events,which can be quantified in clinical tumor samplesusing various antibody-based proteomicassays. Such information can guide dosingdecisions with molecularly targeted therapiesduring early clinical drug evaluations andhelp to prevent drug development resourcesfrom being wasted on compounds that do notachieve sufficient target inhibition in tumortissue 11 . Second, especially if linked to thedetection of phosphotyrosine protein modificationsin the same sample 3 , compoundcentricphosphoproteomics may uncoverunexpected effects of the drug on upstream orparallel signaling networks that mediate drugresistance, identify mechanisms of ‘off-target’drug toxicity or suggest new opportunities forcombination therapies 12 .Thus far, kinase inhibitor therapy has beenmost successful for cancers with an activatingmutation that can be readily identifiedin routinely collected clinical samples usinggenomic assays. Examples include nonsmallcell lung cancers with mutations in theEGFR kinase domain or melanomas withBRAF mutations. It remains unclear whichmutations predict responsiveness to PI(3)K/mTOR pathway inhibitors. Moreover, signalingthrough this pathway can be deregulatedby many molecular alterations. This genomicaU0126RSKRTKiRASMEKERKRTK RTKRapamycin PI-103TORC1S6KPI(3)KTORC2AKTiAKTProtein substrates with Ser/Thr phosphorylation sites(GSK3, PRAS40, RPS6, many others)complexity represents a rate-limiting step inthe further development of PI(3)K pathwayinhibitors and has spurred interest in transcriptional13 or proteomic markers of aberrantpathway activation.It remains to be seen whether at least asubset of cancers display ‘pathway addiction’as opposed to ‘oncogene addition’ toparticular components within the PI(3)Kpathway. Perhaps the combination of a robustpathway-activation marker (e.g., phospho-PRAS40 Thr246 ) with focused mutationalanalysis (e.g., mutational profiling of PIK3CA)will offer a reasonable compromise for patientstratification into PI(3)K/AKT inhibitortrials, as suggested by Anderson et al. 3 . Clearly,much work remains to be done to realizethe clinical potential of genomics and proteomics.Nonetheless, these two studies 2,3represent outstanding examples of hypothesis-drivenbiomarker discovery, which,WortmanninPDK1PDK1ibVehicle12 C-Arg12 C-LysInhibitor13 C-Arg13 C-LysSILAC labeling of human cancer cell linesand short-term treatment with inhibitorsof RTKs, PI(3)K, AKT, MEK, mTORMix cell lysates 1:1Immunoaffinity purification with antibodiesspecific to AGC kinase and MAPKkinase phosphomotifsMass spectrometryFigure 1 Identification of (Ser)/(Thr) phosphorylation substrates in core cancer signaling pathways.(a) Probing the RAS, PI(3)K and mTOR signaling pathways with inhibitors. Members of theRAS-PI(3)K-mTOR signaling network 9 are shown in black and inhibitors used by Moritz et al. 2and Andersen et al. 3 are shown in red. ERK, extracellular signal-regulated kinase; GSK3, glycogensynthase kinase 3; MEK, mitogen-activated protein/extracellular signal-regulated kinase kinase; PDK1,3-phosphoinositide-dependent protein kinase 1; PI(3)K, phosphatidylinositol-3-OH-kinase; PRAS40,proline-rich AKT1 substrate 1; RPS6, ribosomal protein S6; RSK, ribosomal S6 kinase; RTK, receptortyrosine kinase; S6K, p70 ribosomal protein S6 kinase; TORC1/2, mammalian target of rapamyincomplex 1/2. (b) SILAC-based mass spectrometry 10 to quantify inhibitor-induced changes in Ser/Thrphosphorylation. Cancer cell lines are grown either in ‘light’ medium containing the normal forms of theamino acids lysine ( 12 C 6 -Lys) and arginine ( 12 C 6 -Arg) or in ‘heavy’ medium containing 13 C 6 -Lys and 13 C 6 -Arg. After short-term treatment with inhibitor, cells are lysed and lysates pooled before immunoaffinitypurification with antibodies specific to phosphomotifs of interest. Inhibitor-induced changes inphosphorylation patterns are quantified by comparing protein abundance using the light and heavypeaks in the mass spectra. AGC, cAMP (adenosine 3′,5′-monophosphate)-dependent, cGMP (guanosine3′,5′-monophosphate)-dependent, and protein kinase C; MAPK, mitogen-activated protein kinase.once validated in a broader genetic context,are likely to produce new pharmacologicalopportunities for disrupting cancer-associatedsignaling networks.COMPETING FINANCIAL INTERESTSThe authors declare no competing financial interests.1. Varmus, H. Science 312, 1162–1165 (2006).2. Moritz, A. et al. Sci. Signal. 3, ra64 (2010).3. Andersen, J.N. et al. Sci. Transl. Med. 2, 43ra55(2010).4. Courtney, K.D., Corcoran, R.B. & Engelman, J.A.J. Clin. Oncol. 28, 1075–1083 (2010).5. Sabatini, D.M. Nat. Rev. Cancer 6, 729–734(2006).6. Cloughesy, T.F. et al. PLoS Med. 5, e8 (2008).7. Carracedo, A. et al. J. Clin. Invest. 118, 3065–3074(2008).8. She, Q.B. et al. Cancer Cell 18, 39–51 (2010).9. Shaw, R.J. & Cantley, L.C. Nature 441, 424–430(2006).10. Mann, M. Nat. Rev. Mol. Cell Biol. 7, 952–958 (2006).11. Sawyers, C.L. Nature 452, 548–552 (2008).12. Wallin, J.J. et al. Sci. Transl. Med. 2, 48ra66 (2010).13. Saal, L.H. et al. Proc. Natl. Acad. Sci. USA 104,7564–7569 (2007).nature biotechnology volume 28 number 10 OCTOBER 2010 1029


esearch highlights© 2010 Nature America, Inc. All rights reserved.Chimeric mouse with a rat pancreasThe successful generation of humaninduced pluripotent stem cells (iPSCs) hasmade the production of patient-specificorgans for transplantation conceivable,although the immense technical difficultiesof creating such organs in vitro make itunlikely that this aim will be achievedin the near future. Kobayashi et al. nowsuggest a potential alternative route.Working with rats and mice, they show thatit is possible to gestate rat-mice chimerasto term, a feat achieved previously only once for animalsbelonging to different genera—the geep, a chimera betweengoats and sheep. The authors inject either rat or mouse iPSCsinto a blastocyst of the respective other species and transplantthe blastocysts into pseudopregnant mothers of the speciesof the donor blastocyst. The resulting animals show a strongcontribution by the chimeric cells to all organs tested. To test ifit is possible to derive an organ entirely from iPSCs, Kobayashiet al. use a mouse line with a deletion of the transcription factorPdx1, which is essential for the development of the pancreas.When rat iPSCs are injected into mouse Pdx1 –/– blastocysts, thechimeric animal grows a pancreas derived entirely from rat cellsthat can functionally substitute for the missing mouse organ.Although considerable conceptual, technical, ethical and legalchallenges remain in translating this approach to human organs,this paper may open up an new route to patient-specific organproduction. (Cell 142, 787–799, 2010)MEA methylome map of lineagecommitmentHow do epigenetic marks control the present and future identity of acell? Several studies have compared the DNA methylome of pluripotentand differentiated cells, but a new paper by Ji et al. is the first to mapthe methylome of a differentiation hierarchy that encompasses severalstages of progressive cell-fate restriction. The authors measure globalCpG methylation in mouse hematopoietic multipotent progenitor cellsand in their two progeny, common lymphoid progenitors and commonmyeloid progenitors. They also analyze cell populations further downthese pathways: thymocyte progenitors and granulocyte/macrophageprogenitors. The data reveal considerable epigenetic plasticity over thecourse of differentiation and show that lymphopoiesis involves far moreDNA methylation than myelopoiesis. The study also identifies manynew genes associated with the choice between lymphoid and myeloidfate. (Nature 467, 338–342, 2010)KAPlug-and-play sequence analysisGenome sequencers are being increasingly used in the research laboratoryand in the clinic, resulting in an acute need for robust softwarefor analyzing the data. McKenna et al. describe a software tool kitwith code for performing important low-level data-managementWritten by Kathy Aschheim, Laura DeFrancesco, Markus Elsner,Peter Hare & Craig Maktasks, which provide the foundation for higher-level analyses of personalgenomes, cancer genomes and exomes, for example. The software,called the Genome Analysis Toolkit, describes much neededconceptual patterns—‘abstractions’ in the parlance of computerscientists—for simplifying how programmers structure their code.These abstractions, and previous efforts such as SAMtools, Galaxyand the ShortRead package, are essential as they reduce softwareerrors, promote sharing of knowledge and should speed the developmentof analytical pipelines. Given that all of the potential consumersof high-throughput sequence data will not have the bioinformaticsresources to engineer customized pipelines from scratch, an opensource code base of low-level software libraries should be valuable,much as libraries for internet communication protocols aided computernetworking. (Genome Res. 20, 1297–1303, 2010) CMAntimalarial drug candidateEvidence of the emerging resistance to artemisinin derivatives hasincreased the urgency of identifying new classes of compounds to treatmalaria. Despite substantial advances in our understanding of the biologyof Plasmodium in recent years, the success rates of efforts to rationallydesign drugs that target molecular vulnerabilities in the malarialparasite have been disappointing. Rottmann et al. report that a moretraditional approach—using cellular proliferation assays to screen chemicallibraries—has identified spirotetrahydro-β-carbolines, or spiroindolones,as promising antimalarials. Cell-based assays and rodent datasuggest that their most promising compound, NITD609, has all of theattributes needed for an antimalarial treatment. These include efficacyat concentrations that are potentially compatible with a once-dailyoral dosing regimen. Unlike most commonly used antimalarial drugs,NITD609 rapidly stops protein synthesis, possibly by disrupting ionhomeostasis maintained by a Plasmodium falciparum P-type ATPase.This protein seems a likely site for the emergence of resistance shouldthe promising preclinical data for the drug translate to its broad clinicaluse. (Science 329, 1175–1180, 2010)PHObesity leaves its methylation marksAssociating genetic variants with complex phenotypes like obesity orheart disease has proven elusive, but Feinberg et al. have done just thatwith a set of epigenetic markers. Using their comprehensive, highthroughput,array-based, relative methylation technology, they analyzesamples from a group of Icelandic individuals who participated in a morethan decade-long study called Age, Gene/Environment Susceptibility.By analyzing 4.5 million CpG sites from 74 people whose lymphocytesprovided ample DNA on two occasions, the researchers identify 227regions that show interindividual variations or variably methylatedregions (VMRs), which encompass genes important in developmentand morphogenesis. The authors also assess differences within individuals,identifying two distinct classes of genes, 41 of which vary and 199of which remain stable. Using cross-sectional linear regression analysisfor each VMR in relation to body mass index (BMI), they go on todetermine the relationship of the VMRs to obesity. From this analysis,13 VMRs are found to co-vary with BMI, four of which did so stablyover the course of the 11 years of study. Although the sample size issmall, this study suggests the utility of VMR association in defining theepigenetic basis of a disease. In addition, it is the first comprehensivestudy demonstrating stable methylation marks that uniquely identifyindividuals. (Sci. Transl. Med. 2, 49ra67, 2010)LD1030 volume 28 number 10 OCTOBER 2010 nature biotechnology


EditorialMaking a markHigh-throughput technologies are enabling epigenetic modifications to be mapped on a genome-wide scale, but whethersuch knowledge can be rapidly translated into biomedical applications remains unclear.© 2010 Nature America, Inc. All rights reserved.Despite its inception over 60 years ago, epigenetics is very much in itsformative stages. Even the term ‘epigenetics’ means different thingsto different people. The best working definition for the field is that it is thestudy of traits heritable through meiosis or mitosis that are not dependenton the primary DNA sequence. Even so, British geneticist Adrian Bird hascommented, “Epigenetics is a useful word if you don’t know what’s goingon—if you do, you use something else.”In the past year, ‘what’s going on’ has become a good deal clearer. Thefirst DNA methylomes for different human cell types have now beenworked out; the long-sought mammalian DNA demethylase has beenidentified; heritable epigenetic marks have been demonstrated to not onlydepend on genetic variation, but also vary in ways associated with diseasepredisposition; an expanding group of noncoding RNAs has been shownto interact with the epigenetic machinery; the role of methylation in regulatingalternative splicing has been established; and additional evidencehas accrued that chromatin modifications are important for neuronalplasticity and protracted changes in brain function. At the same time,efforts to create genome-wide catalogs of covalent modifications of DNAand histones have been spurred by next-generation sequencing and arraytechnologies that offer greater throughput and sensitivity.The molecular actors participating in what’s going on have also becomeclearer: covalent modifications both to DNA (e.g., methylcytosine andhydroxymethylcytosine nucleotides) and to histones/histone variants(acetylation, methylation, phosphorylation and so on) as well as noncodingRNA molecules (microRNAs, small nucleolar RNAs and large intergenicnoncoding RNAs (lincRNAs)), transcription factors, DNA-bindingproteins and even cytoplasmic signaling factors.It is now evident that, unlike changes to DNA sequence, most chromatinstates are remarkably reversible and transient. Even DNA methylation—long considered a permanent, gene silencing, epigenetic mark—can beremoved in certain instances. These chromatin signatures change duringaging and are influenced by environmental factors, such as maternal behavior,physical exercise and diet. And dysregulation of epigenetic silencing isassociated with several diseases, including imprinting disorders, Rett syndrome,facioscapulohumeral muscular dystrophy and even autism. But it isin the realm of cancer, particularly leukemias, where epigenetic research hasyielded insights into abnormalities in histone marks on promoters, aberrantDNA methylation at CpG islands and microRNAs. For solid tumors, malignancieshave been associated with spontaneous defects in tumor suppressorgene silencing and breast cancer invasiveness/metastasis recently has beenlinked to lincRNA-mediated retargeting of a histone methylase.Aberrant chromatin remodeling has also been implicated in the processof somatic cell nuclear transfer (SCNT) used to clone animals. Fewcloned embryos survive to term and many of the offspring die postnatallyor are abnormal. That inappropriate epigenetic signatures areresponsible for these defects is evident from the fact that the offspringof these cloned animals—the second generation—are phenotypicallynormal. In this context, it is sobering that work published in Nature(467, 280–281, 2010) last month suggests that induced pluripotent stem(iPS) cells show less complete epigenetic reprogramming than embryonicstem cells produced via SCNT.Of course, if high-throughput technologies can help determine theappropriate signature of epigenetic marks, it may be possible to screenfor more fully reprogrammed iPS cells. But given the plasticity of manyhistone modifications, it remains uncertain whether epigenetic signaturesalone will have sufficient predictive or diagnostic value. In most cases, wehave no way to tell whether a particular epigenetic signature is a cause ofdisease or merely a consequence of the pathological state.From a therapeutic standpoint, there is particular reason for optimism,as four drugs acting on DNA methyltransferase and histone deacetylaseenzymes have already been approved. At least in blood cancers, this providesvalidation that pharmacological alteration of chromatin modificationshas tangible clinical benefit, and these successes are spurring industryinterest in the development of inhibitors of other epigenetic targets, suchas histone methyltransferases.Currently, however, all epigenetic drugs act in a nonspecific, pangenomicmanner and, consequently, are associated with significant doselimitingtoxicities. This is perhaps unsurprising as chromatin-modifyingenzymes have no inherent specificity for a particular nucleosome (or itsassociated gene). Rather, they are recruited by DNA binding proteinsor co-factors or RNAs that localize the complex to a specific stretch ofsequence. This issue goes beyond simple, drug-related, off-target effects:agents that modify the chromatin state across the genome may also awakenundesirable elements, such as endogenous retroviruses. Thus, if epigenetictherapies are to succeed outside cancer—in neurological indications, forexample—their activity needs to be more directed.We are currently witnessing a renaissance in epigenetics research.Much of the recent growth in the field can be attributed to the technology-enabledability to survey epigenetic modifications on a genome-widescale. The success of epigenetic therapy in hematological malignancieshas also engendered confidence in the translational potential of the field.But greater emphasis now needs to be placed on elucidating not only themolecular mechanisms by which an expressed or silent state is transmittedthrough cell division but also the interplay between DNA and/or chromatinmodifications and RNAs, transcription factors, nuclear organizingfactors and signal transduction pathways in different cells types, at differentages and under different developmental and disease states. With thisknowledge in hand, epigenetics has the potential to make an even greatermark on the practice of medicine.Nature Biotechnology is grateful to sponsors GlaxoSmithKline’s EpiNovaDPU, Cellzome and Active Motif, whose support enables this focus to befreely available to readers online.nature biotechnology volume 28 number 10 OCTOBER 2010 1031


commentaryLinking cell signaling and the epigeneticmachineryHelai P Mohammad & Stephen B BaylinOne of the biggest gaps in our knowledge about epigenomes is how their interplay with cellular signaling influencesdevelopment, adult cellular differentiation and disease.© 2010 Nature America, Inc. All rights reserved.An array of high-throughput technologies isproviding us with ever more detailed mapsof the positions of epigenetic marks in thegenomes of various cell types under assortedconditions. The striking differences observedin these experiments have increased our appreciationof the importance and functional consequencesof chromatin remodeling. However,how the machinery that governs the variousepigenetic processes ties into the larger cellularcontext remains largely unknown. Specifically,we need to understand what signals a cell mustreceive and send to appropriately orchestratethe epigenome for its role in developmentalbiology, cell differentiation and renewal ofadult stem cells, and in how aberrant signalingto the epigenetic machinery contributes todisease development.The basic premise of this commentary is thatwe must add three-dimensional information tolinear epigenome mapping by elucidating theenvironmental cues and signal transductioncascades that result in alterations in chromatinstructure and ultimately gene expression.The epigenome and the cellularenvironmentThe genomic distributions of the three mainmodulators of the epigenome (reviewed in thisissue 1 )—DNA methylation, histone modificationsand nucleosome positioning—are rapidlybeing elucidated across the genomes ofmultiple cell types using a growing series ofsequencing- and microarray-based technologies2,3 . We will undoubtedly establish over theHelai P. Mohammad and Stephen B. Baylinare at the Sidney Kimmel ComprehensiveCancer Center, The Johns Hopkins MedicalInstitutions, Baltimore, Maryland, USA.e-mail: sbaylin@jhmi.edunext several years the patterns of all epigeneticmarks that accompany heritable transcriptionalstates in all cell types.What we currently know much less about,and must come to understand, are the cues inboth normal and abnormal cellular environmentsthat signal cells to alter their epigenomes.The complex interaction between these cuesand changes in the epigenome must be theresult of a highly orchestrated set of eventsthat involves communication of each cell withits environment. The initiation of proper celldifferentiation and tissue organization mustinvolve signal transduction cascades that createepigenome alterations only when new setsof gene expression patterns become heritable.Epigenetic processes will thereby regulate thebalance between stem, progenitor and maturecells in adult and developing tissues as well asfostering abnormal cell population states, suchas in cancer.It is increasingly apparent that genes that areimportant in development overlap considerablywith those that govern adult renewal systemsand whose expression patterns are altered indiseases, such as cancer. Most of these genesare subject to a stringent epigenetic control oftheir transcription. Our goal for this article is toconsider, using selected examples, what we mayneed to explore to understand the bi-directionalrelationship between the epigenome and thecellular environment.Highly regulated epigenome remodelingin the embryoA premier setting for investigating the signalsthat control switching of cell states by changingepigenetic states is, of course, embryonicdevelopment. Here, the relatively open cellularchromatin status of the zygote and of embryonicstem cells (ESCs), which is characterizedby comparatively low levels of DNA methylationand high levels of histone acetylation, mustbe progressively converted to ever-changingvariations of more-closed, deactylated andmethylated chromatin states that facilitate celllineage commitment and the formation of specializedtissues 4 . Genome-wide studies of thelocalization of histone modifications, histonemodifying enzymes and DNA methylation areproviding us with valuable insights into the linearpositioning of the epigenetic determinantsof ESCs and clues to changes occurring withlineage commitment (reviewed in this issue 5 ).The ability to reprogram mature cells to anembryonic-like state by nuclear transfer or byinducing the expression of key transcriptionfactors has provided us with critical opportunitiesto linearly map the epigenetic parametersthat are essential for attaining pluripotency.As we obtain knowledge of the abovegenomic patterns, we are challenged to understandwhat cellular signaling processes dictatetheir development and how they contribute tocellular differentiation. One way to envisiondissecting this is to consider the Waddingtonlandscape model, as recently interpreted byseveral authors 6–10 . Waddington envisioneddevelopment starting with a marble at the topof a hill and initially having the totipotent stateof a zygote cell (Fig. 1). As the totipotent zygoticcell rolls down the hill, it—with some degreeof stochasticity—enters a series of furrows thatinduce increasingly restrictive, more committedcell fates as the totipotent cell changes to amultipotent adult stem cell and then to differentiatedcells of adult tissues. This trip throughthe valleys is further associated with the evolvingpatterns of epigenetic states that maintainthe cell fate changes key to each developmentaland differentiation stage 7 . During cellularreprogramming to an induced pluripotent statenature biotechnology volume 28 number 10 october 2010 1033


COMMENTARYDevelopmentalpotentialEnvironmental stimulusSignal transductionEpigeneticstatus© 2010 Nature America, Inc. All rights reserved.TotipotentZygotePluripotentICM/ES cells, EG cells,EC cells, mGS cells,IPS cellsMultipotentAdult stem cells(partially reprogrammedcells?)UnipotentDifferentiatedcell types–Pax5B cell(induced pluripotent stem cells (iPSCs)), themarble is pushed progressively back ‘uphill’ ina process that must reverse many of the matureepigenomic modifications. Dissecting this processthrough epigenomic analyses is providingan unparalleled opportunity to understandthe linear patterns of epigenomic features thatcharacterize different stages in development.We now suggest that understanding the integrationbetween the environment of Waddington’sfurrows with the epigenome alterations experiencedby Waddington’s marble provides amodel for understanding what orchestratesepigenomic changes (Fig. 1).Pluripotency and intra-nuclear regulation.Studies of the mechanisms of iPS cell generationhave further clarified the need for orchestrationof the epigenetic landscape in early development.The critical step during dedifferentiationto iPSCs is the activation of transcriptionfactors that maintain the embryonic state andthe downregulation of factors promoting celldifferentiation, whereas epigenetic silencingof pluripotency genes, such as OCT4 andNANOG, is associated with the onset of lineagecommitment 9 . The silencing of these genesappears to be a molecular progression. Thepresence of repressive histone modificationsemerges first to dictate transcriptional silenc-Oct4Sox2NanogPcGmiRNAlincRNA+Pdx1+Ngn3+MafaMacrophageWNTShhNotchRAPcGmiRNAlincRNAWNTTGFs/growth factorsNotchShhLIF/Cytokines+MyoDing. This is then locked in by the impositionof DNA methylation 10–13 which further preventsreprogramming to the undifferentiatedstate 10–12 . (Fig. 2). Studies of iPSC generationhave identified potential roles for the enzymecytidine deaminase (AID) and the Tet familyof proteins as DNA demethylases for the pluripotencygenes in the final steps of convertingcommitted cells back to ESC-like cells 14,15 . Thespecific signals that regulate the levels and targetingof the machinery that demethylates andmethylates DNA at the pluripotency genes arelargely unknown but important possibilities arediscussed later below.As genes in ESCs undergo changes in theirepigenetic status, transcription factors regulatedownstream target genes that are critical to theepigenetic landscapes that evolve during development.OCT4, SOX2 and NANOG are part of aregulatory network that includes their own promotersand promoters of genes that are targetsof the polycomb-group (PcG) protein complexesthat mediate long-term gene silencing 16,17 .A hallmark of ESCs is that some gene promotersare simultaneously occupied by thepolycomb-associated histone repression mark,H3K27me3, as well as the activation marksH3K4me2 and three methyl (me3) groupsplaced by the trithorax (Trx) complex—astate termed bivalent chromatin 18,19 . During?+Oct4+Sox2+Klf4+c-MycBMPWnt3aFibroblast muscle??Global DNA demethylationOnly active Z chromosomesGlobal repression of differentiationgenes by polycomb proteinsPromoter hypomethylationX InactivationRepression of lineage-specificgenes by polycomb proteinsPromoter hypermethylationX InactivationDerepression of polycombsilencedlineage genesPromoter hypermethylationFigure 1 Depiction of potential cell signaling in Waddington’s model of epigenetic determination of development, as interpreted by Hochedlinger andPlath 7 . Colored marbles correspond to differentiation states. Arrows represent the directionality of factor influence for development with ‘+’ indicatingaddition and ‘–’ indicating removal of a given factor or signal. The downward blue arrow at the top left of the ‘hill’ reflects direction of normal development,whereas the upward blue arrow at the bottom right of the hill depicts the direction of cellular reprogramming during generation of iPSCs. Coloring of text fornames of factors and signaling pathways correspond to their function within the given developmental stage.conversion from the totipotent state of ESCs tothe multipotent state of more adult tissue stem/progenitor cells, this bivalent state is remodeled,and either the active or the repressive mark isenhanced, depending on the transcription staterequired for lineage commitment 18 . Althoughthese chromatin states are being defined in mappingexperiments, the questions, from a signalingstandpoint, are what triggers this shift inbalance to the Trx- and PcG-mediated histonemodifications and what is the mechanism bywhich such a shift may occur (Fig. 2)?Extrinsic signaling and cellular niche/environment.From studies of human and mouseESCs and the reprogramming of cells to iPSCs,several pathways have emerged as major candidateregulators of epigenetic remodeling(Fig. 1). In essence, these are the signal transductionsystems that induce and maintain thestemness of ESCs as well as those that convertESCs to and maintain them as more committedprogenitors.The first example is the Wnt pathway that isinvolved at all of these stages. In terms of supportingpluripotency, the ligand Wnt3a canreplace overexpression of the nuclear oncogeneproduct, c-Myc, a key downstream targetof Wnt pathway signaling, in potentiating thegeneration of iPSC (Fig. 1) 20 .1034 volume 28 number 10 october 2010 nature biotechnology


COMMENTARY© 2010 Nature America, Inc. All rights reserved.Growth factors are a second major group ofsignaling molecules to be considered. This classof morphogens, critical for appropriate development,includes bone morphogenetic proteins,transforming growth factors (TGFs) and fibroblastgrowth factors (FGFs). Specifically, FGFsand TGFs have been shown to sustain expression,by means of downstream signaling toSmad proteins, of the pluripotency factors Oct4,Sox2 and Nanog that promote the undifferentiatedpotential of human ESCs 21 . Growth factorwithdrawal has been exploited to push ESCstoward differentiation, thus allowing inferenceswith respect to the influence of growthfactor signaling on epigenetic alterations duringdifferentiation 22,23 . Genes marked by bivalentchromatin patterns undergo remodeling upongrowth factor withdrawal such that either moreactive or repressive chromatin states emerge,depending upon of the specific lineage commitmentinduced 20,24 .As a third example, the cytokine leukemiainhibitory factor (LIF) has been associated withsupporting the undifferentiated state of mouseESCs. LIF confers its signal to a cascade thatresults in activation of STAT3, expression ofwhich has been shown to be important in retentionof the pluripotent state of ESCs 25,26 .Whereas the above examples are pathwaysthat maintain stemness of ESCs, signaling cascadesare also critical to the commitment ofthese cells during development and maintenanceof the progenitors for various cell lineagesthat emerge in development. In some instances,the same pathway can function both in ESCsmaintenance and differentiation.For example, activation of the Wnt pathwayis important for the differentiation of neuronalprecursors and the specification of theforebrain in vivo 27,28 . Disruption in the downstreammediator of Wnt, β-catenin, results indecreased proliferation of neural progenitorsand defects in neuronal migration 29,30 .Retinoic acid signaling is also essential forneural development and involved in specification,differentiation and outgrowth of axons(reviewed in ref. 31). Additionally, retinoicacid can induce differentiation of human andmouse ESCs, and embryonic carcinoma (EC)cells 32 . Similarly, Notch signaling appearscritical for key stages of development. It isimportant for cell-type specification and differentiationbut is also involved in the proliferationand self-renewal of stem cells (reviewedin refs. 33,34). Finally, Sonic Hedghog (Shh)signaling is critical for the development ofmany organs including the brain, the lung andendocrine system. It promotes the appropriatedifferentiation of ESCs (reviewed in ref. 35),but has also been implicated in self-renewalof normal stem cells.WntStemnessAID? TET?Retention of unmethylated stateOct4Sox2NanogTGFβThe key question is how we tie the signalingfrom the above pathways to specific interactionsthat control the epigenetic landscapes duringcritical stages of development. Although ourunderstanding in this arena is in the early stages,we can consider the critical steps that must besubject to control by the signaling pathway anddiscuss the possible mechanisms that are beginningto emerge. In doing so, it is first imperativeto separate classic signaling transduction fromtrue epigenetic regulation. The former involvesevents that directly modulate gene expressionby altering the levels, post-translational modificationor positioning of transcription factorsand their co-factors in response to the extracellularenvironment or state of the cell 36,37 . Theseevents may then trigger changes in the transcriptionof a given gene or a program of responsivegenes. Only then might a new epigenetic statethat stabilizes these signal transduction inducedchanges come into play which renders the newexpression states heritable, even in the absenceof the initial signals 37 .The imposition of epigenetic states establishesheritable activation and repression ofgene transcription through a host of activat-NotchTrxBivalentand/orPcG genesPcGCommitment? ? Kinase activity?Altered expression of machinery?miRNAs?lincRNAs?Altered positioning of machinery?? ?ShhHistone demethylasesPcGFigure 2 Potential mechanisms by which the chromatin of key developmental genes may beregulated by cellular signaling. The left panel represents the ESC state whereby extrinsic signalingmay impinge upon regulation of the DNA methylation status of pluripotency genes. The transcriptionstart sites (arrows) of the three genes are depicted at the bottom left with circles representingCpG sites as DNA unmethylated (white) or methylated (black). The DNA methylated gene istranscriptionally repressed (red circle over transcription start) and nucleosomes (blue) are in a morecompact structure in contrast to the more open structure of the expressed genes on the bottom left.Right panel represents the committed progenitor state that ensues when the pluripotency factors aresilenced in ESCs. Subsequent resolution of bivalency to active or inactive target gene transcriptionalstates is depicted as discussed in the text.ing or repressing histone modifications, insome cases coupled to DNA methylationand changes in nucleosome positioning 38 . Allof these steps involve a battery of enzymesand protein complexes, including histonemethyltransferases (HMTs), histone acetyltransferases(HATs), histone deactylases(HDACs), histone demethylases (KDMs),histone deacetylases, DNA methyltransferases(DNMTs), DNA demethylases andnucleosome remodeling complexes (reviewedin this issue 1 and in ref. 39). The activity ofall of these proteins could be regulated by signalingmolecules. Once a gene transcriptionstate is altered by the enzymatic chromatinremodelers, it becomes locked in a heritablepattern and the maintenance of the state maynot require continued activity of the factorsthat originally created it.Obviously, the interplay between classic signaltransduction and epigenetic control is likelyto be complex and may often be intertwined.Thus, regulation of DNA-binding transcriptionalregulatory machinery may interact withchromatin modifying enzymes to lock in epigeneticstates.nature biotechnology volume 28 number 10 october 2010 1035


COMMENTARY© 2010 Nature America, Inc. All rights reserved.HMTKDMHDACRasRBP-JHormone receptors, such as those that bindretinoic acid, are one example of DNA-bindingtranscription factors that also interacts withchromatin-modifying enzymes (reviewed inref. 40). These receptors are typically localizedto the cytoplasm and translocate to the nucleusupon steroid hormone binding, where they actas DNA-binding factors mediating gene expressionchanges. At least in the case of the mousemammary tumor virus (MMTV) promoter(reviewed in refs. 40,41), hormone activationcan occur through activation of two such receptors,glucorticoid and progesterone as well asthrough progestin activated ERK signaling 41 .Activation of MMTV and other progesteronereceptor target genes is further coupled toactivity of HAT’s, ultimately resulting in repositioningof histones providing an accessiblelandscape for receptor binding at hormoneresponse elements 41 .Another example of transcription factor–mediated alterations of the epigenome involvesSTATs, the downstream effectors of cytokinesignaling, important for maintenance of thepluripotent state, as described above. STAT4PcGShhDNMTc-MycNotchGrowthfactorsFigure 3 Modeling signaling that may promote cancer-specific DNA hypermethylation imposed on anormally non-DNA methylated, PcG-marked gene. Black solid arrows, direct regulation; black dashedline arrows, potential regulation and intersection of signal transduction with chromatin regulatingmachinery; red arrows, feedback in that the signaling pathway itself becomes activated in associationwith genes that are abnormally DNA methylated and silenced; yellow star, active 2/3meH3K4; redstar, the PcG-associated 3meH3K27; green polygon, AcH3K9; black circles, methylated CpG sites.HMT, histone methyltransferase; KDM, lysine demethylase; HDAC, histone deacetylase; DNMT, DNAmethyltransferase.Wntcan promote an active chromatin environment,whereas STAT6 is associated with a transcriptionallyinactive epigenetic state 42 . Earlier studiesindicated that transcription factors, such asSTATs, do have differential requirements forco-activators likely helping to confer additionalspecificity to their ability to modulate the localchromatin 43 . Specifically, for maintenance ofESC stemness, LIF activates STAT3, which, inturn, activates Klf4 that activates the pluripotencyfactor, Sox2 26 .A third, and recent example, involves theinteraction of the Notch effector RBP-J withthe lysine demethylase, KDM5a. Methylationof histone H3 Lys 4 is dynamically altered atRBP-J sites upon inhibition or reactivation ofNotch signaling 44 .Finally, a fourth example is the transcriptionalmediators of TGF-beta signaling, theSMAD proteins. SMADs have been associatedwith transcriptional activation or repressionin a variety of contexts 45 . SMADs recruiteither co-repressors or co-activators, generallyassociated with transcriptional machinerysuch as HDAC or p300, that ultimately alteracetylation states of histones within TGF-betaresponsive gene promoters 45 .The aforementioned scenarios providejust a few examples by which signal transductioncascade effectors can modulatechromatin through interactions with the histonemodifyingmachinery. There are many additionalexamples and probably numerous othersthat have yet to be found. The theme, however,is clear in that extrinsic signaling may lead toa cytoplasmic cascade of events that are ultimatelytranslated to the nucleus by transcriptionfactors. The transcription factors mayrecruit complexes that contain histone modifiersto specific gene promoters thereby modulatingthe local chromatin to promote or inhibittranscription. Ultimately, as in the previouslydescribed example of the pluripotency factors,these transcriptional states may be locked in byDNA methylation (refs. 10–13 as above).From the standpoint of Waddington’s modelin Figure 1, we might surmise that any environmentalsignaling for control of ESCs mightinfluence the balance between stemness retentionand commitment. Central to this would bethe regulation of embryonic transcription factors,such as OCT4, SOX2 and NANOG (Fig. 2).For maintenance of stemness, one might imaginesignaling pathway convergence upon stepsthat initially allow transcription of these geneswith factors creating a favorable open chromatinstructure, which includes a promoter free ofDNA methylation and histones with activatingpost-translational modifications, such as lysineacetylation. For the onset of loss of pluripotencyand for subsequent lineage commitment, itappears that signal transduction must first initiatesilencing of the pluripotency genes throughthe appearance of repressive histone modifications.Subsequently, promoter DNA methylationensures that lineage commitment is maintainedthrough the many rounds of mitosis that followin the lifetime of the organism 10 .Might then the epigenetic control of embryonictranscription factors be influenced bypathways such as Wnt, bone morphogeneticproteins, TGFs, FGFs or cytokines, such as LIF?Does it involve regulation of the DNA demethylasesteps recently proposed to be essential forthe induction or retention of embryonic factorexpression 14,15 . Downstream from these events,how does the cell signal transduction machinerycontrol the balance between Trx and PcGproteins in establishing the bivalent chromatinbalance discussed earlier?One pathway linked to such control is Shh,which increases progenitor cell number in amanner dependent upon the PcG factor, Bmi1.Bmi1 is a component of the PcG complexthat recognizes the repressive histone markH3K27me3. In cell culture models, addition1036 volume 28 number 10 october 2010 nature biotechnology


COMMENTARY© 2010 Nature America, Inc. All rights reserved.of Shh ligands or ovexpression of Gli, a downstreameffector of Shh signaling, increases Bmi1expression 46 . Mice engineered to be null forBmi1 expression in cerebellar granule precursorcells are defective in cerebellar development aswell as proliferation of neural progenitor cells inresponse to Shh 47,48 . The convergence betweenBmi1, a PcG constituent, and Shh signalinghighlights one mechanism by which signaltransduction may alter the balance between Trxand PcG through modulation of a key componentof the PcG machinery. Additional study isrequired to elucidate how this may specificallyalter the bivalent chromatin balance for Bmi1target genes.Notch is another pathway that is increasinglyimplicated in the control of the epigeneticlandscape. As mentioned earlier, a recent studydescribes a new role for the histone demethylase,KDM5A, as playing an integral functionin a Notch repressor complex. This PcG interactinghistone demethylase 49,50 , which woulddecrease the key transcriptional activatingmark, H3K4me3, associates with the Notchnuclear effector protein, RBP-J, in a manneressential for Notch/RBP-J target gene silencing51 . Furthermore, the KDM5A and RBP-Jinteraction is critical in Notch-mediated patterning,growth and tumorigenesis in vivo, asshown in Drosophila 44 .The control of the balance between ESCstemness and commitment also appears toinvolve microRNAs (miRNAs) and other noncodingRNAs. Specifically, a family of c-MycregulatedmiRNAs regulates the expression ofpluripotency and differentiation factors 52,53 .Within this group, a family of miRNAs, mir-200,has recently been shown to regulate a key PcGcomponent, SUZ12 (ref. 54). Long-intergenicRNAs (lincRNAs) have also been shown tohave a role in pluripotency and are themselvestranscriptionally regulated by key transcriptionfactors such as Oct4 and Nanog 55 . Such RNAsmay also be critical for PcG occupancy and thecorrect targeting of histone modifiers such asLSD1 (refs. 56,57). Signaling may thus involveregulation of the expression status of noncanonical,noncoding RNAs, adding an additionallayer of complexity by which signaling can altercell states to ultimately promote heritable alterationsto the epigenome (Figs. 1 and 2).We can consider maintenance of ESCstemness and subsequent differentiation duringdevelopment as a model for signalingmediatedalterations in the epigenome. To thisend, we have tried to emphasize the need tostep up research efforts to elucidate the interactionsbetween cell signaling pathways andthe molecular machinery regulating switches inthe epigenetic landscape that drive key stages inembryogenesis. We now have the opportunityto map such interactions across the genome,as analytic tools and the understanding of themolecular epigenetic machinery are availableand/or rapidly developing.Adult cell renewal and epigenomesUnderstanding the mechanisms of adult cellrenewal will also require the exploration of theimpact of cellular signaling on the epigeneticlandscapes of these cells (Fig. 1). Many of thesame concepts that govern embryonic development,such as regulation by bivalent chromatinmarks, can also be found in adult stem cells. Inaddition, the same nuclear transcription factorsand signaling pathways discussed previously forembryonic development are major players inadult tissue homeostasis.The environment for this cell renewal—the‘stem cell niche’—might be imagined as the furrows(those at the very bottom of the model) forinteraction with stem and progenitor cells in theWaddington model 58 . Stem cell niches providespecial microenvironments for adult stem cells.The signaling factors in the niche are thoughtto regulate the epigenetic mechanisms responsiblefor maintaining the balance between adultstem cell self-renewal and lineage commitment.Indeed, genome-wide chromatin tiling studieshave defined the epigenetic changes thataccompany different stages of differentiationin adult tissues 59,60 .One great challenge in studying these processesis obtaining high-quality samples fromkey isolated cell types in sufficient purity andquantity for today’s genome-wide technologiesfor mapping epigenetic modifications. Clearly,improved methods for the isolation of suchpopulations and adapting assay platforms tovery small cell numbers must be developed.Use of mouse and other model organisms maybe instrumental for many such studies.Disease as a scenario for understandingepigenome regulationIt is increasingly being recognized that epigeneticabnormalities are critical to diseasepathogenesis (see ref. 1, this issue). Studies ofepigenetic changes associated with differentconditions can not only improve our understandingof the biology of the diseases and holdgreat promise for improving their management(for a review, see ref. 61, this issue), but also beinvaluable for providing insights into basicaspects of epigenetic regulation.At present, cancer is by far the most studieddisorder with respect to epigenetic abnormalities.Silencing of tumor suppressor genes byaberrant DNA hypermethylation of normallyunmethylated promoter region CpG islandsis the best understood of these molecularchanges. Dissecting the molecular origins ofthese silencing events, which might involvehundreds of genes depending on the tumor ofthe specific patient (reviewed in refs. 62,63),can teach us much about the cell signalingevents that govern epigenetics (Fig. 3). Perhapsthe most instructive aspects link epigeneticabnormalities in cancer to the developmentalevents in cell signaling that have been a majorfocus of this article so far.Many studies have stressed that cancer is adisease driven by cells either arrested in embryonic-likestates and/or reprogrammed to adoptsuch states (reviewed in refs. 64,65). Recent geneexpression array results have found an ‘embryonicstem cell–like’ signature in many cancers,particularly in the most aggressive tumors 66 .How can we tie the similarities betweentumor cells and ESCs to the signaling eventsthat may guide epigenomes in both? Someclues have emerged in recent years. First, mostof the factors generally used for the generationof iPSCs have defined roles as oncogenes 67 .Second, deletion of tumor suppressor genes,including perhaps the most common abnormallyepigenetically silenced gene in cancer,Ink4a, can facilitate iPSC formation 68 , and canreprogram older cells to behave as youngerones. Inactivation of another epigeneticallysilenced gene in cancer, Arf, an alternativereading frame product of the Cdkn2a locus,along with inactivation of the tumor suppressorRb, can convert post-mitotic myocytes tomyoblast colonies that retain the ability to differentiate69 . Third, exciting recent work haslinked maintenance of cancer stem-like cells,including their role in therapy resistance, tohigh expression of Jarid 1a and 1b, two proteinsthat erase the transcriptional activating mark,H3K4me3 (refs. 70,71). Fourth, as human cellsystems age, their stem cell numbers actuallyincrease 72,73 and DNA hypermethylation ofmany of the same genes modified in cancerincreases with age 74 . Finally, a sizeable fractionof the overexpressed genes in the ESC-likesignature for tumors 61 encode members of thePcG complexes that are vital for establishingdevelopmental epigenome patterns 17,75–77 .Also, the overexpression of these PcG geneshas been experimentally linked to cellulartransformation 78–81 and to induction of cancer-relatedabnormal gene silencing and progressionof abnormal DNA methylation 82,83 .Another scenario illustrates how the linksbetween embryogenesis and cancer epigenomesprovide clues on how cell signaling is involved(Fig. 3). Multiple laboratories have shown thatvirtually half the genes with aberrant promoterhypermethylation in cancer carry PcG and/or bivalent chromatin marks in ESCs and/orembryonic progenitor cells 82,84,85 . These genesdo not have DNA methylation in their promoternature biotechnology volume 28 number 10 october 2010 1037


COMMENTARY© 2010 Nature America, Inc. All rights reserved.CpG islands in embryonic cells (Fig. 2) 82,84,85 . Itis hypothesized that DNA methylation arises toreplace or augment PcG occupancy for a morestable silencing of the involved genes (reviewedin refs. 65,86,87).Although somewhat preliminary, multiplestudies have shown how signaling events areinvolved in abnormal DNA methylation in cancer(Fig. 3). Many signal transduction pathwaysthat drive cell transformation and tumor progressionlead to the upregulation of PcG and/orcomponents of the DNA methylation machinery.Experimental overactivation of the Raspathway demonstrates a requirement for suchepigenetic machinery proteins and can inducethe abnormal gene promoter DNA methylation88 . Additionally, through largely unknownmechanisms, overexpression of c-myc can giverise to a specific signature of CpG island hypermethylationin culture and in vivo models ofT-cell lymphoma (ref. 89). Overactivation ofthe Shh pathway with possible downstreaminvolvement of the PcG protein Bmi1 is linkedto maintenance of cancer stem-like cells 46 .Similar functions have been reported for Wntand Notch pathways (discussed above) 90,91 .Furthermore, there is mounting evidence forinvolvement of epigenetic abnormalities infacilitating the tumorigenic activities of all ofthese pathways. For example, epigenetic genesilencing may be obligatory for Shh-drivenformation of cerebellar tumors 47 and the oncogenicactivity of Notch requires the interactionwith a H3K4me3 demethylase 44 . Multiple genesencoding for proteins that antagonize Wntpathway activity become DNA methylated andsilenced thus facilitating hyperactivity of thepathway in cancer (reviewed in ref. 92).Much remains to be understood about theprecise molecular events that seem to link cellsignaling and the cancer epigenetic abnormalitiesdiscussed above. Doing so, however, shouldprovide invaluable insights toward understandinghow epigenomic states can be regulated andthe knowledge will clearly be invaluable forimproving the treatment of cancer.ConclusionsWe have tried to illustrate, using selected examples,how vital it will be to understand whatmay now be one of the biggest gaps in ourknowledge about the nature of epigenomes—how their different states are orchestrated bycell signaling at key stages of development, inadult cellular differentiation and in importantdisease states. We have pointed out accruingexperimental evidence of the signaling pathwaysthat control the epigenetic machineryand to the molecular mechanisms involved.We have hypothesized how we may build onthese clues to study the events in infinitelymore detail. New technologies and our rapidlyaccelerating knowledge of the machinerythat maintains epigenetic states will facilitateour dissection of the regulatory mechanismsthat govern the epigenome and will help in thedevelopment of translational applications.COMPETING FINANCIAL INTERESTSThe authors declare no competing financial interests.1. Portelay, A. & Esteller, M. Nat. Biotechnol. 28, 1057–1068 (2010).2. Satterlee, J. Nat. Biotechnol. 28, 1039–1044(2010).3. Bernstein, B. et al. Nat. Biotechnol. 28, 1045–1048(2010).4. Surani, M.A., Hayashi, K. & Hajkova, P. Cell 128,747–762 (2007).5. Meissner, A. et al. Nat. Biotechnol. 28, 1079–1088(2010).6. Hemberger, M., Dean, W. & Reik, W. Nat. Rev. Mol. CellBiol. 10, 526–537 (2009).7. Hochedlinger, K. & Plath, K. Development 136, 509–523 (2009).8. Zhou, Q. & Melton, D.A. Cell Stem Cell 3, 382–388(2008).9. Yamanaka, S. & Blau, H.M. Nature 465, 704–712(2010).10. Athanasiadou, R. et al. PLoS ONE 5, e9937 (2010).11. Feldman, N. et al. Nat. Cell Biol. 8, 188–194 (2006).12. Epsztejn-Litman, S. et al. Nat. Struct. Mol. Biol. 15,1176–1183 (2008).13. Li, J.Y. et al. Mol. Cell. Biol. 27, 8748–8759 (2007).14. Bhutani, N. et al. Nature 463, 1042–1047 (2010).15. Ito, S. et al. Nature 466, 1129–1133 (2010).16. Sridharan, R. et al. Cell 136, 364–377 (2009).17. Lee, T.I. et al. Cell 125, 301–313 (2006).18. Bernstein, B.E. et al. Cell 125, 315–326 (2006).19. Guenther, M.G. & Young, R.A. Science 329, 150–151(2010).20. Marson, A. et al. Cell Stem Cell 3, 132–135 (2008).21. Xu, R.H. et al. Cell Stem Cell 3, 196–206 (2008).22. Conti, L. et al. PLoS Biol. 3, e283 (2005).23. Brustle, O. et al. Science 285, 754–756 (1999).24. Chi, A.S. & Bernstein, B.E. Science 323, 220–221(2009).25. Niwa, H., Burdon, T., Chambers, I. & Smith, A. GenesDev. 12, 2048–2060 (1998).26. Niwa, H., Ogawa, K., Shimosato, D. & Adachi, K.Nature 460, 118–122 (2009).27. Hirabayashi, Y. et al. Development 131, 2791–2801(2004).28. Gunhaga, L. et al. Nat. Neurosci. 6, 701–707(2003).29. Machon, O., van den Bout, C.J., Backman, M., Kemler,R. & Krauss, S. Neuroscience 122, 129–143 (2003).30. Backman, M. et al. Dev. Biol. 279, 155–168 (2005).31. Maden, M. Nat. Rev. Neurosci. 8, 755–765 (2007).32. Andrews, P.W. Dev. Biol. 103, 285–293 (1984).33. Bray, S.J. Nat. Rev. Mol. Cell Biol. 7, 678–689(2006).34. Kopan, R. & Ilagan, M.X. Cell 137, 216–233 (2009).35. Bertrand, N. & Dahmane, N. Trends Cell Biol. 16,597–605 (2006).36. Bryant, G.O. et al. PLoS Biol. 6, 2928–2939 (2008).37. Ptashne, M. Curr. Biol. 19, R234–R241 (2009).38. Jenuwein, T. & Allis, C.D. Science 293, 1074–1080(2001).39. Richly, H., Lange, M., Simboeck, E. & Di Croce, L.Bioessays 32, 669–679 (2010).40. Biddie, S.C., John, S. & Hager, G.L. Trends Endocrinol.Metab. 21, 3–9 (2010).41. Vicent, G.P. et al. Mol. Endocrinol. 3, 1–2 (2010).42. Wei, L. et al. Immunity 32, 840–851 (2010).43. Korzus, E. et al. Science 279, 703–707 (1998).44. Liefke, R. et al. Genes Dev. 24, 590–601 (2010).45. Massague, J., Seoane, J. & Wotton, D. Genes Dev. 19,2783–2810 (2005).46. Liu, S. et al. Cancer Res. 66, 6063–6071 (2006).47. Leung, C. et al. Nature 428, 337–341 (2004).48. Zencak, D. et al. J. Neurosci. 25, 5774–5783(2005).49. Klose, R.J. et al. Cell 128, 889–900 (2007).50. Pasini, D. et al. Genes Dev. 22, 1345–1355 (2008).51. Borggrefe, T. & Oswald, F. Cell. Mol. Life Sci. 66, 1631–1646 (2009).52. Lin, C.H., Jackson, A.L., Guo, J., Linsley, P.S. &Eisenman, R.N. EMBO J. 28, 3157–3170 (2009).53. Dang, C.V. EMBO J. 28, 3065–3066 (2009).54. Iliopoulos, D. et al. Mol. Cell 39, 761–772 (2010).55. Guttman, M. et al. Nature 458, 223–227 (2009).56. Rinn, J.L. et al. Cell 129, 1311–1323 (2007).57. Tsai, M.C. et al. Science 329, 689–693 (2010).58. Voog, J. & Jones, D.L. Cell Stem Cell 6, 103–115(2010).59. Barski, A. et al. Cell 129, 823–837 (2007).60. Barski, A. & Zhao, K. J. Cell. Biochem. 107, 11–18(2009).61. Kelly, T.K., De Carvalho, D.D. & Jones, P.A. Nat.Biotechnol. 28, 1069–1078 (2010).62. Herman, J.G. & Baylin, S.B. N. Engl. J. Med. 349,2042–2054 (2003).63. Jones, P.A. & Baylin, S.B. Cell 128, 683–692(2007).64. Feinberg, A.P., Ohlsson, R. & Henikoff, S. Nat. Rev.Genet. 7, 21–33 (2006).65. Ohm, J.E. & Baylin, S.B. Cell Cycle 6, 1040–1043(2007).66. Ben-Porath, I. et al. Nat. Genet. 40, 499–507(2008).67. Takahashi, K. & Yamanaka, S. Cell 126, 663–676(2006).68. Li, H. et al. Nature 460, 1136–1139 (2009).69. Pajcini, K.V., Corbel, S.Y., Sage, J., Pomerantz, J.H. &Blau, H.M. Cell Stem Cell 7, 198–213 (2010).70. Sharma, S.V. et al. Cell 141, 69–80 (2010).71. Roesch, A. et al. Cell 141, 583–594 (2010).72. Rossi, D.J., Jamieson, C.H. & Weissman, I.L. Cell 132,681–696 (2008).73. Chambers, S.M. et al. PLoS Biol. 5, e201 (2007).74. Toyota, M. & Issa, J.P. Semin. Oncol. 32, 521–530(2005).75. Valk-Lingbeek, M.E., Bruggeman, S.W. & van Lohuizen, M.Cell 118, 409–418 (2004).76. Gil, J., Bernard, D. & Peters, G. DNA Cell Biol. 24,117–125 (2005).77. Mikkelsen, T.S. et al. Nature 448, 553–560 (2007).78. Varambally, S. et al. Nature 419, 624–629 (2002).79. Bracken, A.P. et al. EMBO J. 22, 5323–5335 (2003).80. Kirmizis, A., Bartley, S.M. & Farnham, P.J. Mol. CancerTher. 2, 113–121 (2003).81. Kleer, C.G. et al. Proc. Natl. Acad. Sci. USA 100,11606–11611 (2003).82. Ohm, J.E. et al. Nat. Genet. 39, 237–242 (2007).83. Mohammad, H.P. et al. Cancer Res. 69, 6322–6330(2009).84. Schlesinger, Y. et al. Nat. Genet. 39, 232–236(2007).85. Widschwendter, M. et al. Nat. Genet. 39, 157–158(2007).86. Cedar, H. & Bergman, Y. Nat. Rev. Genet. 10, 295–304(2009).87. Baylin, S.B. in StemBook (ed. L. Gerard) (Harvard StemCell Institute, 2009).88. Gazin, C., Wajapeyee, N., Gobeil, S., Virbasius, C.M. &Green, M.R. Nature 449, 1073–1077 (2007).89. Opavsky, R. et al. PLoS Genet. 3, 1757–1769(2007).90. Reya, T. & Clevers, H. Nature 434, 843–850 (2005).91. Fre, S. et al. Nature 435, 964–968 (2005).92. Ying, Y. & Tao, Q. Epigenetics 4, 307–312 (2009).1038 volume 28 number 10 october 2010 nature biotechnology


commentaryTackling the epigenome: challenges andopportunities for collaborationJohn S Satterlee, Dirk Schübeler & Huck-Hui NgWhat are the key considerations to take into account when large-scale epigenomics projects are being implemented?© 2010 Nature America, Inc. All rights reserved.Epigenetic changes have been correlated withimportant biological processes and diseasestates; however, the global epigenetic landscapeof most cell types has not been comprehensivelyinvestigated. Recent advances in genomicstechnology, in particular high-throughputsequencing, have enabled genome-wide analysisof histone modifications and analysis ofDNA methylation at nucleotide resolution.Here we discuss the scientific opportunitiesand current challenges in this area of research,including strategies for balancing the breadthand depth of genome-wide projects; the selectionof cell types and assays; data visualizationand analysis; and approaches for exploitingthe resulting data to learn more about humanhealth and disease. We also provide a brief overviewof current large-scale projects (Box 1).What is epigenomics?Human genomes consist of the DNA encodingour genetic information, whereas epigenomesinclude DNA modifications and histonemodifications layered on top of the genome.These marks comprise a part of the instructionsdirecting the genome to express genes atparticular places and times 1,2 . Scientific understandingof the DNA ‘hardware’ of the humangenome is well established, but the epigenomic‘software’ has not yet been systematically investigatedat a genome-wide level. A chief hurdlefor such an endeavor is the large number ofepigenomes even within an individual. Each ofus has essentially one genome; however, eachcell type in each individual is believed to haveJohn S. Satterlee is at the US National Instituteon Drug Abuse, Bethesda, Maryland, USA; DirkSchübeler is at the Friedrich Miescher Institute,Basel, Switzerland; and Huck-Hui Ng is at theGenome Institute of Singapore, Singapore.e-mail: satterleej@nida.nih.gova distinct epigenome that reflects its developmentalstate 3 . Thus, there are likely to be atleast as many human epigenomes as there aredistinct cell types in the human body.The epigenetic state of a cell is affected bydevelopmental as well as environmental influences,and both of these inputs may leave epigenetictraces that the cell ‘remembers’ (referredto as cellular memory) 4 . Furthermore, the historyof transcription and environmental influences,such as nutrition, toxins, drugs of abuse,infection, disease state and exposure to toxicagents, can also affect DNA and histone modifications5 . Thus, the epigenome may provide acrucial interface between the environment andthe genome. The stability of chromatin changescan vary: some may be transient changes,whereas others are longer lasting. Some chromatinchanges are mitotically heritable and canaffect somatic tissues, whereas others may evenbe inherited through meiosis and affect the nextgeneration 6 . Epigenetic states are also likely tobe influenced by an individual’s specific constellationof genetic variation; however, the extentto which this is the case is unknown. Thus, thereis potentially an extremely large number of possibleepigenomes that could be mapped.The field of epigenomics—the study of epigeneticchanges at the level of the genome—haschanged rapidly, largely owing to advances inDNA sequencing technology 7 . Until recently,researchers were using microarray approachesand performing relatively small-scale studies.However, with the widespread adoption ofnext-generation sequencing, genome-wideepigenomic mapping experiments can now beperformed at unprecedented resolution 8,9 .Potential scientific benefits of large-scaleepigenomicsLarge-scale epigenomic mapping studies havethe potential to enhance three major areas ofscience: basic gene regulatory processes, cellulardifferentiation and reprogramming,and the role of epigenetic regulation in disease.Although chromatin modifications arebecoming better characterized at the genomewidelevel, further work is necessary to understandtheir role in nuclear processes, such asgene regulation. Epigenome-wide maps willprovide a comprehensive list of chromatinfeatures that can serve as a launch point for‘upstream’ investigations to identify the transcriptionfactors, regulatory molecules andpathways that initiate, modulate or maintainepigenomic features 10 . These maps may alsoallow pursuit of ‘downstream’ investigations toidentify genes with similar suites of epigeneticfeatures that suggest coordinated regulationof gene expression in particular cell types 11 .Mapping of DNA methylation, histone modificationsand noncoding RNAs simultaneouslyin the same cell types will allow scientists tobegin to better understand the cross-talk thatoccurs among these epigenetic regulatorymechanisms. Epigenomic data sets also haveremarkable power to identify functional chromosomalregions. For example, epigenomicinformation in concert with other data hasbeen used to predict cis-regulatory sequencessuch as enhancers, microRNA genes, imprintedloci, and loci poised for activation 12–18 . Giventhe complexity involved in cellular regulation,epigenome maps will undoubtedly reveal newprinciples in the regulation of genome structureand function.Compared with differentiated cells, theepigenomes of human embryonic stem cells(hESCs) are unusual, especially with respectto DNA methylation 19,20 . Understanding howthe epigenomic state of hESCs changes duringthe differentiation process is crucial forunderstanding both normal development anddisruptions in development that might lead tonature biotechnology volume 28 number 10 october 2010 1039


COMMENTARY© 2010 Nature America, Inc. All rights reserved.adverse birth outcomes or disease conditionsin the child or adult. Similarly, understandingthe extent to which the epigenome can bereprogrammed, as in the case of induced pluripotentstem cells, may be essential to enableregenerative medicine to reach its full potentialfor treating diseases in which cells have beenpermanently lost or impaired 21–23 .With respect to health and disease, epigeneticregulation has been implicated in certaintypes of cancers, and the most detailed diseaseepigenetics investigations have taken place inthe area of cancer biology 24 . Even so, a growingnumber of other diseases seem to occur atleast in part as the result of epigenetic dysregulation.For example, it is hypothesized that certainneuropsychiatric and neurodevelopmentaldisorders have a significant epigenetic component25 . Because the epigenomic status of cellsmay be altered by exogenous influences, it islikely that epigenetic regulation is importantin the development, severity and course ofother common diseases. However, the extentto which epigenetic dysregulation might be aconsequence of, or itself lead to, other commondisease states is poorly understood. Whether ornot these aberrant epigenetic states can affectsubsequent generations is even less clear but isan important area for investigation.Epigenomic maps of cell types and tissuesimportant in specific diseases may providea unique resource allowing researchers toidentify upstream factors and pathways thatmight contribute to the disease state as wellas downstream genes affected by the diseasestate. In addition, epigenetic states, regardlessof whether they are causal for a given disease,have great promise as potential biomarkers fordisease states or environmental stressors (e.g.,exposure to toxins, infections, drugs of abuse orpsychosocial stress) and thus may be useful fordiagnosis of disease or disease progression 26 .Both genes and environment are importantin the development of human diseases 27 .Genome-wide association studies have beensuccessful in identifying genetic variants associatedwith many different diseases 28 . In the caseof diseases that have a strong environmentalcomponent, epigenome-wide association studiesthat statistically correlate epigenetic variationwith disease states or phenotypes couldbe of great value. Epigenomic maps of specificcell types or tissues could serve as a foundationfor the design of future epigenome-wideassociation studies to systematically investigateindividual epigenetic variation and its potentialrole in human diseases. Taking this onestep further, studies investigating both geneticand epigenetic variation in disease, such as arecent study 29 looking at the role of maternalor paternal contribution of gene variants totype 2 diabetes and other common diseases,illustrate the potential value of such approachesto investigating human diseases 29 . A deeperunderstanding of the influence of individualepigenetic and genetic variation in disease susceptibilitycould enable personalized preventionmeasures in the future.Epigenetic changes are inherently moreplastic and dynamic than genetic changesand thus may be particularly useful targets fortherapeutic intervention 30 . Indeed, there are atleast three ‘epigenetic therapeutics’ approvedby the US Food and Drug Administration fortreating specific cancers and seizure disorders,and other compounds are in clinical trials 31,32 .A histone deacetylase inhibitor was even usedto treat a patient with a genetic mutation thatwas believed to be the cause of a seizure disorder,suggesting that in certain cases, epigeneticalterations might be able to override geneticdisease 33 .Potential practical benefits of large-scaleepigenomicsIn addition to the potential scientific and medicalbenefits described above, several practicalbenefits could arise from the coordinationof large-scale epigenomics projects. Theseinclude improved comparability between datasets, avoiding duplication of effort, exploitingeconomies of scale and developing ‘best practices’for epigenomic studies.Given the possible combinations of celltypes, environmental exposures, disease statesand individual genomic variation, the sheernumber of possible distinct epigenomes thatcould be analyzed seems astronomical. Thescientific community is likely to benefit mostfrom an organized and systematic mappingeffort in which similar epigenetic features aremapped in a defined set of cell types with astandardized set of protocols and quality controls,creating high-quality data sets that arecomparable with one another. A standardizedapproach would more easily enable the identificationof epigenomic features correlated withparticular cell states.Many scientists are becoming interested ininvestigating the interplay between epigeneticsand disease, as indicated by the large increase inthe number of US National Institutes of Health(NIH) grants investigating epigenetic processes(Fig. 1). Generation of epigenomic maps forcell types and tissues relevant to important biologicalprocesses and diseases would allow scientiststo use the available public data to lookat epigenomic regulation of their gene or processof interest rather than duplicating effortsby generating their own epigenomic maps. Ofcourse, community epigenomics projects willmerely serve as a foundation to be exploitedand built upon by researchers as they pursuetheir unique scientific interests.Because large-scale mapping groups arealready geared toward data production, theytypically can generate epigenomic data moreinexpensively, rapidly and reproducibly thanresearchers pursuing smaller-scale projectswithin their own laboratories. Such groups willalso be able to troubleshoot problems that arisein methods development, reagent validation,data standardization and data quality measurement,and begin to converge upon the best scientificpractices in this area. These groups willalso be well positioned to investigate the sourcesof experimental variation that arises and determinewhat steps need to be taken to minimizethis variation. Solutions to technical problems inepigenomics will enable individual investigatorsto more readily apply epigenomic techniquesto their biological problem of interest. As highthroughputsequencers become more widelyavailable, many scientists will begin to considerperforming epigenomic analyses within theirown laboratories. Although there will inevitablybe healthy competition between researchgroups exploring small-scale epigenomicsquestions, it will be important to encouragecoordination between larger-scale communityprojects to minimize duplication and maximizethe exploration of epigenomic features in a widearray of cell types and tissues.Planning large-scale epigenomicsprojectsGiven the broad scientific interest in this area,it is conceivable that large-scale epigenomicsprojects could be initiated in several fields,depending upon what data a communitydesires the most. One could readily imagineindividual projects focusing on stem cells,cancer, neuroepigenomics, genetic- epigeneticinteractions, developmental biology or individualepigenomic variation, because eachof these fields has a distinct set of importantbiological questions and challenges that needto be addressed. Compared with the HumanGenome Project, large-scale epigenomicsprojects have the potential to be quite openended,and thus a key consideration is to limitthe scope of such projects so that they achievedefined and compelling scientific outcomeswithin the fiscal and time constraints of eachproject. In particular, the depth and breadth ofa project must be clearly defined through theselection of tissue or cell types to be analyzed,epigenomic features to be assayed and functionalcorrelates to be tested. Project plannersalso need to consider the best way to manage,analyze and visualize the large amounts of datathat will be generated. These key considerationsare addressed in more detail below.1040 volume 28 number 10 OCTOBER 2010 nature biotechnology


COMMENTARYBox 1 Overview of current large-scale epigenomics projects© 2010 Nature America, Inc. All rights reserved.Several medium- and large-scale epigenomics efforts have alreadybeen initiated. We briefly describe selected examples below.Asian projects. Institutions in several countries have alreadydeveloped technological platforms to enable the generation oflarge-scale sequencing data. These centers are in a good positionto launch epigenomics projects. Scientists from Yonsei University(Seoul), the Japanese National Cancer Center, the ShanghaiCancer Institute and the Genome Institute of Singapore arealready organizing annual conferences to promote interactions andcollaborations. As epigenomics research begins to move to centerstage, Asian scientists are likely to make increasing contributions.Canadian and Australian projects. The Canadian Institutesof Health Research (Ottawa) is leading an effort to develop apotential broad-ranging initiative on ‘Epigenetics, Environmentand Health’. Australia has been the site of several workshopsand meetings devoted to epigenetics, and in 2008 researchersthere formed the Australian Alliance for Epigenetics (http://www.epialliance.org.au/).European epigenomics projects. Although research in Europe isfunded at the national level as well as Europe-wide through theresearch program of the European Union, we exclusively focuson the latter here. As yet, the EU has no program specificallydedicated to epigenomics, but several research groups working inthe area have been funded within more broadly defined programs.Among these are the HEROIC program (http://www.heroic-ip.eu/) on epigenomics in mouse stem cells and differentiated cellsand the EPITRON initiative (http://www.epitron.eu/) on cancerepigenetics, which were both early adopters of next-generationsequencing. Furthermore, the SMARTER initiative (http://www.smarter-chromatin.eu/) aims to develop small inhibitorsof chromatin-modifying enzymes. Particularly internationallyvisible is the Epigenome Network of Excellence, which fostersthe epigenetics research community in Europe, in part throughsupport of junior scientists, organization of focal meetings (e.g.,on technical aspects of epigenomics), development of tools (e.g.,a popular protocol database; http://www.epigenome-noe.net/WWW/researchtools/protocols.php) and an informative websiteaimed at the layman (http://www.epigenome-noe.net/WWW/index.php). The Epigenome Network of Excellence exemplifiesthe importance of networking, and the experience it has gainedwill be useful for the coordination and implementation of a moreinternational effort in epigenomics.The ENCODE and ICGC projects. The Encyclopedia of DNAElements (ENCODE) project launched by the US NationalHuman Genome Research Institute aims to identify all functionalelements in the human genome sequence, whereas themodENCODE project has the same goal with respect to modelorganisms 59 . These projects use a wide array of different assaysto identify functional elements, and epigenomic profiling is thusan important component of the programs but not their majorthrust (http://www.genome.gov/10005107). Another project, theInternational Cancer Genome Consortium (ICGC), is investigatinggenomic changes that occur in various types of cancer (http://www.icgc.org/), with the goal of obtaining a comprehensivedescription of genomic, transcriptomic and epigenomic changesin 50 different tumor types and/or subtypes 60 . As many samplesfrom one tumor type or subtype will be analyzed in great detail,the project will provide crucial insights into the links andinterplay between genetics and epigenetics.NIH Roadmap Epigenomics Program. Initiated in 2008,the NIH’s Roadmap Epigenomics Program (http://www.roadmapepigenomics.org) has an epigenomic mappingcomponent described in detail in an accompanying article 61 .In brief, the Roadmap Epigenomics Mapping Consortium isconducting in-depth epigenomic mapping of several highpriorityhuman cell types. In this subset of tissues, genome-widehistone modification analysis will be performed for >30 histonemodifications using ChIP-seq, and DNA methylation analysis willbe performed genome-wide at single-base resolution using theMethylC-seq technique. The first two DNA methylomes for humancell types (the H1 hESC line and the IMR90 fibroblast cell line)have recently been published 19 .To capture the breadth of epigenomic differences amongcell types, the consortium also intends to map >100 humancell types and tissues in a more focused way. Six informativehistone modifications have been selected for ChIP-seq analysisin these cell types. In addition, DNA methylation will be mappedat single-base resolution, but only on a subset of the genome,using reduced-representation bisulfate sequencing or similarmethods. DNase I–hypersensitive sites are expected to beanalyzed for tissues from which sufficient material is available.Gene expression data will be collected using microarray or RNAsequencingstrategies.The consortium is developing standards and best practices forindividual and integrative analyses of the different data typesto provide a reference for the larger epigenomic community asit builds upon these data. Data are being released immediately(with a 9-month embargo for publication of genome-wideanalyses) and will be permanently archived in the GEO database(http://www.ncbi.nlm.nih.gov/epigenomics) at the US NationalCenter for Biotechnology Information (NCBI).Other aspects of the Roadmap Epigenomics Program includedevelopment of new technologies for epigenomic analysis andimaging, identification of new epigenetic modifications andinvestigations into the role of the epigenome in a wide array ofhuman diseases and environmental effects (http://nihroadmap.nih.gov/epigenomics/fundedresearch.asp).The International Human Epigenome Consortium. Over the years,a grassroots group of scientists has championed a sustainedinternational epigenomics effort 62,63 . This is now taking definitiveshape in the form of an initiative to develop an IHEC, whichbuilds on the NIH effort in epigenomics and will attempt tocreate a truly global epigenomics project (http://ihec-epigenomes.org). Even the Roadmap Epigenomics Program, the largestepigenomics effort to date, will be able to map only a smallnumber of the many epigenomes of interest. Although plans forthe consortium are not finalized, it may expand the number ofhuman cell types and tissues being mapped, and its goals couldalso include epigenomic analysis of nonhuman cells and tissues,which are not being characterized in the NIH Roadmap Program.The IHEC could also help to develop best practices and standardsfor epigenomic data generation and analysis, so investigators canperform successful epigenomic analyses more quickly and avoidproblems already encountered and solved by other researchers.nature biotechnology volume 28 number 10 october 2010 1041


COMMENTARY© 2010 Nature America, Inc. All rights reserved.Tissue and cell type selection. One of the primaryissues for large-scale epigenomic researchprojects is the selection and prioritization ofcell types and tissues, which can be affectedby both technical factors and communitydemand, and will vary depending upon theprecise nature of the planned project. Givenfinite resources, scientific prioritization of celltypes and tissues for analysis is crucial. Whichcells or tissues are most likely to provide valuableinsights into important biological processes,diseases or environmental exposures?For community resource projects, what epigenomicmaps would be of the greatest value toresearchers?Another question is what cell types or tissuesare available in the necessary quantities for theproposed assays. This is especially importantgiven that current technologies for epigenomicanalysis are generally not amenable to the use ofvery small numbers of cells, although this couldchange rapidly 34,35 . Similarly, most tissues arecomposed of several specialized cell types (e.g.,different types of neuronal and glial cells in thebrain), which could make interpretation of theresulting epigenomic data difficult and perhapsrequire approaches to isolate single cells out oftissues. Cell lines are more homogenous thantissues, but in vitro growth conditions have thepotential to substantially affect the epigenomicstate of the cells so that they may not accuratelyreflect the in vivo state 36 .In the case of human tissues, individuals havedifferent genomes and environmental exposurehistories 37 . Thus, there might be an advantageto performing epigenomic analyses on differenttissues from a single individual, in whom thesevariables are held constant. Investigations intoindividual epigenomic variation and the interplaybetween the genome and the epi genomeare important areas for future research, so itwould be ideal to use cells and tissues of knownDNA sequence. Alternatively, tissue samplesshould be stored for future analysis as sequencingcosts continue to drop.Another important consideration is theorganism to be investigated. Epigenomic mapsof human tissues will no doubt be importantfor understanding human disease. Even so,mapping of tissues and cell types from modelorganisms (e.g., mouse, zebrafish, fly, worm,sea slug, plant and yeast) may be very usefulbecause the genotype and environmentalexposure can be controlled and these systemsare more amenable to functional testing ormanipulation of epigenomic states. Epigenomicstudies in established animal models of environmentalexposure or disease may also bevaluable, particularly for transgenerationalinvestigations or for studies of cell types thatare difficult to obtain from human sources.Grants with keyword epigenetic/epigenomic2,0001,5001,00050002005 2006 2007 2008 2009YearsFigure 1 The growth of grants related toepigenetics at the NIH. The NIH RePORTERgrants database (http://projectreporter.nih.gov/reporter.cfm) was searched using the keywordsepigenetic or epigenomic for each of the yearsindicated. Graph shows number of grants thatcontain a title, abstract or specific aims with oneor both of these keywords.Comparative epigenomic analyses of specificcell types from different species may also bevaluable for identifying epigenomic regulatoryfeatures.Epigenomic assays. Which marks or featuresshould be mapped? Again, technical and fiscalrestraints affect assay selection and prioritization.Assays that measure histone modifications,DNA modifications, and small and longnoncoding RNAs are essential, but informationabout other chromatin features (transcriptionfactor binding sites, chromatin-interactingproteins, histone variants and nucleosomeposition) may add substantial value to the epigenomicdata captured (see below). Of course,reagents or high-throughput assays may notbe available for all features one might wish toexamine. For the purposes of this article, welimit our discussion to analysis of DNA modificationsand histone modifications.Several assays are available for large-scaleDNA methylation analysis, and these differin their resolution and comprehensiveness.For example, the genome-wide antibodybasedmethylated DNA immunoprecipitationassay has a resolution on the order ofhundreds of nucleotides, whereas assays suchas reduced-representation bisulfite sequencingand targeted bisulfite sequencing achievesingle-base resolution but interrogate only asubset of the genome and may therefore missinformation concerning important regions ofthe epigenome 22,38–43 . MethylC-seq, the mostcomprehensive DNA methylation techniquecurrently available, provides genome-widesingle-nucleotide-resolution data and recentlyhas been used to create a high-density DNAmethylation map for two human cell types 19 .Unfortunately, genome-wide MethylC-seq isexpensive, making comprehensive DNA methylationanalysis of large numbers of cell typesand tissues less feasible for the time being.Even so, truly comprehensive epigenomicmaps should include whole-methylome datawhenever possible.The recent identification of the covalentDNA modification, hydroxymethylcytosine44,45 (hmC), also poses challenges.Antibodies for hmC should permit specifichydroxymethyl-DNA immunoprecipitationassays. However, because mC and hmC areboth resistant to bisulfite conversion 46 , it willbe a priority to develop technology that candistinguish between both modifications at asingle-base level.In terms of histones, >100 distinct posttranslationalmodifications have been identifiedthus far, and more seem to be discoveredevery month. The functions of most of thesemodifications are largely unknown, althoughsome modifications are associated with activechromatin, whereas others are linked tosilenced chromatin 47 . The two major assaysfor profiling histone modifications both relyon chromatin immunoprecipitation (ChIP), inwhich an antibody against a histone modificationis used to immunoprecipitate cross-linkedchromatin 48 . The DNA regions associated withthe histone modification can then be analyzedusing either microarray analysis (ChIP-chip) orhigh-throughput sequence analysis (ChIP-seq).ChIP-seq has the advantage of not being limitedto the sequences present on the microarray,and the output of ChIP-seq is more quantitativethan that of ChIP-chip. ChIP-seq is also morecost-efficient than genome-wide microarrays,and the cost of DNA sequencing is expected todrop even further 48 . However, ChIP-seq createslarge amounts of data that multiply withthe number of histone modifications profiled,creating new computational challenges withregard to storage and analysis 49,50 .The lack of antibodies with the specificityand efficiency of enrichment needed fora ChIP experiment limits the study of manypost-translational histone modifications. Forcomprehensive epigenomic maps, one mightwant to assay every modification for whichthere is a useful reagent (currently around 30);however, this is an expensive proposition if onewishes to look at a large number of tissues orcell types. An alternative strategy to reduce costis to identify a set of key histone modificationsfor which high-quality reagents are availableand that are highly informative of cellular state.For example, the NIH Roadmap EpigenomicsMapping Consortium identified a subset ofsix histone modifications that was felt to bemaximally informative at this point in time(H3K4me1, H3K4me3, H3K9ac, H3K9me3,H3K27me3, H3K36me3).To be comparable, ChIP-based assays relyon validated antibodies supplied in continuous1042 volume 28 number 10 OCTOBER 2010 nature biotechnology


COMMENTARY© 2010 Nature America, Inc. All rights reserved.quality. Commercially available antibodies,however, are frequently polyclonal and thus areonly available in finite amounts with batch-tobatchvariations, leading to antibody-dependentdifferences in the output data. Monoclonal antibodiesoffer a renewable standardized sourceof antibodies; however, because they bind toa single epitope, they do not always work wellin ChIP assays. Clearly, the development ofrenewable standardized affinity reagents forChIP studies would be of great benefit to thescientific community.For all its power, the ChIP assay has certainlimitations. Some modifications may bemasked by other proteins, rendering theminaccessible to the antibody and not readilydetected. Moreover, large numbers of cellsare currently required. Although ChIP assayswill continue to be a widely used techniquefor epi genomics studies, the developmentof alternative approaches that complementChIP would be of great value. For example,the routine ability to isolate chromatin froma particular genetic locus and analyze it viamass spectrometry to identify all the proteinsand post-translational modifications presentat the locus would greatly enhance researchers’ability to investigate regulation of geneexpression 51 .Value-added information. In addition to thetechnical aspects of generating epigenomicdata sets described above, additional experimentalinformation regarding the biologicalsample should be included whenever possible.For example, as much phenotypic data as possibleshould be collected for each tissue assayedso that epigenomic state can be correlated withenvironmental exposures, disease state, age,gender and other measures. As mentionedabove, genetic variation may affect epigenomicstates, so capturing the DNA sequence of thecell type or tissue assayed would allow thesecorrelations to be made.Measurement of gene expression levelsis crucial for correlating epigenomic statewith transcription. Gene expression can bemeasured through microarray analysis ornext- generation sequencing and can focuson coding or noncoding RNA, depending onthe experimental design. The role of noncodingRNAs in epigenomic processes remainssomewhat unclear; however, in some speciesthere is convincing evidence that certainnoncoding RNAs are associated with specifichistone modifications and with DNA 52,53 .Thus, strategies for measuring gene expressionthat include quantification of noncodingRNA levels would help scientists understandhow these molecules correlate with other epigenomicfeatures.For some projects, it will be worthwhile tomeasure chromatin features such as binding oftranscription factors or chromatin-interactingproteins, histone variants and positioning ofnucleosomes 48 . Characterization of chromatinstructure using DNase I hypersensitivity–basedassays may allow correlation of epigenetic stateswith chromatin accessibility 54 . Epigenomic featuresare likely to affect higher-order chromatinstructure, and the introduction of new methodsfor analyzing higher-order chromatin (Hi-C,or chromatin interaction analysis using pairedendtag sequencing) may provide additionalchromatin structure information that can becorrelated with epigenomic features 55,56 .Analysis of epigenomic maps is, by nature,correlative, and thus understanding of functionand mechanism will require examinationof the relationships between the observedepigenomic features and proposed biologicalprocesses. Manipulation of the epigenome canbe achieved globally using pharmacologicalmodulators of epigenetic-modifying enzymesor effector molecules. Similarly, genetic deletionor knockdown, as well as overexpressionapproaches, can be used to manipulate proteinlevels. Currently, locus-specific epigeneticmanipulation involves expression of fusionproteins, and variations of this approach willprobably produce valuable tools for examiningthe functions of epigenomic features 57 .Epigenomic data considerations. The volumeof data generated by a large-scale epigenomicsproject is great and thus creates aneed for efficient data storage and processing.Furthermore, pipelines must be in place forhandling the different data types (e.g., ChIPseq,MethylC-seq, gene expression, DNase Ihypersensitivity), performing quality control,comparing replicates and providing statisticallycorrect normalization measures. Furtherprocessing includes genome alignments anddeposition into databases. Controlled-accessdatabases, such as the Database of Genotypesand Phenotypes (http://www.ncbi.nlm.nih.gov/sites/entrez?db=gap) or the EuropeanGenotype Archive (http://www.ebi.ac.uk/ega/page.php), are used to store DNA sequenceinformation from consenting participants toallow their genomic data to be accessible. Suchaligned and quality-controlled data can then beanalyzed in a variety of ways 58 . Coincidenceof epigenetic marks at a particular gene locuswithin a cell or tissue type can be examined togroup genes on the basis of their complementof epigenetic features. This type of analysisaims to identify co-occurrence of epigeneticfeatures and thus potential cross-talk betweenepigenomic regulatory processes. Epigenomicmaps from different cell types can be comparedto identify features at particular gene loci thatare indicative of a particular cell type or tissue.Data from normal and diseased tissue can alsobe compared to identify epigenetic features atgene loci that might be associated with a particulardisease state or environmental exposure.Epigenomic data can also be correlated withphenotype, genetic variation (single-nucleotidepolymorphisms or copy-number variants),gene expression levels, chromatin accessibilityor other data types. Recent integrativeanalysis has begun to reveal the predictivevalue of epigenetic marks, as described above.Computational approaches for some of theseepigenomic analyses are being developed 8,11 .In addition to sophisticated computationalanalysis, visualization tools are also neededfor intuitive data display that allows the nonexpertto check the epigenetic state of theirgene of interest.ConclusionsLarge-scale epigenomic mapping projects havethe potential to provide global, integrated viewsof different cellular states. This information willalmost certainly provide new biological insightsfor different fields of science, particularly in theareas of basic gene regulatory processes, cellulardifferentiation and reprogramming, andthe role of epigenetic regulation in disease processes.It is hoped that a deeper understandingof the influence of epigenetic processes will leadto better knowledge of disease mechanisms,improve disease diagnosis, enable preventionand potentially allow the development of newtherapeutic agents.ACKNOWLEDGMENTSWe thank members of the Roadmap EpigenomicsConsortium and Workgroup as well as the InterimSteering Committee of the International HumanEpigenome Consortium for their input.COMPETING FINANCIAL INTERESTSThe authors declare no competing financial interests.1. Bird, A. Nature 447, 396–398 (2007).2. Suzuki, M.M. & Bird, A. Nat. Rev. Genet. 9, 465–476(2008).3. Murrell, A., Rakyan, V.K. & Beck, S. Hum. Mol. Genet.14 Spec. No. 1, R3–R10 (2005).4. Ng, R.K. & Gurdon, J.B. Cell Cycle 7, 1173–1177(2008).5. Zhang, T.Y. & Meaney, M.J. Annu. Rev. Psychol. 61,439–466 (2010).6. Youngson, N.A. & Whitelaw, E. Annu. Rev. GenomicsHum. Genet. 9, 233–257 (2008).7. Bernstein, B.E., Meissner, A. & Lander, E.S. Cell 128,669–681 (2007).8. Hawkins, R.D., Hon, G.C. & Ren, B. Nat. Rev. Genet.11, 476–486 (2010).9. Bentley, D.R. et al. Nature 456, 53–59 (2008).10. Berger, S.L., Kouzarides, T., Shiekhattar, R. &Shilatifard, A. Genes Dev. 23, 781–783 (2009).11. Ernst, J. & Kellis, M. Nat. Biotechnol. 28, 817–825(2010).12. Ozsolak, F. et al. Genes Dev. 22, 3172–3183 (2008).13. Heintzman, N.D. et al. Nature 459, 108–112(2009).14. Bernstein, B.E. et al. Cell 125, 315–326 (2006).nature biotechnology volume 28 number 10 october 2010 1043


COMMENTARY© 2010 Nature America, Inc. All rights reserved.15. Dindot, S.V., Person, R., Strivens, M., Garcia, R. &Beaudet, A.L. Genome Res. 19, 1374–1383 (2009).16. Barski, A. et al. Genome Res. 19, 1742–1751(2009).17. Guttman, M. et al. Nature 458, 223–227 (2009).18. Hon, G.C., Hawkins, R.D. & Ren, B. Hum. Mol. Genet.18, R195–R201 (2009).19. Lister, R. et al. Nature 462, 315–322 (2009).20. Meissner, A. Nat. Biotechnol. 28, 1079–1088(2010).21. Ball, M.P. et al. Nat. Biotechnol. 27, 361–368(2009).22. Deng, J. et al. Nat. Biotechnol. 27, 353–360 (2009).23. Doi, A. et al. Nat. Genet. 41, 1350–1353 (2009).24. Gronbaek, K., Hother, C. & Jones, P.A. APMIS 115,1039–1059 (2007).25. Tsankova, N., Renthal, W., Kumar, A. & Nestler, E.J. Nat.Rev. Neurosci. 8, 355–367 (2007).26. Mulero-Navarro, S. & Esteller, M. Crit. Rev. Oncol.Hematol. 68, 1–11 (2008).27. Hunter, D.J. Nat. Rev. Genet. 6, 287–298 (2005).28. Altshuler, D., Daly, M.J. & Lander, E.S. Science 322,881–888 (2008).29. Kong, A. et al. Nature 462, 868–874 (2009).30. Haberland, M., Montgomery, R.L. & Olson, E.N.Nat. Rev. Genet. 10, 32–42 (2009).31. Sharma, S., Kelly, T.K. & Jones, P.A. Carcinogenesis31, 27–36 (2010).32. Mack, G.S. J. Natl. Cancer Inst. 98, 1443–1444(2006).33. Almeida, A.M. et al. N. Engl. J. Med. 356, 1641–1647(2007).34. Goren, A. et al. Nat. Methods 7, 47–49 (2010).35. Gu, H. et al. Nat. Methods 7, 133–136 (2010).36. O’Neill, L.P., VerMilyea, M.D. & Turner, B.M.Nat. Genet. 38, 835–841 (2006).37. Bjornsson, H.T. et al. J. Am. Med. Assoc. 299, 2877–2883 (2008).38. Weber, M. et al. Nat. Genet. 37, 853–862 (2005).39. Ammerpohl, O., Martin-Subero, J.I., Richter, J., Vater,I. & Siebert, R. Biochim. Biophys. Acta 1790, 847–862 (2009).40. Meissner, A. et al. Nature 454, 766–770 (2008).41. Eckhardt, F. et al. Nat. Genet. 38, 1378–1385(2006).42. Beck, S. & Rakyan, V.K. Trends Genet. 24, 231–237(2008).43. Li, J.B. et al. Genome Res. 19, 1606–1615 (2009).44. Kriaucionis, S. & Heintz, N. Science 324, 929–930(2009).45. Tahiliani, M. et al. Science 324, 930–935 (2009).46. Hayatsu, H. & Shiragami, M. Biochemistry 18, 632–637 (1979).47. Campos, E.I. & Reinberg, D. Annu. Rev. Genet. 43,559–599 (2009).48. Park, P.J. Nat. Rev. Genet. 10, 669–680 (2009).49. Mikkelsen, T.S. et al. Nature 448, 553–560 (2007).50. Barski, A. et al. Cell 129, 823–837 (2007).51. Dejardin, J. & Kingston, R.E. Cell 136, 175–186(2009).52. Kloc, A., Zaratiegui, M., Nora, E. & Martienssen, R. Curr.Biol. 18, 490–495 (2008).53. Nagano, T. & Fraser, P. Mamm. Genome 20, 557–562(2009).54. Hesselberth, J.R. et al. Nat. Methods 6, 283–289(2009).55. Fullwood, M.J. et al. Nature 462, 58–64 (2009).56. Lieberman-Aiden, E. et al. Science 326, 289–293(2009).57. Hansen, K.H. et al. Nat. Cell Biol. 10, 1291–1300(2008).58. Bock, C. & Lengauer, T. Bioinformatics 24, 1–10(2008).59. Birney, E. et al. Nature 447, 799–816 (2007).60. The International Cancer Genome Consortium. Nature464, 993–998 (2010).61. Bernstein, B. Nat. Biotechnol. 28, 1045–1048(2010).62. Jones, P.A. & Martienssen, R. Cancer Res. 65, 11241–11246 (2005).63. The American Association for Cancer Research HumanEpigenome Task Force, and the European Union,Network of Excellence, Scientific Advisory Board. Nature454, 711–715 (2008).1044 volume 28 number 10 OCTOBER 2010 nature biotechnology


commentaryThe NIH Roadmap Epigenomics MappingConsortiumBradley E Bernstein, John A Stamatoyannopoulos, Joseph F Costello, Bing Ren, Aleksandar Milosavljevic,Alexander Meissner, Manolis Kellis, Marco A Marra, Arthur L Beaudet, Joseph R Ecker, Peggy J Farnham,Martin Hirst, Eric S Lander, Tarjei S Mikkelsen & James A Thomson© 2010 Nature America, Inc. All rights reserved.The NIH Roadmap Epigenomics Mapping Consortium aims to produce a public resource of epigenomic maps for stemcells and primary ex vivo tissues selected to represent the normal counterparts of tissues and organ systems frequentlyinvolved in human disease.Recent years have seen remarkable progressin understanding of human genetics,enabled by the availability of the human genomesequence and increasingly high-throughputtechnologies for DNA analysis 1 . Yet despitetheir breadth and comprehensiveness, purelyDNA sequence–level investigations do notshed light on a crucial component of humanbiology: how the same genome sequencecan give rise to over 200 different cell typesthrough remarkably consistent differentiationprograms. This process of developmentalspecification, classically termed ‘epigenesis’, isnow known to involve differential regulation ofgenes and their products 2 . Aberrant regulationof such phenomena has been extensively linkedto human diseases and, additionally, can beinfluenced by environmental inputs 3–5 .Gene regulation and genome function areintimately related to the physical organizationof genomic DNA and in particular to theway it is packaged into chromatin, a complexnucleoprotein structure comprising histones,DNA binding factors, accessory proteincomplexes and noncoding RNAs 6–9 (Fig. 1).Chromatin is a dynamic entity that is subjectto modification of both its DNA and proteincomponents, with direct structural andfunctional consequences. The term ‘epigenome’is used to describe the way in which thesemodifications and structural features aredistributed across the genome in a given cellpopulation. The epigenomic landscapes andthe associated gene expression programs aremaintained within a given cell lineage throughcomplex processes that involve transcriptionfactors, chromatin regulators, histonemodifications and variants, and RNAs 10–12 , butthat remain poorly understood in mammals.Although the mechanisms remain obscure, anow overwhelming body of evidence supportscentral roles for epigenomic changes in diseasesusceptibility and pathogenesis. Multipledisease processes, including cancer, are nowwell known to be associated with characteristicalterations in the patterns of chromatin,DNA methylation and gene expression 3,5 .In addition, epidemiological studies havelinked early environmental exposures, suchas in utero starvation, to long-term healthconsequences ranging from metabolicdisorders to psychiatric diseases 13 . A causalrole for epigenomic aberrations is supported byseveral lines of evidence, including mutationsof genes encoding chromatin regulators indevelopmental disorders and cancer 4,14–16 , andby the therapeutic efficacy of small-moleculeinhibitors of DNA methyltransferases andhistone-modifying enzymes 17 .Major epigenomic features can now beinterrogated comprehensively by combiningcellular, biochemical and molecular techniqueswith high-throughput sequencing. Productionof genome-wide maps of cytosine methylation,histone modifications, chromatin accessibilityand RNA transcripts represents a powerful andBradley E. Bernstein, Alexander Meissner,Manolis Kellis, Eric S. Lander and TarjeiS. Mikkelsen are at the Broad Instituteof Harvard and MIT, Cambridge,Massachusetts, USA; Bradley E. Bernsteinis also at the Howard Hughes MedicalInstitute, Department of Pathology,Massachusetts General Hospital andHarvard Medical School, Boston,Massachusetts, USA; and Alexander Meissneris in the Department of Stem Cell andRegenerative Biology at Harvard University,Cambridge, Massachusetts, USA. John A.Stamatoyannopoulos is in the Departmentsof Genome Sciences and Medicine, Universityof Washington School of Medicine, Seattle,Washington, USA. Joseph F. Costello is in theDepartment of Neurosurgery, University ofCalifornia at San Francisco, San Francisco,California, USA. Bing Ren is at the LudwigInstitute for Cancer Research, University ofCalifornia San Diego School of Medicine,La Jolla, California, USA. AleksandarMilosavljevic and Arthur L. Beaudet are inthe Department of Molecular and HumanGenetics, Baylor College of Medicine,Houston, Texas, USA. Marco A. Marra andMartin Hirst are at the Genome SciencesCentre, British Columbia Cancer Agency,Vancouver, British Columbia, Canada.Joseph R. Ecker is in the Genomic AnalysisLaboratory, Salk Institute for BiologicalStudies, La Jolla, California, USA. Peggy J.Farnham is at the Genome Center, Universityof California at Davis, Davis, California,USA. James A. Thomson is at the Universityof Wisconsin School of Medicine and PublicHealth, Madison, Wisconsin, USA.e-mail: Bernstein.Bradley@mgh.harvard.edunature biotechnology volume 28 number 10 october 2010 1045


COMMENTARY© 2010 Nature America, Inc. All rights reserved.The epigenomeDNA methylationDNADNAaccessibilityDNAbindingproteinsHistonegeneral approach for surveying the regulatorystate of the genome in a cell type of interest.The resulting data define the locations andactivation states of diverse functional elements,including genes and their transcriptionalcontrol elements (e.g., promoters, enhancersand insulators), noncoding transcripts andepigenetic effectors, such as imprinting controlregions 18–25 . More globally, such maps canprovide insight into developmental state andpotential, for example of a stem cell population,and shed light on aberrant regulatory programsin diseased tissues.Here we describe the aims and scope ofthe US National Institutes of Health (NIH)Roadmap Epigenomics Mapping Consortium,which has set out to provide a publiclyaccessible resource of epigenomic maps in stemcells and primary ex vivo tissues. These mapswill detail the genome-wide landscapes ofDNA methylation, histone modifications andrelated chromatin features, and are intendedto provide a reference for studies of the geneticand epigenetic events that underlie humandevelopment, diversity and disease. Below, wedescribe the organizational structure, goals andanticipated deliverables of the consortium.A coordinated study of human epigenomesIn 2008, the NIH Roadmap EpigenomicsMapping Consortium (http://www.roadmapepigenomics.org/) was launched with thegoal of producing a public resource of humanepigenomic data to catalyze basic biology anddisease-oriented research. The consortiumleverages experimental pipelines built aroundnext-generation sequencing technologies tomap DNA methylation, histone modifications,chromatin accessibility and RNA transcripts inHistonemodificationsRNAPolycombcomplexFigure 1 Layers of genome organization. Genome function and cellular phenotypes are influenced byDNA methylation and the protein-DNA complex known as chromatin. In mammals, DNA methylationoccurs on cytosine bases, primarily in the context of CpG dinucleotides. Accessible chromatin that ishypersensitive to DNase I digestion marks promoters and functional elements bound by transcriptionfactors or other regulatory proteins. Histone modifications, associated proteins such as Polycombrepressors and noncoding RNAs constitute an additional layer of chromatin structure that affectsgenome function in a context-dependent manner.stem cells and primary ex vivo tissues selectedto represent the normal counterparts of tissuesand organ systems frequently involved inhuman disease. The mapping of such normalepigenomes is being undertaken by fourEpigenomics Mapping Centers and supportedby a Data Analysis and Coordinating Center,which collectively coordinate experimental andanalytical efforts to maximize consistency, dataquality and overall coverage of the epigenomiclandscape.Because the epigenomic landscape variesmarkedly across tissue types (and betweenindividuals), there is no single ‘reference’epigenome. Rather, the consortium expectsto deliver a collection of normal epigenomesfor different tissues and individuals, intendedto provide a framework or reference forcomparison and integration within a broadarray of future studies. A core goal of theconsortium is to close the gap between datageneration and its public dissemination byrapid release of raw sequence data, profilesof epigenomic features and higher-levelintegrated maps, in coordination with the USNational Center for Biotechnology Information(NCBI). The consortium is also committedto the development, standardization anddissemination of protocols, reagents andanalytical tools to enable the researchcommunity to utilize, integrate and expandupon this body of data (Fig. 2).Reference maps for major epigenomicfeaturesThe Epigenomics Mapping Centers havecollaboratively established data collectionpipelines to produce high-quality,comprehensive epigenomic maps. Specific dataB. Wongtypes have been prioritized that offer broadinsight into genome regulation, are generallyapplicable to diverse cell populations and canbe evaluated comprehensively and accuratelyby high-throughput sequencing. These includegenomic maps for DNA methylation, histonemodifications, chromatin accessibility andRNA expression. The Mapping Centers workwith the Data Analysis and CoordinationCenter to evaluate, compare and integrate thedifferent data types and formats to ensure dataquality and standards that enable the largercommunity to build upon these data.The first of these data types, DNAmethylation, is assayed by sequencing DNAthat has been treated with sodium bisulfite(BS-seq), or enriched by methylcytosine pulldown(methylated DNA immunoprecipitation(MeDIP)-seq) or methylation-sensitiverestriction enzymes (MRE-seq). BS-seq,applied either to whole genomes or to reducedrepresentationsamples, has been designated asa primary assay because it provides accurateand consistent nucleotide-resolution data.The consortium is implementing MeDIPseqand MRE-seq on a more limited basis tobenchmark and compare these widely appliedapproaches.A second type of data, histone modifications,are assayed by sequencing DNA enrichedby chromatin immunoprecipitation withmodification-specific histone antibodies(ChIP-seq). The consortium has implementedrigorous specificity tests that use arrays ofdifferentially modified histone tail peptidesto ensure antibody specificity. In addition,common cell sources are collectively profiledand compared, ensuring consistency betweenthe different data-collection centers.Chromatin accessibility is assayed bysequencing DNase I cleavage sites in nuclearchromatin. These assays are performed at highsequencing depth to provide a global surveyof accessible regions as well as high-resolutioninformation regarding the protein occupancyof specific sequences 24 .Finally, RNA expression is assayed bysequencing mRNAs or size-selected small RNAfractions to high depths. These expression dataare intended to augment and illuminate thefunctional output of the epigenomic profiles.Given its mandate to deliver epigenomicmaps for hundreds of different cell populations,the consortium must balance breadthof cell coverage with the depth to which differentepigenomic features are investigated.High-value cell types, such as human embryonicstem cells (hESCs), will be subjectedto deep exploration of a very broad rangeof histone modifications and comprehensive,single nucleotide– resolution analysis of1046 volume 28 number 10 october 2010 nature biotechnology


COMMENTARY© 2010 Nature America, Inc. All rights reserved.DNA methylation. Although it is not yet possibleto specify a definitive set of features thatrepresent a minimal epigenome, the consortiumhas initially identified DNA methylation,six major histone modifications (H3K4me1,H3K4me3, H3K9me3, H3K9ac, H3K27me3and H3K36me3). chromatin accessibility andRNA as essential features that will be assayedin most or all designated cell populations.This combination of deep and broad analysisis expected to maximize coverage of cellulardiversity and disease- relevant humantissues, while ensuring that a broad range ofepigenomic features is explored.Prioritized cells and tissuesThe consortium will investigate a diversecollection of cell and tissue models, includinghESCs and adult stem cells and theirdifferentiated progeny; induced pluripotentstem cells; and primary ex vivo human fetaland adult tissues. These cells and tissues wereprioritized on the basis of broad scientific andbiomedical interest, tractability, phenotypicdiversity and under-representation in othercollaborative projects.Because of their biomedical importance,hESCs and major lineage derivatives havebeen selected for intensive investigation.The resulting data will offer insight into thedistributions, dynamics and inter- relationshipsamong epigenomic features, and catalyzestudy of their functions in development,epigenetic control and genome regulation.The consortium will also target additionalstem cell models, including mesenchymal andneural stem cells, and reprogrammed cells, asin vitro models of development with particularrelevance to regenerative medicine.Broader coverage of human cellular diversitywill be achieved through study of primarycells and tissues relevant to metabolic andcardiovascular disease, cancer, neuropsychiatricdisease, aging and other leading health issues.These will be acquired from primary sourcesand sorted or otherwise manipulated to obtainsuitably homogeneous cell populations thatwill be directly channeled to data- collectionpipelines. Prioritized cell types include sortedhematopoietic lineages, liver, muscle andadipose, as well as selected cell types from breastand neural tissues. In addition, fetal tissues willbe analyzed for insight into epigenomic landscapesof early development. Maps for suchprimary ex vivo tissues are urgently neededbecause most of our current knowledge hascome from either transformed cell lines orcultured cells, both of which experience markednonphysiologic changes to their chromatinenvironment, including aberrant DNA hypermethylationand loss of heterochromatin integ-Figure 2 Portal for the NIH Roadmap Epigenomics Mapping Consortium. A public portal (http://www.roadmapepigenomics.org/) provides general information about the consortium and its participants, alongwith links to experimental protocols, consortium data and interfaces for visualizing epigenomic maps.rity. Collectively, profiles for these diverse cellmodels should offer unprecedented insightinto the breadth and dynamics of human epigenomesand provide a durable frameworkfor future explorations of epigenomic changesassociated with human disease.Integration and dissemination of humanepigenomesThe consortium aims to provide the scientificcommunity ready access to a critical massof high-quality epigenomic data for cellsand tissues representative of normal humanbiology. These data will comprise multiplelevels of information, from raw sequencingdata and epigenomic profiles for an individualepigenomic feature in a single cell or tissue type,to integrated epigenomic maps that representa composite of multiple epigenomic profilesfor an individual cell type or, alternatively, thatcapture biological variation of such featuresacross different cell types. The consortiumwill also develop and disseminate softwaretools and algorithms to facilitate use of theseresults by the community—for example,through the ability to search for epigenomicsignatures common across genes or loci, or toidentify distinguishing features of cell lineages,developmental stages, cellular environments orderivation history. The latter may also be usedto classify disease states or to identify aberrantepigenomic features or regulatory programsthat underlie human pathology.The primary web portal for the consortium(http://www.roadmapepigenomics.org/) offersdetailed descriptions of the overall project,target cell and tissue types, and epigenomicassays used by the consortium. The portallinks to companion sites managed by NCBI(http://www.ncbi.nlm.nih.gov/geo/roadmap/epigenomics/) and the Data Analysis andCoordination Center (http://www.epigenomeatlas.org/) that provide access to raw andprocessed consortium data along with toolsfor visualization, analysis and integration ofepigenomic data.Progress and challengesThe use of established technologies andapproaches has enabled the consortium torapidly initiate data production. Notableprogress during the initial phase includedproduction of comprehensive DNAmethylomes for an hESC line (H1) andprimary fibroblasts 26 , and generation of hundredsof data sets—for major histone modifications,targeted DNA methylation analysis,RNA expression and chromatin accessibility—representing dozens of cell types, includingmultiple stem cell lines and ex vivo adult anddeveloping tissues. These data sets are nowavailable for download and viewing at theweb portals referenced above. Guided by otherNIH genomics projects 27 , the consortium hasadopted a data release policy under whichusers will have immediate access to the datanature biotechnology volume 28 number 10 october 2010 1047


COMMENTARY© 2010 Nature America, Inc. All rights reserved.but are expected to abide by a moratoriumon submission or presentation of works thatincorporate these data for the 9 months followingtheir release.Any effort of this scope inevitably faceschallenges and obstacles. The chief issueshave revolved around cell-type selection andacquisition, assay standardization and developingthe infrastructures for integration anddissemination of epigenome-scale data sets.Cell type selection and acquisition. A keyongoing challenge relates to the identificationand prioritization of cells and tissues by theconsortium. Ideally, models are selected onthe basis of pervasive biological and medicalimportance. However, the decisions areconfounded by issues of tractability. Manyhigh-value primary tissues are available inlimited quantities that push the detectionboundaries of current technologies. Inaddition, isolating relatively homogeneouspopulations from certain complex tissues caninvolve extensive preparative steps that maythemselves effect changes to the epigenome.Finally, our relatively crude understanding ofinter- individual epigenomic variation leavesopen the question of how many samples of agiven tissue type must be analyzed to yield arepresentative map. These challenges highlightthe importance of technology development,including effective procedures forisolating homogeneous cell populations,interrogating small samples and increasingthe throughput of the assays.Standardization of assays. The consortiumis implementing the latest epigenomictechnologies based on next-generationsequencing technology. Because thesetechnologies continue to evolve and areinherently dependent on preparative steps,there is an ongoing need to benchmarkand validate assays. In the case of histonemodification assays, substantial resourcesmust be committed to procurement andvalidation of high-quality antibody reagents,including confirmation of biochemicalspecificity and ChIP-seq efficacy. In thecase of DNA methylation, there is a need tobenchmark and standardize different assaytypes, including BS-seq applied either toreduced representations of the genome or tothe whole genome, as well as various enrichmentmethods in widespread use by the scientificcommunity 28 .Data integration and dissemination.Several challenges have emerged at the levelof data handling and analysis. First, a clearerunderstanding of the underlying data sets interms of sensitivity, specificity and precisionis needed and is being pursued as a jointeffort among the centers. Second, the sheervolume and complexity of consortiumgenerateddata has pushed the limits of existinganalytical and visualization tools. Thus,the development of a new generation of toolsfor integration, dissemination and interpretationof epigenomic data is vital to the overallsuccess of the program.Future and contextThe long-term goal of epigenomics research isa fuller understanding of how global changesin diverse functional features superimposedon the human genome sequence contributeto cellular phenotypes in health and disease.This is a complex and ambitious undertaking,the realization of which will ultimately requiresystematic dissection and analysis of tissues,characterization of disease models and detailedexposition of regulatory mechanisms throughmodel-organism studies. The efforts of theRoadmap Epigenomics Mapping Consortiumto establish an expansive resource of epigenomicmaps of normal cell and tissue phenotypes representsan important step in this direction. Bycatalyzing subsequent mechanistic studies ofchromatin, DNA methylation and transcription,these efforts should provide a springboard fordisease-focused studies, such as those currentlybeing pursued under the parallel Roadmapprogram Epigenomics of Human Health andDisease. These Roadmap efforts will also becomplemented by other major initiatives,such as the International Human EpigenomeConsortium, which was established to accelerateand coordinate epigenomics research worldwide(see accompanying paper 29 ).More broadly, the consortium aims tofoster synergistic interactions with relatedcollaborative projects, including theEncyclopedia of DNA Elements (ENCODE)Consortium 18 , the International HapMapProject and the 1000 Genomes Project 30 .The Epigenomics Mapping Consortium isdistinguished from these efforts by the broad setof normal primary tissues and stem cell–deriveddevelopmental models that it will survey. Assuch, it will provide a highly complementaryresource through which the in vivo state andbehavior of DNA elements catalogued underENCODE or implicated in studies of genomevariation may be understood. Such informationwill be essential for appreciating the relevanceof detected genomic elements and variants tonormal development and human disease.In the coming years, the RoadmapEpigenomics Program and other complementaryefforts should vastly improve understanding ofthe organization of the human epigenome andhow it varies across tissues, individuals anddisease states— information that may translatedirectly into the identification of aberrant epigeneticevents that underlie susceptibility to specificdiseases and environmental exposures.COMPETING FINANCIAL INTERESTSThe authors declare no competing financial interests.ACKNOWLEDGMENTSWe thank R. Waterland, C. Epstein, N. Shoresh and allconsortium members, as well as the NIH EpigenomicsImplementation Group, for discussions and feedback inthe drafting of this document.1. Altshuler, D., Daly, M.J. & Lander, E.S. Science 322,881–888 (2008).2. Bird, A. Nature 447, 396–398 (2007).3. Feinberg, A.P. Nature 447, 433–440 (2007).4. Jaenisch, R. & Bird, A. Nat. Genet. 33 Suppl, 245–254(2003).5. Jones, P.A. & Baylin, S.B. Cell 128, 683–692(2007).6. Kouzarides, T. Cell 128, 693–705 (2007).7. Bernstein, B.E., Meissner, A. & Lander, E.S. Cell 128,669–681 (2007).8. Fraser, P. & Bickmore, W. Nature 447, 413–417(2007).9. Zaratiegui, M., Irvine, D.V. & Martienssen, R.A. Cell128, 763–776 (2007).10. Schwartz, Y.B. & Pirrotta, V. Nat. Rev. Genet. 8, 9–22(2007).11. Grewal, S.I. & Moazed, D. Science 301, 798–802(2003).12. Henikoff, S. Nat. Rev. Genet. 9, 15–26 (2008).13. Jirtle, R.L. & Skinner, M.K. Nat. Rev. Genet. 8, 253–262 (2007).14. Hess, J.L. Crit. Rev. Eukaryot. Gene Expr. 14, 235–254(2004).15. Hansen, R.S. et al. Proc. Natl. Acad. Sci. USA 96,14412–14417 (1999).16. Dalgliesh, G.L. et al. Nature 463, 360–363 (2010).17 Batty, N., Malouf, G. G. & Issa, J. P. Cancer Lett. 280,192–200 (2009).18. Birney, E. et al. Nature 447, 799–816 (2007).19. Heintzman, N.D. et al. Nature 459, 108–112 (2009).20. Eckhardt, F. et al. Nat. Genet. 38, 1378–1385(2006).21. Meissner, A. et al. Nature 454, 766–770 (2008).22. Cokus, S.J. et al. Nature 452, 215–219 (2008).23. Barski, A. et al. Cell 129, 823–837 (2007).24. Hesselberth, J.R. et al. Nat. Methods 6, 283–289(2009).25. Mikkelsen, T.S. et al. Nature 448, 553–560 (2007).26. Lister, R. et al. Nature 462, 315–322 (2009).27. Toronto International Data Release Workshop AuthorsNature 461, 168–170 (2009).28. Suzuki, M.M. & Bird, A. Nat. Rev. Genet. 9, 465–476(2008).29. Satterlee, J. Nat. Biotechnol. 28, 1039–1044 (2010).30. Frazer, K.A. et al. Nature 449, 851–861 (2007).1048 volume 28 number 10 october 2010 nature biotechnology


commentaryEpigenomics reveals a functionalgenome anatomy and a new approachto common diseaseAndrew P Feinberg© 2010 Nature America, Inc. All rights reserved.Epigenomics provides the context for understanding the function of genome sequence, analogous to the functionalanatomy of the human body provided by Vesalius a half-millennium ago. Much of the seemingly inconclusive geneticdata related to common diseases could therefore become meaningful in an epigenomic context.New Year’s Eve in 2014 will mark the fivehundredthanniversary of the birth ofAndreas van Wesel, commonly known asVesalius, author of De humani corporis fabrica 1 ,a treatise almost as influential in its time as wasOn the Origin of Species over three centurieslater. Vesalius pioneered the rigorous study ofhuman anatomy and introduced experimentalobservation into medical education as a substitutefor hearsay. The late Victor McKusick, whohelped to create the Human Genome Project andmapped the first human autosomal gene, calledgene mapping “neo-Vesalian” 2 , as it representedan anatomy of the genome, similar to Vesalius’anatomy of the body, for finding genes. Vesaliuswas more than a mapper, though: he challengedthe dogma of both Galen and Aristotle on theanatomy of blood circulation by using thearrangement of structures in the body to correctlydeduce their functions. Similarly, the particularorder of genes on chromosomes and thearrangement of the chromosomes themselveshave only recently been found to be meaningfulbiologically, not just as a map.I suggest here that epigenomics—that is, thegenome-scale study of epigenetics—has transformedgenome science by showing that theorganization of the genome is important forgene function, just as Vesalius showed that theorganization of anatomic structures allowed thefunction of organs. Moreover, the combinationAndrew P. Feinberg is at the Center forEpigenetics and Department of Medicine,Johns Hopkins University School of Medicine,Baltimore, Maryland, USA.e-mail: afeinberg@jhu.eduof new epigenomic tools with conventionalgenetics, and a new mathematical language fortheir interface, may have as much impact onunderstanding of human disease as did Vesalius’anatomy a half-millennium ago.Epigenomics provides a functionalanatomy of the genomeEpigenomics has helped to reveal several surprisinglarge-scale functional relationshipsamong genes themselves and the surroundingnongenic DNA, previously hinted at bythe β-globin cluster. One is the generality oflarge (tens to thousands of kilobases) genomicregions regulating gene expression. Althoughthe β-globin gene cluster had been studied fordecades 3 and progressive chromatin changeshad been linked to globin gene switching duringdevelopment 4 , the generality and size ofmultigene chromatin domains emerged onlywith large-scale epigenomic mapping. Asincreasing numbers of imprinted genes werefound, it was discovered that they were organizedin gene clusters, often with commonregulatory elements, such as CCCTC bindingfactor (CTCF) binding sites 5 . With the adventof genome-scale mapping of histone modifications,many large regions of heterochromatinmodifications have been found, such as specificmodifications associated with the inactiveX chromosome 6 . Moreover, large autosomalregions of heterochromatin modificationacross Hox gene clusters have been determinedto be more highly conserved across speciesthan the underlying DNA sequence anddo not simply correspond to exonic boundaries7 . Thus, epigenomic studies have revealedthat the functional genome is at least an orderof magnitude greater in scope than what wassuspected on the basis of the sequence alone.Epigenomics has provided the genome withthe kind of functional anatomy that Vesaliusgave gross anatomy five centuries ago.Another unexpected large-scale genomic relationshipis frequent intra- and interchromosomalinteractions mediated by chromatin proteins.These were discovered through chromatincapturemethods, described in detail elsewherein this issue 8 , designed to preserve chromatinmediatedinteractions over long distances. DNAloop structures, mediated by chromatin, highlydynamic and surprisingly common, are associatedwith function. For example, several interleukingenes in the 200-kilobase (kb) mouseTH2 cytokine locus, when transcriptionallyactive, are folded into numerous loops anchoredby special AT-rich sequence-binding protein(SATB) at their bases 9 . Remarkably, trans interactionsbetween chromosomes involve some ofthe same sequences that epigenetically regulateimprinted gene domains, such as the H19 differentiallymethylated region, and may act throughtransvection to regulate genes in trans 10 .A recent example of large-scale genomicorganization mediated by chromatin is thelink between long RNAs, heterochromatinmodification and gene activity. At the‘Biology of Genomes’ meeting held at ColdSpring Harbor, New York, USA on 11–15 May2005, Tom Gingeras of Cold Spring HarborLaboratory asked for a wager on the numberof genes that will ultimately be agreed upon,arguing that the nearly 50% of the genomethat may be untranslated RNA will be provednature biotechnology volume 28 number 10 october 2010 1049


COMMENTARY© 2010 Nature America, Inc. All rights reserved.Approximate number of publications9,0008,0007,0006,0005,0004,0003,0002,0001,00001985Selected genesGenome scale19871989199119931995functional 11 . Growing evidence indicatesthat much of this RNA mediates chromatinstructure. For example, antisense RNAsappear to establish heterochromatin in mammaliangenes, independently of Dicer and thepost-translational microRNA machinery 12 .These regions may span >100 kb 12 , affectmultiple genes and involve Argonaut-familyproteins 13 . An exciting recent discovery isthe role of long intergenic noncoding RNAs(lincRNAs) in establishing heterochromatin.For example, HOTAIR is a lincRNA thatretargets PRC2 over HOX domains, leadingto marked changes in gene expression relevantto cancer progression 14 .Finally, large organized chromatin lysine (K)modifications (or LOCKs) have been shownto organize the genome into very large blocks(hundreds to thousands of kilobases), some ofwhich are differentiation-specific in their locationand extent and correspond to lamin-associateddomains (LADs) 15–17 . These very largeregions may provide a dynamic mechanism forfunctional organization of the genome and arealtered in cancer 15 .Large-scale mapping studies offer additionalclues that many such large-scale epigeneticnetworks profoundly influence cellular developmentand genome function. For example,CTCF, which mediates H19 imprinting, seemsto play a general role in defining the boundariesof functional gene regions 18 . Likewise, targetgenes of Polycomb, a protein thought to beinvolved in stable gene silencing, may alternatebetween functionally active and silent statesYearFigure 1 The rate of increase of genome-scale publications addressing cancer genetics has becomegreater than that of publications in the same area focused on selected genes. Whereas publishedgenome-scale studies represent only about 2% of cancer epigenetics, the rate of increase over the past5 years of cancer epigenomic studies is double that of conventional analyses based on selected genes.Numbers are approximate, from PubMed citation analysis; scales are different for gene-based andgenome-based plots; 2010 data are extrapolated.1997199920012003200520072009300250200150100Approximate number of publicationsover large gene regions 19 . That such networkshave a general role in organizing the genomefunctionally is suggested by the identificationof chromosome territories and the spatialproximity of gene-rich chromosomes 20 .Epigenomics may supersede single-geneepigenetic disease researchJust as epigenomics provides a functional anatomyof the normal genome, genome-scale studiesof epigenetic disease are helping us understandepigenetic pathology. And just as cancer was thevanguard for gene-specific disease epigenetics 21 ,genome-scale epigenetic studies of disease havealso focused first on cancer, revealing muchmore genetic pathology than was suggestedby candidate-gene approaches. For example,methylation changes can affect large genomicregions in colorectal cancer 22 , and widespreadmethylation changes are even more striking outsideof the usually examined CpG islands (i.e., inshores and gene bodies) 23 . Similarly, it came as asurprise to most when widespread alterations inhistone acetylation and methylation were foundto be ubiquitous in cancer 24 . Stem cells, thefocus for a wide range of both basic and appliedresearch on disease, have shown promiscuousmethylation differences from somatic cells ona genome-wide scale, notably including differencesat non-CpG sites 25 . Remarkably, the sitesof differential methylation largely overlap, withstrong statistical significance, across physiologicalstates—the same sites appear, for example,in normal cells compared with cancer cells, instem cells compared with differentiated cells500and in comparisons of tissues derived fromdifferent germ layers 26 . Thus, the language ofepigenomic organization seems to be commonfor normal development and for disease, just asthe language of anatomy is common for normaland abnormal physiology.Increasing appreciation of the importanceof large-scale epigenetic control in regulatinggene function has influenced how disease-basedgenomic studies are being organized. Althoughpublished genome-scale studies represent onlyabout 2% of cancer epigenetics, the rate ofincrease over the past five years of cancer epigenomicstudies is more than double that of conventionalgene-based analyses of cancer (Fig.1). The same relative increase in genome-scalestudies also seems to apply in the nascent fieldof noncancer human disease epigenetics, suchas epigenetics of cardiovascular, immunologicaland neuropsychiatric disease 27,28 . These differencesare driven in part by the availability of newtechnology, of course, but also by the growingrealization that variation in both DNA methylationand chromatin are widespread acrossthe genome and may be organized into largegenomic domains.Another important factor driving such ‘diseaseepigenomics’ is the relatively limited yield todate of conventional single-nucleotide polymorphism(SNP)–based genetic analysis in explainingmost common human diseases. As has beenwidely described in both scientific 29,30 and laypublications 31 , it was anticipated a decade agothat genetic analysis would be much more successfulat attributing risk of disease to specificgenetic markers.How is epigenomics transforming thesearch for genetic causes of common humandiseases? Many have suggested that environmentallydriven epigenetic variation may bean important contributing factor in diseaserisk, particularly as a surrogate for mutationalchange 32–34 (Table 1).But researchers should also consider anotherdimension to this epigenetic argument forcommon disease, an aspect that has receivedcomparatively less attention. Because the actual‘genome anatomy’ target for disease is probablymuch larger than scientists previouslyrealized—perhaps involving more than half ofthe genome—and because understanding ofthe normal function of this genome anatomyrequires epigenomics, it is possible that muchof what appears to be negative genetic-associationdata could become meaningful in an epigenomiccontext (Table 1). For example, mostgenome-wide association studies (GWAS)identify not genes, but nearby regions or intergenicdeserts. Yet these same regions frequentlyharbor differentially methylated regions thatdiscriminate tissue types or distinguish cancer1050 volume 28 number 10 october 2010 nature biotechnology


COMMENTARYTable 1 How epigenomics is transforming the search for genetic causes of common human diseaseEpigenome anatomy Possible disease link New approach to common disease searchEnvironmentally driven epigenetic variation Epigenome changes in absence of sequence variant Methylome arrays, capture bisulfite sequencing,chromatin immunoprecipitation with sequencingRegulatory site or expression Noncoding RNAs RNA sequencing and methods aboveKey disease sequences unlinked to target genes Intra- and interchromosomal interactions Chromatin network mappingRegulatory sequence distant from gene Coregulated gene clusters Genome-scale methylation, chromatin mappingSequence-defined methylation Sequence variants controlling epigenome Linked GWAS and epigenome studiesNew class of VMRs Sequence variants controlling epigenomic variance New statistics for reexamining and integrating GWASDomain disruption, anchoring proteins LOCKs and LADs Native chromatin whole-genome analysis© 2010 Nature America, Inc. All rights reserved.from normal cells. They are also the canonicalregions for lincRNAs that help establish chromatinstructure and normal gene function.Furthermore, gene deserts may promote transassociations of chromosomes in epigeneticregulation 35 . Another way in which diseaseassociatedDNA sequence variants mightaffect disease risk is through their linkage toDNA sequences that regulate DNA methylation,chromatin modification or binding factors.Substantial association of SNPs with DNAmethylation has already been found 36,37 .An additional possibility my group has proposedis that DNA sequence variants themselvesmight affect the stochastic or environmentallyinfluenced variance in the epigenome. Accordingto this model, individuals in a complex specieswould gain an evolutionary advantage byincluding alleles for increased epigenetic variationper se (i.e., genetic alleles that increase epigeneticvariance without affecting the mean) 38 .This would be like an evolutionary ‘hedgingone’s bet’ and would confer an advantage forgenes in pathways whose environment changesepochally (e.g., in response to the abundance offood and water). Examining inbred mice fromthe same litter and living in the same cage, weidentified hundreds of variably methylatedregions (VMRs) that are highly enriched byfunctional annotation for key genes in developmentand embryonic pattern formation 38 .Thus, development itself, which is regulatedby epigenetics, probably includes a great dealof stochasticity at the epigenetic level. Geneticvariants that increase this developmental plasticityat specific targets may confer an evolutionaryadvantage but might be deleterious to someindividuals after a recent epochal change in theenvironment, such as the recent Western diet 38 .Intriguingly, several VMRs have recently beenlinked to body mass index 39 .Finally, researchers are only beginning tounderstand the role of LOCKs and LADs infunctional genome organization. Their assessmentin disease will require robust genome-scaleapproaches to native chromatin measurementand availability of clinical specimens permittingsuch analyses (Table 1).Future technology developmentWhat potential areas for future technologydevelopment will fuel growth in this area? Ofcourse, as in non-epigenetic genome science,all roads lead to sequencing, including bisulfitegenome-scale sequencing for DNA methylation.The rollout of inexpensive, comprehensiveand high-throughput single-moleculesequencing has been slower than promised,and second-generation sequencing is stillimpractical for large-scale epidemiologicalstudies involving thousands of patients, exceptfor capture-based methods, such as padlockprobes 40 . The dilemma in capture-based studiesis that although they offer enormous advantagesin throughput, single-base resolutionand allele-specific data, they will not revealregions of differential methylation where wedo not already know to look—a problem thatmay be vast as epigenomics is applied to anever increasing number of diseases. At thesame time, high-throughput sequencing isrelatively cheap now for examining chromatinmodifications—but that is true only for studiesworking, for example, with modificationson a fairly small fraction of the genome purifiedby chromatin immunoprecipitation. Forlarge regional changes, such as LOCKs, thereare cost limitations similar to those for wholegenomebisulfite sequencing.An important advance will come fromreagents, such as the arrays from Illumina (SanDiego) and others, that are cheap and amenableto processing by typical university core laboratories.For example, a soon-to-be-released methylationchip from Illumina will provide ~450,000targets, including all CpG islands and shores, aswell as DNase-hypersensitive sites and otherregions identified and curated for this purposeby a consortium of laboratories organized byTom Hudson of McGill University in Montreal.Although this reagent may not be next year’sor even this year’s most comprehensive tool,450,000 targets isn’t bad—and such cooperativeapproaches open epigenomic research toany general laboratory, a very exciting development.Other exciting technological initiativesinclude epigenomic analysis of microdissectedsamples or even single cells, and enrichment ofsmall chromosomal fragments for biochemicalanalysis of chromatin 41 .A new epigenetic epidemiology will need tobe crafted. Research can no longer considergenetic variation in isolation when lookingfor disease relationships. Samples in ongoingand future large-scale cohorts must be preservedto allow analysis of DNA methylationand chromatin. But retrospectively, a greatdeal can be added to existing cohort studies,as DNA methylation is stable over decades.Much of the existing genetic data might bemade clearer by supplementing those studieswith epigenomic analysis. New cohort samplingshould include standard sources, suchas lymphocytes, but also, as much as possible,target tissues affected by the disease.Additionally, we need to develop new statisticaland epidemiological tools for diseaseepigenomics and for its synthesis with conventionalgenetic analysis. For example, unlikeSNPs, epigenetic variation is inherently quantitativeand thus does not lend itself to simpleallele designation (for example, quantitativelevels of DNA methylation or Polycomb complexmembers). The quantitative nature ofepigenome variation can help explain complextraits with a smaller number of contributingloci, as they do not necessarily require as manyof the additive signals originally proposedby R.A. Fisher 42 . Such an approach is beingapplied, for example, to the analysis of quantitativetraits associated with VMRs 39 .The apparent additional complexity thatepigenomics brings to genetics may seemdaunting. But I don’t think Vesalius wouldhave been intimidated, and I know Victorwould have been delighted.COMPETING FINANCIAL INTERESTSThe author declares no competing financial interests.ACKNOWLEDGMENTI thank E. Pujadas, K. Reddy and R. Ohlsson forcomments on the manuscript. This work wassupported by US National Institutes of Health grant5R37CA054358.1. Vesalius, A. De humani corporis fabrica libri septem(J. Oporini, Basel, Switzerland, 1543).nature biotechnology volume 28 number 10 october 2010 1051


COMMENTARY2. McKusick, V.A. J. Am. Med. Assoc. 286, 2289–2295(2001).3. Proudfoot, N.J., Shander, M.H., Manley, J.L., Gefter,M.L. & Maniatis, T. Science 209, 1329–1336 (1980).4. Crossley, M. & Orkin, S.H. Curr. Opin. Genet. Dev. 3,232–237 (1993).5. Viville, S. & Surani, M.A. Bioessays 17, 835–838(1995).6. Boggs, B.A. et al. Nat. Genet. 30, 73–76 (2002).7. Bernstein, B.E. et al. Cell 120, 169–181 (2005).8. van Steensel, B. & Dekker, J. Nat. Biotechnol. 28,1089–1095 (2010).9. Cai, S., Lee, C.C. & Kohwi-Shigematsu, T. Nat.Genet. 38, 1278–1288 (2006).10. Sandhu, K.S. et al. Genes Dev. 23, 2598–2603(2009).11. Kapranov, P., Willingham, A.T. & Gingeras, T.R. Nat.Rev. Genet. 8, 413–423 (2007).12. Yu, W. et al. Nature 451, 202–206 (2008).13. MacFarlane, L.A., Gu, Y., Casson, A.G. & Murphy, P.R.Mol. Endocrinol. 24, 800–812 (2010).14. Gupta, R.A. et al. Nature 464, 1071–1076 (2010).15. Wen, B., Wu, H., Shinkai, Y., Irizarry, R.A. & Feinberg, A.P.Nat. Genet. 41, 246–250 (2009).16. Hawkins, R.D. et al. Cell Stem Cell 6, 479–491(2010).17. Peric-Hupkes, D. et al. Mol. Cell 38, 603–613(2010).18. Smith, S.T. et al. Dev. Biol. 328, 518–528 (2009).19. Schwartz, Y.B. et al. PLoS Genet. 6, e1000805(2010).20. Lieberman-Aiden, E. et al. Science 326, 289–293(2009).21. Feinberg, A.P. & Vogelstein, B. Nature 301, 89–92(1983).22. Frigola, J. et al. Nat. Genet. 38, 540–549 (2006).23. Irizarry, R.A. et al. Nat. Genet. 41, 178–186(2009).24. Fraga, M.F. et al. Nat. Genet. 37, 391–400 (2005).25. Lister, R. et al. Nature 462, 315–322 (2009).26. Doi, A. et al. Nat. Genet. 41, 1350–1353 (2009).27. Saterlee, J., Schubeler, D. & Ng, H. Nat. Biotechnol.28, 1039–1044 (2010).28. Portela, A. & Esteller, M. Nat. Biotechnol. 28, 1057–1068 (2010).29. Manolio, T.A. et al. Nature 461, 747–753 (2009).30. Goldstein, D.B. N. Engl. J. Med. 360, 1696–1698(2009).31. Wade, N. A decade later, genetic map yields few newcures. New York Times (12 June 2010).32. Bjornsson, H.T., Fallin, M.D. & Feinberg, A.P. TrendsGenet. 20, 350–358 (2004).33. Petronis, A., Paterson, A.D. & Kennedy, J.L. Schizophr.Bull. 25, 639–655 (1999).34. Jiang, Y.H., Bressler, J. & Beaudet, A.L. Annu. Rev.Genomics Hum. Genet. 5, 479–510 (2004).35. Gondor, A. & Ohlsson, R. Nature 461, 212–217(2009).36. Kerkel, K. et al. Nat. Genet. 40, 904–908 (2008).37. Gibbs, J.R. et al. PLoS Genet. 6, e1000952 (2010).38. Feinberg, A.P. & Irizarry, R.A. Proc. Natl. Acad. Sci. USA107 Suppl 1, 1757–1764 (2010).39. Feinberg, A.P. et al. Sci. Transl. Med. 2, 49ra67(2010).40. Deng, J. et al. Nat. Biotechnol. 27, 353–360(2009).41. Bernstein, B.E. et al. Nat. Biotechnol. 28, 1045–1048(2010).42. Barton, N.H., Briggs, D.E.G., Eisen, J.A., Goldstein,D.B. & Patel, N.H. Evolution (Cold Spring HarborLaboratory Press, Cold Spring Harbor, New York,USA, 2007).© 2010 Nature America, Inc. All rights reserved.1052 volume 28 number 10 october 2010 nature biotechnology


commentaryPutting epigenome comparison intopracticeAleksandar MilosavljevicComparative analysis of epigenomes offers new opportunities to understand cellular differentiation, mutation effects anddisease processes. But the scale and heterogeneity of epigenetic data present numerous computational challenges.© 2010 Nature America, Inc. All rights reserved.Many layers of epigenomic informationare being mapped using methods basedon high-throughput sequencing and microarrays,but thus far, integrative analysis of epigenomicdata has been limited by the relativelyfew types of cells that have been assayed 1–4 .The most recent achievement in this area iscomputational inference of chromatin states 5defined by combinations of histone marks.New initiatives 6 , enabled by high-throughputsequencing–based assays, aim to systematicallysample many diverse cell types. In addition,falling costs for DNA sequencing are makingit feasible to conduct smaller-scale projectsfocused on specific diseases. This denser samplingof the space of epigenomic variation bylarge and small projects alike should provideunprecedented opportunities for discovery bycomparative analysis of epigenomes.Unlike DNA sequence, however, epigenomicdata are not digital. Furthermore, epigenomesmay be measured at several levels of resolution,from the 1-base-pair (bp) resolution ofDNA methylation detected by whole-genomebisulfite sequencing to >100-bp-resolutionmaps of histone marks or of methylationmeasured via methylated DNA immunoprecipitationand high-throughput sequencing(MeDIP-seq) 7 . In addition, epigenomic signalsmay be spread throughout the genomeand may not necessarily be associated withany specific genomic element. Epigenomicinformation may vary between cell types,between individuals and even between cellsAleksandar Milosavljevic is at The NIHEpigenomics Roadmap Data Analysis andCoordination Center, Molecular and HumanGenetics Department, Baylor College ofMedicine, Houston, Texas, USA.e-mail: amilosav@bcm.eduof the same type in a population. It may alsobe influenced by many molecular processes,including transcriptional regulation, splicing,and DNA recombination, replication andrepair 8 . Epigenomic diversity spans severaltimescales, ranging from short-term physiologicalprocesses, such as memory formation 9and cell differentiation 10 , to long-term processes,such as aging 11 and evolutionary variation12 . Epigenomic variation is also influencedby genetic, environmental, disease-associatedand experimental perturbations.The wide spectrum of biological processesinvolving epigenomic variation points to anopportunity for discovery by comparativeepigenome analysis. Comparative analysis hasbeen successfully applied to genomic DNAsequences and to perturbations of gene expressionpatterns 13 . As the sampling of epigenomicdiversity improves, comparative analyses ofepigenomes will provide increasing opportunitiesfor discovery by identifying, at everfiner levels of detail, epigenomic changes thatcorrelate with each other and with biologicallysignificant variables. Here I describe two applicationsof comparative analysis of epigenomesand then consider the relevant computationaland cyberinfrastructure challenges.Comparing epigenomes to map cellulardifferentiationWaddington’s epigenetic landscape concept 14,15suggests a bifurcating branching pattern of cellulardifferentiation. The now iconic pictureof the landscape is a visual representation ofcellular differentiation along specific trajectoriesin the abstract multi-dimensional space ofmolecular states within a cell. This totality ofmolecular states includes what we now referto as the epigenome. Epigenomes from severalrelated cell types might provide sufficientinformation to infer the bifurcating branchingpatterns of the epigenetic landscape.Studies of differentiation mediated by thePolycomb-Trithorax system suggest that thiswill be possible. In embryonic stem cells,Polycomb-Trithorax regulates genes containingCpG islands in their promoters. Such genesreside in a ‘bivalent’ or ‘poised’ state, definedby the presence of both trimethylated lysine 4on histone H3, an epigenetic mark associatedwith active genes, and trimethylated lysine 27on histone H3 (H3K27me3), a mark associatedwith inactive genes 4,16 . Genes marked withthis chromatin state may be activated or inactivatedupon differentiation. A recent study 17has identified extensive patterns of H3K27me3shared by two pancreatic cell types, beta cellsand acinar cells, which is consistent with theircommon developmental history. Specificallythis study found that the epigenomes of betacells contain H3K27me3 marks characteristicof the endodermal lineage of the pancreaticcells, whereas the gene expression signature ofbeta cells largely resembles those of ectodermderivedneural tissues. Additional results suggestthat the neural expression program ofbeta cells is activated during late pancreaticcell differentiation by a small number of transcriptionalregulators. This case shows thatepigenomes provide information about celllineages that may not be available at the levelof gene expression.One method to reconstruct the presumablybifurcating patterns of differentiation is the cladisticmethod 18 , which has been used to recoverevolutionary branching patterns of speciation.Unlike purely numerical methods that use thetotality of measurements of a single type, thecladistic method focuses on select evidence(‘characters’) from a diversity of sources relevantfor the reconstruction of a tree pattern 19 .nature biotechnology volume 28 number 10 october 2010 1053


COMMENTARY© 2010 Nature America, Inc. All rights reserved.Katie VicariProject 1 Project 2 Project 3Data level 0 1 2 3Data processingclusterComparisonComparisonclusterBy focusing on the bifurcating tree as theunderlying structure, the cladistic methodsucceeded in integrating evidence from paleontologicaland molecular data 18 . By analogy,in case of cell differentiation, the method holdspromise for integrating data obtained by directmeasurements on partially differentiated celltypes and from reconstructions based on fullydifferentiated ones.Comparing epigenomes to understandgenetic variationA comparison of two epigenomes may revealdifferences that are due to the variation in theunderlying genomic sequence. This may beaccomplished by identifying differences betweenthe epigenomes that coincide with changes ingenomic sequences in the same locus.The effects of genetic variation on theepigenome are just beginning to be comprehended20 , with the exception of a few relativelywell-understood genomic loci where variantscause human diseases. In some cases, such asin Rett syndrome, where the methyl-CpG–binding protein MeCP2 is mutated, geneticDatabaseclusterHumanepigenomeatlasFigure 1 A proposed cyberinfrastructure for epigenome analysis and comparison. The cyberinfrastructurewould connect users and resources that are geographically distributed over the network. A clinicalresearcher conducting a study of disease-related epigenomic perturbations would rely almost completelyon remote resources distributed over the web for primary processing of the data (data levels 0–3) andcomparative analysis using a human epigenome atlas.mutation acts in trans, in that mutation at asingle locus alters genome-wide patterns ofepigenome maintenance. Alternatively, geneticvariants may act in cis to alter local patterns ofepigenomic marks, as shown, for example, by arecent high-resolution genome-wide comparisonof DNA methylation and single-nucleotidepolymorphisms (SNPs) in humans 21 . This studyfound that allele-specific skewing of methylationlevels occurs at >35,000 sites across thegenome, suggesting that sequence variants havepervasive effects on the epigenome. Moreover,genetic mutations are known to affect localepigenetic marks in diseases such as fragile Xand facioscapulohumeral muscular dystrophy.The frequency with which sequence variantscause phenotypically significant changes inthe epigenome is an open question. A planhas been proposed 22 to use patterns of allelespecificepigenomic marks to identify SNPs offunctional significance within critical regionsdetected by genome-wide association studies.Epigenome comparisons may also help identifyfunctional consequences of structural variants.Cahan et al. 23 recently provided indirectevidence that the effects of copy-number variantson the epigenome may be widespread. Thestudy reports that the effects of copy-numbervariation on gene expression are not limitedto the genes within copy-altered loci, as hadcommonly been assumed. In fact, most of theaffected genes reside far from the structuralchange, leading the authors to hypothesize thatthe effects of structural variants may be mediatedby local changes in chromatin structure.Epigenome comparisons are likely to be usefulin testing this hypothesis.Computational and engineeringchallenges aheadComparing epigenomes to each other and toother types of data is challenging because theresolution of epigenomic signals is assay dependentand may not match the resolution of theother data sets. For example, assays of DNAmethylation based on bisulfite sequencing yielddata at nucleotide resolution, whereas MeDIPassays offer hundred-base-pair resolution 7 .There are a number of different solutionsto this problem. One is to average signals overfixed-size ‘windows’ across the genome or overfeatures such as exons, introns or enhancer elements.An alternative is to parse epigenomicsignals into discrete peaks. This is suitable forpunctate peaks, such as trimethylation of lysine4 on histone H3, but not for the broad peaksassociated with many other signals, such astrimethylation of lysine 36 on histone H3. Therewill probably be numerous ways in which thegenome-wide signals are transformed intonumerical data for epigenome comparison,with each transformation being appropriate forspecific purposes.Epigenomes may be compared by searchingfor similarity or by detecting differences.Searches for similarity among epigenomes mayborrow from methods developed for wholegenomecomparison. In particular, comparingepigenomes may require a combination of globaland local ‘alignment’ methods. Unlike genomicsequence, however, which provides a convenientconcept of ‘locality’ in the one-dimensionalbase-pair coordinate system, comparing epigenomesmay require sets of noncontiguousloci to be analyzed together to accommodateour knowledge of the three-dimensional organizationof chromosomes in the nucleus or ourknowledge of thousands of loci spread throughoutthe genome that are co-regulated by masterregulators of development. Such sets may be createdby grouping genomic regions containingbinding sites of specific master regulators, genesrelated to a particular differentiation pathway orgene elements such as promoters.Interpreting specific differences between twoepigenomes will depend on our understanding1054 volume 28 number 10 october 2010 nature biotechnology


COMMENTARYof the background variation in the signal. Inanalogy to DNA sequence comparisons, we needto understand which epigenomic marks are conservedat a specific locus and which are underlooser constraint in the same locus. Of course,the immediate problem is that we currently donot have much knowledge about the conservationof epigenomic marks across genomic loci.Gradual accumulation of data will solve thisproblem but probably not in a definitive way,because variation is not only locus-dependentbut may be also highly context dependent. Forexample, variation during development in onecell lineage may have different meaning thanvariation in a different lineage or variation dueto aging. Consequently, observed epigenomicdifferences will be open to context-dependentreinterpretation as more data accumulate.The comparative interpretation of epigenomicsignals will also pose several technical and engineeringchallenges that are often grouped underthe term ‘cyberinfrastructure’. These challengesinclude the standards, resources and tools forcomputer-aided discovery, data sharing and© 2010 Nature America, Inc. All rights reserved.Table 1 Key concepts for epigenomics research cyberinfrastructureRequirement Concept Description and examples of relevance for epigenomicsData level 0 refers to DNA sequence reads, typically in short read format (SRF) or fastq format.Data level 1 refers to reads mapped to a reference assembly, typically in sequence alignment/map(SAM), binary equivalent of SAM (BAM) or browser-extensible data (BED) formats. Level 1 data canbe used to identify both genomic and epigenomic variation. These data also include the unmapped(repetitive) fraction of reads.Data level 2 refers to ‘raw epigenomic signal’ such as read density plots, CpG methylation counts 28 orData level aother statistics, frequently in the bigWig UCSC Genome Browser format 29 .Data level 3 refers to typically discrete data such as chromatin immunoprecipitation with sequencing(ChIP-seq) peak calls or hidden Markov model segmentations segmentations of the genome into chromatinstates. These data are obtained by analyzing individual or multiple marks from a single sample. Dependingon data volume, they are stored either in high-density or in simple tab-delimited (GFF, LFF) formats.Data level 4 refers to results of epigenome comparisons. Syntax and semantics for this data level are stillunder development.Data reuse andData formats to meet the often conflicting requirements of storage efficiency for high-volume data (bigintegrationSyntaxWig), simplicity (tab-delimited) and machine readability (JavaScript Object Notation, or JSON; ExtensibleMarkup Language, or XML).Theory of meaning. This term is commonly used in connection with controlled vocabularies and ontologies,such as the widely used Gene Ontologies and other ontologies produced by the Open BiomedicalSemanticsOntologies Foundry and other projects.Set of technologies developed by the World Wide Web Consortium, including Resource DescriptionSemantic Web (Web 3.0) Framework for knowledge representation, that allows programmatic communication and automated reasoningabout information shared across the web.Data about data, a key requirement for data reuse. Various minimal standards have been recommended bygroups such as the Minimum Information for Biological and Biomedical Investigations project. In coordinationwith the European Bioinformatics Institute and the DNA Database of Japan, and guided by feedbackMetadatafrom the NIH Epigenomics Roadmap initiative and other users, NCBI has now developed version 1.2 of aSequence Read Archive (SRA)-XML metadata format for assays with sequencing readouts. Shared metadataformats will be essential for successful coordination of international epigenome projects.A set of analysis tools that are invoked sequentially to perform a data analysis task. Galaxy 30 is a softwaresuite with an interactive interface and an online service for pipeline design. One example is integration ofPipelinethe EpiGRAPH software for epigenome analysis using Galaxy 31 to identify epigenomic modifications thatcharacterize highly polymorphic (SNP-rich) promoters.Tool integrationA formal, portable, programmatically executable description of a data analysis process. May be used asWorkflowmetadata to document and ensure reproducibility of data analysis. Projects developing workflow systemsinclude Galaxy, GenePattern and Taverna.An environment for integration of data analysis and visualization tools and data sets (for example, CLCWorkbenchGenomics Workbench and Genboree Workbench).The address system of the Web, used to uniquely identify objects, such as web pages and epigenomeURI and URLmaps, for access by web browsers and other computer programs via Hypertext Transfer Protocol (HTTP)Web services andand other protocols.programmaticRepresentational State Transfer Application Programming Interface. A programming interface, typicallyinteroperabilityimplemented using HTTP, that is developed using a set of design principles to ensure efficient communicationof computer programs over the web. Provides access to data and computing resources over the webREST APIusing scripts written in a programming language such as Pearl, Python, Ruby or JavaScript.Cloud computingAccess to scalable, on-demand computing and storage services over the web.Access to computingresources and servicesAccess to software applications over the web, such as those for epigenomic data processing and comparison(Fig. 1). This is a key aspect of Web 2.0 (see below).Software as a serviceProtocol (for example, OpenID) allowing users or computer programs acting as their agents to be recognizedby multiple web servers.Authentication protocolCollaboration andpublicationWeb hosting of collaborative processes such as grant review at the NIH or epigenomic data processing andWeb 2.0comparison (Fig. 1).Examples include NCBI Gene Expression Omnibus and SRA archives, Ensembl, UCSC Genome BrowserDatabases, knowledge bases and archival repositoriesand more specialized resources such as the human epigenome atlas (Fig. 1).a This abstraction captures commonalities and facilitates development of data formats and tools for a diversity of genomic and epigenomic assays. Examples in the table focus on assays withsequencing readouts.nature biotechnology volume 28 number 10 october 2010 1055


COMMENTARY© 2010 Nature America, Inc. All rights reserved.collaboration over the web. The problems ofhigh-volume data capture, visualization, interpretationand reuse are currently recognizedas key limiting factors across scientific disciplines24 . Table 1 lists infrastructure requirementsand concepts relevant (but not necessarilyspecific) to epigenome research. A few of theseare described in detail below.Data reuse. One practical cyberinfrastructurechallenge for epigenomics research isto enable effective data exchange and reuse.The first step in this direction is to developa unifying framework for the multiple layersof heterogeneous information generatedby sequencing- and array-based assays. Datastandards are emerging from the coordinationbetween the Cancer Genome Atlas, the1000 Genomes Project, the Encyclopedia ofDNA Elements and the US National Institutesof Health (NIH) Epigenomics Roadmap(see ‘data levels’ in Table 1). The abstractdata levels codify commonalities across thediversity of assays and technologies used toobtain data. As the diversity of derived dataand knowledge increases, advanced methodsfor knowledge representation and exchange,such as the Resource Description Frameworkderived in the context of the Semantic Web,will need to be applied 25 .Metadata standards. Metadata is a key requirementfor reuse of epigenomic data in the publicdomain for comparative analyses becauseit provides the biological and experimentalcontext in which the data were generated. Oneexample is the Sequence Read Archive XMLschema developed by the US National Centerfor Biotechnology Information (NCBI) andadapted by the NIH Epigenomics Roadmapinitiative for epigenomic data.Reproducibility. Another practical challengeis to ensure reproducibility of reported analysisresults 26 . This problem may be tackled byencapsulating all aspects of computationalanalyses in the form of workflow descriptionsand distributing them as metadata withanalysis results.Data storage and computing power.Epigenome comparisons and higher-levelinterpretations will require substantial computationalresources. The use of multiple dataand computing resources that are geographicallydistributed over the web, and of ‘cloudcomputing’ (using shared remote computerhardware) and programming frameworkssuch as the Genome Analysis Toolkit 27 , maybe helpful.A human epigenome atlas. How will cyberinfrastructurebe used to facilitate epigenomecomparison? Figure 1 illustrates a hypotheticalscheme that includes several projectsand could involve clinical researchers usingweb-based services to process epigenomicdata and perform comparative analyses.This model, known as ‘software as a service’,is appealing because fewer local resourceswould be required. Such an arrangementwould be particularly important for adoptionof epigenomics in the context of translationaland disease-focused studies, wherelocal bioinformatics resources and expertisemay be limited. Many projects could usecloud computing and well-tested pipelineswith built-in quality-characterization stepsthat take in sequencing data (data level 0) asit is delivered from sequencers and generateepigenomic signals at the level of individualsamples (data levels 1–3). These signals wouldbe compared against a human epigenomeatlas, which would serve as a reference dataset much like the reference human genome.Other types of visualization and analysis arepossible. Upon publication, raw data and theresults of analyses would be archived andincorporated into the human epigenome atlasand other specialized repositories.One open issue is how best to involve theresearch community in the continued developmentand maintenance of repositories such as ahuman epigenome atlas. To stimulate the contributionof smaller projects to these data andknowledge commons, the NIH EpigenomicsRoadmap Consortium is collaborating withthe NCBI to develop standards for epigenomicmetadata and define reference pipelines foruniform processing and characterization ofthe quality of a variety of epigenomic assays.ConclusionsIn summary, comparative analysis of epigenomesis likely to provide many novel insights.Mapping the bifurcating tree of cellular differentiationshould be useful for understandingdevelopment. Precise and comprehensivemapping of epigenomic perturbations shouldreveal consequences of genomic mutations andenvironmental influences on human developmentand disease. To achieve these goals, wemust develop conceptual and computationalapproaches that address the heterogeneityand context dependence of epigenetic data.In addition, discovery would be aided by thebuilding of a cyberinfrastructure that includesshared repositories and knowledge bases ableto accommodate the unprecedented volume ofdata and diversity of applications.COMPETING FINANCIAL INTERESTSThe author declares competing financialinterests: details accompany the full-text HTMLversion of the paper at http://www.nature.com/naturebiotechnology/1. Birney, E. et al. Nature 447, 799–816 (2007).2. Barski, A. et al. Cell 129, 823–837 (2007).3. Heintzman, N.D. et al. Nat. Genet. 39, 311–318(2007).4. Mikkelsen, T.S. et al. Nature 448, 553–560 (2007).5. Ernst, J. & Kellis, M. Nat. Biotechnol. 28, 817–825(2010).6. Bernstein B.E. et al. Nat. Biotechnol. 28, 1045–1048(2010).7. Harris, R.A. et al. Nat. Biotechnol. 28, 1097–1105(2010)8. Kouzarides, T. Cell 128, 693–705 (2007).9. Levenson, J.M. et al. J. Biol. Chem. 281, 15763–15773 (2006).10. Reik, W. Nature 447, 425–432 (2007).11. Rakyan, V.K. et al. Genome Res. 20, 434–439(2010).12. Bernstein, B.E. et al. Cell 120, 169–181 (2005).13. Lamb, J. et al. Science 313, 1929–1935 (2006).14. Slack, J.M. Nat. Rev. Genet 3, 889–895 (2002).15. Waddington, C.H. The Strategy of the Genes: ADiscussion of Some Aspects of Theoretical Biology(Allen & Unwin, London, 1957).16. Bernstein, B.E. et al. Cell 128, 669–681 (2007).17. van Arensbergen, J. et al. Genome Res. 20, 722–732(2010).18. Ridley, M. Evolution and Classification: the Reformationof Cladism (Longman, London UK, 1989).19. Hennig, W. Phylogenetic Systematics (University ofIllinois Press, Urbana, Illinois, 1966).20. Meaburn, E.L. et al. Epigenetics 5, 578–582(2010).21. Schalkwyk, L.C. et al. Am. J. Hum. Genet. 86, 196–212 (2010).22. Tycko, B. Am. J. Hum. Genet. 86, 109–112 (2010).23. Cahan, P. et al. Nat. Genet. 41, 430–437 (2009).24. Tony Hey, S.T. & Tolle, K. (eds). The Fourth Paradigm:Data-Intensive Scientific Discovery (MicrosoftResearch, Seattle, 2009).25. Wang, X. et al. Nat. Biotechnol. 23, 1099–1103(2005).26. Mesirov, J.P. Science 327, 415–416 (2010).27. McKenna, A. et al. Genome Res. 20, 1297–1303(2010).28. Xi, Y. & Li, W. BMC Bioinformatics 10, 232 (2009).29. Rosenbloom, K.R. et al. Nucleic Acids Res. 38,D620–D625 (2010).30. Goecks, J. et al. Genome Biol. 11, R86 (2010).31. Bock, C. et al. Methods Mol. Biol. 628, 275–296(2010).1056 volume 28 number 10 october 2010 nature biotechnology


eviewEpigenetic modifications and humandiseaseAnna Portela 1 & Manel Esteller 1,2© 2010 Nature America, Inc. All rights reserved.Epigenetics is one of the most rapidly expanding fields in biology. The recent characterization of a human DNAmethylome at single nucleotide resolution, the discovery of the CpG island shores, the finding of new histonevariants and modifications, and the unveiling of genome-wide nucleosome positioning maps highlight theaccelerating speed of discovery over the past two years. Increasing interest in epigenetics has been accompanied bytechnological breakthroughs that now make it possible to undertake large-scale epigenomic studies. These allow themapping of epigenetic marks, such as DNA methylation, histone modifications and nucleosome positioning, whichare critical for regulating gene and noncoding RNA expression. In turn, we are learning how aberrant placement ofthese epigenetic marks and mutations in the epigenetic machinery is involved in disease. Thus, a comprehensiveunderstanding of epigenetic mechanisms, their interactions and alterations in health and disease, has become apriority in biomedical research.Even before DNA was identified as the molecule of inheritance,scientists knew that not every gene in an organism can be active ineach cell at all times. Even so, all cells in an organism share the samegenetic information. Conrad Waddington coined the term ‘epigeneticlandscape’ 1,2 for the molecular mechanisms that convert this geneticinformation into observable traits or phenotypes. In many instances,epigenetic gene expression patterns and associated phenotypes persistthrough mitosis or even meiosis, although no change in the primaryDNA sequence has occurred. Consequently, epigenetics is generallyunderstood to be the study of mechanisms that control gene expressionin a potentially heritable way.Recent breakthroughs in the understanding of the mechanisms underlyingepigenetic phenomena and their prevalence as contributors to thedevelopment of human disease have led to a greatly enhanced interest inepigenetic research.On a molecular level, covalent modifications of cytosine bases andhistones, and changes in the positioning of nucleosomes are commonlyregarded as the driving epigenetic mechanisms. They are fundamental tothe regulation of many cellular processes, including gene and microRNAexpression, DNA-protein interactions, suppression of transposable elementmobility, cellular differentiation, embryogenesis, X-chromosomeinactivation and genomic imprinting.In multicellular organisms, the ability of epigenetic marks to persistduring development and potentially be transmitted to offspring maybe necessary for generating the large range of different phenotypesthat arise from the same genotype 1,3–5 . For instance, cloned animalsgenerated from the same donor DNA are not identical to, and developdiseases with different penetrance from, their donor 1,3 . Human clonesthat arise spontaneously—monozygotic twins—are identical at theDNA sequence level, but have different DNA methylation 4,5 and histonemodification profiles 4 that might affect the penetrance of severaldiseases, such as cancer 4 or autoimmune disorders 6 . But this phenomenonis also observed at a single cell level: how can stem cells developinto any type of cell and how does a liver cell always give rise to twonew liver cells after cell division? Again, epigenetics seems to be part ofthe answer as it has been described as one of the key factors in cellulardifferentiation 7,8 (see the review by Meissner 9 in this issue).The importance of epigenetics in maintaining normal development andbiology is reflected by the observation that many diseases develop when thewrong type of epigenetic marks are introduced or are added at the wrongtime or at the wrong place 10 . For instance, a clear causality role for DNAmethylation in cancer is suggested by hypermethylation of some genes (e.g.,p16 INK4a , p14 ARF and MGMT) as an early event in tumorigenesis, as well asby tumor type-specific methylation landscape 11 . Here we summarize recentprogress in the field of epigenetic research and its role in disease, preparingourselves for the surprises that epigenetics might hold in the future.Epigenetic modifications and their machineriesFor didactic purposes, epigenetic modifications can be grouped into threemain categories: DNA methylation, histone modifications and nucleosomepositioning. It is important to keep in mind the interplay between epigeneticfactors—as the observed outcome is always the sum of their interactions—andthe many positive and negative feedback mechanisms.1 Cancer Epigenetics and Biology Program (PEBC), Bellvitge Biomedical ResearchInstitute (IDIBELL), Barcelona, Catalonia, Spain. 2 Institucio Catalana de Recercai Estudis Avançats (ICREA), Barcelona, Catalonia, Spain. Correspondence shouldbe addressed to M.E. (mesteller@iconcologia.net).Published online 13 October 2010; doi:10.1038/nbt.1685DNA methylation. The most widely studied epigenetic modificationin humans is cytosine methylation. DNA methylation occurs almostexclusively in the context of CpG dinucleotides. The CpG dinucleotidestend to cluster in regions called CpG islands 1 , defined as regions ofnature biotechnology volume 28 number 10 OCTOBER 2010 1057


eviewaRNA polUnmethylated CpG islandDNMTDNMTMethylated CpG islandTFTFE1RNA polE2E3MBDMBDx MBD MBDE1E2E3bUnmethylated CpG island shoreMethylated CpG island shore~2kbRNA polTFTFE1RNA polE2~2kbxE1E2cMethylated gene bodyUnmethylated gene body© 2010 Nature America, Inc. All rights reserved.dxxxRNA polRNA polRNA polTF TF RNA polRNA pol TF TF RNA pol TF TFE1 E2 E3 E4E1 E2E3 E4Methylated repetitive sequencexRepetitive sequenceTranspositionRecombinationGenome instabilityUnmethylated repetitive sequenceRepetitive sequenceFigure 1 DNA methylation patterns. DNA methylation can occur in different regions of the genome. The alteration of these patterns leads to disease in thecells. The normal scenario is depicted in the left column and alterations of this pattern are shown on the right. (a) CpG islands at promoters of genes arenormally unmethylated, allowing transcription. Aberrant hypermethylation leads to transcriptional inactivation. (b) The same pattern is observed when studyingisland shores, which are located up to 2 kb upstream of the CpG island. (c) However, when methylation occurs at the gene body, it facilitates transcription,preventing spurious transcription initiations. In disease, the gene body tends to demethylate, allowing transcription to be initiated at several incorrect sites. (d)Finally, repetitive sequences appear to be hypermethylated, preventing chromosomal instability, translocations and gene disruption through the reactivation ofendoparasitic sequences. This pattern is also altered in disease.more than 200 bases with a G+C content of at least 50% and a ratio ofobserved to statistically expected CpG frequencies of at least 0.6. CpGdinucleotides are usually quite rare in mammalian genomes (~1%).About 60% of human gene promoters are associated with CpG islandsand are usually unmethylated in normal cells, although some of them(~6%) become methylated in a tissue-specific manner during earlydevelopment or in differentiated tissues 12 (Fig. 1a).In general, CpG-island methylation is associated with gene silencing.DNA methylation plays a key role in genomic imprinting, wherehypermethylation at one of the two parental alleles leads to monoallelicexpression 13 . A similar gene-dosage reduction is observed inX-chromosome inactivation in females 14 .DNA methylation can inhibit gene expression by various mechanisms.Methylated DNA can promote the recruitment of methyl-CpG-bindingdomain (MBD) proteins. MBD family members in turn recruit histonemodifyingand chromatin-remodeling complexes to methylated sites 15,16 .DNA methylation can also directly inhibit transcription by precluding therecruitment of DNA binding proteins from their target sites 17 . In contrast,unmethylated CpG islands generate a chromatin structure favorable forgene expression by recruiting Cfp1, which associates with histone methyltransferaseSetd1, creating domains rich in the histone methylation markH3K4 trimethylation (H3K4me3; see below) 18 .DNA methylation does not occur exclusively at CpG islands. The termCpG island shores, referring to regions of lower CpG density that lie inclose proximity (~2 kb) of CpG islands, has recently been coined. Themethylation of these CpG island shores is closely associated with transcriptionalinactivation (Fig. 1b). Most of the tissue-specific DNA methylationseems to occur not at CpG islands but at CpG island shores 19,20 .Differentially methylated CpG island shores are sufficient to distinguishbetween specific tissues and are conserved between human and mouse.Moreover, 70% of the differentially methylated regions in reprogrammingare associated with CpG island shores 20,21 .DNA methylation is less frequently coupled with transcriptional activation,as when, for instance, it occurs at gene bodies (Fig. 1c). Genebody methylation is common in ubiquitously expressed genes and ispositively correlated with gene expression 22 . It has been proposed thatit might be related to elongation efficiency and prevention of spuriousinitiations of transcription 23 .DNA methylation and DNA methylation–associated proteins notonly participate in gene transcription regulation in cis, but also actin trans, being involved in nuclear organization and in the establishmentof specific chromosomal territories. An imprinted region canphysically interact with sequences distant in the primary sequence oron different chromosomes. These physical interactions in trans can1058 volume 28 number 10 OCTOBER 2010 nature biotechnology


eviewa DNA methylation Maintenance DNMT de novo DNMTDNMT1Hemimethylated DNAIt is recruited to methylatedDNA by URHF1Indirectly repressed by miR-29b,through SP1DNMT3ADNMT3BRecruited by EZH2 and G9A (HMTs)Interaction with nucleosomescontaining methylated DNADirectly repressed by miR-29bDNMT3A is recruited by HRR3me© 2010 Nature America, Inc. All rights reserved.b Histone modificationsamong others…c Chromatin remodelingHATHDACregulate transcription, as shown for the H19 imprinting control regionand the Osbpl1a/Impact loci 24 . Other examples of epigenetic playersthat cause three-dimensional (3D) rearrangements of the genome toregulate gene expression are the DNA methylation enzyme (DNA methyltransferase1; DNMT1) that participates in the maintenance of thenucleolar compartment architecture 25 and the methyl-CpG-bindingdomain (MBD) protein MeCP2, which is required for the formation ofa silent chromatin loop at the Dlx5-Dlx6 locus 26 (the 3D organizationof the genome is discussed in more detail in the review by van Steenseland Dekker 27 in this issue).DNA methylation is not only linked to gene transcription regulation.A significant fraction of deeply methylated CpGs is found in repetitiveelements (Fig. 1d). This DNA methylation is needed to protectchromosomal integrity, which is achieved by preventing reactivation ofendoparasitic sequences that cause chromosomal instability, translocationsand gene disruption 11 .Although DNA methylation mainly occurs in the CpG dinucleotidecontext in mammals, non-CG methylation has recently beendescribed in humans at CHG and CHH sites (where H is A, C or T).CHG and CHH methylation has been found in stem cells and seemsto be enriched in gene bodies directly correlated with gene expressionand to be depleted in protein binding sites and enhancers 28 . Thelevels of non-CpG methylation decrease during differentiation andAcetylation Methylation PhosphorylationHDAC1 and 2 can berecruited by MeCP2mir-449a targets HDAC1SET7 (HMT) regulatesDNMT1 stabilityHMTHDMSETDB1 and Suv39h(HMTs) are recruitedby MBD1KDM1B (HDM) is requiredto establish maternalgenomic imprintLSD1 is a subunit of theNuRD complexKinasePhosph.H3S10ph blocks H3K9meH3S10ph facilitates H3recognition by GCN5 (HAT)JAK2 phosphorylates H3,releasing HP1αSWI/SNF ISWI Mi-2 INO80miR-9* and miR-124 mediatethe BAF to npBAF switchBRM is recruited by MeCP2ISW2 excludes SWI/SNFfrom promoters bypostioning nucleosomesNURF recognizes theH3K4me3H4K16ac inhibits chromatinremodeling by ISWISET domains (HMT)recognize ISWI-remodelednucleosomal speciesCHD5 expression isrepressed by CpG islandmethylationMBD3 is an integralsubunit of Mi-2/NurDHDAC and 2 are integralcomponents of Mi-2/NuRDSWR1 removes the H2A-H2Bdimmers and replaces themwith H2A.Z-H2B dimmersp400 has HAT activityH2Aph enhances INO80recruitmentFigure 2 Epigenetic machinery and interplay among epigenetic factors. Epigenetic marks are catalyzed by different epigenetic complexes, whose principalfamilies are illustrated here. (a–c) Epigenetic regulation depends on the interplay among the different players: DNA methylation (a), histone marks (b) andnucleosome positioning (c). The interaction among the different factors brings about the final outcome. This figure illustrates selected examples of thepossible interrelations among the various epigenetic players.are restored in induced pluripotent stem cells, suggesting a key rolein origin and maintenance of pluripotent state 28,29 . Mechanisms ofno-CpG methylation remain unclear 29 .In addition to 5-methylcytosines, 5-hydroxymethyl-2′-deoxycytidinehas also been observed. So far, 5-hydroxymethyl-2′-deoxycytidine hasbeen reported in Purkinje cells (constituting 0.6% of total nucleotides)and in granule cells (constituting 0.2% of total nucleotides), but it seemsnot to be present in cancer cell lines 30 . These new DNA modificationsneed to be further studied to determine their implications for normal anddiseased epigenetic regulation.More work is also required in the development of new technologicalapproaches 31,32 and powerful analytical tools 33 , which have proven tobe crucial for the progress of the field 34 . Massive parallel sequencing isproviding lots of data, but its accurate analysis and interpretation, andits price remain as the last drawbacks to work with DNA methylomesat base resolution. Beyond sequencing-based technologies, the recentlyreleased, refined methylation arrays are worth considering for certaingenomic questions.DNA methylation is mediated by the DNMT family of enzymes thatcatalyze the transfer of a methyl group from S-adenosyl methionine toDNA. In mammals, five members of the DNMT family have been reported:DNMT1, DNMT2, DNMT3a, DNMT3b and DNMT3L, but only DNMT1,DNMT3a and DNMT3b possess methyltransferase activity.nature biotechnology volume 28 number 10 OCTOBER 2010 1059


eview© 2010 Nature America, Inc. All rights reserved.The catalytic members of the DNMT family are customarily classifiedinto de novo DNMTs (DNMT3A and DNMT3B) and maintenanceDNMTs (DNMT1). DNMT3A and DNMT3B are thought to be responsiblefor establishing the pattern of methylation during embryonic development.The de novo DNMTs are highly expressed in embryonic stem (ES)cells and downregulated in differentiated cells 15 . The DNMT3 family containsa third member, DNMT3L, which is required for establishing maternalgenomic imprinting, despite being catalytically inactive 35 . DNMT3Lis expressed during gametogenesis when genomic imprinting takes place.It acts as a general stimulatory factor for DNMT3a and DNMT3b andinteracts and co-localizes with them in the nucleus 36,37 .The maintenance DNMT, DNMT1, has a 30- to 40-fold preference forhemimethylated DNA, and also has de novo DNMT activity. DNMT1 isthe most abundant DNMT in the cell and is transcribed mostly during theS phase of the cell cycle. It is most often needed to methylate hemimethylatedsites that are generated during semi-conservative DNA replication(Fig. 2). In a cellular context the affinity of DNMT1 to newly synthesizedDNA is increased by its interaction with the DNA polymerase processingfactor proliferating cell nuclear antigen (PCNA), ensuring localization tothe replication fork 38 . The ubiquitin-like plant homeodomain and RINGfinger domain-containing protein 1 (UHRF1) could perform a similarfunction, tethering DNMT1 to hemimethylated DNA, thanks to its SETand RING associated–domain, that shows strong preferential binding tohemimethylated CpGs 39 (Fig. 2a).However, the division of labor between de novo and maintenancemethylation is not always so clear, and a revised model has recentlybeen proposed by Jones and Liang 40 . The updated model still supportsthe idea that the bulk of DNA methylation in dividing cells would bemaintained by DNMT1 in conjunction with UHRF1 and PCNA. But italso proposes that DNMT3A and DNMT3B, which have been shownto anchor strongly to nucleosomes containing methylated DNA 41(Fig. 2a), are compartmentalized in methylated regions, methylatingthe sites missed by DNMT1 at the replication fork. Finally, DNMT2,despite containing all the catalytic signature motifs of conventionalDNMTs, has almost no DNMT activity. However, it has been reportedthat DNMT2 methylates tRNA Asp (ref. 42).One of the most intriguing questions in the DNA methylation field ishow the DNA methylation machinery is directed to specific sequences inthe genome. Several mechanisms have been proposed, mainly suggestinginteraction of DNMTs with other epigenetic factors 41,43–47 (Fig. 2).More recently, small inhibitory (si)RNA-mediated, RNA-directed DNAmethylation have also been described. In plants, RNA-directed DNAmethylation is a stepwise process initiated by double-stranded RNAs thatrecruit DNMTs to catalyze de novo DNA methylation of specific regionsincluding not only gene promoters but also repetitive sequences 48–51 .Although the process is well studied in plants and some of the RNAdirectedDNA methylation components are conserved in mammals, it isstill unclear if similar processes are involved in regulating DNA methylationin animals. There are no reports suggesting the involvement of longintergenic ncRNA (lincRNAs) in DNA methylation.Histone modifications. Histones are key players in epigenetics. Thecore histones H2A, H2B, H3 and H4 group into two H2.A-H2.B dimersand one H3-H4 tetramer to form the nucleosome. A 147-bp segment ofDNA wrapped in 1.65 turns around the histone octamer and neighboringnucleosomes are separated by, on average, ~50 bp of free DNA. Thecore histones are predominantly globular except for their N-terminal tails,which are unstructured 52 . Histone H1 is called the linker histone. It doesnot form part of the nucleosome but binds to the linker DNA (that is, theDNA separating two histone complexes), sealing off the nucleosome atthe location where DNA enters and leaves 53 .All histones are subject to post-transcriptional modification. Severalpost-transcriptional modifications occur in histone tails: acetylation,methylation, phosphorylation, ubiquitination, SUMOylation and ADPribosylation52,54 , among others (Fig. 3). Histone modifications haveimportant roles in transcriptional regulation, DNA repair 55 , DNA replication,alternative splicing 56 and chromosome condensation 52 .In relation to its transcriptional state, the human genome can be roughlydivided into actively transcribed euchromatin and transcriptionally inactiveheterochromatin. Euchromatin is characterized by high levels ofacetylation and trimethylated H3K4, H3K36 and H3K79. On the otherhand, heterochromatin is characterized by low levels of acetylation andhigh levels of H3K9, H3K27 and H4K20 methylation 57 . Recent studieshave demonstrated that histone modification levels are predictive for geneexpression. Actively transcribed genes are characterized by high levelsof H3K4me3, H3K27ac, H2BK5ac and H4K20me1 in the promoter andH3K79me1 and H4K20me1 along the gene body 58 (Fig. 4).However, the notion of heterochromatin as a transcriptionallyinactive region has been challenged by the discovery of numerousnoncoding RNAs (ncRNAs) derived from heterochromatic loci 51 .For instance, Schizosaccharomyces pombe centromeric regionsexpress siRNAs that bind to the RNA-induced transcriptional silencingcomplex and provide sequence specificity to the complex. TheRNA-induced transcriptional silencing complex is required for H3K9methylation at centromeric repeats and for the recruitment of thehistone methylation enzyme Clr4, which is essential for the spreadingof heterochromatic domains 51,59,60 . But centromeric siRNAs arenot the only ncRNAs that are capable of directing histone modifications61 . Well-known examples of this phenomenon in humans arethe ncRNAs XIST and HOTAIR. XIST is involved in the silencing ofthe inactive X chromosome in females, through the recruitment ofPolycomb-repressing complexes (PRC) with methyltransferase andhistone ubiquitinase activity 62,63 . HOTAIR is a lincRNA transcribedfrom the HOXC cluster that represses genes in the HOXD cluster byrecruiting the histone methyltransferase PRC2 (ref. 64).All the modifications described so far are covalent post-transcriptionalmodifications. However, a new type of modification has recentlybeen described. The histone H3 tail is clipped after the Ala21 residue,cutting off the N-terminal 21 residues and associated post-transcriptionalmodifications. This modification represents the first massiveclearing of histone marks to be reported. Histone H3 clipping seemsto be inhibited by H3K4me 65 .Histones can be modified at different sites simultaneously. The corehistones forming the nucleosome can each have several modifications,giving rise to cross-talk among the different marks. Communicationamong histone modifications can occur within the same site 66 , in thesame histone tail 67 and among different histone tails 68 (Fig. 2b). Thus,a single histone mark does not determine outcome alone; instead, it isthe combination of all marks in a nucleosome or region that specifiesoutcome. A recent paper has described the existence of up to 51 distinct‘chromatin states’ based on the enrichment of specific combinations ofhistone modifications. Distinct biological roles are suggested for thedifferent chromatin states 69 . An interesting case of co-existing histonemodifications is found in ES cells within the ‘bivalent domains’, wherethe H3K4me3 active mark is found together with the H3K27me3 repressivemark at promoters of developmentally important genes. Bivalentdomains enable ES cells to tightly regulate and rapidly activate geneexpression during different developmental processes, but are lost withcell commitment 70,71 .As mentioned before, all the epigenetic players interact with eachother. An interesting example of the interplay between histone modificationsand DNA methylation is the relationship between DNMT3L1060 volume 28 number 10 OCTOBER 2010 nature biotechnology


eviewP A P M P A P AUAMM AH1.4N-S E T A P… …A E K T P V… …K S A G A A K R K A S… …K A V A A S K E R… …A L K K A L...3 17 18 26 2734 36 46 526364AAAMMMP AM PP…K S L V S K G T L V Q T K… …S F K L N… …K S A K K T… …K K A K S… …P K S P A… -C85 9097 106 149 154 168 172 186H2APAAAAAN-S R G K Q G G K A R A K A K S… …L R K G N… …L G K V T… …L P K K T E S H…-C1 5 9 13 153699 119 120MUM PH2BAMAP AAUMN-…P A K S A... …K G S K K A V T K… …V Y K V L… …Y N K R S… …L A K H A… …K A V T K…-C5 12 1415 20 4385108 116 120AAAAUM P AMM AMPPAM AAMM AM PAMMPP© 2010 Nature America, Inc. All rights reserved.H3.1H4N-A R T K Q T A R K S T G G K A P R K Q L A T K A A R K S A P A T G G V K K P H R Y R P G T V…2 3 4 8 9 101114 17 18 23 26 27 28363741 45…Y Q K S T… …D F K T D…-C5679and H3K4. DNMT3L specifically interacts with histone H3 tails,inducing de novo DNA methylation by recruitment of DNMT3A;however, this interaction is strongly inhibited by H3K4me 43 .Furthermore, several histone methyltransferases have also beenreported to direct DNA methylation to specific genomic targetsby recruiting DNMTs 44,45 , helping in this way to set the silencedstate established by the repressive histone marks. Moreover, histonemethyltransferases and demethylases can also modulate the stabilityof DNMT proteins, thereby regulating DNA methylation levels 46,47(Fig. 2b). On the other hand, DNA methylation can also direct histonemodifications. For instance, methylated DNA mediates H3K9methrough MeCP2 recruitment 72 .Many enzymes that catalyze covalent post-transcriptional modificationshave been described 52,73 . Because the modifications aredynamic, enzymes to remove these post-transcriptional modificationshave also been reported 52,73,74 . However, the list of histone modifications,its writers and erasers, might not yet be completed. Of theenzymes that modify histones, methyltransferases, histone demethylasesand kinases are the most specific to individual histone subunitsand residues 52,75 . Conversely, most of the histone acetyltransferases(HATs) and histone deacetylases (HDACs) are not highly specific andmodify more than one residue.Many transcriptional co-activators (e.g., GCN5, PCAF, CBP, p300,Tip60 and MOF) have been reported to possess intrinsic HAT activity,whereas many transcriptional co-repressor complexes (e.g., mSin3a,NCoR/SMRT and Mi-2/NuRD) contain subunits with HDAC activity 66 .Surprisingly, it has recently been reported that HDACs and HATs areboth targeted to transcribed regions of active genes by phosphorylatedRNA polymerase II. Thus, most HDACs in the human genome functionto reset chromatin by removing acetylation at active genes, whereasHATs, by contrast, are mainly linked to transcriptional activation 76 .PAMMAAMAMAAMN-S G R G K G G K G L G K G G A K R H R K... …R I S G L… …V L K V F… …K R K… …L K R Q…-C1 3 5 8 12 16 20 4759 77 79 9192Figure 3 Histone modifications. All histones are subject to post-transcriptional modifications, which mainly occur in histone tails. The main posttranscriptionalmodifications are depicted in this figure: acetylation (blue), methylation (red), phosphorylation (yellow) and ubiquitination (green). Thenumber in gray under each amino acid represents its position in the sequence.PMAAA MNucleosome positioning. Nucleosomes are a barrier to transcriptionthat blocks access of activators and transcription factors to theirsites on DNA, at the same time they inhibit the elongation of thetranscripts by engaged polymerases. The packaging of DNA intonucleosomes appears to affect all stages of transcription, therebyregulating gene expression.In particular, the precise position of nucleosomes around the transcriptionstart sites (TSSs) has an important influence on the initiationof transcription. A preferential positioning of nucleosomes can bedescribed at any given genomic locus. Nucleosome displacements ofas few as 30 bp at TSS have been implicated in changes in the activityof RNA polymerase II. Moreover, the 5′ and 3′ ends of genes possessnucleosome-free regions needed to provide space for the assembly anddisassembly of the transcription machinery. The loss of a nucleosomedirectly upstream of the TSS is tightly correlated with gene activation,whereas the occlusion of the TSS by a nucleosome is associated withgene repression 77,78 (Fig. 4).Nucleosome positioning not only determines accessibility of the transcriptionfactors to their target DNA sequence but has also been reportedto play an important role in shaping the methylation landscape 79 (Fig. 4).Besides transcription regulation, nucleosome occupancy also participatesin directing meiotic recombination events 80 .The precise function of nucleosomes is influenced by the incorporationof different histone variants. Histone variants are distinguished fromcore histones by the fact that they are expressed outside of S phase andare incorporated into chromatin independently from DNA replication.They differ from core histones in their tails, in their domain structureand in a few key amino acids 57 . Histone variants regulate nucleosomepositioning and gene expression 23 . For example, the incorporation of thehistone variant H2A.Z protects genes against DNA methylation 81 . Thus,the interplay among different epigenetic partners becomes evident oncenature biotechnology volume 28 number 10 OCTOBER 2010 1061


eview© 2010 Nature America, Inc. All rights reserved.more. The nucleosome remodeling machinery is influenced by DNAmethylation 82 and has been linked with specific histone modifications 83(Fig. 2c). MicroRNAs (miRNAs) can also regulate histone variant replacement84 or interact with chromatin remodeling complexes mediating theexchange of specific subunits 85 .Several groups of large macromolecular complexes are known to move,destabilize, eject or restructure nucleosomes in an ATP hydrolysis–dependentmanner. These complexes, known as chromatin remodeling complexes,can be classified into four families (SWI/SNF, ISWI, CHD andINO80) that share similar ATPase domains but differ in the compositionof their unique subunits 86 .In the first of these families, the SWI/SNF family, members have as acatalytic unit either Brahma (BRM) or BRG1, which share ~75% of identitybut differ in their first 60 amino acids. SWI/SNF family complexes aremaster regulators of gene expression, regulating expression of—amongothers—FOS, CSF-1, CRYAB, MIM-1, p21 (also known as CDKN1A),HSP70, VIM and CCNA2. Moreover, SWI/SNF has also been reported tomodulate alternative splicing 87 .Many members of the second class, the ISWI family, such as ACF andCHRAC, have been reported to promote chromatin assembly and torepress transcription. However, NURF, another complex of this family, iscapable of activating RNA polymerase II thus participating in transcriptionalactivation 88 .In the CHD family, some members participate in the sliding and ejectionof nucleosomes, promoting transcription; however, others, such as theMi-2/NuRD complex, have repressive roles and contain HDAC activityand MBD proteins 88 (Fig. 2c).Members of the last group, the INO80 family, have been reported toparticipate in multiple cellular processes: transcriptional activation, DNArepair, telomere regulation, chromosome segregation and DNA replicationamong others 86 . However, the SWR1 member has the unique ability torestructure the nucleosome, removing the H2A-H2B dimers and replacingthem with H2A.Z-H2B dimers 88 (Fig. 2c).Epigenetic modifications in cancerIn addition to featuring classic genetic mutations, cancer cells presenta profoundly distorted epigenetic landscape (Table 1). The cancerepigenome is characterized by global changes in DNA methylation,histone modification patterns and chromatin-modifying enzymeexpressionprofiles 11,89 , which play important roles in cancer initiationand progression.DNA methylation. Cancer cells are characterized by a massive global lossof DNA methylation 90 (20–60% less overall 5-methyl-cytosine). At thesame time, the acquisition of specific patterns of hypermethylation at theCpG islands of certain promoters is frequently observed (Fig. 1a).Global hypomethylation occurs mainly at repetitive sequences,promoting chromosomal instability, translocations, gene disruptionand reactivation of endoparasitic sequences 90,91 (Fig. 1d). A clearcase is the LINE family member L1, which has been shown to behypomethylated in a wide range of cancers, including breast, lung,bladder and liver tumors 92 .Hypomethylation at specific promoters can activate the aberrantexpression of oncogenes and induce loss of imprinting (LOI)in some loci. For instance, MASPIN (also known as SERPINB5), atumor suppressor gene that becomes hypermethylated in breast andprostate epithelial cells 93 , appears to be hypomethylated in othertumor types. MASPIN hypomethylation, and therefore its expression,increases with the degree of dedifferentiation of some types ofcancer cells 94,95 . S100P in pancreatic cancer, SNCG in breast and ovariancancers and melanoma-associated gene (MAGE) and dipeptidylpeptidase 6 (DPP6) in melanomas are other well-studied examples ofhypomethylated genes in cancer 19,92 . The most common LOI eventdue to hypomethylation is insulin-like growth factor 2 (IGF2), whichhas been reported in a wide range of tumor types, including breast,liver, lung and colon cancer 96 .In contrast to global DNA hypomethylation, hypermethylation isobserved at specific CpG islands (Fig. 1a). The transcriptional inactivationcaused by promoter hypermethylation affects genes involved in themain cellular pathways: DNA repair (hMLH1, MGMT, WRN, BRCA1),vitamin response (RARB2, CRBP1), Ras signaling (RASSFIA, NOREIA),cell cycle control (p16 INK4a , p15 INK4b , RB), p53 network (p14 ARF , p73(also known as TP73), HIC-1) and apoptosis (TMS1, DAPK1, WIF-1,SFRP1), among others 15 . Hypermethylated promoters have been proposedas a new generation of biomarkers and hold great diagnostic andprognostic promise for clinicians 97 (reviewed in more detail by Jonesand colleagues 98 in this issue).However, even though the focus of most studies is on CpG islands locatedin promoters, recent findings suggest that most of the aberrant DNA methylationin cancer occurs in CpG island shores (e.g., in HOXA2 and GATA2)(Fig. 1b). Notably, most changes in CpG island shores (45–65%) seem tobe associated with regions that become hypermethylated during normaltissue differentiation (e.g., in TGFB1 and PAX5) 19,20 . Differential DNAmethylation seems to correlate with gene expression at CpG island shoresjust as it does with CpG islands 21 .Human tumors are also characterized by an overall miRNA downregulation99 often caused by hypermethylation at the miRNA promoters 100 .For example, miR-124a is repressed by hypermethylation, mediatingCDK6 activation and Rb phosphorylation 101 . Interestingly, inactivationof miRNA expression by hypermethylation is not only linked to cancerbut also to metastasis development. Silencing of miR-148, miR-34b/c andmiR-9 by promoter hypermethylation favors tumor dissemination fromthe original location 102 .Hypermethylation patterns are tumor-type specific and it is stillunclear why certain regions become hypermethylated, whereas othersremain unmethylated. One possibility is that inactivation of particulargenes confers a growth advantage, resulting in clonal selection 15 . Insome cases, it has also been proposed that aberrant CpG-island methylationcould be due to the recruitment of DNMTs and HDACs tospecific target genes mediated by fusion proteins, such as the promyelocyticleukemia–retinoic acid receptor-α (PML–RARA) fusion protein,expressed in some leukemias 103 . Another possibility is the spreading ofmethylation from highly methylated sequences to their surroundingsis more pronounced in cancer. It has been reported that epigeneticsilencing by DNA methylation can span 1-Mb-long regions of a chromosome104 , resembling the loss of heterozygosity often observed inhuman tumors. This global distortion of the DNA methylation patterncould also be mediated by dysregulation of DNMT expression. DNMT1and DNMT3b are overexpressed in many tumor types 105 . Moreover,DNMT expression can also be regulated by miRNAs. The miR-29 familyis known to directly target and downregulate DNMT3A and DNMT3B,and indirectly target DNMT1 (ref. 106) (Fig. 2a).Histone modifications. The most prominent alteration in histone modificationin cancer cells is a global reduction of monoacetylated H4K16 (ref.107). Loss of acetylation is mediated by HDACs, which have been found tobe overexpressed 108 or mutated 109 in different tumor types. The main classof HDACs implicated in this process is the Sirtuin family of proteins 110 .Gene expression and deacetylase activity of SirT1 is upregulated in severalcancer types. Moreover, SirT1 interacts with DNMT1, thus affecting DNAmethylation patterns 25 . HDAC expression can be regulated by miRNAs,such as miR-449a, which, by repressing the expression of HDAC-1 in1062 volume 28 number 10 OCTOBER 2010 nature biotechnology


eviewH3K27H3K27H3K9H4K20 H4K20 H3K27H3K9 H3K9 H3K9 H4K20MMMMMMMMMMAH3K4MAH3K79H3K36 MMA© 2010 Nature America, Inc. All rights reserved.Figure 4 Nucleosome positioning patterns. Nucleosome positioning plays an important role in transcriptional regulation. Transcriptionally active genepromoters possess a nucleosome-free region at the 5′ and 3′ untranslated region, providing space for the assembly and disassembly of the transcriptionmachinery. The loss of a nucleosome directly upstream of the TSS is also necessary for gene activation, whereas the occlusion of this position leads totranscription repression. DNA methylation regulates transcription, and thus interferes with nucleosome positioning. Methylated DNA seems to be associatedwith ‘closed’ chromatin domains, where DNA is condensed into strictly positioned nucleosomes, thereby impeding transcription. Conversely, unmethylatedDNA is associated with ‘opened’ chromatin domains, which allow transcription.prostate cancer cells, regulates cell growth and viability 111 (Fig. 2b). Inaddition to alteration in HDAC expression, several cancer types (e.g.,colon, uterus, lung and leukemia) also bear translocations leading to theformation of aberrant fusion proteins, mutations or deletions in HATsand HAT-related genes 112,113 , thus contributing to the global imbalanceof histone acetylation.Besides the global loss of H4K16ac, cancer cells suffer a globalloss of the active mark H3K4me3 (ref. 114) and the repressive markH4K20me3 (ref. 107), and a gain in the repressive marks H3K9me(ref. 115) and H3K27me3 (ref. 116). Altered distribution of thehistone methyl marks in cancer cells is mainly due to the aberrantexpression of both histone methyltransferases and histone demethylases75 . A recent publication has described inactivating mutations inthe histone methyltransferase SETD2 and in the histone demethylaseUTX and JARID1C in renal carcinomas 117 . Another example is thehistone methyltransferase EZH2—a subunit of the PRC2 and PRC3complexes—which enhances proliferation and neoplastic transformationand is overexpressed in several cancer types. Overexpressionof the lincRNA HOTAIR in breast tumors and metastases retargetsPRC2 and alters H3K27me3 landscape 118 . Moreover, EZH2 expressionis upregulated in many cancer tumors due to the genomic loss ofmiR-101 (ref. 119). In addition to its histone methyltransferase activity,EZH2 interacts with DNMTs directly controlling DNA methylation116 . NSD1, another histone methyltransferase, has been reportedto undergo promoter DNA methylation-dependent silencing in neuroblastomas120 . DOT1L, the major H3K79 histone methyltransferase,is essential for the establishment of a euchromatic state that allowsthe expression of tumor suppressor genes 121,122 .In leukemias, the presence of mixed lineage leukemia (MLL) fusiononcoproteins leads to aberrant patterns of H3K79 and H3K4 methylation,resulting in altered gene expression of MLL targets 123,124 . Some histonedemethylases (e.g., GASC1, LSD1, JmjC and UTRX) have also been shownto be upregulated or amplified in several cancers, including prostate cancerand squamous cell carcinomas 125 .Although further studies are needed, histone phosphorylation alsoseems to be relevant in cancer. Histone phosphorylation plays a rolein DNA damage-repair response, chromosome stability and apoptosis.Recently JAK2, a nonreceptor tyrosine kinase that regulates severalcellular processes by inducing cytoplasmic signaling cascade, has beenreported also to be present in the nucleus, directly phosphorylatingH3Y41 (Fig. 2b). Phosphorylated H3Y41 (H3Y41ph) levels are regulatedby cytokine signaling. H3K41ph prevents the binding of heterochromatinprotein1α (HP1α) to this region of H3, increasing the expression of thegenes located there, as it was reported in the lmo2 promoter. JAK2 isfrequently activated by chromosomal translocations or point mutationsin hematological malignancies 126 .Nucleosome positioning. All families of chromatin remodelers havebeen tied to cancer, although in most cases the molecular mechanismsunderlying their function remain unclear. For instance, BRG1 and BRM,the ATPase subunits of SWI/SNF complexes, have been characterized astumor suppressors and are silenced in about 15–20% of primary nonsmall-celllung cancers 127 . Surprisingly, an oncogenic role for BRG1 as ap53 destabilizer has also been proposed 128 . Mutations in SNF5, a subunitof the SWI/SNF remodeling complex, have been observed in sporadicrenal rhabdoid tumors and in choroid plexus carcinomas, meduloblastomasand central primitive neuroectodermal tumors 129 .Nucleosome remodeling is also involved in the transcriptional repressionby promoter hypermethylation (Fig. 4). Promoter hypermethylationresults in the occupation of the TSS by a nucleosome, as has been reportedfor MLH1 in colon cancer 130 . The genes encoding subunits of the chromatinremodeling complexes (e.g., CHD5 (ref. 131)) themselves are alsotargets of CpG island hypermethylation in cancer, thereby downregulatingits expression and impairing the normal chromatin remodelingprocesses (Fig. 2c).In addition to nucleosome positioning, histone variants have alsobeen related to cancer. For example, increased expression of MacroH2Ais involved in senescence. Thus, lung tumors with highly expressedMacroH2A have a better prognosis, with lower proliferation rates andless frequent recurrence 132 .Epigenetic modifications in neurodevelopmental disordersThe central nervous system is one of the most complex systems in humans.Not only do the different regions of an organ present different expressionpatterns, but the same cell type has different transcriptional regulationdepending on its localization in the organ 133 . The mitotic exit, when neuralcells lose their multipotency, is a key step in nervous system development85,134 , requiring a very precise tuning of the transcriptional program.Epigenetic factors are key players in this regulation. Genetic mutationsin epigenetic genes cause dysfunctions that lead to certain neurodevelopmentaldisorders. Here, we classify them according to the epigeneticmachinery that becomes mutated.nature biotechnology volume 28 number 10 OCTOBER 2010 1063


eview© 2010 Nature America, Inc. All rights reserved.Table 1 Epigenetic modifications in human diseasesAberrant epigenetic mark AlterationConsequencesExamples of genes affected and/orresulting diseaseCancerDNA methylation CpG island hypermethylation Transcription repression MLH1 (colon, endometrium, stomach 11 ),BRCA1 (breast, ovary 11 ), MGMT (severaltumor types 11 ), p16 INK4a (colon 11 )CpG island hypomethylation Transcription activation MASPIN (pancreas 92 ), S100P (pancreas 92 ),SNCG (breast and ovary 92 ), MAGE(melanomas 92 )CpG island shore hypermethylation Transcription repression HOXA2 (colon 20 ),GATA2 (colon 20 )Repetitive sequences hypomethylation Transposition, recombination genomic L1 (ref. 11), IAP 11 , Sat2 (ref. 107)instabilityHistone modification Loss of H3 and H4 acetylation Transcription repression p21 WAF1 (also known as CDKN1A) 11Loss of H3K4me3 Transcription repression HOX genesLoss of H4K20me3 Loss of heterochromatic structure Sat2, D4Z4 (ref. 107)Gain of H3K9me and H3K27me3 Transcription repression CDKN2A, RASSF1 (refs. 115–116)Nucleosome positioning Silencing and/or mutation of remodeler Diverse, leading to oncogenic transformation BRG1, CHD5 (refs. 127–131)subunitsAberrant recruitment of remodelers Transcription repression PLM-RARa 103 recruits NuRDHistone variants replacementDiverse (promotion cell cycle/destabilization H2A.Z overexpression/lossof chromosomal boundaries)Neurological disordersDNA methylation CpG island hypermethylation Transcription repression Alzheimer’s disease (NEP) 135CpG island hypomethylation Transcription activation Multiple sclerosis (PADI2) 135Repetitive sequences aberrant methylation Transposition, recombination genomic ATRX syndrome (subtelomeric repeats) 135,143instabilityHistone modification Aberrant acetylation Diverse Parkinson’s and Huntington’s diseases 135Aberrant methylation Diverse Huntington’s disease and Friedreich’sataxia 135Aberrant phosphorylation Diverse Alzheimer’s disease 135Nucleosome positioning Misposition in trinucleotide repeats Creation of a ‘closed’ chromatin domain Congenital myotonic dystrophy 151Autoimmune diseasesDNA methylation CpG island hypermethylation Transcription repression Rheumatoid arthritis (DR3) 154,155CpG island hypomethylation Transcription activation SLE (PRF1, CD70, CD154, AIM2) 6Repetitive sequences aberrant methylation Transposition, recombination genomicinstabilityICF (Sat2, Sat3), rheumatoid arthritis(L1) 152,155Histone modification Aberrant acetylation Diverse SLE (CD154, IL10, IFN-γ) 6Aberrant methylation Diverse Diabetes type 1 (CLTA4, IL6) 159Aberrant phosphorylation Diverse SLE (NF-κB targets)Nucleosome positioning SNPs in the 17q12-q21 region Allele-specific differences in nucleosome Diabetes type 1 (CLTA4, IL6)distributionHistone variants replacement Interferes with proper remodeling Rheumatoid arthritis (histone variantmacroH2A at NF-κB targets) 157DNA methylation. Rett syndrome is an X-linked neurological diseasecaused by point mutations in the MBD protein MeCP2. Both upregulationand downregulation of MeCP2 in the brain are associated withneurodevelopmental defects. Customarily, MeCP2 has been consideredto function as a gene silencer, mediating the recruitment of HDACs tomethylated DNA (Fig. 2b). Recently, new data have highlighted importantroles for MeCP2 in chromatin architecture, regulation of mRNAsplicing 135,136 and active transcription of genes (e.g., Sst, Gprin1) 137 .Although transcriptional alterations have been described in some genes(e.g., Fkbp5, Mobp, Ddc and S100a9) 138 , imprinted regions (e.g., DLX5)and miRNAs (e.g., miR-184) 139,140 , MeCP2 deficiency does not result inhigh levels of genome-wide transcriptional alteration. It stills remainsunknown whether or not the described alterations are causative.Histone modifications. Rubinstein-Taybi syndrome is an autosomaldominant disorder associated with the dysfunction of a HAT. It is a geneticallyheterogeneous disease associated in ~55% of cases with mutationsin the cAMP-response element binding protein (CBP), in another 3% ofcases with mutations in EP300 and in ~42% of cases with an unidentifiedcause. CBP and EP300 function as transcriptional co-activators in additionto their HAT activity 135 . In Cbp +/− mice H2B acetylation is reduced bymore than 30%, suggesting that the failure in long-term memory formationcould be explained by chromatin changes in one or several loci thatcontrol memory storage 141 .The neurodevelopmental disease Coffin-Lowry syndrome is a rareX-linked disorder caused by loss-of-function mutations in RSK2,a serine/threonine protein kinase. RSK2 participates in the MAPkinase pathway, inducing the transient transcription of a set of genes.RSK2 mediates H3S10ph directly, changing chromatin structure andfacilitating the binding of CBP, which acetylates H3 residues. Thus,RSK2 promotes gene transcription through chromatin opening 142 .Nucleosome positioning. ATRX syndrome is an X-linked disordercaused by mutations in ATRX, a member of the Snf2 family of chromatin1064 volume 28 number 10 OCTOBER 2010 nature biotechnology


eview© 2010 Nature America, Inc. All rights reserved.remodelers. The ATRX protein interacts with the SET domain of thehistone methyltransferase EZH2, the Daxx transcriptional cofactor,MeCP2 and the chromoshadow domain of HP1 proteins. It participatesamong other cellular processes in heterochromatin formation,chromosome alignment at the meiotic spindle, chromosome cohesionin somatic cells and maintenance of X-chromosome inactivation inwomen. Because no DNA repair defects or genomic instability occursin ATRX patients, it has been suggested that ATRX may regulate thetranscription of a specific set of target genes. Although global DNAmethylation is unchanged in ATRX patients, aberrant DNA methylationin some repetitive sequences has been reported 135,143 .Epigenetic modifications in neurodegenerative andneurological diseasesRecent studies have also shed some light on the relationship betweenepigenetic alterations and neurodegenerative and/or neurologicaldiseases. The majority of the evidence centers on DNA methylationand histone modification (Table 1).DNA methylation. DNA methylation patterns appear to be distortedin a great deal of neurological diseases, giving rise to hyper- andhypomethylated sites. For instance, FMR1 promoter hypermethylationhas been described in Fragile X syndrome patients. Fragile Xsyndrome is caused by a CGG trinucleotide repeat expansion in the5′-untranslated region of FMR1. Expansion of the CGG trinucleotiderepeats to >200 copies induces methylation of FMR1, leading to itstranscriptional silencing 144 . Other reported cases of hypermethylatedpromoters include neprilysin (NEP, also known as MME) inAlzheimer’s disease, FXN in Friedreich’s ataxia and SMN2 in spinalmuscular atrophy 135 .Conversely, hypomethylated sites have also been reported. Forexample, the substantia nigra of Parkinson’s patients overexpressestumor necrosis factor alpha (TNFα) due to its promoter hypomethylation,thereby inducing apoptosis of neuronal cells 145 . Other casesof hypomethylation were reported in the promoter region of PADI2for multiple sclerosis patients 135 and in the Avp enhancer for micesubjected to early-life stress 146 . Alterations in DNA methylation patternsnot only affect gene promoters but may also lead to LOI. Classicexamples of LOI are the Prader-Willi and the Angelman syndromes.Both diseases involve aberrant DNA methylation in the imprintingcontrolled region at 15q11-q13. Prader-Willi syndrome arisesfrom the loss of paternally expressed genes in this region, whereasAngelman syndrome arises from the loss of the maternally expressedUBE3A gene 147 .Histone modifications. The pattern of histone marks is also alteredin neurological diseases, histone hypoacetylation being the most frequentlyobserved change. A good example of histone hypoacetylationis amyotrophic lateral sclerosis (ALS). ALS patients have aggregates ofthe protein FUS in cytoplasmic deposits of misfolded proteins. FUS isable to bind CBP, strongly inhibiting its HAT activity and to negativelyregulate specific CREB target genes. Thus, overexpression of FUSinduces histone hypoacetylation 135 . Other cases of hypoacetylationin neurological diseases are found in Parkinson’s and Huntington’sdisease 135 and Friedreich’s ataxia 148 . Despite histone hypoacetylation,more changes relating neurological diseases and histone marks havebeen reported. For example, histone acetylation and phosphorylationalterations are typical in Alzheimer’s disease and epilepsy, H3K9hypertrimethylation has been described in Huntington’s disease 135and Friedreich’s ataxia 149 and the histone demethylase PHF8 has beeninvolved in X-linked mental retardation 150 .Nucleosome positioning. It has been suggested that the amplificationof CTG repeats in congenital myotonic dystrophy is a very strongnucleosome positioning signal that mediates the creation of a closed chromatindomain 151 . Despite this fact, which needs further investigation, littleis known about the possible implications of nucleosome positioning orhistone variants in neuronal malignancies.Epigenetic modifications in autoimmune diseasesAutoimmune diseases are characterized by the breakdown of immunetolerance to specific self-antigens. Different types of epigenetic alterationshave been reported in this type of disorder (Table 1).DNA methylation. Most of the research relating autoimmunity disordersand epigenetic changes has focused on DNA methylation alterations. Infact, one of the best known autoimmune diseases, the ICF (immunodeficiency,centromeric instability and facial anomalies) syndrome, is causedby heterozygous mutations in DNMT3B. ICF patients show markedDNA hypomethylation in the pericentromeric satellite 2 and 3 repeats,alpha satellite sequences, Alu sequences and the D4Z4 and NBL2 repeats.Conversely, ICF patients have almost unchanged global DNA methylationlevels 143,147 , although several genes regulating development, neurogenesisand immune function have aberrant expression 152 .Other autoimmune diseases, unrelated to mutations in the DNAmethylation machinery, also present global hypomethylation, as is thecase for systemic lupus erythematosus (SLE) and rheumatoid arthritis.The hypomethylated regions are not yet well defined, although somehypomethylated sites have been reported. SLE patients have DNAhypomethylation in PRF1, CD70, CD154, IFGNR2, MMP14, LCN2,CSF3R and AIM2 among other genes, and also in the ribosomal RNAgene promoter, 18S and 28S (ref. 6). The mechanisms responsiblefor this widespread hypomethylation are beginning to be revealed. Ithas been recently reported that hypomethylation in SLE is partiallymediated by miR-21 and miR-148a that directly and indirectly targetDNMT1 (ref. 153). In rheumatoid arthritis, not only hypomethylatedsites (e.g., in L1 and IL6) but also hypermethylated sites (e.g., in DR3)have been described 154,155 .Histone modifications. Little is known about the role of histone modificationsin autoimmune diseases, although initial studies are beginningto shed some light in this area. In human SLE T-cells, the HDACinhibitor trichostatin A reverses the aberrant expression of CD154, IL10and interferon (IFN)-γ products 156 . A role for histone modificationsin rheumatoid arthritis has also been described. Because the transcriptionfactor NF-κB—a key regulator inflammatory—binds very poorlyto nucleosomal DNA, histone modifications are needed to allow efficientNF-κB binding to its targets: histone H3K9 and S10 (also knownas PSMD6) phosphoacetylation, reduction in H3K9me and increasein H3/H4 acetylation 157 . Thus, in rheumatoid arthritis, the reducedactivity of HDACs plays a key role in regulating NF-κB–mediated geneexpression 158 . Patients with type 1 diabetes also present a characteristicpattern of histone marks, showing lymphocytes but not monocytes withincreased H3K9me2 in a subset of genes associated with autoimmuneand inflammatory pathways (e.g., CLTA4, IL6) 159 .However, histone modifications have a role not only in transcriptionregulation. Nucleosomes are key autoantigens in SLE, being presentin the circulation because of increased apoptosis and/or insufficientclearance. In apoptosis, histone modifications occur, such as H2BS14phosphorylation 160 , H3T45 phosphorylation 161 , H3K4 trimethylation162 , H4 triacetylation at K8, K12 and K16 (ref. 163) as well asH2BK12 acetylation 164 . It has been suggested that histone modificationsarising during apoptosis make released apoptotic nucleosomesnature biotechnology volume 28 number 10 OCTOBER 2010 1065


eview© 2010 Nature America, Inc. All rights reserved.more immunogenic, leading to activation of antigen-presenting cells,which could result in autoantibody production 162 .Nucleosome positioning. No studies have yet made a connection betweennucleosome positioning and autoimmune diseases. Notably, it has recentlybeen reported that single-nucleotide polymorphisms in the 17q12-q21region, which have been associated with a higher risk of asthma, type 1diabetes, primary biliary cirrhosis and Crohn’s disease, lead to allele-specificdifferences in nucleosome distribution 165 . Moreover, in rheumatoidarthritis, the incorporation of the histone variant macroH2A interfereswith the binding of the transcription factor NF-κB and impedes SWI/SNF-dependent remodeling 157 .Conclusions and perspectivesIn the past decade the fast-evolving field of epigenetics has taken centerstage, as shown by the results of a simple PubMed search of the term ‘epigenetic’:there were around 200 papers published in 1999, but more than2,500 in 2009. Such startling growth in the number of publications atteststo the intense research activity being undertaken in the field.Great progress has been made in the description of epigenetic modificationsin normal and diseased tissues. Thus far, efforts in epigeneticresearch have mainly focused on cancer, but as the field has grown, ithas provided new insights into other types of diseases, particularly neurologicaland autoimmune diseases. Epigenetic alterations are likely tobe found in other disorders; indeed, they have already been described incardiovascular diseases 166–168 , metabolic diseases 169 , myopathies 170 andchildren born from assisted reproductive treatments 171 .In the past months, we have witnessed a flood of new discoveries:the description of comprehensive DNA methylomes of humans 22 andviruses 146 , the putative identification of non-CpG methylation 28 , thedefinition of CpG island shores 19 , the involvement of aberrant DNAmethylation in other diseases besides cancer 6,135 , the description of newhistone modifications and histone variants and their roles 45,126,161 , thereport of new epigenetic machinery such as the DNA demethylase Tet1(refs. 172,173) and the histone kinase JAK2 (ref. 126), the description ofnew mutations in the epigenetic machinery 99 and the flurry of ncRNAstudies that highlight the importance of RNA-mediated regulation inepigentics 174,175 .Many key questions remain unanswered: what are the functions of non-CpG methylation and 5-hydroxymethylcytosine in human cells? Are therenew DNA or histone modifications yet to be discovered? What are therules of the so-called histone code? What are the roles and function ofncRNAs and how many more ncRNAs are yet to be described? How is theplacement of epigenetic marks and its specificity regulated? How are causativeepigenetic changes going to be distinguished from mere bystanderalterations? Is it always clear whether a specific epigenetic modificationis a cause or a consequence of a certain process? One of the most intriguingquestions is how do the various epigenetic players interact and whatmechanisms convey sequence specificity to the enzymes involved? Furtherresearch is needed and efforts focused on such questions will be key in ourprogress toward a complete map of epigenetic regulation.Advances in technological development are enabling epigenomicanalysis on a large scale. The first whole-genome, high-resolution mapsfor epigenetic modifications are appearing, but we should not stop here.Detailed human DNA methylomes, histone modification and nucleosomepositioning maps in healthy and diseased tissues are needed. In thisregard several international projects and initiatives have been established:the NIH Roadmap Epigenomics Program, the ENCODE Project, theAHEAD Project and the Epigenomics NCBI browser, among others(see the commentaries by Bernstein and colleagues 176 and Satterlee andcolleagues 177 in this issue). The detailed study of the epigenetic mapswould be of enormous use in basic and applied research and would berelevant for focusing pharmacological research on the most promisingepigenetic targets. A key topic for future research is the implementationof mechanisms for the release of whole genome methylation and histonemodification maps into public databases.COMPETING FINANCIAL INTERESTSThe authors declare no competing financial interests.Published online at http://www.nature.com/naturebiotechnology/.Reprints and permissions information is available online at http://npg.nature.com/reprintsandpermissions/.1. Esteller, M. Epigenetics in evolution and disease. Lancet 372, S90–S96 (2008).2. Waddington, C.H. Introduction to Modern Genetics (Macmillan, 1939).3. Rideout, W.M., III, Eggan, K. & Jaenisch, R. Nuclear cloning and epigenetic reprogrammingof the genome. Science 293, 1093–1098 (2001).4. Fraga, M.F. et al. Epigenetic differences arise during the lifetime of monozygotic twins.Proc. Natl. Acad. Sci. USA 102, 10604–10609 (2005).5. Kaminsky, Z.A. et al. DNA methylation profiles in monozygotic and dizygotic twins.Nat. Genet. 41, 240–245 (2009).6. Javierre, B.M. et al. Changes in the pattern of DNA methylation associate with twindiscordance in systemic lupus erythematosus. Genome Res. 20, 170–179 (2010).7. Chi, A.S. & Bernstein, B.E. Developmental biology. Pluripotent chromatin state. Science323, 220–221 (2009).8. Meissner, A. et al. Genome-scale DNA methylation maps of pluripotent and differentiatedcells. Nature 454, 766–770 (2008).9. Meissner, A. Epigenetic modifications in pluripotent and differentiated cells. Nat.Biotechnol. 28, 1079–1088 (2010).10. Esteller, M. CpG island hypermethylation and tumor suppressor genes: a boomingpresent, a brighter future. Oncogene 21, 5427–5440 (2002).11. Esteller, M. Cancer epigenomics: DNA methylomes and histone-modification maps.Nat. Rev. Genet. 8, 286–298 (2007).12. Straussman, R. et al. Developmental programming of CpG island methylation profilesin the human genome. Nat. Struct. Mol. Biol. 16, 564–571 (2009).13. Kacem, S. & Feil, R. Chromatin mechanisms in genomic imprinting. Mamm. Genome20, 544–556 (2009).14. Reik, W. & Lewis, A. Co-evolution of X-chromosome inactivation and imprinting inmammals. Nat. Rev. Genet. 6, 403–410 (2005).15. Esteller, M. Epigenetic gene silencing in cancer: the DNA hypermethylome. Hum. Mol.Genet. 16 Spec No 1, R50–R59 (2007).16. Lopez-Serra, L. & Esteller, M. Proteins that bind methylated DNA and human cancer:reading the wrong words. Br. J. Cancer 98, 1881–1885 (2008).17. Kuroda, A. et al. Insulin gene expression is regulated by DNA methylation. PLoS ONE4, e6953 (2009).18. Thomson, J.P. et al. CpG islands influence chromatin structure via the CpG-bindingprotein Cfp1. Nature 464, 1082–1086 (2010).19. Irizarry, R.A. et al. The human colon cancer methylome shows similar hypo- and hypermethylationat conserved tissue-specific CpG island shores. Nat. Genet. 41, 178–186(2009).20. Doi, A. et al. Differential methylation of tissue- and cancer-specific CpG island shoresdistinguishes human induced pluripotent stem cells, embryonic stem cells and fibroblasts.Nat. Genet. 41, 1350–1353 (2009).21. Ji, H. et al. Comprehensive methylome map of lineage commitment from haematopoieticprogenitors. Nature 467, 338–342 (2010).22. Hellman, A. & Chess, A. Gene body-specific methylation on the active X chromosome.Science 315, 1141–1143 (2007).23. Zilberman, D., Gehring, M., Tran, R.K., Ballinger, T. & Henikoff, S. Genome-wide analysisof Arabidopsis thaliana DNA methylation uncovers an interdependence betweenmethylation and transcription. Nat. Genet. 39, 61–69 (2007).24. Zhao, Z. et al. Circular chromosome conformation capture (4C) uncovers extensivenetworks of epigenetically regulated intra- and interchromosomal interactions. Nat.Genet. 38, 1341–1347 (2006).25. Espada, J. et al. Epigenetic disruption of ribosomal RNA genes and nucleolar architecturein DNA methyltransferase 1 (Dnmt1) deficient cells. Nucleic Acids Res. 35,2191–2198 (2007).26. Horike, S., Cai, S., Miyano, M., Cheng, J.F. & Kohwi-Shigematsu, T. Loss of silentchromatinlooping and impaired imprinting of DLX5 in Rett syndrome. Nat. Genet. 37,31–40 (2005).27. van Steensel, B. & Dekker, J. Genomics tools for unraveling chromosome architecture.Nat. Biotechnol. 28, 1089–1095 (2010).28. Lister, R. et al. Human DNA methylomes at base resolution show widespread epigenomicdifferences. Nature 462, 315–322 (2009).29. Laurent, L. et al. Dynamic changes in the human methylome during differentiation.Genome Res. 20, 320–331 (2010).30. Kriaucionis, S. & Heintz, N. The nuclear DNA base 5-hydroxymethylcytosine is presentin Purkinje neurons and the brain. Science 324, 929–930 (2009).31. Berman, B.P., Weisenberger, D.J. & Laird, P.W. Locking in on the human methylome.Nat. Biotechnol. 27, 341–342 (2009).32. Weisenberger, D.J. et al. DNA methylation analysis by digital bisulfite genomic sequencingand digital MethyLight. Nucleic Acids Res. 36, 4689–4698 (2008).1066 volume 28 number 10 OCTOBER 2010 nature biotechnology


eview© 2010 Nature America, Inc. All rights reserved.33. Down, T.A. et al. A Bayesian deconvolution strategy for immunoprecipitation-basedDNA methylome analysis. Nat. Biotechnol. 26, 779–785 (2008).34. Laird, P.W. Principles and challenges of genome-wide DNA methylation analysis. Nat.Rev. Genet. 11, 191–203 (2010).35. Bourc’his, D., Xu, G.L., Lin, C.S., Bollman, B. & Bestor, T.H. Dnmt3L and the establishmentof maternal genomic imprints. Science 294, 2536–2539 (2001).36. Chen, Z.X., Mann, J.R., Hsieh, C.L., Riggs, A.D. & Chedin, F. Physical and functionalinteractions between the human DNMT3L protein and members of the de novo methyltransferasefamily. J. Cell. Biochem. 95, 902–917 (2005).37. Holz-Schietinger, C. & Reich, N.O. The inherent processivity of the human de novoDNA methyltransferase 3A (DNMT3A) is enhanced by DNMT3L. J. Biol. Chem. 285,29091–29100 (2010).38. Chuang, L.S. et al. Human DNA-(cytosine-5) methyltransferase-PCNA complex as atarget for p21WAF1. Science 277, 1996–2000 (1997).39. Bostick, M. et al. UHRF1 plays a role in maintaining DNA methylation in mammaliancells. Science 317, 1760–1764 (2007).40. Jones, P.A. & Liang, G. Rethinking how DNA methylation patterns are maintained. Nat.Rev. Genet. 10, 805–811 (2009).41. Jeong, S. et al. Selective anchoring of DNA methyltransferases 3A and 3B tonucleosomes containing methylated DNA. Mol. Cell. Biol. 29, 5366–5376 (2009).42. Goll, M.G. et al. Methylation of tRNAAsp by the DNA methyltransferase homologDnmt2. Science 311, 395–398 (2006).43. Ooi, S.K. et al. DNMT3L connects unmethylated lysine 4 of histone H3 to de novomethylation of DNA. Nature 448, 714–717 (2007).44. Tachibana, M., Matsumura, Y., Fukuda, M., Kimura, H. & Shinkai, Y. G9a/GLP complexesindependently mediate H3K9 and DNA methylation to silence transcription.EMBO J. 27, 2681–2690 (2008).45. Zhao, Q. et al. PRMT5-mediated methylation of histone H4R3 recruits DNMT3A,coupling histone and DNA methylation in gene silencing. Nat. Struct. Mol. Biol. 16,304–311 (2009).46. Esteve, P.O. et al. Regulation of DNMT1 stability through SET7-mediated lysinemethylation in mammalian cells. Proc. Natl. Acad. Sci. USA 106, 5076–5081(2009).47. Wang, J. et al. The lysine demethylase LSD1 (KDM1) is required for maintenance ofglobal DNA methylation. Nat. Genet. 41, 125–129 (2009).48. Mosher, R.A. & Melnyk, C.W. siRNAs and DNA methylation: seedy epigenetics. TrendsPlant Sci. 15, 204–210 (2010).49. Matzke, M.A. & Birchler, J.A. RNAi-mediated pathways in the nucleus. Nat. Rev. Genet.6, 24–35 (2005).50. Vrbsky, J. et al. siRNA-mediated methylation of Arabidopsis telomeres. PLoS Genet.6, e1000986 (2010).51. Zaratiegui, M., Irvine, D.V. & Martienssen, R.A. Noncoding RNAs and gene silencing.Cell 128, 763–776 (2007).52. Kouzarides, T. Chromatin modifications and their function. Cell 128, 693–705(2007).53. Daujat, S., Zeissler, U., Waldmann, T., Happel, N. & Schneider, R. HP1 binds specificallyto Lys26-methylated histone H1.4, whereas simultaneous Ser27 phosphorylationblocks HP1 binding. J. Biol. Chem. 280, 38090–38095 (2005).54. Rando, O.J. & Chang, H.Y. Genome-wide views of chromatin structure. Annu. Rev.Biochem. 78, 245–271 (2009).55. Huertas, D., Sendra, R. & Munoz, P. Chromatin dynamics coupled to DNA repair.Epigenetics 4, 31–42 (2009).56. Luco, R.F. et al. Regulation of alternative splicing by histone modifications. Science327, 996–1000 (2010).57. Li, B., Carey, M. & Workman, J.L. The role of chromatin during transcription. Cell 128,707–719 (2007).58. Karlic, R., Chung, H.R., Lasserre, J., Vlahovicek, K. & Vingron, M. Histone modificationlevels are predictive for gene expression. Proc. Natl. Acad. Sci. USA 107, 2926–2931(2010).59. Moazed, D. Small RNAs in transcriptional gene silencing and genome defence. Nature457, 413–420 (2009).60. Grewal, S.I. & Jia, S. Heterochromatin revisited. Nat. Rev. Genet. 8, 35–46 (2007).61. Khalil, A.M. et al. Many human large intergenic noncoding RNAs associate with chromatin-modifyingcomplexes and affect gene expression. Proc. Natl. Acad. Sci. USA106, 11667–11672 (2009).62. Chow, J. & Heard, E. X inactivation and the complexities of silencing a sex chromosome.Curr. Opin. Cell Biol. 21, 359–366 (2009).63. Agrelo, R. & Wutz, A. X inactivation and disease. Semin. Cell Dev. Biol. 21, 194–200(2010).64. Rinn, J.L. et al. Functional demarcation of active and silent chromatin domains inhuman HOX loci by noncoding RNAs. Cell 129, 1311–1323 (2007).65. Santos-Rosa, H. et al. Histone H3 tail clipping regulates gene expression. Nat. Struct.Mol. Biol. 16, 17–22 (2009).66. Wang, Z. et al. Combinatorial patterns of histone acetylations and methylations in thehuman genome. Nat. Genet. 40, 897–903 (2008).67. Duan, Q., Chen, H., Costa, M. & Dai, W. Phosphorylation of H3S10 blocks the accessof H3K9 by specific antibodies and histone methyltransferase. Implication in regulatingchromatin dynamics and epigenetic inheritance during mitosis. J. Biol. Chem. 283,33585–33590 (2008).68. Nakanishi, S. et al. Histone H2BK123 monoubiquitination is the critical determinantfor H3K4 and H3K79 trimethylation by COMPASS and Dot1. J. Cell Biol. 186,371–377 (2009).69. Ernst, J. & Kellis, M. Discovery and characterization of chromatin states for systematicannotation of the human genome. Nat. Biotechnol. 28, 817–825 (2010).70. Mikkelsen, T.S. et al. Genome-wide maps of chromatin state in pluripotent and lineagecommittedcells. Nature 448, 553–560 (2007).71. Bernstein, B.E. et al. A bivalent chromatin structure marks key developmental genesin embryonic stem cells. Cell 125, 315–326 (2006).72. Fuks, F. et al. The methyl-CpG-binding protein MeCP2 links DNA methylation to histonemethylation. J. Biol. Chem. 278, 4035–4040 (2003).73. Bhaumik, S.R., Smith, E. & Shilatifard, A. Covalent modifications of histones duringdevelopment and disease pathogenesis. Nat. Struct. Mol. Biol. 14, 1008–1016(2007).74. Chang, B., Chen, Y., Zhao, Y. & Bruick, R.K. JMJD6 is a histone arginine demethylase.Science 318, 444–447 (2007).75. Chi, P., Allis, C.D. & Wang, G.G. Covalent histone modifications–miswritten, misinterpretedand mis-erased in human cancers. Nat. Rev. Cancer 10, 457–469 (2010).76. Wang, Z. et al. Genome-wide mapping of HATs and HDACs reveals distinct functionsin active and inactive genes. Cell 138, 1019–1031 (2009).77. Schones, D.E. et al. Dynamic regulation of nucleosome positioning in the humangenome. Cell 132, 887–898 (2008).78. Cairns, B.R. The logic of chromatin architecture and remodelling at promoters. Nature461, 193–198 (2009).79. Chodavarapu, R.K. et al. Relationship between nucleosome positioning and DNAmethylation. Nature 466, 388–392 (2010).80. Getun, I.V., Wu, Z.K., Khalil, A.M. & Bois, P.R. Nucleosome occupancy landscape anddynamics at mouse recombination hotspots. EMBO Rep. 11, 555–560 (2010).81. Zilberman, D., Coleman-Derr, D., Ballinger, T. & Henikoff, S. Histone H2A.Z andDNA methylation are mutually antagonistic chromatin marks. Nature 456, 125–129(2008).82. Harikrishnan, K.N. et al. Brahma links the SWI/SNF chromatin-remodeling complexwith MeCP2-dependent transcriptional silencing. Nat. Genet. 37, 254–264 (2005).83. Wysocka, J. et al. A PHD finger of NURF couples histone H3 lysine 4 trimethylationwith chromatin remodelling. Nature 442, 86–90 (2006).84. Lal, A. et al. miR-24-mediated downregulation of H2AX suppresses DNA repairin terminally differentiated blood cells. Nat. Struct. Mol. Biol. 16, 492–498(2009).85. Yoo, A.S., Staahl, B.T., Chen, L. & Crabtree, G.R. MicroRNA-mediated switching ofchromatin-remodelling complexes in neural development. Nature 460, 642–646(2009).86. Ho, L. & Crabtree, G.R. Chromatin remodelling during development. Nature 463,474–484 (2010).87. Reisman, D., Glaros, S. & Thompson, E.A. The SWI/SNF complex and cancer. Oncogene28, 1653–1668 (2009).88. Clapier, C.R. & Cairns, B.R. The biology of chromatin remodeling complexes. Annu.Rev. Biochem. 78, 273–304 (2009).89. Sharma, S., Kelly, T.K. & Jones, P.A. Epigenetics in cancer. Carcinogenesis 31, 27–36(2009).90. Goelz, S.E., Vogelstein, B., Hamilton, S.R. & Feinberg, A.P. Hypomethylation ofDNA from benign and malignant human colon neoplasms. Science 228, 187–190(1985).91. Gaudet, F. et al. Induction of tumors in mice by genomic hypomethylation. Science300, 489–492 (2003).92. Wilson, A.S., Power, B.E. & Molloy, P.L. DNA hypomethylation and human diseases.Biochim. Biophys. Acta 1775, 138–162 (2007).93. Futscher, B.W. et al. Aberrant methylation of the maspin promoter is an early event inhuman breast cancer. Neoplasia 6, 380–389 (2004).94. Futscher, B.W. et al. Role for DNA methylation in the control of cell type specific maspinexpression. Nat. Genet. 31, 175–179 (2002).95. Bettstetter, M. et al. Elevated nuclear maspin expression is associated with microsatelliteinstability and high tumour grade in colorectal cancer. J. Pathol. 205, 606–614(2005).96. Ito, Y. et al. Somatically acquired hypomethylation of IGF2 in breast and colorectalcancer. Hum. Mol. Genet. 17, 2633–2643 (2008).97. Li, M. et al. Sensitive digital quantification of DNA methylation in clinical samples.Nat. Biotechnol. 27, 858–863 (2009).98. Kelly, T.K., De Carvalho, D.D. & Peter A Jones, P.A. Epigenetic modifications as therapeutictargets. Nat. Biotechnol. 28, 1069–1078 (2010).99. Melo, S.A. et al. A TARBP2 mutation in human cancer impairs microRNA processingand DICER1 function. Nat. Genet. 41, 365–370 (2009).100. Saito, Y. et al. Specific activation of microRNA-127 with downregulation of the protooncogeneBCL6 by chromatin-modifying drugs in human cancer cells. Cancer Cell 9,435–443 (2006).101. Lujambio, A. et al. Genetic unmasking of an epigenetically silenced microRNA inhuman cancer cells. Cancer Res. 67, 1424–1429 (2007).102. Lujambio, A. et al. A microRNA DNA methylation signature for human cancer metastasis.Proc. Natl. Acad. Sci. USA 105, 13556–13561 (2008).103. Di Croce, L. et al. Methyltransferase recruitment and DNA hypermethylation of targetpromoters by an oncogenic transcription factor. Science 295, 1079–1082 (2002).104. Frigola, J. et al. Epigenetic remodeling in colorectal cancer results in coordinate genesuppression across an entire chromosome band. Nat. Genet. 38, 540–549 (2006).105. Miremadi, A., Oestergaard, M.Z., Pharoah, P.D. & Caldas, C. Cancer genetics of epigeneticgenes. Hum. Mol. Genet. 16 Spec No 1, R28–R49 (2007).106. Garzon, R. et al. MicroRNA-29b induces global DNA hypomethylation and tumor suppressorgene reexpression in acute myeloid leukemia by targeting directly DNMT3Aand 3B and indirectly DNMT1. Blood 113, 6411–6418 (2009).107. Fraga, M.F. et al. Loss of acetylation at Lys16 and trimethylation at Lys20 of histoneH4 is a common hallmark of human cancer. Nat. Genet. 37, 391–400 (2005).nature biotechnology volume 28 number 10 OCTOBER 2010 1067


eview© 2010 Nature America, Inc. All rights reserved.108. Zhu, P. et al. Induction of HDAC2 expression upon loss of APC in colorectal tumorigenesis.Cancer Cell 5, 455–463 (2004).109. Ropero, S. et al. A truncating mutation of HDAC2 in human cancers confers resistanceto histone deacetylase inhibition. Nat. Genet. 38, 566–569 (2006).110. Vaquero, A., Sternglanz, R. & Reinberg, D. NAD+-dependent deacetylation of H4 lysine16 by class III HDACs. Oncogene 26, 5505–5520 (2007).111. Noonan, E.J. et al. miR-449a targets HDAC-1 and induces growth arrest in prostatecancer. Oncogene 28, 1714–1724 (2009).112. Moore, S.D. et al. Uterine leiomyomata with t(10;17) disrupt the histone acetyltransferaseMORF. Cancer Res. 64, 5570–5577 (2004).113. Bryan, E.J. et al. Mutation analysis of EP300 in colon, breast and ovarian carcinomas.Int. J. Cancer 102, 137–141 (2002).114. Hamamoto, R. et al. SMYD3 encodes a histone methyltransferase involved in theproliferation of cancer cells. Nat. Cell Biol. 6, 731–740 (2004).115. Kondo, Y. et al. Alterations of DNA methylation and histone modifications contributeto gene silencing in hepatocellular carcinomas. Hepatol. Res. 37, 974–983 (2007).116. Vire, E. et al. The Polycomb group protein EZH2 directly controls DNA methylation.Nature 439, 871–874 (2006).117. Dalgliesh, G.L. et al. Systematic sequencing of renal carcinoma reveals inactivation ofhistone modifying genes. Nature 463, 360–363 (2010).118. Gupta, R.A. et al. Long non-coding RNA HOTAIR reprograms chromatin state to promotecancer metastasis. Nature 464, 1071–1076 (2010).119. Varambally, S. et al. Genomic loss of microRNA-101 leads to overexpression of histonemethyltransferase EZH2 in cancer. Science 322, 1695–1699 (2008).120. Berdasco, M. et al. Epigenetic inactivation of the Sotos overgrowth syndrome genehistone methyltransferase NSD1 in human neuroblastoma and glioma. Proc. Natl.Acad. Sci. USA 106, 21830–21835 (2009).121. Jones, B. et al. The histone H3K79 methyltransferase Dot1L is essential for mammaliandevelopment and heterochromatin structure. PLoS Genet. 4, e1000190 (2008).122. Jacinto, F.V., Ballestar, E. & Esteller, M. Impaired recruitment of the histone methyltransferaseDOT1L contributes to the incomplete reactivation of tumor suppressorgenes upon DNA demethylation. Oncogene 28, 4212–4224 (2009).123. Krivtsov, A.V. et al. H3K79 methylation profiles define murine and human MLL-AF4leukemias. Cancer Cell 14, 355–368 (2008).124. Wang, P. et al. Global analysis of H3K4 methylation defines MLL family membertargets and points to a role for MLL1-mediated H3K4 methylation in the regulationof transcriptional initiation by RNA polymerase II. Mol. Cell. Biol. 29, 6074–6085(2009).125. Shi, Y. Histone lysine demethylases: emerging roles in development, physiology anddisease. Nat. Rev. Genet. 8, 829–833 (2007).126. Dawson, M.A. et al. JAK2 phosphorylates histone H3Y41 and excludes HP1alpha fromchromatin. Nature 461, 819–822 (2009).127. Medina, P.P. & Sanchez-Cespedes, M. Involvement of the chromatin-remodeling factorBRG1/SMARCA4 in human cancer. Epigenetics 3, 64–68 (2008).128. Naidu, S.R., Love, I.M., Imbalzano, A.N., Grossman, S.R. & Androphy, E.J. The SWI/SNF chromatin remodeling subunit BRG1 is a critical regulator of p53 necessary forproliferation of malignant cells. Oncogene 28, 2492–2501 (2009).129. Roberts, C.W. & Orkin, S.H. The SWI/SNF complex–chromatin and cancer. Nat. Rev.Cancer 4, 133–142 (2004).130. Lin, J.C. et al. Role of nucleosomal occupancy in the epigenetic silencing of the MLH1CpG island. Cancer Cell 12, 432–444 (2007).131. Mulero-Navarro, S. & Esteller, M. Chromatin remodeling factor CHD5 is silenced bypromoter CpG island hypermethylation in human cancer. Epigenetics 3, 210–215(2008).132. Sporn, J.C. et al. Histone macroH2A isoforms predict the risk of lung cancer recurrence.Oncogene 28, 3423–3428 (2009).133. Gibbs, J.R. et al. Abundant quantitative trait Loci exist for DNA methylation and geneexpression in human brain. PLoS Genet. 6, e1000952 (2010).134. Wynder, C., Hakimi, M.A., Epstein, J.A., Shilatifard, A. & Shiekhattar, R. Recruitmentof MLL by HMG-domain protein iBRAF promotes neural differentiation. Nat. Cell Biol.7, 1113–1117 (2005).135. Urdinguio, R.G., Sanchez-Mut, J.V. & Esteller, M. Epigenetic mechanisms in neurologicaldiseases: genes, syndromes, and therapies. Lancet Neurol. 8, 1056–1072(2009).136. Hite, K.C., Adams, V.H. & Hansen, J.C. Recent advances in MeCP2 structure andfunction. Biochem. Cell Biol. 87, 219–227 (2009).137. Chahrour, M. et al. MeCP2, a key contributor to neurological disease, activates andrepresses transcription. Science 320, 1224–1229 (2008).138. Urdinguio, R.G. et al. Mecp2-null mice provide new neuronal targets for Rett syndrome.PLoS ONE 3, e3669 (2008).139. Nomura, T. et al. MeCP2-dependent repression of an imprinted miR-184 released bydepolarization. Hum. Mol. Genet. 17, 1192–1199 (2008).140. Urdinguio, R.G. et al. Disrupted microRNA expression caused by Mecp2 loss in a mousemodel of Rett syndrome. Epigenetics 5, 656–663 (2010).141. Alarcon, J.M. et al. Chromatin acetylation, memory, and LTP are impaired in CBP+/−mice: a model for the cognitive deficit in Rubinstein-Taybi syndrome and its amelioration.Neuron 42, 947–959 (2004).142. Clayton, A.L., Rose, S., Barratt, M.J. & Mahadevan, L.C. Phosphoacetylation of histoneH3 on c-fos- and c-jun-associated nucleosomes upon gene activation. EMBO J. 19,3714–3726 (2000).143. De Sario, A. Clinical and molecular overview of inherited disorders resulting fromepigenomic dysregulation. Eur. J. Med. Genet. 52, 363–372 (2009).144. Gheldof, N., Tabuchi, T.M. & Dekker, J. The active FMR1 promoter is associatedwith a large domain of altered chromatin conformation with embedded local histonemodifications. Proc. Natl. Acad. Sci. USA 103, 12463–12468 (2006).145. Pieper, H.C. et al. Different methylation of the TNF-alpha promoter in cortex andsubstantia nigra: Implications for selective neuronal vulnerability. Neurobiol. Dis. 32,521–527 (2008).146. Murgatroyd, C. et al. Dynamic DNA methylation programs persistent adverse effectsof early-life stress. Nat. Neurosci. 12, 1559–1566 (2009).147. Robertson, K.D. DNA methylation and human disease. Nat. Rev. Genet. 6, 597–610(2005).148. Herman, D. et al. Histone deacetylase inhibitors reverse gene silencing in Friedreich’sataxia. Nat. Chem. Biol. 2, 551–558 (2006).149. Al-Mahdawi, S. et al. The Friedreich ataxia GAA repeat expansion mutation inducescomparable epigenetic changes in human and transgenic mouse brain and heart tissues.Hum. Mol. Genet. 17, 735–746 (2008).150. Kleine-Kohlbrecher, D. et al. A functional link between the histone demethylase PHF8and the transcription factor ZNF711 in X-linked mental retardation. Mol. Cell 38,165–178 (2010).151. Kumari, D. & Usdin, K. Chromatin remodeling in the noncoding repeat expansiondiseases. J. Biol. Chem. 284, 7413–7417 (2009).152. Jin, B. et al. DNA methyltransferase 3B (DNMT3B) mutations in ICF syndrome lead toaltered epigenetic modifications and aberrant expression of genes regulating development,neurogenesis and immune function. Hum. Mol. Genet. 17, 690–709 (2008).153. Pan, W. et al. MicroRNA-21 and microRNA-148a contribute to DNA hypomethylationin lupus CD4+ T cells by directly and indirectly targeting DNA methyltransferase 1.J. Immunol. 184, 6773–6781 (2010).154. Javierre, B.M., Esteller, M. & Ballestar, E. Epigenetic connections between autoimmunedisorders and haematological malignancies. Trends Immunol. 29, 616–623 (2008).155. Karouzakis, E., Gay, R.E., Gay, S. & Neidhart, M. Epigenetic control in rheumatoidarthritis synovial fibroblasts. Nat. Rev. Rheumatol. 5, 266–272 (2009).156. Mishra, N., Brown, D.R., Olorenshaw, I.M. & Kammer, G.M. Trichostatin A reversesskewed expression of CD154, interleukin-10, and interferon-gamma gene andprotein expression in lupus T cells. Proc. Natl. Acad. Sci. USA 98, 2628–2633(2001).157. Vanden Berghe, W. et al. Keeping up NF-kappaB appearances: epigenetic control ofimmunity or inflammation-triggered epigenetics. Biochem. Pharmacol. 72, 1114–1131 (2006).158. Huber, L.C., Stanczyk, J., Jungel, A. & Gay, S. Epigenetics in inflammatory rheumaticdiseases. Arthritis Rheum. 56, 3523–3531 (2007).159. Miao, F. et al. Lymphocytes from patients with type 1 diabetes display a distinct profileof chromatin histone H3 lysine 9 dimethylation: an epigenetic study in diabetes.Diabetes 57, 3189–3198 (2008).160. Ajiro, K. Histone H2B phosphorylation in mammalian apoptotic cells. An associationwith DNA fragmentation. J. Biol. Chem. 275, 439–443 (2000).161. Hurd, P.J. et al. Phosphorylation of histone H3 Thr-45 is linked to apoptosis. J. Biol.Chem. 284, 16575–16583 (2009).162. van Bavel, C.C. et al. Apoptosis-induced histone H3 methylation is targeted byautoantibodies in systemic lupus erythematosus. Ann. Rheum. Dis. published onlinedoi:10.1136/ard.2010.129320 (10 August 2010).163. Dieker, J.W. et al. Apoptosis-induced acetylation of histones is pathogenic in systemiclupus erythematosus. Arthritis Rheum. 56, 1921–1933 (2007).164. Van Bavel, J.J. & Cunningham, W.A. Self-categorization with a novel mixed-race groupmoderates automatic social and racial biases. Pers. Soc. Psychol. Bull. 35, 321–335(2009).165. Verlaan, D.J. et al. Allele-specific chromatin remodeling in the ZPBP2/GSDMB/ORMDL3 locus associated with the risk of asthma and autoimmune disease. Am. J.Hum. Genet. 85, 377–393 (2009).166. Turunen, M.P., Aavik, E. & Yla-Herttuala, S. Epigenetics and atherosclerosis. Biochim.Biophys. Acta 1790, 886–891 (2009).167. Movassagh, M. et al. Differential DNA methylation correlates with differential expressionof angiogenic factors in human heart failure. PLoS ONE 5, e8564 (2010).168. Hang, C.T. et al. Chromatin regulation by Brg1 underlies heart muscle developmentand disease. Nature 466, 62–67 (2010).169. Symonds, M.E., Sebert, S.P., Hyatt, M.A. & Budge, H. Nutritional programming of themetabolic syndrome. Nat. Rev. Endocrinol. 5, 604–610 (2009).170. Zeng, W. et al. Specific loss of histone H3 lysine 9 trimethylation and HP1gamma/cohesin binding at D4Z4 repeats is associated with facioscapulohumeral dystrophy(FSHD). PLoS Genet. 5, e1000559 (2009).171. Wilkins-Haug, L. Epigenetics and assisted reproduction. Curr. Opin. Obstet. Gynecol.21, 201–206 (2009).172. Tahiliani, M. et al. Conversion of 5-methylcytosine to 5-hydroxymethylcytosine in mammalianDNA by MLL partner TET1. Science 324, 930–935 (2009).173. Ito, S. et al. Role of Tet proteins in 5mC to 5hmC conversion, ES-cell self-renewal andinner cell mass specification. Nature 466, 1129–1133 (2010).174. Ghildiyal, M. & Zamore, P.D. Small silencing RNAs: an expanding universe. Nat. Rev.Genet. 10, 94–108 (2009).175. Mattick, J.S. The genetic signatures of noncoding RNAs. PLoS Genet. 5, e1000459(2009).176. Bernstein, B. The NIH Roadmap Epigenome Mapping Consortium. Nat. Biotechnol.28, 1045–1048 (2010).177. Satterlee, J. Tackling the epigenome: challenges and opportunities for collaborativeefforts. Nat. Biotechnol. 28, 1039–1044 (2010).1068 volume 28 number 10 OCTOBER 2010 nature biotechnology


e v i e wEpigenetic modifications as therapeutic targetsTheresa K Kelly 1,2 , Daniel D De Carvalho 1,2 & Peter A Jones 1© 2010 Nature America, Inc. All rights reserved.Epigenetic modifications work in concert with genetic mechanisms to regulate transcriptional activity in normal tissues and areoften dysregulated in disease. Although they are somatically heritable, modifications of DNA and histones are also reversible,making them good targets for therapeutic intervention. Epigenetic changes often precede disease pathology, making themvaluable diagnostic indicators for disease risk or prognostic indicators for disease progression. Several inhibitors of histonedeacetylation or DNA methylation are approved for hematological malignancies by the US Food and Drug Administration andhave been in clinical use for several years. More recently, histone methylation and microRNA expression have gained attention aspotential therapeutic targets. The presence of multiple epigenetic aberrations within malignant tissue and the abilities of cells todevelop resistance suggest that epigenetic therapies are most beneficial when combined with other anticancer strategies, such assignal transduction inhibitors or cytotoxic treatments. A key challenge for future epigenetic therapies will be to develop inhibitorswith specificity to particular regions of chromosomes, thereby potentially reducing side effects.Epigenetics encompasses the wide range of heritable changes in geneexpression that do not result from an alteration in the DNA sequenceitself. DNA methylation, the reversible post-translational modificationof the range of histone variants, and nucleosome positioning collectivelydefine the epigenetic landscape of a cell 1,2 . DNA methylationoccurs when a methyl group is added to the 5′ position of the cytosinering of CpG dinucleotides. Recently, methylation in embryonic stemcells was also suggested to occur at sites other than CpG dinucleotides,mainly on the cytosine of CHH or CHG trinucleotides (where H = A,C or T) 3 . In addition, it was recently shown that 5- methylcytosinecan be converted into 5-hydroxymethylcytosine by members of theTET protein family 4 , mainly in embryonic stem cells and Purkinjecells 5 . The biological relevance of these recently described types ofmethylation is an area of active investigation. Histones can be covalentlymodified after translation by the addition of methyl, acetyl,phosphoryl, ubiquityl or sumoyl groups. Whether the modificationfacilitates or inhibits transcription depends on the histone residuemodified and the type of modification. The localization of nucleosomeswithin genomic regulatory regions has an important role increating environments that either permit or prevent transcription.Nucleosomes consist of DNA wrapped around a core of two copies ofeach of the H2A, H2B, H3 and H4 histone proteins, thus linking DNAmethylation and histone modifications. The presence of particularvariants of core histone proteins, such as H3.3 and H2A.Z, at specificgenomic loci influences the stability of nucleosome occupancy.Thus, multiple levels of epigenetic control account for appropriateorchestration of gene expression in healthy cells and dysregulatedgene expression in disease.1 Departments of Urology and Biochemistry and Molecular Biology, NorrisComprehensive Cancer Center, Keck School of Medicine, University of SouthernCalifornia, Los Angeles, California, USA. 2 These authors contributed equally tothis work. Correspondence should be addressed to P.A.J. (pjones@med.usc.edu).Published online 13 October 2010; doi:10.1038/nbt.1678Here, we focus on recent examples in which epigenetic modificationshave been used to evaluate disease risk, progression and clinicalresponse. We aim to provide a broad overview of the accomplishments,remaining challenges and unrealized potential of epigenetic therapiesin a range of diseases, with a particular emphasis on cancer.Epigenetic disease mechanisms and their clinical relevanceEpigenetic aberrations have been well established in cancer 6,7 andoccur in several other diseases, including diabetes 8 , lupus 9 , asthma 10and a variety of neurological disorders 7,11–13 (Table 1 and referenceswithin). In cancer cells, a global loss of DNA methylation (hypomethylation),particularly in gene bodies and intergenic regions (includingrepetitive elements) leads to genomic instability. This globalhypomethylation is accompanied by increased de novo methylation(hypermethylation) of many promoters of tumor suppressors andother genes that are contained within CpG islands. This results instable gene silencing (Fig. 1). In addition to changes in DNA methylation,cancer cells are characterized by a global loss of histone H4Lys16 (H4K16) acetylation and H4K20 trimethylation. There is alsoincreased expression of BMI1, a component of the polycomb repressivecomplex (PRC)-1, and EZH2, a histone-methylating componentof PRC2, which both inhibit gene expression 6,14 . Notably, recent evidencehas shown that genes targeted by the PRC in embryonic stemcells are more likely than others to become methylated in cancer 15–17 ,suggesting that aberrant linkage between polycomb repression andthe silencing of gene expression by DNA methylation may at leastpartly account for early changes seen during oncogenesis. Furtherunderstanding of the basis of this switch in epigenetic silencingmechanisms may provide new avenues to evaluate the tumorigenicpotential of abnormal tissue.Epigenetic modifications can be used to stratify disease subtypes,severity or treatment responsiveness 18 and to predict clinical outcomes19,20 . H3 acetylation and H3K9 dimethylation can discriminatebetween cancerous and nonmalignant prostate tissue, and H3K4 trimethylationcan predict the recurrence of prostate-specific antigennature biotechnology VOLUME 28 NUMBER 10 OCTOBER 2010 1069


e v i e wTable 1 Selected examples of epigenetic alterations associated with diseaseEpigenetic aberration Enzyme responsible Disease Epigenetic alteration Comments ReferenceDNA methylation DNMT1, DNMT3A, Rett syndrome Inability to ‘read’ DNA methylation MECP2 mutation 11–13DNMT3B and DNMT3LDiabetes Hypermethylation of PPARGC1A promoter 8CancerGlobal hypomethylation, hypermethylation6,7,11of some CpG island promoters, includingCIMPSystemic lupus erythematosus Hypomethylation of CpG islands at Decreased DNMT1 and 9specific promoter regionsDNMT3B expressionICF syndrome Hypomethylation at specific sites DNMT3B mutation 11–13ATR-X syndromeHypomethylation of specific repeat andsatellite sequencesATRX mutation 11,12© 2010 Nature America, Inc. All rights reserved.Histone acetylation HATs and HDACs Rubinstein-Taybi syndrome Hypoacetylation Mutation in gene encodingCBP, a known HATDiabetesHyperacetylation at promoters ofinflammatory genesAsthma Hyperacetylation Increased HAT activity anddecreased HDAC activityCancer H4K16 acetylation loss Hypomethylation of DNArepetitive sequencesHistone methylation HMTs and HDMs Cancer H4K20me3 loss Hypomethylation of DNArepetitive sequencesSotos syndrome Decreased H4K20me3 and H3K36me3 Loss of function of NSD1,a HMTHuntington’s disease Increased H3K9me3 and possibly Increased expression of theincreased H3K27 trimethylation HMT ESET; enhanced PRC2activitymiRNA expression N/A Cancer Decreased miR-101 Increased EZH2, H3K27 74,87trimethylationDecreased miR-143 Increased DNMT3A 88Decreased miR-29Increased DNMT3A and 89DNMT3BIncreased miR-21 Decreased PTEN 96Increased miR-155 Lower survival rates 95ATR-X, alpha-thalassemia X-linked; CIMP, CpG island methylator phenotype; HAT, histone acetyltransferase; HDM, histone demethylase; HMT, histone methyltransferase;ICF, immunodeficiency, centromere instability and facial anomalies; me3, trimethylation.accumulation after prostatectomy 21 . EZH2 expression is an independentprognostic marker that is correlated with the aggressiveness ofprostate, breast and endometrial cancers 22 . Expression of the DNArepair gene O(6)-methylguanine-DNA methyltransferase (MGMT)antagonizes chemotherapy and radiation treatment 23 . Accordingly,silencing of MGMT by endogenous hypermethylation is correlatedwith positive treatment response. Furthermore, epigenetic alterationscan precede tumor formation and are thus potential diagnostic indicatorsof disease risk 24 . For example, infection with Helicobacter pyloriis associated with DNA hypermethylation of specific genes, which areoften methylated in cancer 25 . Thus, reversal of epigenetic alterationsthat occur as a result of an acute illness may prevent progression to amore chronic disease state.The growing development of technologies to analyze the epigenomehas led to the emergence of pharmacoepigenomics, the useof epigenetic profiles to identify molecular pathways most sensitiveto cancer drugs 26 as a means of prioritizing therapeutic strategies.In non–small-cell lung cancer, an unmethylated IGFBP3 promoterindicates responsiveness to cisplatin-based chemotherapy 27 . A polymorphismin the gene encoding the CYP2C19*2 variant of a cytochromeP450 protein necessitates the use of higher doses of valproicacid (VPA) to achieve target plasma concentrations 28 . Furthermore,epigenetic changes can be monitored to measure treatment efficacyand disease progression. Methylation of PITX2 can be usedto predict outcomes of individuals with early-stage breast cancerafter adjuvant tamoxifen therapy 29 . Patients with hypermethylationof the gene encoding p16 (CDKN2A) have lower recurrence rates11–13of bladder cancer compared to patients with no hypermethylationafter interleukin-2 treatment 30 . As epigenetic mechanisms determinewhich genes, and thus signaling pathways, can be activated,the presence of distinct modifications on specific genes and subsetsof genes can aid at several steps in determining and monitoringoptimal therapeutic approaches.The reversibility of epigenetic modifications makes them more‘druggable’ than attempts to target or correct defects in the genesequence itself. Moreover, it is possible that cancer cells can become‘addicted’ to the aberrant epigenetic landscape resulting from multipleepigenetic abnormalities 31 , rendering them more sensitive thannormal cells to epigenetic therapy though a mechanism similar to aninverted oncogene addiction. A classic example of oncogene addictionis mesenchymal-epithelial transition factor (MET), a tyrosine kinasethat acts as a receptor for hepatocyte growth factor and controlstissue homeostasis in normal cells 32 . MET can be aberrantly activatedin cancer by ligand-dependent mechanisms or by overexpression 32 .Although MET has roles in both normal and cancer cells, the latterare more sensitive to MET inhibition owing to their greater relianceon MET signaling 32 . Thus, cancer cells become dependent (and consequentlyaddicted) to increased activity of a few highly importantoncogenes. It is possible that cancer cells undergo a parallel process bywhich they become dependent on aberrant silencing or inactivation ofa few crucial tumor suppressor genes. As it is well known that severaltumor suppressor genes are silenced in cancer by epigenetic mechanisms6 , it is possible that cancer cells become addicted to their aberrantepigenetic landscape and consequently become more sensitive81066113121070 VOLUME 28 NUMBER 10 OCTOBER 2010 nature biotechnology


e v i e wNormalCancerAcetylationTrimethylationK4 K4 K4PRC reprogrammingK27 K27 K27 K27Methylated CpGUnmethylated CpGTumor suppressor genes(e.g., FBXO32)PRC2 inhibitor, HDAC inhibitor,LSD1 inhibitor, miR-101Aberrantly polycomb-repressedgenes (e.g., FBXO32)K4 K4 K45mC reprogrammingTumor suppressor genes(e.g., MLH1, RUNX3)DNMT inhibitor, miR-143,miR-29, LSD1 inhibitorAberrantly methylated genes(e.g., MLH1, RUNX3)K4 K4 K4K27 K27 K27 K27Epigenetic switchingDNMTinhibitorCTAs (e.g., NY-ESO-1)Immunotherapy targetsPolycomb-repressed genes(e.g., PAX7)???Aberrantly methylated genes(e.g., PAX7)© 2010 Nature America, Inc. All rights reserved.Figure 1 Epigenetic aberrations of CpG island promoters in cancer cells and the epigenetic therapies that target them. Tumor suppressor genes (suchas FBXO32, MLH1 and RUNX3) are expressed in normal cells and become silenced in cancer cells. This can occur either by PRC reprogramming (as forFBXO32), where the polycomb group protein EZH2 catalyzes the methylation of H3K27, or by 5-methylcytosine (5mC) reprogramming (as for MLH1 andRUNX3) owing to de novo DNA methylation by DNMT3A and DNMT3B. Polycomb-mediated repression can be targeted by inhibitors of PRC2, such asDZNep, and re-expression of these genes can be enhanced by HDAC and LSD1 inhibitors allowing acetylation of H3 and H4 and methylation of H3K4,respectively. Polycomb-mediated repression can also be reversed by inducing miR-101 expression, which inhibits the expression and function of EZH2.5mC reprogramming can be reversed, mainly by DNMT inhibitors, but also by re-expression of miR-143 and miR-29, two miRNAs that target de novoDNMTs. LSD1 inhibitors may also reactivate tumor suppressor genes by inhibiting DNMT1 stabilization, leading to loss of DNA methylation maintenance.Genes that are polycomb-repressed in normal cells (such as PAX7) can undergo epigenetic switching by DNA methylation, thus losing their plasticityduring transformation. It is not known whether treatment of cancer cells with DNMT inhibitors alone can reverse epigenetic switching to restore thepolycomb-repressed state or whether it will reactivate this set of genes. Cancer-testis antigens (CTAs, such as NY-ESO-1) can become silenced by DNAmethylation in cancer. Treatment with DNMT inhibitors can induce CTA expression, allowing the immune system to recognize and kill the cancer cells.Red arrows represent epigenetic alterations during transformation; green arrows represent reversion of these alterations by epigenetic therapy.to epigenetic therapy than normal cells. There is some evidence thatcancer cells are preferentially, affected by epigenetic therapies 33 .We next consider progress and remaining challenges in manipulatingDNA methylation and histone modifications for therapeutic purposes,including microRNAs (miRNAs), which can also affect geneexpression without altering DNA sequence and regulate as well as beregulated by epigenetic mechanisms. What are the merits and limitationsof therapeutic strategies that intervene at these distinct levelsof regulation of the epigenetic landscape? Moreover, how might theybe used together or in combination with nonepigenetic therapies toprevent disease and remission?DNA methylationCancer is characterized by global hypomethylation, with hypermethylationof a subset of gene promoters contained within CpGislands leading to gene silencing (Fig. 1) 6 . This hypermethylationhas recently been described to extend past the boundaries of CpGislands into so-called DNA shores 34 . DNA (cytosine-5)-methyltransferase(DNMT)-3A and DNMT3B are responsible for de novoDNA methylation patterns, which are then copied to daughter cellsduring S phase by DNMT1. DNA methylation inhibitors have beenwell characterized and tested in clinical trials 35 . 5-Azacytidine(5-Aza-CR; Vidaza; azacitidine), a nucleoside analog that is incorporatedinto RNA and DNA, is approved to treat patients with high-riskmyelodysplastic syndromes (MDS) and successful clinical results haverecently been reported (Tables 2 and 3) 36 . 5-Aza-2-deoxycytidine (5-Aza-CdR; Dacogen; decitabine) is the deoxy derivative of 5-Aza-CRand is incorporated only into DNA. At low doses, both azanucleosidesact by sequestering DNMT enzymes after incorporation into DNA,leading to global demethylation as cells divide. At higher doses, theyinduce cytotoxicity. Zebularine is a cytidine analog that acts similarlyto 5-Aza-CR but has lower toxicity and greater stability and specificity37 . Another drug for which promising preclinical data are availableis S110, a decitabine derivative with better stability and activitythan 5-Aza-CdR (Fig. 2) 38 . In addition to inhibiting DNMT activity,azanucleosides act through nonspecific mechanisms, which are likelyto contribute to their clinical effectiveness.Analysis of promoter DNA methylation can classify cancers 26,39,40 ,predict the progression of cancer 41,42 and direct therapy 43,44 . Forexample, DNA methylation of specific promoters may identify asubset of colorectal cancers that are responsive to 5-fluorouracil 43 .Furthermore, use of DNA methylation inhibitors to reverse the silencingof MLH1 restores sensitivity to cisplatin 45 . This suggests that combiningDNA methylation inhibitors with conventional chemotherapydrugs increases therapeutic efficacy. Successful conventional chemotherapydepends on activation of proapoptotic genes that respondto cytotoxic agents, leading to cell death. DNA methylation of theseproapoptotic genes can prevent cell death, which in turn confersresistance to chemotherapy. Thus, reactivation of epigeneticallysilenced apoptotic genes should increase the efficacy of chemotherapy.For example, APAF1 is silenced in metastatic melanoma cells, andtreatment with 5-Aza-CdR restores expression and chemosensivity 44 .Conversely, methylation-induced silencing of DNA repair genes canbe detrimental (by leading to microsatellite instability 46 ) or beneficial(by preventing the repair of genes targeted by chemotherapy, causingcells to undergo apoptosis rather than repair 47 ). Methylationinducedsilencing of cancer-testis antigens, such as NY-ESO-1, canprotect cancer cells from being recognized by T cells. Treating cancercells with demethylating agents can induce the expression of theseantigens, allowing recognition and killing by engineered cytotoxicnature biotechnology VOLUME 28 NUMBER 10 OCTOBER 2010 1071


e v i e w© 2010 Nature America, Inc. All rights reserved.Table 2 Selected clinical trials of epigenetic cancer therapies with published findingsPhase ofEpigenetic target Agentstudy Disease FindingsDNMT inhibitor aloneDNMTs 5-Aza-CR 2/3 MDS and AML Complete remission in 10–17% andhematological improvementin 23–36%3 MDS Better overall survival than withconventional care (24.5 vs.15 months)5-Aza-CdR 2 MDS and CMML Anti-MDS and anti-CMMLactivities with a safe toxicityprofile; 34% of patients achievedcomplete response and 73%had objective responseHDAC inhibitor aloneHDAC Phenylbutyrate 1 MDS and AML Well tolerated; no patients achievedcomplete or partial remission,although four achieved hematologicalimprovementVorinostat (SAHA) 1 Relapsed or refractory Seven of 31 AML patients showedAML, CLL, MDS, hematological improvement, including twoALL and CMLcomplete responses and two completeresponses with incomplete bloodcount recovery1 Advanced solid andhematologic malignanciesT lymphocytes 48 . This suggests the possibility of augmenting the efficacyof immunotherapy by combining it with drugs that modulateepigenetic regulation (Fig. 1).Despite the clinical successes achieved with DNA methylationinhibitors, there is still considerable room for improvement. Theavailable DNA methylation inhibitors block DNA methylation bytrapping DNMT enzymes on DNA, preventing methylation at othergenomic loci . Notwithstanding the therapeutic benefits of simultaneouslycounteracting the broad hypermethylation of tumor suppressorgenes characteristic of most cancers, global hypomethylation maylead to activation of oncogenes and/or increased genomic instability.Moreover, DNA hypomethylation can activate promoters withinrepetitive elements. For example, hypomethylation of long interspersednuclear element-1 can activate an alternative transcript ofthe MET oncogene in bladder cancer 49 . Moreover, DNA methylationinhibitors have also been implicated in defects in memory-associatedneural plasticity, suggesting a link between DNA methylation andneural plasticity associated with learning and memory 50 .Developing DNA methylation inhibitors that target specific genesor groups of genes would overcome these perceived risks of agentsresponsible for global DNA demethylation. Furthermore, becauseDNA methylation inhibitors act during the S phase of the cell cycle,they preferentially affect rapidly growing cells. This is advantageouswhen treating rapidly dividing cancer cells but may be less clinicallyOne complete response (diffuse largeB-cell lymphoma), three partial responses(cutaneous T-cell lymphoma)Number ofsubjects309 114358 3695 11527 11641 11773 118Combination therapyDNMTs and HDAC 5-Aza-CR and VPA 1 Advanced solid cancers Combination is safe; 25% of patients 55 110showed stable disease (median, 6 months)5-Aza-CR and1 Refractory solid tumors Combination is safe; no clinical benefit 27 112phenylbutyrateHDAC Vorinostat and doxorubicin 1 Solid tumors Two of 24 showed partial responses(breast and prostate cancer) and two stabledisease for more than 8 months (melanoma)32 119Vorinostat plus carboplatinand paclitaxel1 Advanced non–small-celllung cancerBetter response ratio (34% versus 12.5%),progression-free survival (6 versus 4.1months) and overall survival (13 versus 9.7months) than with placebo plus carboplatinand paclitaxel94 106ReferenceALL, acute lymphoblastic leukemia; CLL, chronic lymphocytic leukemia; CML, chronic myelogenous leukemia; CMML, chronic myelomonocytic leukemia; SAHA, suberoylanilidehydroxamic acid.useful in treating diseases that are not characterized by rapid cellcycling. Moreover, the observation that levels of DNA methylationreturn to pretreatment levels upon withdrawal of azanucleoside 11 suggestsa continual need for DNMT inhibition. Thus, despite the clinicalsuccess of DNA methylation inhibitors, their lack of specificity, cellcycle dependency and need for continuous administration leave roomfor the development of better therapies.Histone modificationsWhereas DNA methylation is considered to be a very stable epigeneticmodification, histone modifications are more labile. Levels of histonemodifications are maintained by the balance between the activitiesof histone-modifying enzymes that add or remove specific modifications.As aberrant histone modification levels result from an imbalancein these modifying enzymes in diseased tissue, correcting theincreased or decreased level of a particular enzyme should restore thenatural equilibrium in the affected cells.Cancer cells are characterized by dysregulation of histone methyltransferasesand histone demethylases, overexpression of histonedeacetylases (HDACs), and a global reduction in levels of histoneacetylation 6,14,51–53 . HDAC inhibitors have long been studied in theclinical setting as potential therapies (Fig. 2), and recent clinical trialsof these agents have been extensively reviewed elsewhere (see alsoTables 2 and 3) 54 . HDAC inhibitors can also affect the acetylation1072 VOLUME 28 NUMBER 10 OCTOBER 2010 nature biotechnology


HOOOPOONONaOHNH2NONNNOONNHHOOr e v i e w© 2010 Nature America, Inc. All rights reserved.FDA approvedPre-clinical trials Clinical trials5-AzacytidineDNA methylation5-Aza-2′-deoxycytidineHydralazineS110NH 2OVorinostat(SAHA)Histone acetylationNHOHNO HH 3 COCH 3NHONHSOSHNOCH 3of proteins other than histones, potentially leading to more globaleffects 54 . Furthermore, because HDAC inhibitors only target ~10%of all acetylation sites 55 , more work is necessary to understand theunderlying basis for target specification of global and isoform-specificHDAC inhibitors. Substantial efforts are currently under way to findnew molecules that can selectively inhibit specific HDACs 56,57 andthus avoid the side effects that occur with a global HDAC inhibitor,including cardiac toxicity 54 and deficits in hematopoiesis 58 and memoryformation 59–61 . To date, specific inhibitors of HDAC6 (class II)and HDAC8 (class I) have been developed 56,57 . When combined witha better understanding of the pathophysiology of diseases associatedwith alterations in HDACs, the development of specific HDACinhibitors will allow more rational therapy and potentially reduceside effects. For example, the HDAC inhibitor PCI-34051, which isderived from a low-molecular-weight hydroxamic acid scaffold, selectivelyinhibits HDAC8 and induces apoptosis in T-cell lymphomasbut not other tumor or normal cells. This indicates that HDAC8 hasan important role in the pathophysiology of this disease and suggeststhat therapy with an HDAC8-specific inhibitor(s) can reduce undesirableside effects 57 . Other HDAC inhibitors are selective to a groupof HDAC isoforms, rather than a specific isoform, allowing theiruse for a wider range of diseases while minimizing side effects. Forexample, MGCD0103 (mocetinostat), which inhibits HDAC isoforms 1,H 3 COHNOH3CPhenylbutyrateRomidepsinEntinostat (MS-275)PCI-34051Histone methylationNHHOHOHNN/AN/A4 NH4SL11144OHNNDZNepFigure 2 Chemical structures of selected compounds that target epigeneticmodifications. Several molecules that target epigenetic alterations inpathological states are currently at different stages of drug development. Thenucleoside analogs 5-azacytidine and 5-aza-2′-deoxycytidine are approvedby the US Food and Drug Administration (FDA) to treat high-risk MDS,and successful clinical results have been reported. The drug hydralazineis currently being investigated in clinical trials as a putative demethylatingagent against solid tumors. S110, a dinucleotide containing 5-aza-CdR,has been shown in vitro to demethylate DNA and is more stable than5-aza-CdR because it is less sensitive to deamination by cytidine deaminase.Targeting of histone acetylation has also been a successful example ofepigenetic therapy. Several HDAC inhibitors are FDA approved, including thehydroxamic acid–based compound SAHA and the depsipeptide romidepsin,whereas others are currently in clinical trials for cancer (phenylbutyrate andentinostat) and neurologic diseases (entinostat). New molecules targetingspecific HDACs are under preclinical investigation (such as PCI-34051,which targets HDAC8). More recently, significant effort is under way to findnew molecules able to target histone methylation. To our knowledge, nodrugs targeting histone methylation are FDA approved or in clinical trials.Even so, preclinical trials suggest antitumor activity of the oligoamine analogSL11144, which inhibits LSD1, and the S-adenosylhomocysteine hydrolaseinhibitor DZNep, which depletes cellular levels of PRC2 components.HClNH 2NHN2 and 3 (class 1) and 11 (class 4), was shown in clinical trials to betolerable and inhibit histone acetylation in patients with advancedsolid tumors 62 . MGCD0103 was also shown to be safe and to haveantileukemia effects 63 . Although the identification of additionalspecific HDAC inhibitors will increase specificity and the possibilityof personalized treatments, it may also limit the likelihood of theirsuccessful incorporation into combinatorial therapies.Histone methyltransferase and demethylase enzymes are generallymore specific than HDACs in that they target fewer residues 64 .However, like HDACs, lysine and arginine methyltransferase enzymesalso methylate proteins other than histones 65,66 . A great deal of effortis under way to find drugs able to revert specific histone methylationmarks or to selectively target histone methyltransferases or histonedemethylases. In this regard, a new class of oligoamine analogs wasrecently found that act as potent inhibitors of lysine-specific demethylase-1(LSD1; Fig. 2). LSD1 targets the activating H3K4 mono- anddimethylation mark but can also target the repressive H3K9 dimethylation(H3K9me2) mark when complexed with the androgen receptor51,67 . Treatment of colon cancer cells with LSD1 inhibitors (suchas SL11144) increases H3K4 methylation, decreases H3K9me2, andrestores expression of SFRP2 (ref. 68), indicating context specificityof LSD1 and its inhibitors. LSD1 inhibition in neuroblastoma resultsin decreased proliferation in vitro and reduced xenograft growth 69 .Notably, LSD1 can also demethylate DNMT1, resulting in destabilizationand loss of global maintenance of DNA methylation 70 . Theability of LSD1 to affect both histone and DNA methylation makes ita promising target for epigenetic therapy.The repression mediated by the H3K27 trimethylation (H3K27me3)mark occurs through the actions of two multisubunit complexes,PRC1 and PRC2. The H3K27me3 mark deposited by EZH2 is recognizedand bound by PRC1, which can further recruit additionalproteins to establish a repressed chromatin configuration 6 . Genepromoters that are marked by PRC2 (that is, polycomb target genes)in embryonic stem cells have recently been shown to be far morelikely than other genes to become methylated in cancer 15–17 . Similarly,polycomb targets in normal prostate cells also become methylatedin prostate cancer 71 . Thus, alterations in chromatin structure do notalways coincide with changes in gene expression associated withdisease. Instead, DNA methylation replacement of polycomb repressivemarks ‘locks in’ an inactive chromatin state through a processcalled epigenetic switching 71 . Although the mechanism underlyingthe predisposition of polycomb targets for DNA methylation is notfully understood, some links have recently been uncovered. CBX7, acomponent of the PRC1 complex, can directly interact with DNMT1and DNMT3B at polycomb target genes 72 .Although drugs that target histone methylases and demethylaseshave considerable potential, more work is necessary to determinetheir specificities and the stabilities of the changes they effect. Thereare currently no such drugs in clinical trials. Preclinical studies suggestthat the S-adenosylhomocysteine hydrolase inhibitor 3-deazaneplanocinA (DZNep) shows the most promise (Fig. 2). DZNepdepletes cellular levels of PRC2 components (EZH2, EED and SUZ12)and consequently reduces H3K27me3 levels and induces apoptosis inbreast cancer, but not normal, cells 73 . The effect of DZNep is similar tothat observed when EZH2 is depleted by RNA interference, suggestingthat this drug is more effective in cancers of the prostate and breast,which rely on abnormally high EZH2 expression levels 74 . In contrast,a subsequent study showed that DZNep also decreases H4K20me3.This demonstration that DZNep lacks specificity and acts more as aglobal histone methylation inhibitor underscores the need for furtherdevelopment of histone methylation inhibitors 75 .nature biotechnology VOLUME 28 NUMBER 10 OCTOBER 2010 1073


e v i e w© 2010 Nature America, Inc. All rights reserved.Table 3 Epigenetic cancer therapies under commercial development (either in safety and efficacy trials or approved)Drug Sponsor Indication Clinical statusDNMT inhibitors5-Aza-CdR (Dacogen) Eisai (Tokyo) MDS Approved May 2006AMLPhase 3 in 480 patients1 st line CML Phase 2 in 19 patients5-Aza-CR (Vidaza) Celgene (Summit, NJ, USA) MDS Approved May 2004AMLPhase 3 targeting 480 patientsHematologic cancer Phase 2S110 (dinucleotide prodrug of decitabine) SuperGen (Dublin, CA, USA) MDS and AML New Drug ApplicationHDAC inhibitorsRomidepsin (Istodax; a cyclic depsipeptide) Celgene CTCL Approved November 2009NHL Phase 2Vorinostat (Zolinza; suberoylanilidehydroxamic acid)Merck (Whitehouse Station,NJ, USA)EZH2 activity can also be regulated by signaling cascades. Forexample, AKT phosphorylates EZH2 at Ser21, suppressing its methyltransferaseactivity and thereby reducing levels of H3K27me3 (ref.76). The frequency of H3K27 trimethylation can be restored usingLY294002, an inhibitor of the phosphatidylinositol-3-kinase and AKTpathway, opening a new therapeutic opportunity to repair epigeneticalterations by targeting upstream signaling pathways. Furthermore, inprostate cancer, the oncogenic ETS transcription factor ERG can bindto the EZH2 promoter and induce overexpression. Thus, pharmacologicaldisruption of ERG activity could reduce the EZH2 overexpressionobserved in cancer 77 . EZH2 is a particularly important examplebecause it is frequently overexpressed and aberrantly targeted to genesin cancer 71 , a process termed PRC reprogramming (Fig. 1).G9a and G9a-like protein (GLP) are histone methyltransferases thatcatalyze H3K9 dimethylation and are often overexpressed in tumors 78 .Knockdown of G9a in prostate cancer cells indicates a crucial rolefor this protein in regulating centrosome duplication and chromatinstructure. The likely importance of G9a in perpetuating the malignantphenotype and its promise as a target in cancer therapy 79 have generatedsubstantial interest in developing G9a and GLP inhibitors. Thusfar, the most efficient inhibitor is BIX-01294, a diazepine-quinazolineaminederivative that transiently reduces global H3K9me2 levels inseveral cell lines 80 . BIX-01294 binds to the SET domain of GLP in thesame groove at which the target lysine (H3K9) binds. This preventsthe binding of the peptide substrate and, consequently, the depositionof methylation marks at H3K9 (ref. 81).Several other histone methyltransferases and demethylases have alsobeen associated with diseases, making them potential targets for epigenetictherapy. For instance, MMSET, a H4K20 methyltransferase, isoverexpressed in myeloma cell lines and is required for cell viability 82 .CTCL Approved October 2006MesotheliomaPhase 3 targeting 660 patientsMDS, NHL, brain cancer and NSCLC Phase 2Vorinostat + bortezomib (Velcade) Multiple myeloma Phase 2 and 3 targeting 742 patientsPanobinostat (LBH589; hydroxamate analog) Novartis (Basel) Hodgkin’s lymphoma Phase 3 in 367 patientsCML, AML and MDS Phase 2/3Panobinostat + bortezomib + dexamethasone Multiple myeloma Phase 3 targeting 676 patientsBelinostat (PXD10; hydroxamate analog) Spectrum Pharmaceuticals AML, CTCL, MDS, NHL andPhase 2(Irvine, CA, USA)ovarian cancerMocetinostat dihydrobromide (MGCD0103; MethylGene (Montreal, QC, AML, CLL, Hodgkin’s lymphoma, NHL, Phase 2aminopyrimidine analog)Canada)pancreatic cancer and thymic carcinomaEntinostat (SNDX-275; synthetic benzamide Syndax Pharmaceuticals Breast cancer, Hodgkin’s lymphoma Phase 2derivative)(Waltham, MA , USA) and NSCLCPCI-24781 (CRA-024781; hydroxamic acidderivative)Pharmacyclics (Sunnyvale,CA, USA)Hematologic cancer and sarcoma Phase 1/2OtherI-conjugated monoclonal antibody targeting Peregrine Pharmaceuticals Glioblastoma multiformePhase 2 in 40 patientsDNA–histone H1 complexes (Cotara) (Tustin, CA, USA)CLL, chronic lymphocytic leukemia; CML, chronic myelogenous leukemia; CTCL, cutaneous T-cell lymphoma; NHL, non-Hodgkin′s lymphoma; NSCLC, non–small-cell lungcancer. Sources: BioMedTracker, Thomsen Pharma Partnering and PubMed.SMYD3, a H3K4 methyltransferase, is also highly expressed in cancerand seems to have a role in carcinogenesis as a coactivator of estrogenreceptor-alpha 83 . Expression of GASC1, an H3K9 and H3K36demethylase, is often amplified in cancer, and its inhibition decreasesrates of cell proliferation 84 .Although the challenges associated with targeting specific histonemodifications have not prevented considerable clinical success withthis group of targets, it seems likely that therapeutics capable of targetingspecific histone-modifying enzymes could retain or increasetherapeutic success rates while decreasing side effects resulting fromthe lack of specificity. In contrast, targeting individual histone modifyingenzymes may decrease clinical efficacy if histone-modifyingenzymes not targeted by the drug in question compensate for anychanges and thereby confer drug resistance. Designing personalizedcocktails of inhibitors based on an individual’s need may help overcomethe potential problems of compensation and resistance.MicroRNAsSmall, noncoding miRNAs are able to induce heritable changes in geneexpression without altering DNA sequence and thus contribute to theepigenetic landscape. In addition, miRNAs can both regulate and beregulated by other epigenetic mechanisms. Expression of miRNAs isdysregulated in several diseases, including cancer 85 and certain neurodegenerativedisorders 86 . For example, miR-101 targets EZH2 fordegradation and is downregulated in several types of cancer, leadingto increased EZH2 expression (and consequently higher H3K27me3levels) and decreased expression of tumor suppressor genes 74,87 .Restoring expression of miR-101 leads to reduced H3K27me3 andinhibits colony formation and cancer cell proliferation 74,87 . Expressionof miR-143 in colorectal cancer cells 88 and the miR-29 family in lung1074 VOLUME 28 NUMBER 10 OCTOBER 2010 nature biotechnology


e v i e w© 2010 Nature America, Inc. All rights reserved.cancer cells 89 reduces DNMT3A and DNMT3B levels, respectively,and results in decreased cell growth and colony formation. Treatmentof cells with 5-Aza-CdR and 4-phenylbutyric acid results in miR-127 activation, which in turn downregulates the BCL6 oncogene inbladder cancer cells 90 . In fact, treatment with 5-Aza-CdR alone issufficient to reactivate miR-148a, miR-34b/c and miR-9—a group ofmiRNAs capable of suppressing metastasis 91 . In addition to inducingaberrantly repressed miRNAs using epigenetic drugs, replacementgene therapy may also be useful in reestablishing miRNA expression.Viral vectors generated by cloning individual or groups of humanmiRNAs have been successful in preclinical assays using a mousemodel of hepatocellular carcinoma, in which miR-26a expressionfrom an adeno-associated virus results in apoptosis and inhibitionof cancer cell proliferation in the absence of toxicity 92 . Gene therapyusing miRNAs has an advantage over conventional RNA interferencein that it is unlikely to generate a strong type I interferon responsebecause double-stranded RNA is not introduced to the cell 93 .Abnormally high expression of miRNAs can be targeted using recentlydeveloped locked nucleic acid (LNA)–modified phosphorothioate oligonucleotidetechnology. LNA-modified oligonucleotides contain anextra bridge in their chemical composition, leading to enhanced stabilitycompared to their unmodified counterparts. These LNA-modified phosphorothioateoligonucleotides can generate miRNAs, creating LNA–antimiRNAsthat can be delivered systemically. In preclinical assays withprimates, intravenous injections of LNA–anti-miRNA complementaryto the 5′ end of miR-122 antagonized liver-specific expression of thismiRNA without toxicity 94 . Phase 1 trials based on these promising resultsare currently under way. LNA–anti-miRNAs may be used to target aberrantlyexpressed miRNAs in other diseases, such as cancer. For example,miR-155 is upregulated in lung adenocarcinoma compared to noncancerouslung tissue, and patients with higher miR-155 expression havelower survival rates than do patients with lower miR-155 expression.This suggests that miR-155 is a promising target for LNA–anti-miRNAtherapy (Table 1) 95 . Several other miRNAs are upregulated in cancer andcould theoretically be used as LNA–anti-miRNA targets. For example,miR-21 is upregulated in several types of cancer (lung, breast, colon, gastricand prostate carcinomas; endocrine pancreatic tumors; glioblastomas;and cholangiocarcinomas) and targets the tumor suppressor PTEN(Table 1) 96 . Thus, miRNAs can both alter the epigenetic machinery andbe regulated by epigenetic alterations. This creates a highly controlledfeedback mechanism, making it a suitable target for epigenetic therapyand possibly an epigenetic drug itself.One unique advantage of targeting miRNAs is the ability of onemiRNA to regulate several target genes and multiple cellular processes.In that way, if the level of one or a few miRNAs has changed in a pathologicalstate, several different pathways could consequently be altered.Rather than trying to identify and directly target the proteins in multiplepathways, it would be more effective to restore the physiologicallevel and functions of the dysregulated miRNA(s). This clinical potentialhighlights the importance of better understanding miRNA profilesin healthy and diseased tissues in order to develop better therapeuticstrategies. Furthermore, multiple miRNAs that target different stepsof an overactive pathway could be combined to increase efficacy andallow for customization of therapies to individual patients. Althoughthe unique composition of miRNA-based therapy provides many benefits,additional research is necessary to determine the best method ofdelivery and increase miRNA stability to ensure efficacy.Combined epigenetic therapiesThe presence of multiple epigenetic aberrations in a single tissue, theability of diseased cells to develop resistance, and the discovery thatcommon sets of genes are regulated by distinct epigenetic mechanismsat different biological stages collectively point to the likely feasibilityof combinatorial approaches to target epigenetic modulators. Effortsfor more than 25 years to enhance therapeutic efficacy by combiningepigenetic strategies 97,98 have revealed both additive and synergisticeffects, depending on the targets 11,52 . Extensive work on the clinicalbenefits of combined DNMT and HDAC inhibition has been comprehensivelyreviewed elsewhere 6,46,54,99,100 . A recent phase 2 multicenterstudy examining the combination of 5-Aza-CR and the HDAC inhibitorVPA in patients with higher-risk MDS found that therapeutic levelsof VPA may increase the efficacy of 5-Aza-CR 28 . Sequential administrationof DNMT and HDAC inhibitors resulted in clinical efficacy inpatients with hematologic malignancies 54,99 . However, other studiesfound no correlation between baseline methylation levels or methylationreversal and positive clinical outcome in patients with MDS oracute myeloid leukemia (AML) after combined treatment with 5-Aza-CR and entinostat 101 . The mechanism behind the clinical efficacy ofsequential DNMT and HDAC inhibition remains controversial, andadditional studies investigating potential genetic or epigenetic determinantsof responsiveness will be helpful. Besides inducing apoptosisin cancer cells, another therapeutic approach involves inducing differentiationof cancer cells. To this end, following 5-Aza-CR and VPAtreatment with all-trans-retinoic acid resulted in global hypomethylationand histone acetylation and clinical response in nearly half oftreated patients with AML or high-risk MDS 102 .Although targeting of histone demethylases is still in its infancy, earlypreclinical studies show promise for using such drugs alone (as describedabove) or together with other epigenetic therapies. Restoration of theexpression of SFRP2, a negative regulator of Wnt signaling, in a humancolon cancer model after LSD1 and DNMT inhibition has been associatedwith significant growth inhibition of established tumors 68 . Notably,in addition to demethylating histone residues, LSD1 can demethylateDNMT1. This provides the ability to target both histone methylationand DNA methylation using a single compound. Cotreatment withthe HDAC inhibitor panobinostat further enhances DZNep-mediatedreduction in EZH2 levels, leading to increased p16, p21, p27 andFBX032 expression and apoptosis in cultured AML cells and mousemodels 103 . These promising data suggest that an absence of clinicaltrials targeting histone methylation and demethylation enzymes shouldnot diminish enthusiasm for their therapeutic potential.Epigenetic and cytotoxic therapiesConventional chemotherapy can rapidly induce cell death in cancercells, although resistance to standard chemotherapy often arisesthrough epigenetic and DNA repair mechanisms 27 . As a result,epigenetic therapeutics can be combined with more conventionaltherapies to induce responsiveness or overcome resistanceto cytotoxic treatments. Preconditioning with epigenetic drugscould reverse the epigenetic alteration(s) that confer resistance,restoring chemotherapeutic sensitivity. For example, 5-Aza-CRtreatment can reverse DNA methylation, thereby overcoming thegene silencing that led to chemotherapeutic resistance 104 . In contrast,methylation-induced silencing of DNA repair genes, such asMGMT, is correlated with a positive clinical response to chemotherapy.Thus, the potential for success of combinations of DNAmethylation inhibitors and chemotherapy may depend on the epigeneticprofile of an individual tumor. Responses of patients withpreviously untreated non–small-cell lung cancer to combinationsof the HDAC inhibitor vorinostat with carboplatin and paclitaxelwere sufficiently promising to warrant a phase 2 study, which alsoshowed encouraging results (Table 2) 105,106 .nature biotechnology VOLUME 28 NUMBER 10 OCTOBER 2010 1075


e v i e w© 2010 Nature America, Inc. All rights reserved.ConclusionsSeveral molecular regulators of the cellular epigenetic landscapehave been established as effective targets in successful therapies fora variety of malignancies. In particular, the inhibition of DNMTsor HDACs has been approved for cancer treatment. Although themechanism(s) behind the therapeutic benefit of DNMT and HDACinhibition are not fully understood, ongoing and future studies thatcombine genomic sequencing and expression data may provide thekeys to understanding the mechanism(s) underlying responsiveness.Besides their methylation and acetylation, histones can be phosphorylated,ubiquitylated and sumoylated. These modifications, whichhave been less well studied in the context of disease, may expandcurrent possibilities for therapeutic intervention.Given the importance of epigenetic mechanisms in controllingdevelopment and normal cellular behavior, it seems that approachescapable of targeting specific epigenetic alterations, rather than affectingglobal modifications, would greatly enhance clinical efficiency whilelowering toxicity and side effects. This is an important priority for thefield. Another major challenge in advancing epigenetic therapy willbe to discriminate between so-called driver genes (those that must beepigenetically silenced for disease to occur) and so-called passengergenes (those that are epigenetically silenced owing to aberrant activityof the epigenetic machinery, but are not necessary for disease to occur).Recent advances in high-throughput technologies such as genomewidesequencing, combined with RNA profiling, chromatin immunoprecipitationor bisulfite conversion, have generated large amountsof data that can be integrated to form a comprehensive understandingof the epigenetic alterations that are common and specific to variousdisease states. Assimilating these large datasets is likely to assistin identifying epigenetic alterations that are causative and those thatare merely correlative 107,108 . Thus, it may eventually be possible forpatients to be screened, using high-throughout technologies, and classifiedby epigenetic alterations of the driver genes responsible for theirillness. Along with the development of targeted inhibitors of epigeneticmodifications, this could open the way for the use of personalizedtargeted therapies.Currently, despite the successful clinical use of epigenetic therapiesto treat hematological malignancies, there has been little success intreating solid cancers (Tables 2 and 3) 109–112 . Initial clinical trials,which used treatment regimens later found to be less than optimal,resulted in low rates of positive clinical response. Administering morerecently developed dosing and treatment schedules, and classifyingtumor subtypes based on molecular signatures, may increase the efficacyof epigenetic therapy for solid tumors. Solid tumors invariablycomprise heterogeneous populations of cells, many at different stagesof differentiation. Clinical success may therefore require more effectiveapproaches to determine which of these cells harbor epigeneticalterations and new strategies to ensure that therapeutic agents maintainstability and are able to penetrate the cellular mass and reachaffected cells.The recognition of epigenetics as a significant contributor tonormal development and disease has opened new avenues for drugdiscovery and therapeutics, with a range of prospects that continuesto expand as our knowledge of epigenetic regulation advances.Epigenetic therapies could be combined with conventional therapiesto develop personalized treatments, render unresponsive tumorssusceptible to treatment and reduce dosing. These advances maylimit the side effects of treatment, improving compliance with dosingregimens and overall quality of life.AcknowledgmentsSupported by R37CA082422 and R01CA083867 (P.A.J). We thank members ofthe Jones laboratory for helpful discussions and careful reading of the manuscript,particularly H. Han for help in drawing chemical structures.COMPETING FINANCIAL INTERESTSThe authors declare competing financial interests: details accompany the full-textHTML version of the paper at http://www.nature.com/naturebiotechnology/.Published online at http://www.nature.com/naturebiotechnology/.Reprints and permissions information is available online at http://npg.nature.com/reprintsandpermissions/.1. Portela, A. & Esteller, M. Epigenetic modifications and human disease.Nat. Biotechnol. 28, 1057–1068 (2010).2. Meissner, A. Epigenetic modifications and their role in pluripotency.Nat. Biotechnol. 28, 1079–1088 (2010).3. Lister, R. et al. Human DNA methylomes at base resolution show widespreadepigenomic differences. Nature 462, 315–322 (2009).4. Tahiliani, M. et al. Conversion of 5-methylcytosine to 5-hydroxymethylcytosine inmammalian DNA by MLL partner TET1. Science 324, 930–935 (2009).5. Wu, S.C. & Zhang, Y. Active DNA demethylation: many roads lead to Rome. Nat.Rev. Mol. Cell Biol. 11, 607–620 (2010).6. Jones, P.A. & Baylin, S.B. The epigenomics of cancer. Cell 128, 683–692(2007).7. Fouse, S.D. & Costello, J.F. Epigenetics of neurological cancers. Future Oncol.5, 1615–1629 (2009).8. Villeneuve, L.M. & Natarajan, R. The role of epigenetics in the pathology ofdiabetic complications. Am. J. Physiol. Renal Physiol. 299, F14–F25 (2010).9. Javierre, B.M. et al. Changes in the pattern of DNA methylation associate with twindiscordance in systemic lupus erythematosus. Genome Res. 20, 170–179 (2010).10. Adcock, I.M., Ito, K. & Barnes, P.J. Histone deacetylation: an important mechanismin inflammatory lung diseases. COPD 2, 445–455 (2005).11. Egger, G., Liang, G., Aparicio, A. & Jones, P.A. Epigenetics in human diseaseand prospects for epigenetic therapy. Nature 429, 457–463 (2004).12. Urdinguio, R.G., Sanchez-Mut, J.V. & Esteller, M. Epigenetic mechanisms inneurological diseases: genes, syndromes, and therapies. Lancet Neurol. 8, 1056–1072(2009).13. Feng, J. & Fan, G. The role of DNA methylation in the central nervous systemand neuropsychiatric disorders. Int. Rev. Neurobiol. 89, 67–84 (2009).14. Sharma, S., Kelly, T.K. & Jones, P.A. Epigenetics in cancer. Carcinogenesis 31,27–36 (2010).15. Widschwendter, M. et al. Epigenetic stem cell signature in cancer. Nat. Genet.39, 157–158 (2007).16. Schlesinger, Y. et al. Polycomb-mediated methylation on Lys27 of histone H3 premarksgenes for de novo methylation in cancer. Nat. Genet. 39, 232–236 (2007).17. Ohm, J.E. et al. A stem cell-like chromatin pattern may predispose tumorsuppressor genes to DNA hypermethylation and heritable silencing. Nat. Genet.39, 237–242 (2007).18. Shen, L. et al. Integrated genetic and epigenetic analysis identifies three differentsubclasses of colon cancer. Proc. Natl. Acad. Sci. USA 104, 18654–18659(2007).19. Seligson, D.B. et al. Global histone modification patterns predict risk of prostatecancer recurrence. Nature 435, 1262–1266 (2005).20. Figueroa, M.E. et al. DNA methylation signatures identify biologically distinctsubtypes in acute myeloid leukemia. Cancer Cell 17, 13–27 (2010).21. Ellinger, J. et al. Global levels of histone modifications predict prostate cancerrecurrence. Prostate 70, 61–69 (2010).22. Bachmann, I.M. et al. EZH2 expression is associated with high proliferation rateand aggressive tumor subgroups in cutaneous melanoma and cancers of theendometrium, prostate, and breast. J. Clin. Oncol. 24, 268–273 (2006).23. Weller, M. et al. MGMT promoter methylation in malignant gliomas: ready forpersonalized medicine? Nat Rev Neurol 6, 39–51 (2010).24. Kanai, Y. Genome-wide DNA methylation profiles in precancerous conditions andcancers. Cancer Sci. 101, 36–45 (2010).25. Kondo, T. et al. Accumulation of aberrant CpG hypermethylation by Helicobacterpylori infection promotes development and progression of gastric MALT lymphoma.Int. J. Oncol. 35, 547–557 (2009).26. Shen, L. et al. Drug sensitivity prediction by CpG island methylation profile inthe NCI-60 cancer cell line panel. Cancer Res. 67, 11335–11343 (2007).27. Ibanez de Caceres, I. et al. IGFBP-3 hypermethylation-derived deficiency mediatescisplatin resistance in non-small-cell lung cancer. Oncogene 29, 1681–1690(2010).28. Voso, M.T. et al. Valproic acid at therapeutic plasma levels may increase5-azacytidine efficacy in higher risk myelodysplastic syndromes. Clin. Cancer Res.15, 5002–5007 (2009).29. Martens, J.W., Margossian, A.L., Schmitt, M., Foekens, J. & Harbeck, N. DNAmethylation as a biomarker in breast cancer. Future Oncol. 5, 1245–1256 (2009).1076 VOLUME 28 NUMBER 10 OCTOBER 2010 nature biotechnology


e v i e w© 2010 Nature America, Inc. All rights reserved.30. Jarmalaite, S. et al. Promoter hypermethylation in tumour suppressor genes andresponse to interleukin-2 treatment in bladder cancer: a pilot study. J. CancerRes. Clin. Oncol. 136, 847–854 (2010).31. Baylin, S.B. & Ohm, J.E. Epigenetic gene silencing in cancer—a mechanism forearly oncogenic pathway addiction? Nat. Rev. Cancer 6, 107–116 (2006).32. Comoglio, P.M., Giordano, S. & Trusolino, L. Drug development of MET inhibitors:targeting oncogene addiction and expedience. Nat. Rev. Drug Discov. 7, 504–516(2008).33. Cheng, J.C. et al. Preferential response of cancer cells to zebularine. Cancer Cell6, 151–158 (2004).34. Irizarry, R.A. et al. The human colon cancer methylome shows similar hypo- andhypermethylation at conserved tissue-specific CpG island shores. Nat. Genet. 41,178–186 (2009).35. Issa, J.P. & Kantarjian, H.M. Targeting DNA methylation. Clin. Cancer Res. 15,3938–3946 (2009).36. Fenaux, P. et al. Efficacy of azacitidine compared with that of conventional careregimens in the treatment of higher-risk myelodysplastic syndromes: a randomised,open-label, phase III study. Lancet Oncol. 10, 223–232 (2009).37. Yoo, C.B., Cheng, J.C. & Jones, P.A. Zebularine: a new drug for epigenetic therapy.Biochem. Soc. Trans. 32, 910–912 (2004).38. Yoo, C.B. et al. Delivery of 5-aza-2′-deoxycytidine to cells using oligodeoxynucleotides.Cancer Res. 67, 6400–6408 (2007).39. Toyota, M., Ohe-Toyota, M., Ahuja, N. & Issa, J.P. Distinct genetic profiles incolorectal tumors with or without the CpG island methylator phenotype. Proc.Natl. Acad. Sci. USA 97, 710–715 (2000).40. Toyota, M. et al. CpG island methylator phenotype in colorectal cancer. Proc.Natl. Acad. Sci. USA 96, 8681–8686 (1999).41. Tanemura, A. et al. CpG island methylator phenotype predicts progression ofmalignant melanoma. Clin. Cancer Res. 15, 1801–1807 (2009).42. Noushmehr, H. et al. Identification of a CpG island methylator phenotype thatdefines a distinct subgroup of glioma. Cancer Cell 17, 510–522 (2010).43. Van Rijnsoever, M., Elsaleh, H., Joseph, D., McCaul, K. & Iacopetta, B. CpG islandmethylator phenotype is an independent predictor of survival benefit from 5-fluorouracilin stage III colorectal cancer. Clin. Cancer Res. 9, 2898–2903 (2003).44. Soengas, M.S. et al. Inactivation of the apoptosis effector Apaf-1 in malignantmelanoma. Nature 409, 207–211 (2001).45. Strathdee, G., MacKean, M.J., Illand, M. & Brown, R. A role for methylation ofthe hMLH1 promoter in loss of hMLH1 expression and drug resistance in ovariancancer. Oncogene 18, 2335–2341 (1999).46. Ma, X., Ezzeldin, H.H. & Diasio, R.B. Histone deacetylase inhibitors: currentstatus and overview of recent clinical trials. Drugs 69, 1911–1934 (2009).47. Fukushima, T., Takeshima, H. & Kataoka, H. Anti-glioma therapy with temozolomideand status of the DNA-repair gene MGMT. Anticancer Res. 29, 4845–4854(2009).48. Wargo, J.A. et al. Recognition of NY-ESO-1+ tumor cells by engineered lymphocytesis enhanced by improved vector design and epigenetic modulation of tumorantigen expression. Cancer Immunol. Immunother. 58, 383–394 (2009).49. Wolff, E.M. et al. Hypomethylation of a LINE-1 promoter activates an alternatetranscript of the MET oncogene in bladders with cancer. PLoS Genet. 6, e1000917(2010).50. Feng, J. & Fan, G. The role of DNA methylation in the central nervous systemand neuropsychiatric disorders. Int. Rev. Neurobiol. 89, 67–84 (2009).51. Shi, Y. Histone lysine demethylases: emerging roles in development, physiologyand disease. Nat. Rev. Genet. 8, 829–833 (2007).52. Yoo, C.B. & Jones, P.A. Epigenetic therapy of cancer: past, present and future.Nat. Rev. Drug Discov. 5, 37–50 (2006).53. Nakagawa, M. et al. Expression profile of class I histone deacetylases in humancancer tissues. Oncol. Rep. 18, 769–774 (2007).54. Lane, A.A. & Chabner, B.A. Histone deacetylase inhibitors in cancer therapy.J. Clin. Oncol. 27, 5459–5468 (2009).55. Choudhary, C. et al. Lysine acetylation targets protein complexes and co-regulatesmajor cellular functions. Science 325, 834–840 (2009).56. Haggarty, S.J., Koeller, K.M., Wong, J.C., Grozinger, C.M. & Schreiber, S.L.Domain-selective small-molecule inhibitor of histone deacetylase 6 (HDAC6)-mediated tubulin deacetylation. Proc. Natl. Acad. Sci. USA 100, 4389–4394(2003).57. Balasub ramanian, S. et al. A novel histone deacetylase 8 (HDAC8)-specificinhibitor PCI-34051 induces apoptosis in T-cell lymphomas. Leukemia 22,1026–1034 (2008).58. Bruserud, Ø., Stapnes, C., Ersvaer, E., Gjertsen, B.T. & Ryningen, A. Histonedeacetylase inhibitors in cancer treatment: a review of the clinical toxicity andthe modulation of gene expression in cancer cell. Curr. Pharm. Biotechnol. 8,388–400 (2007).59. Fischer, A. et al. Recovery of learning and memory is associated with chromatinremodeling. Nature 447, 178–182 (2007).60. Guan, J.S. et al. HDAC2 negatively regulates memory formation and synapticplasticity. Nature 459, 55–60 (2007).61. Ptak, C. & Petronis, A. Epigenetics and complex disease: from etiology to newtherapeutics. Annu. Rev. Pharmacol. Toxicol. 48, 257–276 (2008).62. Siu, L.L. Phase I study of MGCD0103 given as a three-times-per-week oral dosein patients with advanced solid tumors. J. Clin. Oncol. 26, 1940–1947(2008).63. Garcia-Manero, G. et al. Phase 1 study of the oral isotype specific histonedeacetylase inhibitor MGCD0103 in leukemia. Blood 112, 981–989 (2008).64. Lall, S. Primers on chromatin. Nat. Struct. Mol. Biol. 14, 1110–1115 (2007).65. Huang, J. & Berger, S.L. The emerging field of dynamic lysine methylation ofnon-histone proteins. Curr. Opin. Genet. Dev. 18, 152–158 (2008).66. Lee, Y.H. & Stallcup, M.R. Minireview: protein arginine methylation of nonhistoneproteins in transcriptional regulation. Mol. Endocrinol. 23, 425–433 (2009).67. Metzger, E. et al. LSD1 demethylates repressive histone marks to promoteandrogen-receptor-dependent transcription. Nature 437, 436–439 (2005).68. Huang, Y. et al. Novel oligoamine analogues inhibit lysine-specific demethylase 1and induce reexpression of epigenetically silenced genes. Clin. Cancer Res. 15,7217–7228 (2009).69. Schulte, J.H. et al. Lysine-specific demethylase 1 is strongly expressed in poorlydifferentiated neuroblastoma: implications for therapy. Cancer Res. 69, 2065–2071 (2009).70. Wang, J. et al. The lysine demethylase LSD1 (KDM1) is required for maintenanceof global DNA methylation. Nat. Genet. 41, 125–129 (2009).71. Gal-Yam, E.N. et al. Frequent switching of Polycomb repressive marks and DNAhypermethylation in the PC3 prostate cancer cell line. Proc. Natl. Acad. Sci. USA105, 12979–12984 (2008).72. Mohammad, H.P. et al. Polycomb CBX7 promotes initiation of heritable repressionof genes frequently silenced with cancer-specific DNA hypermethylation. CancerRes. 69, 6322–6330 (2009).73. Tan, J. et al. Pharmacologic disruption of Polycomb-repressive complex 2-mediatedgene repression selectively induces apoptosis in cancer cells. Genes Dev. 21,1050–1063 (2007).74. Varambally, S. et al. Genomic loss of microRNA-101 leads to overexpressionof histone methyltransferase EZH2 in cancer. Science 322, 1695–1699(2008).75. Miranda, T.B. et al. DZNep is a global histone methylation inhibitor that reactivatesdevelopmental genes not silenced by DNA methylation. Mol. Cancer Ther. 8,1579–1588 (2009).76. Cha, T.L. et al. Akt-mediated phosphorylation of EZH2 suppresses methylation oflysine 27 in histone H3. Science 310, 306–310 (2005).77. Yu, J. et al. An integrated network of androgen receptor, polycomb, and TMPRSS2-ERGgene fusions in prostate cancer progression. Cancer Cell 17, 443–454 (2010).78. Huang, J. et al. G9a and Glp methylate lysine 373 in the tumor suppressor p53.J. Biol. Chem. 285, 9636–9641 (2010).79. Kondo, Y. et al. Downregulation of histone H3 lysine 9 methyltransferase G9ainduces centrosome disruption and chromosome instability in cancer cells. PLoSONE 3, e2037 (2008).80. Kubicek, S. et al. Reversal of H3K9me2 by a small-molecule inhibitor for theG9a histone methyltransferase. Mol. Cell 25, 473–481 (2007).81. Chang, Y. et al. Structural basis for G9a-like protein lysine methyltransferaseinhibition by BIX-01294. Nat. Struct. Mol. Biol. 16, 312–317 (2009).82. Marango, J. et al. The MMSET protein is a histone methyltransferase withcharacteristics of a transcriptional corepressor. Blood 111, 3145–3154(2008).83. Kim, H. et al. Requirement of histone methyltransferase SMYD3 for estrogenreceptor-mediated transcription. J. Biol. Chem. 284, 19867–19877 (2009).84. Cloos, P.A. et al. The putative oncogene GASC1 demethylates tri- and dimethylatedlysine 9 on histone H3. Nature 442, 307–311 (2006).85. Croce, C.M. Causes and consequences of microRNA dysregulation in cancer. Nat.Rev. Genet. 10, 704–714 (2009).86. Eacker, S.M., Dawson, T.M. & Dawson, V.L. Understanding microRNAs inneurodegeneration. Nat. Rev. Neurosci. 10, 837–841 (2009).87. Friedman, J.M. et al. The putative tumor suppressor microRNA-101 modulatesthe cancer epigenome by repressing the polycomb group protein EZH2. CancerRes. 69, 2623–2629 (2009).88. Ng, E.K. et al. MicroRNA-143 targets DNA methyltransferases 3A in colorectalcancer. Br. J. Cancer 101, 699–706 (2009).89. Fabbri, M. et al. MicroRNA-29 family reverts aberrant methylation in lung cancerby targeting DNA methyltransferases 3A and 3B. Proc. Natl. Acad. Sci. USA 104,15805–15810 (2007).90. Saito, Y. et al. Specific activation of microRNA-127 with downregulation of theproto-oncogene BCL6 by chromatin-modifying drugs in human cancer cells.Cancer Cell 9, 435–443 (2006).91. Lujambio, A. et al. A microRNA DNA methylation signature for human cancermetastasis. Proc. Natl. Acad. Sci. USA 105, 13556–13561 (2008).92. Kota, J. et al. Therapeutic microRNA delivery suppresses tumorigenesis in amurine liver cancer model. Cell 137, 1005–1017 (2009).93. McCaffrey, A.P. et al. The host response to adenovirus, helper-dependent adenovirus,and adeno-associated virus in mouse liver. Mol. Ther. 16, 931–941 (2008).94. Lanford, R.E. et al. Therapeutic silencing of microRNA-122 in primates withchronic hepatitis C virus infection. Science 327, 198–201 (2010).95. Yanaihara, N. et al. Unique microRNA molecular profiles in lung cancer diagnosisand prognosis. Cancer Cell 9, 189–198 (2006).96. Calin, G.A. & Croce, C.M. MicroRNA signatures in human cancers. Nat. Rev.Cancer 6, 857–866 (2006).97. Jahangeer, S., Elliott, R.M. & Henneberry, R.C. beta-Adrenergic receptor inductionin HeLa cells: synergistic effect of 5-azacytidine and butyrate. Biochem. Biophys.Res. Commun. 108, 1434–1440 (1982).nature biotechnology VOLUME 28 NUMBER 10 OCTOBER 2010 1077


e v i e w© 2010 Nature America, Inc. All rights reserved.98. Cameron, E.E., Bachman, K.E., Myohanen, S., Herman, J.G. & Baylin, S.B.Synergy of demethylation and histone deacetylase inhibition in the re-expressionof genes silenced in cancer. Nat. Genet. 21, 103–107 (1999).99. Kuendgen, A. & Lubbert, M. Current status of epigenetic treatment inmyelodysplastic syndromes. Ann. Hematol. 87, 601–611 (2008).100. Chen, J., Odenike, O. & Rowley, J.D. Leukaemogenesis: more than mutant genes.Nat. Rev. Cancer 10, 23–36 (2010).101. Fandy, T.E. et al. Early epigenetic changes and DNA damage do not predictclinical response in an overlapping schedule of 5-azacytidine and entinostat inpatients with myeloid malignancies. Blood 114, 2764–2773 (2009).102. Soriano, A.O. et al. Safety and clinical activity of the combination of 5-azacytidine,valproic acid, and all-trans retinoic acid in acute myeloid leukemia andmyelodysplastic syndrome. Blood 110, 2302–2308 (2007).103. Fiskus, W. et al. Combined epigenetic therapy with the histone methyltransferaseEZH2 inhibitor 3-deazaneplanocin A and the histone deacetylaseinhibitor panobinostat against human AML cells. Blood 114, 2733–2743(2009).104. Crea, F. et al. Epigenetic mechanisms of irinotecan sensitivity in colorectal cancercell lines. Mol. Cancer Ther. 8, 1964–1973 (2009).105. Ramalingam, S.S. et al. Phase I and pharmacokinetic study of vorinostat,a histone deacetylase inhibitor, in combination with carboplatin andpaclitaxel for advanced solid malignancies. Clin. Cancer Res. 13, 3605–3610(2007).106. Ramalingam, S.S. et al. Carboplatin and Paclitaxel in combination with eithervorinostat or placebo for first-line therapy of advanced non-small-cell lung cancer.J. Clin. Oncol. 28, 56–62 (2010).107. Satterlee, J., Schübeler, D. & Ng, H. Tackling the epigenome: challenges andopportunities for collaborative efforts. Nat. Biotechnol. 28, 1039–1044(2010).108. Bernstein, B.E. et al. The NIH Roadmap epigenomics mapping consortium.Nat. Biotechnol. 28, 1045–1048 (2010).109. Rasheed, W., Bishton, M., Johnstone, R.W. & Prince, H.M. Histone deacetylaseinhibitors in lymphoma and solid malignancies. Expert Rev. Anticancer Ther. 8,413–432 (2008).110. Braiteh, F. et al. Phase I study of epigenetic modulation with 5-azacytidine and valproicacid in patients with advanced cancers. Clin. Cancer Res. 14, 6296–6301 (2008).111. Balch, C., Fang, F., Matei, D.E., Huang, T.H. & Nephew, K.P. Minireview:epigenetic changes in ovarian cancer. Endocrinology 150, 4003–4011 (2009).112. Lin, J. et al. A phase I dose-finding study of 5-azacytidine in combination withsodium phenylbutyrate in patients with refractory solid tumors. Clin. Cancer Res.15, 6241–6249 (2009).113. Berdasco, M. et al. Epigenetic inactivation of the Sotos overgrowth syndrome genehistone methyltransferase NSD1 in human neuroblastoma and glioma. Proc. Natl.Acad. Sci. USA 106, 21830–21835 (2009).114. Silverman, L.R. et al. Further analysis of trials with azacitidine in patients withmyelodysplastic syndrome: studies 8421, 8921, and 9221 by the Cancer andLeukemia Group B. J. Clin. Oncol. 24, 3895–3903 (2006).115. Kantarjian, H.M. et al. Update of the decitabine experience in higher riskmyelodysplastic syndrome and analysis of prognostic factors associated withoutcome. Cancer 109, 265–273 (2007).116. Gore, S.D. et al. Impact of the putative differentiating agent sodium phenylbutyrateon myelodysplastic syndromes and acute myeloid leukemia. Clin. Cancer Res. 7,2330–2339 (2001).117. Garcia-Manero, G. et al. Phase 1 study of the histone deacetylase inhibitorvorinostat (suberoylanilide hydroxamic acid [SAHA]) in patients with advancedleukemias and myelodysplastic syndromes. Blood 111, 1060–1066 (2008).118. Kelly, W.K. et al. Phase I study of an oral histone deacetylase inhibitor,suberoylanilide hydroxamic acid, in patients with advanced cancer. J. Clin. Oncol.23, 3923–3931 (2005).119. Munster, P.N. et al. Phase I trial of vorinostat and doxorubicin in solid tumours:histone deacetylase 2 expression as a predictive marker. Br. J. Cancer 101,1044–1050 (2009).1078 VOLUME 28 NUMBER 10 OCTOBER 2010 nature biotechnology


eviewEpigenetic modifications in pluripotentand differentiated cellsAlexander Meissner 1–3© 2010 Nature America, Inc. All rights reserved.Epigenetic modifications constitute a complex regulatory layer on top of the genome sequence. Pluripotent anddifferentiated cells provide a powerful system for investigating how the epigenetic code influences cellular fate.High-throughput sequencing of these cell types has yielded DNA methylation maps at single-nucleotide resolutionand many genome-wide chromatin maps. In parallel to epigenome mapping efforts, remarkable progress has beenmade in our ability to manipulate cell states; ectopic expression of transcription factors has been shown to overridedevelopmentally established epigenetic marks and to enable routine generation of induced pluripotent stem (iPS)cells. Despite these advances, many fundamental questions remain. The roles of epigenetic marks and, in particular,of epigenetic modifiers in development and in disease states are not well understood. Although iPS cells appearmolecularly and functionally similar to embryonic stem cells, more genome-wide studies are needed to define theextent and functions of epigenetic remodeling during reprogramming.Epigenetics in its classic definition describes mitotically heritablemodifications of DNA or chromatin that do not alter the primarynucleotide sequence 1,2 . A wider definition that is still consistent withthe literal meaning (‘epi’; Greek for ‘on top of ’ or ‘in addition to’)would include stable yet reversible molecular mechanisms that leadto a given phenotype without a change in genotype. Epigenetic statescan be mitotically inherited and thereby provide a mechanism forthe long-term maintenance of cellular identity. Despite their stability,however, epigenetic marks can be readily reprogrammed experimentallyusing various strategies, including nuclear transfer, cell fusionand ectopic expression of transcription factors. In recent years, pluripotentstem cells—both embryonic stem (ES) cells and iPS cells—havebecome a well-studied tool for epigenetics research and a principalcell type in major projects, including the ENCODE Project and the USNational Institutes of Health (NIH) Roadmap Epigenomics Program.Some of the interest can be explained by the opportunity to generatelarge numbers of customized iPS cells for regenerative medicine,disease modeling and other applications 3 . From a basic research perspective,pluripotent stem cells provide a powerful model to study theinterplay of epigenetic modifications and dynamics during cellulardifferentiation.Pluripotent cells have a unique and characteristic epigenetic signaturethat reflects their broad developmental potential. More generally, theepigenetic landscape of any cell is likely to be a sensitive indicator of itspast and current developmental state and may predict its future potential.This review focuses on DNA methylation and histone modifications,1 Department of Stem Cell and Regenerative Biology, Harvard University,Cambridge, Massachusetts, USA. 2 Harvard Stem Cell Institute, Cambridge,Massachusetts, USA. 3 Broad Institute, Cambridge, Massachusetts, USA.Correspondence should be addressed to A.M. (alexander_meissner@harvard.edu).Published online 13 October 2010; doi:10.1038/nbt1684which have been studied more extensively than other mechanisms ofepigenetic regulation, such as histone variants and non-coding RNAs.I begin by reviewing the role of these marks and modifiers in normaldevelopment. The focus will be on functional and genomic studies andthe impact of this work on our understanding of epigenetic marks inpluripotent and differentiated cells. I then summarize recent genomescalestudies and discuss what has been learned about the epigeneticreprogramming involved in induced pluripotency.Epigenetic modifications in developmentMutations in chromatin-modifying enzymes have long been knownto cause abnormal developmental phenotypes in model organisms.Several inherited human diseases, including Rett syndrome andimmunodeficiency-centromeric instability-facial anomalies (ICF)syndrome, are associated with components of the epigenetic machinery,and over the past two decades aberrant DNA methylation hasbeen clearly linked with cancer. Although we know that epigeneticmarks are essential for development (Fig. 1), many important questionsremain regarding the exact roles of individual modificationsacross various developmental stages and cell types. Recent work usingconditional deletions in specific tissues has renewed interest in theregulatory role of DNA methylation.At the earliest stages of development, the fertilized oocyte undergoessubstantial epigenetic remodeling 4,5 . In particular, the paternalpronucleus undergoes rapid changes. During spermatogenesis, smallbasic proteins called protamines replace the majority of nucleosomes,which are reintroduced rapidly after fertilization and before pronuclearfusion. Recent work on human sperm has shown that somenucleosomes are retained and that these show characteristic posttranslationalhistone modifications 6 . The few percent of residual,modified histones are enriched near developmental genes, suggestingthat these histones support early development 6 . Upon fertilization thenature biotechnology volume 28 number 10 OCTOBER 2010 1079


eview(d.p.c)E 0.5E 3.5E 5.5E 10.5E 15.5SpermTEPNZygoteOocyteICMPolarbodyEmbryo(EN, ME and EC)In vivoKnockoutphenotypeDnmt1 # Ring1B #Dnmt3a/b # Ezh2 #G9A # Suz12 #Eed #Dnmt3b #Eset (SetDB1) #EpigeneticmarksS OP MS OP MExplant ICMES cells*****EN ME EC TEIn vitroKnockoutphenotypeDnmt1 ## Eed##G9A ##Ezh2 ##EpigeneticmarksDNA methylationH3K27me3© 2010 Nature America, Inc. All rights reserved.E 19.5(birth)AdultGermcellsDnmt3a # *Dnmt2**, Bmi1***Dnmt3L****DNA methylationH3K27me3Figure 1 Epigenetic dynamics during in vitro and in vivo differentiation. Left (in vivo): Sperm and oocyte, come together at fertilization to form thetotipotent zygote. After extrusion of the second polar body the maternal and paternal pronuclei (PN) migrate and fuse after several hours. Both genomes,paternal and maternal, subsequently undergo substantial epigenetic changes although at different rates. These changes are indicated for two epigeneticmarks as examples to the right. Many of the central enzyme genes have been knocked out and result in a lethal phenotype. The respective phenotypesand approximate time observed are shown in the middle. Far right (in vitro): ES cells are derived from the hypomethylated ICM and regain genome-wideDNA methylation and other epigenetic marks by the time ES cell lines are established. For most of the investigated cell types these marks appear not tochange globally although locus specific changes are observed upon differentiation. As indicated by the simplified schematic of two epigenetic marks (DNAmethylation and H3K27me3), many details about their presence during normal development are still lacking. The drawings are simplified and indicate globallevels that remain stable. Both marks will differ between cell types in their distribution. #, lethal. ##, maintenance fine, but has differentiation defects.*Dnmt3a knockout mice die around 3 weeks postnatally and are smaller/runted. **No observed phenotype, no observed effect on DNA methylation, effecton RNA methylation not well studied but possible. ***Mice are viable, but have hematopoietic and neural abnormalities. ****Homozygous male mice aresterile, offspring of homozygous female mice and heterozygous crosses show imprinting defects and die. *****Wild-type ES cells cannot differentiate intotrophectodermal cells. Loss of Dnmt1 and global loss of DNA methylation restores this developmental potential. d.p.c., days post coitum; E, embryonic;P, paternal; M, maternal; S, sperm; O, oocyte; PN, pronuclei; EN, endoderm; ME, mesoderm; EC, ectoderm; TE, trophectoderm.paternal genome undergoes rapid protamine-to-histone exchange,which is followed by further epigenetic changes. Antibodies against5-methyl cytosine have been used to show that the paternal genomeis actively demethylated (between pronuclear stage PN0 and PN5)before pronuclear fusion 4,7,8 . Additional evidence for this observationcomes from several locus-specific DNA methylation assays 7,9,10 .Subsequently, the maternal genome is demethylated, presumablythrough a passive mechanism during early cleavage divisions 8 . Severalexceptions to this simplified model exist, including most imprintedgenes and some repetitive elements such as intracisternal A-particle(IAP) elements 4,11 . It is currently unclear how and why these genomicelements evade epigenetic reprogramming in the early embryo.The zygote and early blastomeres are totipotent, which means thatthey can differentiate into all embryonic and extraembryonic cell types 12 .The first specification into the pluripotent inner cell mass (ICM) andmore lineage-restricted trophectoderm involves several well-studiedtranscription factors 12,13 . The trophectoderm will form only extraembryonicparts and can be used to derive multipotent trophoblast stemcell lines 12,14 . The latter can differentiate in vitro into trophoblast subtypes,and, after transfer to blastocysts, can contribute to the trophoblastlineage, including extraembryonic ectoderm, ectoplacental cone andgiant cells, but not the epiblast or ICM-derived extraembryonic tissues 14 .By contrast, the pluripotent ICM will form the epiblast and generate thethree embryonic germ layers (endoderm, ectoderm and mesoderm).When explanted under appropriate culture conditions, ICM cells giverise to pluripotent ES cells. Mouse blastocysts are typically explantedaround 3.5 days after fertilization 15,16 , whereas 5–9-day-old embryoshave been used to derive human ES cells 17–19 . ES cells are stably maintainedin a self-renewing, pluripotent state by an autoregulatory networkof transcription factors 20,21 . Although the pluripotent state can be maintainedfor extended periods in culture, it exists only transiently in vivoand is lost upon implantation and specification of the epiblast 20 .1080 volume 28 number 10 OCTOBER 2010 nature biotechnology


eview© 2010 Nature America, Inc. All rights reserved.DNA methylation. DNA methylation is essential for mammalian developmentand is required in most somatic cells 2,22,23 . It is established andmaintained by three catalytically active enzymes: DNA methyltransferase(Dnmt)1, Dnmt3a and Dnmt3b 1,24 . Two additional, homologousenzymes, Dnmt2 and Dnmt3l, are expressed in several cell types,including ES cells 24 . Deletion of Dnmt2 has no apparent phenotype invitro or in vivo 24,25 . The ability of Dnmt2 to methylate a specific cytosinein the anticodon loop structure of tRNAs suggests that it might notfunction as a DNA methyltransferase 26 .Loss of Dnmt1 results in embryonic lethality around embryonicday (E)8.5–9, and Dnmt1 mutant embryos retain only one-third ofthe normal amount of DNA methylation 27 . Dnmt1-deficient embryosshow rudiments of the major organs, but they are smaller than normaland appear to be developmentally delayed 27 . Dnmt3b mutant embryosappear to develop normally before E9.5 but show multiple developmentaldefects later and do not develop to term 28 . Conditional deletion ofDnmt3b in mouse embryonic fibroblasts (MEFs) results in partial lossof methylation, indicating the importance of this enzyme, together withDnmt1, for maintaining epigenomic patterns in proliferating cells 29 .Unlike mice lacking Dnmt1 or Dnmt3b, homozygous Dnmt3a knockoutmice can develop to term but become runted and die ~1 month afterbirth 28 . Conditional deletion of Dnmt3a results in imprinting defectsin the germline 30 . Homozygous Dnmt3l mice are viable, but male miceare sterile and heterozygous offspring of homozygous females die owingto imprinting defects 31,32 . This phenotype is similar to that of Dnmt3adeficientmice and suggests that both enzymes might be involved inestablishing correct imprinting patterns. Dnmt3l is a close homologof Dnmt3a and Dnmt3b that lacks the catalytic domain but is highlyexpressed in the early embryo, ES cells and germ cells 31 . It has been suggestedto function as a co-regulator of both Dnmt3a and Dnmt3b andhas recently been shown to interact with the N-terminal tail of histoneH3 when it lacks methylation at lysine 4 (refs. 33,34).Importantly, the genome is transiently hypomethylated during twophases of normal development without adverse effects 4 . As describedabove, the first phase is preimplantation development. The totipotentzygote and blastomeres, the pluripotent blastomeres, the pluripotentICM cells and trophectoderm cells do not require substantial DNAmethylation. A second wave of demethylation commences after thespecification of primordial germ cells (PGCs) around day E7.25 (refs.4,35). Genome-wide bisulfite sequencing was used to show that E13.5PGCs have only 5–20% of genomic DNA methylation left 36 , confirmingthat PGCs show transient reduction of DNA methylation withoutadverse effects on viability.Histone modifications. Histone modifications provide an additional andcomplex layer of the epigenetic code 37 . Many of the enzymes that regulatethese modifications have been studied extensively, including histoneacetyltransferases, deacetylases, methyltransferases and histone demethylases38 . Among the best-characterized mediators are protein complexesof the polycomb (PcG) and trithorax (trxG) groups 39–41 . PcG proteinscatalyze two distinct histone modifications: tri-methylation of lysine27 of histone 3 (H3K27me3) by polycomb repressive complex (PRC) 2(ref. 42) and mono-ubiquitination of lysine 119 H2A (H2AK119ub1) byPRC1 (ref. 40). H3K27 is tri-methylated by the enhancer of zeste (Ezh2or KMT6), which contains a SET (su[var]3–9, enhancer of zeste, Trx)domain 40–43 and, with Eed (embryonic ectoderm development) andSuz12 (suppressor of zeste 12), are components of PRC2 (ref. 42).Loss of any one of the PRC2 subunits results in severe gastrulationdefects, highlighting its essential role in normal development 44–46 .Ezh2 knockout embryos are underdeveloped and die around E8.5 (refs.45,47). Ezh2 is upregulated upon fertilization and its expression remainshigh during pre-implantation development 45 . Its close homolog, Ezh1,is expressed in the fertilized oocyte but is barely detectable at the blastocyststage 45 . However, it is expressed in ES cells and found later inthe adult 47,48 . Eed does not appear until day E5.5, suggesting that Ezh2and maybe Ezh1 also have roles in preimplantation development thatare independent of PRC2 and Eed 49 . Eed-deficient embryos show gastrulationdefects and do not maintain X inactivation in extraembryoniccells 49 . Like mice deficient in the other PRC2 components, Suz12homozygous mice die during the early postimplantation stages (beforeday E10.5) 46 . Similar to loss of PRC2, loss of PRC1 components, such asRing1B, results in an early embryonic lethal phenotype 50 . Bmi-1–nullmice show several hematopoietic and neurological abnormalities 51 , andloss of the H3K9 methyltransferases Eset (SetDB1) or G9a causes periandpostimplantation lethality 52,53 . Finally, although not discussed here,mutants for most of the chromatin remodeling and histone chaperonesalso show early embryonic lethality (for a summary, see refs. 54,55).Together, the knockout studies have clearly established that DNAmethylation and histone modification are essential for normal development.But many questions remain regarding the specific contributionsof these epigenetic marks to the regulation of gene expression throughoutdevelopment. The genomic distribution and global patterns of thesemarks have not been studied in detail. Mice with mutations in most ofthese genes die early, probably owing to failure to establish early epigeneticpatterns that are presumed to dictate later developmental decisions.It is less clear what the effect of such mutations would be after initialspecification has taken place. Strategies for exploring these questions infuture research include conditional deletions in mouse somatic lineagesand cell types and genome-wide mapping of epigenetic modifications inearly development.Epigenetic modifications in ES cellsBoth undifferentiated and differentiated ES cells are widely used tostudy epigenetic mechanisms—the former because they express manyepigenetic modifiers and the latter because they serve as a model ofdynamic chromatin remodeling. Functional studies have shown thatmost epigenetic marks, including DNA methylation, are not required forthe survival of pluripotency marker–positive cells in culture. Althoughif ES cells lacking epigenetic marks can be maintained, they cannotproperly execute their developmental potential. A central question inthe field is the extent to which epigenetic marks regulate, rather thansimply reflect, the pluripotent state in vitro. In this section I describeloss-of-function and genome-scale studies of the epigenetic landscape.These studies have begun to shed light on this question and have providednew insights into the distinct and overlapping functions of DNAmethylation and histone modifications in ES cells.Loss-of-function studies in ES cells. The pluripotent state has thepotential to generate every cell type, and this potential is reflected inits unique and characteristic epigenetic signature 41,56–64 . All five DNAmethyltransferases—Dnmt1, 2, 3a, 3b and 3l—and the core PRC1 and2 subunits are highly expressed in undifferentiated mouse ES cells. Likesomatic cells, ES cells show high global levels of DNA methylation,with ~60–80% of all CpG dinucleotides being methylated 65 . Althoughthe global mCpG content is similar, the distribution of the mark isunlike that of any other somatic cell type 63 and also very distinct fromthe hypomethylated ICM. Similarly, the distribution and enrichmentof various histone modifications constitutes a unique signature of EScells, as discussed below.ES cells, like pluripotent cells in vivo, can tolerate global loss of epigeneticmarks, including DNA methylation and H3K27 methylation, asshown using knockouts for the Dnmts or PRC2 components 27,28,65–68 .nature biotechnology volume 28 number 10 OCTOBER 2010 1081


eview© 2010 Nature America, Inc. All rights reserved.Although methylation-deficient ES cells cannot stably differentiate 22,66 ,they maintain the potential to regain pluripotency and to contributeto germline-competent chimeras upon restoration of Dnmts 66,69 .According to genomic studies, most developmental transcription factorsseem not to be regulated by DNA methylation, and pluripotencygenes that are, such as Oct4 and Nanog, are unmethylated in the pluripotentstate 59,63 . However, DNA methylation does regulate some genesin pluripotent cells 70,71 . Using mouse ES cells deficient in Dnmt1,3a and3b, one study identified a group of genes that is regulated specificallyby DNA methylation and is distinct from the genes regulated by PRC1and 2 and the core pluripotency factors Oct4, Sox2 and Nanog 70 . Forinstance, the transcription factor gene Elf5, which is important in theregulation of trophectoderm development, is highly methylated andrepressed in undifferentiated ES cells 70,71 . Dnmt1-deficient ES cellsshow hypomethylation of Elf5 promoter and gain the ability to differentiateinto trophectoderm 71 .Loss of DNA methylation prevents ES cells from differentiatingand often creates sharp colonies with less background differentiationthan is normally observed. In contrast, cells deficient in PRC2are more prone to differentiation 44–46,68,72 . This can be explained inpart by the distinct targets of DNA methylation and H3K27 methylationin ES cells. As discussed above, PcGs silence developmentaltranscription factors that are not regulated by DNA methylation 59,63 .Loss of polycomb marks does not lead to full-blown expression ofthese transcription factors, but it increases the transcription of manyPRC2 target genes above a basal level 72 . This seems to increase thesusceptibility of ES cells to differentiation under suboptimal conditions72 and has been used as an argument in favor of the idea thatthese marks serve to buffer commitment towards any given lineage 73 .Loss of any of the subunits (Eed, Suz12 or Ezh2) results in globalreduction or loss of H3K27me3 (refs. 46,47,68,72).Although PRC2 suppresses differentiation, it is not required to maintainpluripotency 68 . Low- and high-passage Eed −/− ES cells generatedearly embryonic chimeras and contributed to all germ layers. But noEed −/− MEFs could be derived, and Eed −/− ES cells contributed onlyrarely in late-gestation embryos (beyond E12.5). As Eed is required forproper PRC2 assembly, this result suggests that PRC2 is essential fordifferentiation but not for molecular pluripotency. In another study 45 ,loss of Ezh2 led to impaired outgrowth when blastocysts were explantedand failure to derive ES cells. However, more recently it was shown thatblastocysts from heterozygous mice can give rise to Ezh2-null ES cellsat the expected frequency 47 , which suggests that Ezh2 is in fact notrequired for the establishment of ES cells. The discrepancy might beexplained by differences in the gene deletions or in the ES cell derivationprotocols 47 . Interestingly, H3K27me3 was still found at previouslyidentified PRC2 target genes in the null ES cells 47,72,74 , but was lost inthe Eed and Suz12 knockout ES cells 46,68 . Eed-deficient ES cells showedloss of H3K27me2 and H3K27me3 and significant loss of H3K27me1,whereas Ezh2 deletion affected H3K27me3 significantly, H3K27me2slightly less and H3K27me1 apparently not at all 47 . ES cells lackingSuz12 show normal H3K27me1, slightly reduced H3K27me2 and lossof H3K27me3 (ref. 46). In addition to the PcG complexes, other histonemethyltransferases have also been studied in mouse ES cells. Depletionof the H3K9 methyltransferase Suv39h (KMT1A/B) in mouse ES cellsled to notable enrichment of transcripts that corresponded to all classesof repeats 75 . A short hairpin RNA (shRNA) screen for chromatin regulatorsinvolved in pluripotency identified SetDB1 (KMT1E), anotherH3K9 methyltransferase, as a crucial component for stem cell maintenance76 . Notably, SetDB1-occupied genes were found to be a subsetof the bivalent genes discussed above. One of the functions of SetDB1in maintaining ES cells seems to be the repression of trophectodermdifferentiation. Conditional depletion of SetDB1 results in decreasedH3K9 methylation and upregulation of Cdx2 (ref. 77).In summary, DNA methylation and histone modifications seem tohave distinct targets and roles in undifferentiated ES cells. Because of theirlargely non-overlapping functions, it may be possible to delete any onemodification without completely disrupting the undifferentiated state.It will be important to use double and higher-order knockouts to dissectsuch compensatory effects. More dynamic and genome-scale dataon epigenetic changes during differentiation will certainly advance ourunderstanding of the respective roles of these epigenetic modifications.The epigenetic landscape in ES cells. Whereas genetic knockoutand biochemical studies have shed light on the functions of particularDNA and histone modifications, genome-scale studies haveprovided a broader picture of the functional relevance and roles ofepigenetic marks.Recent technological advances have led to comprehensive mapsof DNA methylation in mouse and human pluripotent cells 61–63 .Genome-scale studies in the mouse confirmed the results of previousstudies 78,79 , which showed that the methylation levels of CpGs in bothwild-type ES cells and somatic cells have a bimodal distribution, withmost genomic regions being either ‘largely unmethylated’ or ‘largelymethylated’ 63 . The methylation status of CpGs is highly correlated withthe local CpG density. Nearly all high-CpG promoters in ES cells areenriched for H3K4me3 and are devoid of DNA methylation 59,63 . Thisanti-correlation is observed for H3K4me1, H3K4me2 and H3K4me3and may, at least in the germline, be linked to the ability of Dnmt3l tobind only unmodified H3K4 (see above) 34 . In ES cells, CpGs in low-CpG promoters, which are generally associated with tissue-specificgenes 79 , are mostly methylated, with the exception of a small subset(


eview© 2010 Nature America, Inc. All rights reserved.CHH methylation) at first appear much higher than those in mouse EScells (~3% combined), they are in fact very similar, and the discrepancycan be explained by a difference in the calculation rather than a differencein biology.Early studies in the mouse referred to non-CG methylation in generalwithout distinguishing between CNG and CHH methylation 65,83,85 .More recent work on human cells, in analogy to the plant literature, distinguishesbetween CNG and CHH methylation 61,62 . In plants, CNGmethylation depends on the plant-specific methyltransferase CMT3 (refs.86,87). No homolog of CMT3 has been described in mammals, this isin contrast to the conserved de novo methyltransferases (Dnmt3a and bin mouse and human and DRM2 in plants) and maintenance methyltransferases(Dnmt1 in mouse and human and MET1 in plants) 24,87 . Inboth mice and humans, non-CpG methylation is low in differentiatedcells 62,83 and, as expected, reappears in iPS cells 62 . In plants, the role ofnon-CpG methylation has been extensively studied 87 . However, only onestudy has described the presence and possible functional role of non-CpGmethylation in somatic cells of higher organisms 88 . This study indicatedthat CpA methylation is involved in the regulation of enhancers that arerequired for olfactory receptor choice in the mouse brain 88 . As all bisulfitesequencing–based approaches 61–63 readily detect non-CpG methylation,in the future it will be possible to determine the extent and functions, ifany, of these modifications.In addition to 5-methyl cytosine, other covalent modificationsto DNA have been found in some mammalian cell types. The TET(ten-eleven translocation) family members catalyze the conversion of5-methyl cytosine to 5-hydroxymethylcytosine (5hmC) 89 . 5hmC isdetectable in undifferentiated mouse ES cells as well as in Purkinjeneurons and granule cells (0.6% and 0.2% of total nucleotides, respectively)90 . During ES cell differentiation, the amount of TET1 and 5hmCdecreases 89 , and TET1 knockdown impairs the self-renewal and maintenanceof ES cells 91 . Some targets, including the Nanog promoter, havebeen suggested in connection with the observed phenotype 91 , but morework is needed to fully understand the role of this modification in EScell biology. Most current technologies for genome-scale DNA methylationprofiling, including bisulfite sequencing, cannot discriminatebetween 5mC and 5hmC, but specific antibodies 91 have been developedand will probably be used, in a similar way to 5mC antibodies, formethylated DNA immunoprecipitation and high-throughput sequencing(meDIP-Seq) in the near future 92 . A more detailed discussion ofTET proteins and 5hmC can be found in a recent review 93 . Finally, itshould be noted that other modifications, such as N6-methyladenine,that are frequently found in bacteria 94 have not been studied in muchdetail in mammalian genomes.Similar to the extensive catalog of DNA methylation maps, dozens ofchromatin state maps from mouse and human pluripotent cells have beenpublished 44,57–63 . Globally, ES cells show an open chromatin structure,and active chromatin domains are widespread 56–59,95,96 . Nearly 75% ofpromoters (active and inactive) are enriched for H3K4 methylation inhuman ES cells 95 . Although most of these promoters experience transcriptionalinitiation, only a subset is enriched for the elongating formof Pol II and H3K36 methylation 95 . H3K27 methylation is a key factor inbalancing the self-renewal and differentiation of ES cells 56,72 . In additionto genome-wide chromatin maps, the localization of many of the coresubunits of PRC1 and PRC2 have been reported 56,72,74 . Ezh1 was shownto directly interact with the other PRC2 components, and co-localizationof Ezh1 and K27me3 in both wild-type and Ezh2-null cells suggests thatEzh1 has a direct role in the establishment of this mark 47 .The combinatorial pattern of histone marks is complex, and manynew marks and states (combination of marks) are likely to be discoveredin the coming years. A particularly well-studied combinationis the co-occurrence of the repressive H3K27me3 with the activeH3K4me3, termed a ‘bivalent domain’ 59,60 . Interestingly, about halfof the identified bivalent domains in ES cells have binding sites forat least one of the three pluripotency-associated transcription factors(Oct4, Nanog and Sox2) 72,97 . Better and more binding data forthe different transcription factors will probably enhance our abilityto investigate the nuances of this co-occupation further. Bivalentdomains generally show PcG occupancy, but can be subdivided intotwo groups on the basis of co-occupancy of both PRC1 and PRC2 oroccupancy by PRC2 alone 56 . Promoters that are ‘co-occupied’ by bothcomplexes can retain PcG-mediated chromatin structure more efficientlyupon differentiation 56 . Finally, H3K4me3 and H3K56ac havebeen shown to share occupancy in human ES cells 98 . Co-localizationof NANOG, SOX2 and OCT4 is more often associated with H3K56Acthan H3K4me3, providing an additional link with the core pluripotencynetwork 98 . Genomic regions that are associated with genesilencing—including transposon and repetitive elements—frequentlypossess the well-known heterochromatin marks H3K9me3 andH4K20me3 (refs. 40,59,75), but additional marks have been connectedwith these regions, including the globular H3K64me3 that isenriched at pericentric heterochromatin. H3K64me3 shows enrichmentin mouse ES cells compared with differentiated cells 99 , consistentwith the observation that epigenetic patterns at repeats changesubstantially during differentiation 75,99 .The open ES cell chromatin structure, which is enriched in noncompacteuchromatin, allows easy access for transcription factors andthe transcriptional machinery and may explain observed global ‘hypertranscription’.By contrast, lineage commitment is accompanied by theaccumulation of regions of highly condensed, transcriptionally inactiveheterochromatin 100 .Overall, the genome-scale studies have provided detailed informationon the distribution of various epigenetic marks. This has turnedout to be a powerful source for understanding the role and relationshipof individual marks and enabled more precise annotation of genomicfeatures such as enhancers. Finding answers to many of the remainingquestions regarding their regulatory roles will be aided by additionalmaps in gain- and loss-of-function studies as well as by studying thedynamic changes during cellular differentiation. In the following sectionI will summarize some of the existing results, but clearly moredata are required.Dynamic epigenetic changes during differentiation. Despite significantadvances in mapping technologies, it is still difficult to investigatelineage specification and the associated global epigenetic remodelingfor many cell types in vivo. But the number of cells that is requiredfor epigenetic analysis continues to decrease, suggesting that theseexciting studies will become possible in the near future 101,102 . In themeantime, ES cells provide a powerful in vitro system to study the roleand extent of epigenetic modification during lineage commitment. Forhumans, ES cells are the only available model in which to study manyquestions of lineage commitment and cell fate decisions. Most of thepublished genome-scale DNA methylation and histone-modificationstudies have compared pluripotent cells with in vitro differentiated ordonor-derived somatic cells.DNA methylation patterns during the differentiation of pluripotentcells are quite dynamic. Using high-throughput bisulfite sequencing,we compared the DNA methylation patterns of one million CpGs inundifferentiated and differentiated mouse ES cells 63 . Notably, most ofthe changes in DNA methylation that were associated with differentiationoccurred at distal putative regulatory regions between 1 and100 kb from known promoters 63 . Many of these regions might containnature biotechnology volume 28 number 10 OCTOBER 2010 1083


eview© 2010 Nature America, Inc. All rights reserved.Figure 2 Epigenetic reprogramming during iPScell derivation. Shown are selected genes withtheir chromatin and expression state acrossdistinct cell types color-coded as shown onthe bottom (data taken from ref. 81). Data arefor uninduced mouse embryonic fibroblasts(MEFs), a hypothetical primary iPS cell colony(nascently reprogrammed) as well as anestablished iPS (MCV8.1) and ES (V6.5) cellline. Upon induction of Oct4, Sox2, Klf4 andc-Myc (OSKM), the MEF epigenome beginsremodeling. The initial events and requiredfactors have not been described yet. After ~10–14 days, iPS cell colonies appear that expressmarkers such as Oct4-GFP. The expression ofhousekeeping genes is not affected, and theyremain active throughout the reprogrammingprocess (Gapdh and Dnmt1 are representativeexamples). With the exception of a few markergenes, the global extent of remodeling isunknown for this stage. Primary coloniesare then picked and expanded as clonal iPScell lines. Usually several more passages areneeded before extensive marker stains can beperformed. Typically, at least 5 -8 passagesare required to obtain sufficient material forgenome-wide studies. The chromatin andexpression states for the selected markergenes are identical in the iPS cell line MCV8.1and in a wild-type ES cell line (V6.5) used toconstruct this schematic 81 . Ink4a (Cdkn2a)GapdhDnmt1Ink4aSnai1MyoDLin28Fgf4Oct4NanogMEFsK4me3K27me3DNAmeExprEnriched for K4me3Enriched for K27me3remains bivalent and sensitive to rapid induction in normal (not transformed or immortalized) cells. Overexpression of OSKM or extended cell culturecan induce expression of Ink4a, but ES and iPS cells show bivalent marks and lack of DNA methylation. Snai1 is an expressed somatic gene thatbecomes repressed and regains bivalency upon reprogramming. MyoD is silent in both MEF and iPS cells, but switches from H3K27 only to a bivalentstate upon reprogramming. Lin28 and Fgf4 are repressed by H3K27 methylation, whereas Oct4 and Nanog are repressed by DNA methylation, and allbecome transcriptionally reactivated only upon reprogramming.functional enhancers. A recent study described an important connectionbetween the epigenetic marking in ES cells and the transcriptionalcompetence of tissue-specific enhancers 103 . Controlled through thetarget-specific action of the transcription factor FoxD3, individualCpG sites remain unmethylated in ES cells and, upon differentiation,allow the recruitment of other transcription factors when directedtowards endoderm (e.g., the Alb1 enhancer) or become methylatedin mesoderm and ectoderm differentiation 103 . Genome-wide maps atnucleotide resolution across many or most cell types will help to definethe relevance of individual CpG sites.In one of the recent human methylome reports, a pairwise comparisonof undifferentiated human ES cells (line H1) and IMR90 fetal lungfibroblasts showed that the latter had lower levels of CpG methylation ina large proportion of the genome 62 . Large partially methylated regions(


eview© 2010 Nature America, Inc. All rights reserved.Bivalent domains maintain genes in a state that is repressed but poisedfor activation 60 . Such domains initially seemed to be almost exclusiveto pluripotent cells 60 , but genome-wide studies in additional cell typesidentified the domains also in cells with more restricted developmentalpotential 59 . It is likely that developmental loci stay bivalent (poised) aslong as their expression may be needed and switch off completely oncecells have reached a terminally differentiated state. Another group ofbivalent loci (unchanged during differentiation) includes genes suchas Ink4a (Cdkn2a), which is directly regulated by PcG proteins 113 .Through its function in controlling the cell cycle, this locus fits wellwith the idea of a gene that is repressed but poised for rapid activation.Importantly, bivalency rather than stable DNA methylation providesthe required flexibility for rapid induction.A key question in chromatin research is how epigenetic modifiers,such as PcGs, are regulated and recruited to target loci. In one study,chromatin data in undifferentiated and differentiated human ES cellsrevealed a region between HOXD11 and HOXD12 that seems to functionas a mammalian polycomb response element 114 . However, additionalanalysis will be required as this region also harbors a CG island 114 ,and these have been previously implicated in the recruitment of epigeneticmodifiers 56 . Many other proteins, including Jarid2 (refs. 115–118),and non-coding RNAs 119,120 have been implicated in the recruitmentand regulation of PcG activity. Recruitment factors, including YY1 (themammalian homolog of Pho), and their roles in mammalian cells arestill under active investigation. The regulation of transcriptional repressionhas recently been discussed in more detail 121 .Epigenetic reprogramming and iPS cellsIt is well established that single or multiple transcription factors canconvert one cell type into another. The mechanism by which ectopictranscription factors override the existing epigenetic state and change itinto a specific alternative state without passing through normal developmentor complete resetting of all marks is still largely unknown. Thekey factors involved in the remodeling have not been identified, andmany questions regarding the dynamics of this process remain.Several approaches have been used to return differentiated cellsto the pluripotent state (reviewed in ref. 122). In the most promisingone, overexpression of Oct4, Sox2, Klf4 and c-Myc was sufficientto reprogram somatic cells to an iPS cell state 123 . Translation of theapproach to human cells has allowed the generation of patient-specificstem cells and transformed the field of regenerative biology 124–126 .Hundreds of studies have followed initial reports of the generation ofgermline-competent iPS cells 125,127–129 . Advances in the use of smallmolecules have raised hopes that in the future it will be possible toreprogram any cell with a cocktail of small molecules 81,130,131 . In themouse system, several functional assays exist to determine the developmentalpotential or limitations of pluripotent cells 21 . These assayscannot be carried out for human ES or iPS cells, making a comprehensiveand careful characterization of the cellular or epigenetic statean essential substitute.We do not know how much epigenetic remodeling really occursduring the initial reprogramming phase, until the time when thefirst pluripotency markers appear in individual colonies (Fig. 2).Extended passaging of human 132 and mouse 133 iPS cells improves thelevel of reprogramming. This might explain many of the conflictingreports on iPS cell variation, given that investigators rarely controlfor passage numbers and sometimes fail even to report them. Mostof the experiments published to date should therefore be comparedonly with caution, both because derivation and culture conditionsmay have varied and because passage numbers are rarely reportedand difficult to compare.Interestingly, somatic cell nuclear transfer has often been describedas more efficient than reprogramming with transcription factors.However, embryos derived from nuclear transfer show major epigeneticreprogramming deficiencies 5 . For example, genes expressed inthe donor cell nuclei can be expressed in an inappropriate lineage ofthe embryo 134 . In embryos of neurectodermal origin, more than halfoverexpressed the neural marker Sox2 in endodermal cells 134 . Evena second round of nuclear transfer could not establish a fully reprogrammedstate, suggesting that epigenetic memory was persistent.This cellular memory was apparently the result of the euchromatichistone variant H3.3 (ref. 135), further highlighting the complexityof epigenetic remodeling during reprogramming. More evidence ofincomplete or aberrant epigenetic reprogramming during nucleartransfer comes from the finding that cloned mice are heavier thannormal mice and show several behavioral and metabolic alterationsthat are consistent with obesity 136 . The obese phenotype was nottransmitted through the germline, suggesting that it was probablycaused by reversible epigenetic changes.Although the oocyte is primed for extensive chromatin remodelingand reprogramming of the paternal and maternal genomes, itoften fails to accomplish complete reprogramming of somatic donorcells 5 . When the expression of Oct4 and ten Oct4-related genes wastested in blastocysts derived by somatic cell nuclear transfer usingcumulus cells as donor cells, nearly 40% of the blastocysts failedto express at least one of the 11 genes, whereas all 15 fertilizationderivedblastocysts expressed the complete set 137 . In somatic cells,most of these genes are silenced by DNA methylation of their promoters82 . Notably, many of the genes that are frequently not reprogrammedby the oocyte are also inefficiently reprogrammed duringthe generation of iPS cells. Oct4, Dppa2, Dppa3, Dppa4 and Dppa5remain inactive until late stages of reprogramming or are expressedonly in established iPS cell lines 81 . This group of genes is highlyDNA methylated in MEFs, which might explain why it is difficultto reprogram them. Indeed, hypomethylation has been shown tofacilitate reprogramming by both somatic cell nuclear transfer 138and transcription factors 81 .By combining gene expression data and genome-wide epigeneticmaps, our group found a strong correlation between the chromatinstate in fibroblasts and the activation of genes that are expressed inpluripotent cells 81 . Genes in an open chromatin state (marked byH3K4me3 or H3K4me3 and H3K27me3) are efficiently reactivated,whereas genes that are silenced by H3K27me3 methylation aloneremain mostly repressed. Both near promoters and in intergenicregions, most (97%) of the high CpG promoters that lack H3K4me3enrichment in MEFs regain this mark in one of the iPS cell linesthat we investigated 81 . These genes fall into at least two groups. Onegroup shows the reactivation and associated gain of H3K4me3 atkey pluripotency factors. These can be further subdivided into twoclasses: the first includes genes such as Lin28, Sox2 and Fgf4, whichare repressed by H3K27 and lack detectable H3K4me3; the secondincludes the genes Oct4, Nanog and Dppa, which are repressed byDNA methylation and also lack detectable H3K4me3. The othermajor group describes genes, including MyoD, that are repressedby H3K27me3 and highly enriched for developmental transcriptionfactors. To create a truly pluripotent cell line, these loci must remainrepressed but have to acquire H3K4me3 to reestablish their bivalencyand thus their developmental competence for all germ layers and celltypes. Failure to do so is not detectable in gene expression data andcan only be observed at the epigenetic level 81 . Overall, gene expressionpatterns would suggest a much less dynamic transition from thedifferentiated state to pluripotency (Fig. 2).nature biotechnology volume 28 number 10 OCTOBER 2010 1085


eview© 2010 Nature America, Inc. All rights reserved.ConclusionsMore than 60 years ago, the best-characterized epigenetic mark,5-methyl cytosine, was reported to be a minor constituent of mammaliangenomes 139 . Now, 10 years after the first draft of the human genomesequence 140 , we have several DNA methylation maps of the humangenome at single-nucleotide resolution and dozens of genome-widechromatin state maps 41,56–64 . These studies have provided novel insightsinto transcriptional regulation and the role of epigenetic modificationsacross cell types. Knowledge of the genomic distribution of many histonemodifications has helped us to understand the roles of these modificationsand enabled more efficient annotation of the genome, including theidentification of putative enhancers 141 , miRNAs 142 and large intergenicnon-coding (linc)RNAs 143 . The expanding catalog of lincRNAs (>3,000)and their association with PRC2 points to a possible general mechanismwhereby RNAs can guide chromatin-modifying complexes to their specificsites of action 144 . More generally, the rapidly advancing field of noncodingRNAs underscores that we are still far from understanding howthe epigenetic machinery operates in cells.In the short term, investigation of the chromatin state may help toexplain differences in the responses of cells to the ectopic expression oftranscription factors and lead to more efficient methods of reprogramming.Furthermore, a large repertoire of in vivo epigenome maps willalso facilitate studies to determine the quality and utility of cells generatedin vitro by directed differentiation or reprogramming. The enormousincrease in sequencing capabilities clearly indicates that this isonly the beginning. Coordinated national and international large-scaleefforts, including the NIH Roadmap Epigenomics Program and theInternational Human Epigenome Consortium (IHEC), are underwayto comprehensively map the entire human epigenome or at least captureas much of the epigenomic space (cell types × epigenetic marks × differentbackgrounds, such as age, genetics, diet) as possible. Epigenomereference maps and data from these projects will significantly increaseour understanding of normal biology, ES cells and epigenetic reprogrammingas well as undesired changes in disease states.ACKNOWLEDGMENTSI thank C. Bock, Z. Smith and B. Bernstein for comments on the manuscript. A.M.is supported by the Massachusetts Life Science Center, the Pew Charitable Trustsand the US National Institutes of Health Roadmap Initiative on Epigenomics(U01ES017155).COMPETING FINANCIAL INTERESTSThe author declares no competing financial interests.Published online at http://www.nature.com/naturebiotechnology/.Reprints and permissions information is available online at http://npg.nature.com/reprintsandpermissions/.1. Bird, A. DNA methylation patterns and epigenetic memory. Genes Dev. 16, 6–21(2002).2. Jaenisch, R. & Bird, A. Epigenetic regulation of gene expression: how the genomeintegrates intrinsic and environmental signals. Nat. Genet. 33 Suppl, 245–254(2003).3. Yamanaka, S. Strategies and new developments in the generation of patient-specificpluripotent stem cells. Cell Stem Cell 1, 39–49 (2007).4. Reik, W., Dean, W. & Walter, J. Epigenetic reprogramming in mammalian development.Science 293, 1089–1093 (2001).5. Rideout, W.M. III, Eggan, K. & Jaenisch, R. Nuclear cloning and epigenetic reprogrammingof the genome. Science 293, 1093–1098 (2001).6. Hammoud, S.S. et al. Distinctive chromatin in human sperm packages genes forembryo development. Nature 460, 473–478 (2009).7. Oswald, J. et al. Active demethylation of the paternal genome in the mouse zygote.Curr. Biol. 10, 475–478 (2000).8. Santos, F., Hendrich, B., Reik, W. & Dean, W. Dynamic reprogramming of DNA methylationin the early mouse embryo. Dev. Biol. 241, 172–182 (2002).9. Mayer, W., Niveleau, A., Walter, J., Fundele, R. & Haaf, T. Demethylation of the zygoticpaternal genome. Nature 403, 501–502 (2000).10. Kafri, T. et al. Developmental pattern of gene-specific DNA methylation in the mouseembryo and germ line. Genes Dev. 6, 705–714 (1992).11. Lane, N. et al. Resistance of IAPs to methylation reprogramming may provide amechanism for epigenetic inheritance in the mouse. Genesis 35, 88–93 (2003).12. Rossant, J. Stem cells and early lineage development. Cell 132, 527–531(2008).13. Niakan, K.K. et al. Sox17 promotes differentiation in mouse embryonic stem cellsby directly regulating extraembryonic gene expression and indirectly antagonizingself-renewal. Genes Dev. 24, 312–326 (2010).14. Tanaka, S., Kunath, T., Hadjantonakis, A.K., Nagy, A. & Rossant, J. Promotion oftrophoblast stem cell proliferation by FGF4. Science 282, 2072–2075 (1998).15. Evans, M.J. & Kaufman, M.H. Establishment in culture of pluripotential cells frommouse embryos. Nature 292, 154–156 (1981).16. Martin, G.R. Isolation of a pluripotent cell line from early mouse embryos culturedin medium conditioned by teratocarcinoma stem cells. Proc. Natl. Acad. Sci. USA78, 7634–7638 (1981).17. Chen, A.E. et al. Optimal timing of inner cell mass isolation increases the efficiencyof human embryonic stem cell derivation and allows generation of sibling cell lines.Cell Stem Cell 4, 103–106 (2009).18. Cowan, C.A. et al. Derivation of embryonic stem-cell lines from human blastocysts.N. Engl. J. Med. 350, 1353–1356 (2004).19. Thomson, J.A. et al. Embryonic stem cell lines derived from human blastocysts.Science 282, 1145–1147 (1998).20. Silva, J. & Smith, A. Capturing pluripotency. Cell 132, 532–536 (2008).21. Jaenisch, R. & Young, R. Stem cells, the molecular circuitry of pluripotency andnuclear reprogramming. Cell 132, 567–582 (2008).22. Jackson-Grusby, L. et al. Loss of genomic methylation causes p53-dependentapoptosis and epigenetic deregulation. Nat. Genet. 27, 31–39 (2001).23. Jones, P.A. & Baylin, S.B. The epigenomics of cancer. Cell 128, 683–692(2007).24. Goll, M.G. & Bestor, T.H. Eukaryotic cytosine methyltransferases. Annu. Rev.Biochem. 74, 481–514 (2005).25. Okano, M., Xie, S. & Li, E. Dnmt2 is not required for de novo and maintenancemethylation of viral DNA in embryonic stem cells. Nucleic Acids Res. 26, 2536–2540 (1998).26. Goll, M.G. et al. Methylation of tRNAAsp by the DNA methyltransferase homologDnmt2. Science 311, 395–398 (2006).27. Li, E., Bestor, T.H. & Jaenisch, R. Targeted mutation of the DNA methyltransferasegene results in embryonic lethality. Cell 69, 915–926 (1992).28. Okano, M., Bell, D.W., Haber, D.A. & Li, E. DNA methyltransferases Dnmt3a andDnmt3b are essential for de novo methylation and mammalian development. Cell99, 247–257 (1999).29. Dodge, J.E. et al. Inactivation of Dnmt3b in mouse embryonic fibroblasts results inDNA hypomethylation, chromosomal instability, and spontaneous immortalization.J. Biol. Chem. 280, 17986–17991 (2005).30. Kaneda, M. et al. Essential role for de novo DNA methyltransferase Dnmt3a inpaternal and maternal imprinting. Nature 429, 900–903 (2004).31. Bourc’his, D., Xu, G.L., Lin, C.S., Bollman, B. & Bestor, T.H. Dnmt3L and theestablishment of maternal genomic imprints. Science 294, 2536–2539 (2001).32. Bourc’his, D. & Bestor, T.H. Meiotic catastrophe and retrotransposon reactivationin male germ cells lacking Dnmt3L. Nature 431, 96–99 (2004).33. Jia, D., Jurkowska, R.Z., Zhang, X., Jeltsch, A. & Cheng, X. Structure of Dnmt3abound to Dnmt3L suggests a model for de novo DNA methylation. Nature 449,248–251 (2007).34. Ooi, S.K. et al. DNMT3L connects unmethylated lysine 4 of histone H3 to de novomethylation of DNA. Nature 448, 714–717 (2007).35. Hayashi, K. & Surani, M.A. Resetting the epigenome beyond pluripotency in thegermline. Cell Stem Cell 4, 493–498 (2009).36. Popp, C. et al. Genome-wide erasure of DNA methylation in mouse primordial germcells is affected by AID deficiency. Nature 463, 1101–1105 (2010).37. Strahl, B.D. & Allis, C.D. The language of covalent histone modifications. Nature403, 41–45 (2000).38. Shi, Y. Histone lysine demethylases: emerging roles in development, physiologyand disease. Nat. Rev. Genet. 8, 829–833 (2007).39. Francis, N.J. & Kingston, R.E. Mechanisms of transcriptional memory. Nat. Rev.Mol. Cell Biol. 2, 409–421 (2001).40. Campos, E.I. & Reinberg, D. Histones: annotating chromatin. Annu. Rev. Genet.43, 559–599 (2009).41. Bernstein, B.E., Meissner, A. & Lander, E.S. The mammalian epigenome. Cell 128,669–681 (2007).42. Cao, R. et al. Role of histone H3 lysine 27 methylation in Polycomb-group silencing.Science 298, 1039–1043 (2002).43. Zhang, Y., Cao, R., Wang, L. & Jones, R.S. Mechanism of Polycomb group genesilencing. Cold Spring Harb. Symp. Quant. Biol. 69, 309–317 (2004).44. Faust, C., Lawson, K.A., Schork, N.J., Thiel, B. & Magnuson, T. The Polycomb-groupgene eed is required for normal morphogenetic movements during gastrulation inthe mouse embryo. Development 125, 4495–4506 (1998).45. O’Carroll, D. et al. The polycomb-group gene Ezh2 is required for early mousedevelopment. Mol. Cell. Biol. 21, 4330–4336 (2001).46. Pasini, D., Bracken, A.P., Jensen, M.R., Lazzerini Denchi, E. & Helin, K. Suz12 isessential for mouse development and for EZH2 histone methyltransferase activity.EMBO J. 23, 4061–4071 (2004).47. Shen, X. et al. EZH1 mediates methylation on histone H3 lysine 27 and complementsEZH2 in maintaining stem cell identity and executing pluripotency. Mol.Cell 32,1086 volume 28 number 10 OCTOBER 2010 nature biotechnology


eview© 2010 Nature America, Inc. All rights reserved.48. Ezhkova, E. et al. Ezh2 orchestrates gene expression for the stepwise differentiationof tissue-specific stem cells. Cell 136, 1122–1135 (2009).49. Shumacher, A., Faust, C. & Magnuson, T. Positional cloning of a global regulator ofanterior-posterior patterning in mice. Nature 383, 250–253 (1996).50. Hanson, R.D. et al. Mammalian Trithorax and polycomb-group homologues areantagonistic regulators of homeotic development. Proc. Natl. Acad. Sci. USA 96,14372–14377 (1999).51. van der Lugt, N.M. et al. Posterior transformation, neurological abnormalities, andsevere hematopoietic defects in mice with a targeted deletion of the bmi-1 protooncogene.Genes Dev. 8, 757–769 (1994).52. Dodge, J.E., Kang, Y.K., Beppu, H., Lei, H. & Li, E. Histone H3–K9 methyltransferaseESET is essential for early development. Mol. Cell. Biol. 24, 2478–2486 (2004).53. Tachibana, M. et al. Histone methyltransferases G9a and GLP form heteromericcomplexes and are both crucial for methylation of euchromatin at H3–K9. GenesDev. 19, 815–826 (2005).54. Li, E. Chromatin modification and epigenetic reprogramming in mammalian development.Nat. Rev. Genet. 3, 662–673 (2002).55. Surani, M.A., Hayashi, K. & Hajkova, P. Genetic and epigenetic regulators of pluripotency.Cell 128, 747–762 (2007).56. Ku, M. et al. Genomewide analysis of PRC1 and PRC2 occupancy identifies twoclasses of bivalent domains. PLoS Genet. 4, e1000242 (2008).57. Zhao, X.D. et al. Whole-genome mapping of histone H3 Lys4 and 27 trimethylationsreveals distinct genomic compartments in human embryonic stem cells. Cell StemCell 1, 286–298 (2007).58. Pan, G. et al. Whole-genome analysis of histone H3 lysine 4 and lysine 27 methylationin human embryonic stem cells. Cell Stem Cell 1, 299–312 (2007).59. Mikkelsen, T.S. et al. Genome-wide maps of chromatin state in pluripotent andlineage-committed cells. Nature 448, 553–560 (2007).60. Bernstein, B.E. et al. A bivalent chromatin structure marks key developmental genesin embryonic stem cells. Cell 125, 315–326 (2006).61. Laurent, L. et al. Dynamic changes in the human methylome during differentiation.Genome Res. 20, 320–331 (2010).62. Lister, R. et al. Human DNA methylomes at base resolution show widespread epigenomicdifferences. Nature 462, 315–322 (2009).63. Meissner, A. et al. Genome-scale DNA methylation maps of pluripotent and differentiatedcells. Nature 454, 766–770 (2008).64. Mohn, F. et al. Lineage-specific polycomb targets and de novo DNA methylationdefine restriction and potential of neuronal progenitors. Mol. Cell 30, 755–766(2008).65. Meissner, A. et al. Reduced representation bisulfite sequencing for comparative highresolutionDNA methylation analysis. Nucleic Acids Res. 33, 5868–5877 (2005).66. Jackson, M. et al. Severe global DNA hypomethylation blocks differentiation andinduces histone hyperacetylation in embryonic stem cells. Mol. Cell. Biol. 24,8862–8871 (2004).67. Tsumura, A. et al. Maintenance of self-renewal ability of mouse embryonic stem cellsin the absence of DNA methyltransferases Dnmt1, Dnmt3a and Dnmt3b. Genes Cells11, 805–814 (2006).68. Chamberlain, S.J., Yee, D. & Magnuson, T. Polycomb repressive complex 2 is dispensablefor maintenance of embryonic stem cell pluripotency. Stem Cells 26,1496–1505 (2008).69. Holm, T.M. et al. Global loss of imprinting leads to widespread tumorigenesis in adultmice. Cancer Cell 8, 275–285 (2005).70. Fouse, S. et al. Promoter CpG methylation contributes to ES cell gene regulation inparallel with Oct4/Nanog, PcG complex, and histone H3 K4/K27 trimethylation. CellStem Cell 2, 160–169 (2008).71. Ng, R.K. et al. Epigenetic restriction of embryonic cell lineage fate by methylationof Elf5. Nat. Cell Biol. 10, 1280–1290 (2008).72. Boyer, L.A. et al. Polycomb complexes repress developmental regulators in murineembryonic stem cells. Nature 441, 349–353 (2006).73. Chi, A.S. & Bernstein, B.E. Developmental biology. Pluripotent chromatin state.Science 323, 220–221 (2009).74. Bracken, A.P., Dietrich, N., Pasini, D., Hansen, K.H. & Helin, K. Genome-wide mappingof Polycomb target genes unravels their roles in cell fate transitions. Genes Dev.20, 1123–1136 (2006).75. Martens, J.H. et al. The profile of repeat-associated histone lysine methylation statesin the mouse epigenome. EMBO J. 24, 800–812 (2005).76. Bilodeau, S., Kagey, M.H., Frampton, G.M., Rahl, P.B. & Young, R.A. SetDB1 contributesto repression of genes encoding developmental regulators and maintenanceof ES cell state. Genes Dev. 23, 2484–2489 (2009).77. Lohmann, F. et al. KMT1E mediated H3K9 methylation is required for the maintenanceof embryonic stem cells by repressing trophectoderm differentiation. StemCells 28, 201–212 (2010).78. Rollins, R.A. et al. Large-scale structure of genomic methylation patterns. GenomeRes. 16, 157–163 (2006).79. Weber, M. et al. Distribution, silencing potential and evolutionary impact of promoterDNA methylation in the human genome. Nat. Genet. 39, 457–466 (2007).80. Cedar, H. & Bergman, Y. Linking DNA methylation and histone modification: patternsand paradigms. Nat. Rev. Genet. 10, 295–304 (2009).81. Mikkelsen, T.S. et al. Dissecting direct reprogramming through integrative genomicanalysis. Nature 454, 49–55 (2008).82. Imamura, M. et al. Transcriptional repression and DNA hypermethylation of a smallset of ES cell marker genes in male germline stem cells. BMC Dev. Biol. 6, 34(2006).83. Ramsahoye, B.H. et al. Non-CpG methylation is prevalent in embryonic stem cellsand may be mediated by DNA methyltransferase 3a. Proc. Natl. Acad. Sci. USA 97,5237–5242 (2000).84. Haines, T.R., Rodenhiser, D.I. & Ainsworth, P.J. Allele-specific non-CpG methylationof the Nf1 gene during early mouse development. Dev. Biol. 240, 585–598(2001).85. Dodge, J.E., Ramsahoye, B.H., Wo, Z.G., Okano, M. & Li, E. De novo methylationof MMLV provirus in embryonic stem cells: CpG versus non-CpG methylation. Gene289, 41–48 (2002).86. Cokus, S.J. et al. Shotgun bisulphite sequencing of the Arabidopsis genome revealsDNA methylation patterning. Nature 452, 215–219 (2008).87. Chan, S.W., Henderson, I.R. & Jacobsen, S.E. Gardening the genome: DNA methylationin Arabidopsis thaliana. Nat. Rev. Genet. 6, 351–360 (2005).88. Lomvardas, S. et al. Interchromosomal interactions and olfactory receptor choice.Cell 126, 403–413 (2006).89. Tahiliani, M. et al. Conversion of 5-methylcytosine to 5-hydroxymethylcytosine inmammalian DNA by MLL partner TET1. Science 324, 930–935 (2009).90. Kriaucionis, S. & Heintz, N. The nuclear DNA base 5-hydroxymethylcytosine ispresent in Purkinje neurons and the brain. Science 324, 929–930 (2009).91. Ito, S. et al. Role of Tet proteins in 5mC to 5hmC conversion, ES-cell self-renewaland inner cell mass specification. Nature 466, 1129–1133 (2010).92. Down, T.A. et al. A Bayesian deconvolution strategy for immunoprecipitation-basedDNA methylome analysis. Nat. Biotechnol. 26, 779–785 (2008).93. Wu, S.C. & Zhang, Y. Active DNA demethylation: many roads lead to Rome. Nat.Rev. Mol. Cell Biol. 11, 607–620 (2010).94. Jeltsch, A. Beyond Watson and Crick: DNA methylation and molecular enzymologyof DNA methyltransferases. ChemBioChem 3, 274–293 (2002).95. Guenther, M.G., Levine, S.S., Boyer, L.A., Jaenisch, R. & Young, R.A. A chromatinlandmark and transcription initiation at most promoters in human cells. Cell 130,77–88 (2007).96. Heintzman, N.D. et al. Distinct and predictive chromatin signatures of transcriptionalpromoters and enhancers in the human genome. Nat. Genet. 39, 311–318(2007).97. Boiani, M. & Scholer, H.R. Regulatory networks in embryo-derived pluripotent stemcells. Nat. Rev. Mol. Cell Biol. 6, 872–884 (2005).98. Xie, W. et al. Histone h3 lysine 56 acetylation is linked to the core transcriptionalnetwork in human embryonic stem cells. Mol. Cell 33, 417–427 (2009).99. Daujat, S. et al. H3K64 trimethylation marks heterochromatin and is dynamicallyremodeled during developmental reprogramming. Nat. Struct. Mol. Biol. 16, 777–781 (2009).100. Efroni, S. et al. Global transcription in pluripotent embryonic stem cells. Cell StemCell 2, 437–447 (2008).101. Gu, H. et al. Genome-scale DNA methylation mapping of clinical samples at singlenucleotideresolution. Nat. Methods 7, 133–136 (2010).102. Goren, A. et al. Chromatin profiling by directly sequencing small quantities ofimmunoprecipitated DNA. Nat. Methods 7, 47–49 (2010).103. Xu, J. et al. Transcriptional competence and the active marking of tissue-specificenhancers by defined transcription factors in embryonic and induced pluripotentstem cells. Genes Dev. 23, 2824–2838 (2009).104. Epsztejn-Litman, S. et al. De novo DNA methylation promoted by G9a prevents reprogrammingof embryonically silenced genes. Nat. Struct. Mol. Biol. 15, 1176–1183(2008).105. Feldman, N. et al. G9a-mediated irreversible epigenetic inactivation of Oct-3/4during early embryogenesis. Nat. Cell Biol. 8, 188–194 (2006).106. Cherry, S.R., Biniszkiewicz, D., van Parijs, L., Baltimore, D. & Jaenisch, R. Retroviralexpression in embryonic stem cells and hematopoietic stem cells. Mol. Cell. Biol.20, 7419–7426 (2000).107. Matsui, T. et al. Proviral silencing in embryonic stem cells requires the histonemethyltransferase ESET. Nature 464, 927–931 (2010).108. Sen, G.L., Reuter, J.A., Webster, D.E., Zhu, L. & Khavari, P.A. DNMT1 maintains progenitorfunction in self-renewing somatic tissue. Nature 463, 563–567 (2010).109. Bröske, A.M. et al. DNA methylation protects hematopoietic stem cell multipotencyfrom myeloerythroid restriction. Nat. Genet. 41, 1207–1215 (2009).110. Trowbridge, J.J., Snow, J.W., Kim, J. & Orkin, S.H. DNA methyltransferase 1 isessential for and uniquely regulates hematopoietic stem and progenitor cells. CellStem Cell 5, 442–449 (2009).111. Ji, H. et al. Comprehensive methylome map of lineage commitment from haematopoieticprogenitors. Nature 467, 338–342 (2010).112. Tadokoro, Y., Ema, H., Okano, M., Li, E. & Nakauchi, H. De novo DNA methyltransferaseis essential for self-renewal, but not for differentiation, in hematopoietic stemcells. J. Exp. Med. 204, 715–722 (2007).113. Jacobs, J.J., Kieboom, K., Marino, S., DePinho, R.A. & van Lohuizen, M. The oncogeneand Polycomb-group gene bmi-1 regulates cell proliferation and senescencethrough the ink4a locus. Nature 397, 164–168 (1999).114. Woo, C.J., Kharchenko, P.V., Daheron, L., Park, P.J. & Kingston, R.E. A region ofthe human HOXD cluster that confers polycomb-group responsiveness. Cell 140,99–110 (2010).115. Li, G. et al. Jarid2 and PRC2, partners in regulating gene expression. Genes Dev.24, 368–380 (2010).116. Pasini, D. et al. JARID2 regulates binding of the Polycomb repressive complex 2 totarget genes in ES cells. Nature 464, 306–310 (2010).117. Peng, J.C. et al. Jarid2/Jumonji coordinates control of PRC2 enzymatic activity andtarget gene occupancy in pluripotent cells. Cell 139, 1290–1302 (2009).nature biotechnology volume 28 number 10 OCTOBER 2010 1087


eview© 2010 Nature America, Inc. All rights reserved.118. Shen, X. et al. Jumonji modulates polycomb activity and self-renewal versus differentiationof stem cells. Cell 139, 1303–1314 (2009).119. Khalil, A.M. et al. Many human large intergenic noncoding RNAs associate withchromatin-modifying complexes and affect gene expression. Proc. Natl. Acad. Sci.USA 106, 11667–11672 (2009).120. Gupta, R.A. et al. Long non-coding RNA HOTAIR reprograms chromatin state topromote cancer metastasis. Nature 464, 1071–1076 (2010).121. Guenther, M.G. & Young, R.A. Transcription. Repressive transcription. Science 329,150–151 (2010).122. Hochedlinger, K. & Jaenisch, R. Nuclear reprogramming and pluripotency. Nature441, 1061–1067 (2006).123. Takahashi, K. & Yamanaka, S. Induction of pluripotent stem cells from mouse embryonicand adult fibroblast cultures by defined factors. Cell 126, 663–676 (2006).124. Park, I.H. et al. Reprogramming of human somatic cells to pluripotency with definedfactors. Nature 451, 141–146 (2008).125. Takahashi, K. et al. Induction of pluripotent stem cells from adult human fibroblastsby defined factors. Cell 131, 861–872 (2007).126. Yu, J. et al. Induced pluripotent stem cell lines derived from human somatic cells.Science 318, 1917–1920 (2007).127. Maherali, N. et al. Global epigenetic remodeling in directly reprogrammed fibroblasts.Cell Stem Cell 1, 55–70 (2007).128. Meissner, A., Wernig, M. & Jaenisch, R. Direct reprogramming of geneticallyunmodified fibroblasts into pluripotent stem cells. Nat. Biotechnol. 25, 1177–1181(2007).129. Wernig, M. et al. In vitro reprogramming of fibroblasts into a pluripotent ES-cell-likestate. Nature 448, 318–324 (2007).130. Huangfu, D. et al. Induction of pluripotent stem cells by defined factors is greatlyimproved by small-molecule compounds. Nat. Biotechnol. 26, 795–797 (2008).131. Ichida, J.K. et al. A small-molecule inhibitor of tgf-Beta signaling replaces sox2 inreprogramming by inducing nanog. Cell Stem Cell 5, 491–503 (2009).132. Chin, M.H. et al. Induced pluripotent stem cells and embryonic stem cells are distinguishedby gene expression signatures. Cell Stem Cell 5, 111–123 (2009).133. Polo, J.M. et al. Cell type of origin influences the molecular and functional propertiesof mouse induced pluripotent stem cells. Nat. Biotechnol. 28, 848–855 (2010).134. Ng, R.K. & Gurdon, J.B. Epigenetic memory of active gene transcription is inheritedthrough somatic cell nuclear transfer. Proc. Natl. Acad. Sci. USA 102, 1957–1962(2005).135. Ng, R.K. & Gurdon, J.B. Epigenetic memory of an active gene state depends on histoneH3.3 incorporation into chromatin in the absence of transcription. Nat. Cell Biol. 10,102–109 (2008).136. Tamashiro, K.L. et al. Cloned mice have an obese phenotype not transmitted to theiroffspring. Nat. Med. 8, 262–267 (2002).137. Bortvin, A. et al. Incomplete reactivation of Oct4-related genes in mouse embryoscloned from somatic nuclei. Development 130, 1673–1680 (2003).138. Blelloch, R. et al. Reprogramming efficiency following somatic cell nuclear transferis influenced by the differentiation and methylation state of the donor nucleus. StemCells 24, 2007–2013 (2006).139. Hotchkiss, R.D. The quantitative separation of purines, pyrimidines, and nucleosidesby paper chromatography. J. Biol. Chem. 175, 315–332 (1948).140. Lander, E.S. et al. Initial sequencing and analysis of the human genome. Nature 409,860–921 (2001).141. Heintzman, N.D. et al. Distinct and predictive chromatin signatures of transcriptionalpromoters and enhancers in the human genome. Nat. Genet. 39, 311–318 (2007).142. Marson, A. et al. Connecting microRNA genes to the core transcriptional regulatorycircuitry of embryonic stem cells. Cell 134, 521–533 (2008).143. Guttman, M. et al. Chromatin signature reveals over a thousand highly conserved largenon-coding RNAs in mammals. Nature 458, 223–227 (2009).144. Khalil, A.M. et al. Many human large intergenic noncoding RNAs associate withchromatin-modifying complexes and affect gene expression. Proc. Natl. Acad. Sci.USA 106, 11667–11672 (2009).1088 volume 28 number 10 OCTOBER 2010 nature biotechnology


e v i e wGenomics tools for unraveling chromosomearchitectureBas van Steensel 1 & Job Dekker 2© 2010 Nature America, Inc. All rights reserved.The spatial organization of chromosomes inside the cell nucleus is still poorly understood. This organization is guided by intra- andinterchromosomal contacts and by interactions of specific chromosomal loci with relatively fixed nuclear ‘landmarks’ such as thenuclear envelope and the nucleolus. Researchers have begun to use new molecular genome-wide mapping techniques to uncoverboth types of molecular interactions, providing insights into the fundamental principles of interphase chromosome folding.The three-dimensional (3D) architecture of interphase chromosomesis one of the most fascinating topological problems in biology.Decades of microscopy studies have revealed several important generalprinciples that govern chromosome architecture 1–3 . First, interphasechromosomes each occupy their own territory in the nucleus, withonly a limited degree of intermingling. Second, genomic loci tendto be nonrandomly positioned within the nuclear space and relativeto each other, strongly suggesting that chromosomes adopt a configurationthat is at least partially reproducible. Finally, the degree ofcompaction of the chromatin fiber varies locally, and is often, but notalways, inversely linked to transcriptional activity and gene density.These important insights have been mostly obtained by fluorescencein situ hybridization (FISH) and in vivo tagging of selectedgenomic loci 1–3 . The power of these methods lies in their ability tovisualize individual loci inside single cell nuclei by light microscopy.However, the resolution limits of light microscopy and the practicalrestriction that only a few loci can be visualized simultaneously havehampered the construction of detailed models of chromosome architecture.Fortunately, over the past few years several new moleculartechniques have been developed toward this goal. These techniquesdirectly probe molecular interactions and thereby offer new viewsbeyond the resolution limits of microscopy. Moreover, by takingadvantage of genome-wide detection methods such as high-densitymicroarrays and massively parallel sequencing, researchers can nowmake comprehensive measurements of structural parameters of chromatinfor entire genomes in a single experiment.In essence, the new techniques focus on detecting two distinctclasses of molecular contacts involving the chromatin fiber (Fig. 1and Table 1). One set of techniques identifies physical interactionsof genomic loci with relatively fixed nuclear structures (landmarks)such as the nuclear envelope or the nucleolus. This can yield importantinformation about the position of genomic loci in nuclear space.A second set of techniques monitors physical associations between1 Division of Gene Regulation, Netherlands Cancer Institute, Amsterdam,The Netherlands. 2 Program in Gene Function and Expression, Department ofBiochemistry and Molecular Pharmacology, University of Massachusetts MedicalSchool, Worcester, Massachusetts, USA. Correspondence should be addressed toB.v.S. (b.v.steensel@nki.nl) or J.D. (job.dekker@umassmed.edu).Published online 13 October 2010; doi:10.1038/nbt.1680linearly distant sequences that come together by folding or bending ofthe chromatin fiber. Such associations may also occur between loci ondifferent chromosomes. Knowledge of intra- and interchromosomalcontacts provides insight into the local or global folding of chromosomes,and into the positioning of chromosomes relative to oneanother. Various chromatin-landmark interactions and chromatinchromatincontacts have now been mapped systematically. Here, wehighlight these new technological developments and the biologicalunderstanding they have yielded so far.Molecular mapping of genome interactions with nuclear landmarksThe nuclear envelope is the main fixed structure of the nucleus, andhas long been thought to provide anchoring sites for interphase chromosomes,and thus to help organize the genome inside the nucleus. Thenuclear envelope consists of a double lipid membrane punctured bynuclear pore complexes (NPCs), which act as channels for nuclear importand export 4 . In most metazoan cells, the nucleoplasmic surface of theinner nuclear membrane is coated by a sheet-like protein structure calledthe nuclear lamina. Its major constituents are nuclear lamins, which forma dense network of polymer fibers 5–7 . Both the nuclear lamina and NPCswere proposed decades ago to provide anchoring sites for interphasechromosomes 8,9 . Indeed, many FISH microscopy studies have supportedthis model: some genomic loci are preferentially located close to thenuclear envelope, whereas other loci are typically found in the nuclearinterior 3,10,11 . However, because of resolution limits it was generallyimpossible to tell whether these loci are in molecular contact with thenuclear lamina or the NPCs. Recent genome-wide mapping techniqueshave begun to provide more global insights into the molecular interactionsof chromosomes with components of the nuclear envelope.Interactions of the genome with the nuclear lamina have beenmapped by means of DamID technology (Fig. 2). In this application,a protein of the nuclear lamina (typically a lamin) is fused to DNAadenine methyltransferase (Dam) from Escherichia coli. When itis expressed in cells, this chimeric protein is incorporated into thenuclear lamina. As a consequence, DNA that is in molecular contactwith the nuclear lamina in vivo is methylated by the tetheredDam. The resulting tags, which are unique because DNA adeninemethylation does not occur endogenously in most eukaryotes, canbe mapped using a microarray-based readout 12,13 . Through thisapproach, nuclear lamina interactions have been mapped in detail innature biotechnology VOLUME 28 NUMBER 10 OCTOBER 2010 1089


e v i e w© 2010 Nature America, Inc. All rights reserved.NPCDADrosophila melanogaster, mouse and human cells 14–16 . In all three species,interactions with the nuclear lamina involve very large genomicdomains, rather than focal sites. Mouse and human genomes have>1,000 lamina-associated domains (LADs) with a median size of ~0.5megabases (Mb). In human cells, several sequence elements demarcatethe borders of many LADs, indicating that LAD organization isat least partially encoded in the genome sequence 15 .Although LADs have on average a relatively low gene density, whencombined they nevertheless harbor thousands of genes. Notably, mostof these genes are transcriptionally inactive 15,16 . This suggests that thenuclear lamina has a repressive role in gene regulation. Consistentwith this, deletion of the major lamin in D. melanogaster causes upregulationof some genes associated with nuclear lamina 17 . Moreover,artificial tethering to the nuclear lamina can cause the downregulationof reporter and some endogenous genes, although this may dependon the reporter or its genomic integration site 18–20 . Furthermore, duringdifferentiation, hundreds of genes show altered interactions withnuclear lamina. For many genes, detachment from the nuclear laminaoccurs concomitant with transcriptional activation; other detachedgenes initially remain silent but are more prone to activation in a seconddifferentiation step, suggesting that interaction with the nuclearlamina locks these genes in a stably repressed state 16 .Interactions of the genome with NPCs have been studied by bothDamID and chromatin immunoprecipitation (ChIP). The latter techniqueuses cross-linking of protein-DNA interactions with formaldehyde(and sometimes other cross-linking chemicals), followedby mechanical fragmentation of the DNA and subsequent immunoprecipitationusing antibodies, in this case antibodies for NPC proteins(Nups). Genome-wide tiling microarrays have been used to identifythe immunoprecipitated DNA sequences. In yeast, D. melanogaster andhuman cells, hundreds of genes are associated with various Nups 21–25 .Notably, detailed analyses in D. melanogaster established that a substantialproportion of these binding events occur in the nuclear interior,involving freely diffusing Nups 23,24 . Although this sheds light on anNPC-independent regulatory role of certain Nups, it also implies thatmost genome-wide maps of Nup interactions cannot be easily interpretedin terms of spatial organization of the genome, unless one conductsChIP or DamID experiments with Nups that are only present inthe NPC and not in the nucleoplasm. Fornerod and colleagues comparedDamID maps obtained with engineered Nups that are either exclusivelyCBNucleolusLaminaFigure 1 Cartoon of nucleus depicting the spatial interactions thatcontribute to the overall architecture of interphase chromosomes. LabelsA–D refer to corresponding entries in Table 1.Table 1 Genome contacts and mapping techniquesGenome contactsTechniquesA. Nuclear lamina DamIDB. Nuclear pores ChIP, DamIDC. Nucleolus FractionationD. Intra- and interchromosomal 3C and derivativesNPC-associated or mostly nucleoplasmic 23 . True NPC-associated locithus identified are rather short sequences of


e v i e w© 2010 Nature America, Inc. All rights reserved.Dam-fusion/Dam log ratio10–1–2–390DamDNA in contactwith NL becomesadenine-methylatedNL proteinExpress Dam-fusion proteinin cells or tissueIsolate genomic DNASelectively amplifyadenine-methylatedDNA fragmentsLabel and hybridize togenomic tiling array92 94 96 98 100 102Position on chromosome 5 (Mb)Figure 2 Mapping of interactions of the genome with nuclear landmarks,here shown for the nuclear lamina. See text for explanation. AdeninemethylatedDNA is specifically amplified using a PCR-based protocol usingrestriction endonucleases that selectively digest DNA depending on theadenine-methylation state, as described elsewhere 12,13 . NL, nuclear lamina.The most widely used molecular method to probe the spatial foldingof chromatin is chromosome conformation capture 30 (3C). 3C determinesthe relative frequency with which pairs of genomic loci are indirect physical contact. Chromatin is cross-linked with formaldehyde,after which DNA is digested and then re-ligated under dilute conditionsthat favor intramolecular ligation of cross-linked fragments (Fig. 3 andTable 2). This creates a genome-wide library of 3C ligation products,each of which is composed of a pair of restriction fragments that weresufficiently close to become cross-linked. Interactions detected by 3Ccan be mediated by proteins that bridge the two loci, but can also reflectcoassociation of loci with larger protein complexes, or perhaps evenlarger subnuclear structures such as nucleoli and transcription factories.Combined, the 3C library reflects the population-averaged foldingof the entire genome, at a resolution of several kilobases.In conventional 3C, the relative abundance of individual ligationproducts is determined using semiquantitative PCR. Initial 3C analysesin yeast revealed long-range interactions between telomeres,and between centromeres located on different chromosomes,consistent with earlier microscopic observations 30 . The first 3C studiesthat demonstrated long-range looping interactions between genesand their enhancers focused on the well-studied β-globin locus 31 .Long-range interactions have now been identified in many candidateloci, for example, the Igf2 locus 32 , the TH2 cytokine locus 33 and theα-globin locus 34 , and in a variety of species, establishing that loopingbetween genes and regulatory elements is a common mechanism forgene regulation. In many cases, gene promoters interact with multipleelements, and these elements often also interact with each other,leading to the formation of complex looped structures, sometimescalled chromatin hubs 31 .In an effort to map chromatin interactions at a genome-wide scale,researchers have developed several detection methods to more comprehensivelyinterrogate 3C libraries. 4C and 5C methods detect targetedsubsets of 3C ligation products (Table 2) 35–37 . In 4C, inversePCR is used to amplify all fragments ligated to a single ‘anchor’ fragmentto obtain a genome-wide interaction profile for the anchorlocus. 5C uses multiplex ligation-mediated amplification to amplifymillions of preselected 3C ligation junctions in parallel, for example,between a set of promoters and a set of enhancers. ChIP-loop (alsocalled 6C) and chromatin interaction analysis using paired-end tagsequencing (ChIA-PET) methods include a ChIP step to selectivelyidentify 3C ligation products that are bound by a protein of interest,for example, a transcription factor 38–40 . All these high-throughputmethods use microarrays or high-throughput sequencing to analyzethe amplified ligation junctions. Careful experimental design of 3Cbasedmethods is crucial to avoid artifacts and misinterpretations, ashas been discussed in detail elsewhere 41,42 .Results obtained with these methods confirm that long-range interactionsare widespread, and have also been used to identify severalnew phenomena. First, long-range interactions can occur over verylarge genomic distances, up to tens of megabases, suggesting thatchromosomes are extensively folded back on themselves. Second,interactions not only occur between specific short functional elements,such as enhancers and promoters, but also occur over largerchromosomal domains. Some groups of genes have many interactionswith each other all along their lengths, suggesting these genes are inclose spatial proximity, perhaps owing to association with the samesubnuclear structure such as the nuclear envelope, or with a transcriptionfactory. Third, interactions occur not only along chromosomes,but also between them. For instance, the X chromosome–inactivationcenter (Xic) of one X chromosome transiently interacts with the Xicof the other X-chromosome while X-chromosome inactivation isestablished 43–45 . Another example is the trans association of imprintedgenes, which may contribute to their regulation 46 .Recently, it has become possible to determine chromatin interactionsin a truly unbiased and genome-wide manner, that is, withoutthe need to limit the analysis to one selected anchor or a group ofthem, or to sites bound by a specific protein 47–49 . The Hi-C technologyis also based on 3C, but includes a step before ligation in whichthe staggered ends of the restriction fragments are filled in with biotinylatednucleotides 48 . As a result, ligation junctions are marked withbiotin, allowing subsequent purification after DNA shearing usingstreptavidin-coated beads. Ligation junctions are then analyzed bypaired-end high-throughput sequencing to identify the interactingloci. Hi-C data can be used to study the overall folding of genomes.Currently, for large genomes such as those of human and mouse, Hi-Canalysis will produce an interaction map with a resolution of ~0.1 to1 Mb. This resolution is limited only by the number of sequence readsthat current platforms can produce, and expected future increases inthroughput and decreases in cost will allow the generation of interactionmaps with substantially higher resolution.The first Hi-C maps of the human genome confirm several featuresof nuclear organization that were also detected by microscopy, and thesenature biotechnology VOLUME 28 NUMBER 10 OCTOBER 2010 1091


e v i e w© 2010 Nature America, Inc. All rights reserved.DigestionHi-C: fill in withbiotin-dCTPLigationFigure 3 Principles of the major 3C-basedtechnologies. All protocols start with treatment ofcells with formaldehyde (not shown), leading tocross-linking of DNA segments in close proximityto one another. After digestion with one or morerestriction enzymes, linked restriction fragmentsDNApurificationImmunoprecipitationDNApurificationmaps have already been used to uncover several new aspects of chromosomearchitecture and nuclear organization 48 . First, chromosomes extensivelyinteract with each other, with some chromosome pairs showingpreferred associations. Thus, chromosomes seem to occupy preferredlocations with respect to each other. Second, chromosomes are spatiallycompartmentalized to form two types of nuclear neighborhoods, calledA- and B-type compartments. A-type compartments contain active loci(as indicated by gene expression level and the presence of chromatinfeatures associated with active chromatin such as sites that are hypersensitiveto DNase I) whereas B-type compartments are composed of inactivechromatin. Spatial separation of active and inactive domains is consistentwith earlier observations obtained for individual loci by microscopy 50and by 4C 35 . Third, Hi-C data, like any 3C-based data, can be modeledusing polymer models to uncover folding states of chromatin (for example,refs. 30,51). Computational modeling of Hi-C data revealed that at alength scale of up to several megabases, human chromatin may be foldedin a polymer state called a fractal globule 48 . This densely packed state ischaracterized by the absence of knots and entanglements. This uniqueconformation allows easy folding and unfolding of sections of chromosomes,which may be relevant for activating and repressing genes.A variant of Hi-C has also been described that marks ligation junctionswith a biotinylated oligonucleotide to facilitate their purification 49 . Thismethod was applied to analysis of the 3D structure of the yeast genome.The data confirmed all the known hallmarks of nuclear organization,including clustering of centromeres and telomeres 52 . Furthermore, interchromosomalinteractions were found to occur between tRNA genes andbetween origins of replication that fire early in S phase.Together, 3C-based studies suggest a bewildering complexity inlong-range communication among a variety of genomic elementsacross chromosomes and the genome. There is still room forfurther technological improvements. For instance, there may besome local biases in the interaction maps caused by differences incross-linkability between chromatin types, and differential accessof sequences to the enzymes used in the protocol. Refining thetechnology may overcome some of these potential limitations. Weare only starting to explore the spatial folding of chromosomes, andthe new genome-wide 3C methods will probably provide a wealthof new insights.Toward an integrated view of chromosome architectureWith several new genome-wide detection methods in place, anintegrated picture of chromosome architecture seems within reach.Ligationproduct libraryLigationproduct libraryUnfortunately, the maps produced so farare derived from diverse cell lines or fromdifferent species, so direct comparisons arenot yet possible. Nevertheless, we can makesome conclusions and reasonable speculations.At least in D. melanogaster, NPCs andthe nuclear lamina clearly interact with differentchromosomal regions, and thus providetwo distinct sets of anchoring points.In human cells, LADs and NADs both tendto include centromeric regions 15,27 , suggestingthat centromeres in each nucleus aredistributed between the nuclear lamina andnucleoli. LADs and B-type domains showsome marked similarities (in size range andan overall lack of gene activity), suggestingthat they must overlap at least in part. If thisis true, it suggests that LADs may interact orintermingle with other LADs and form aggregatesof compacted chromatin near the nuclear lamina (Fig. 4). Thismodel would explain the substantial amounts of heterochromatinin close contact with the nuclear lamina that have been observed bymicroscopy.Evidence is accumulating that some epigenetic marks are linkedto nuclear organization. The timing of DNA replication along thegenome shows a block-like structure of alternating large early- andlate-replicating segments 53,54 . A genome-wide comparison indicatesthat late-replicating domains roughly correspond to LADs 16 ,consistent with the enrichment of late-replicating sequences at thenuclear periphery 53,55 . However, LADs and late-replicating domainsdo not overlap perfectly 16 , indicating that they are related but notidentical. Late-replicating domains also are markedly similar to theB-type domains as identified by Hi-C 56 . Furthermore, the histonemodification H3K9me2 has a domain pattern similar to those ofLADs 15,16,57 and of segments of late-replicating DNA 56,58 . Takentogether, LADs, late-replicating DNA, H3K9me2 domains, and B-typedomains all seem closely related, but more systematic comparisonsare needed to understand their precise relationships.The active compartments of the genome, for example, the A-domainsidentified by Hi-C, may also have cytological correlates. ExpressedTable 2 Scope and detection methods of 3C-based technologiesExampleMethod Scope Detectionreference3C Interaction between two Quantitative PCR 30selected loci4C Genome-wide interactionsof one selected locusInverse PCR followed bydetection with microarrayor sequencing355CHi-CChIP-loop3C4C5CHi-CChIP-loopChIA-PETare intramolecularly ligated. In the case of Hi-C, the ends of the restriction fragments are first filled inwith biotinylated dNTPs before ligation to facilitate purification of ligation junctions using streptavidincoatedbeads. Single or multiple ligation events are detected directly (using 3C, 4C, 5C and Hi-C),or immunoprecipitation is first used to enrich for DNA associated with a protein of interest (usingChIP-loop and ChIA-PET). See Table 2 for overview of different detection strategies and their scope.All interactions amongmultiple selected lociUnbiased genome-wideinteraction mapInteraction between twoselected loci bound by aparticular proteinUnbiased genome-wideinteraction map of lociMultiplex LMA followed 37by detection withmicroarray or sequencingMaking of junctions with 48biotin, shearing and ligationjunction purification,followed by sequencingQuantitative PCR 38ChIA-PETInsertion of linker into 40junction, followed bybound by a particular protein sequencingSee Figure 3 for protocols for these methods. LMA, ligation-mediated amplification.1092 VOLUME 28 NUMBER 10 OCTOBER 2010 nature biotechnology


e v i e w© 2010 Nature America, Inc. All rights reserved.TranscriptionmachineryCTCFLaminaNPCFigure 4 Speculative cartoon model of chromatin organization. LADsmay consist of relatively condensed chromatin (thick lines) and aggregateat the nuclear lamina. Other repressed regions may interact with eachother in the nuclear interior, as do active regions. Complexes formed bycomponents of the transcription machinery (transcription factories) andCTCF may tether active regions together. Parts of only two chromosomesare depicted, each in a different color for clarity. Most interactions occurwithin chromosomes, and relatively few occur between chromosomes.genes have been observed to cluster at subnuclear foci enriched intranscription machineries, which are sometimes called transcriptionfactories (Fig. 4). In addition, these domains seem to correlate withopen chromatin that is replicated early in S phase 56,59 .Another emerging theme is the critical role of the CTCF protein, amultifunctional DNA-binding protein 60 . Extensive 3C-based evidenceindicates that CTCF can mediate long-range interactions, both incis 32,60–62 and in trans 45 (Fig. 4). In addition, borders of human LADsare frequently demarcated by CTCF-binding sites 15 , suggesting thatCTCF helps control LAD organization. How these observations arelinked remains to be elucidated, but CTCF is clearly an importantfactor in the regulation of chromosome topology.Stochastic nature of interactionsSo far, all genome-wide datasets that describe chromosome architectureare derived from large pools of cells. Yet microscopy studies haveshown that the location of individual genomic loci is highly variablefrom cell to cell, even in clonal cell lines. This variability has twobiological sources. First, within each nucleus, chromatin is mobile toa certain degree 63,64 . Second, in a newly formed nucleus after mitosis,the relative positioning of chromosomes may be substantially drivenby stochastic processes 65 .It is difficult to calibrate the genome-wide interaction datasetsin terms of absolute contact frequencies. Currently this can only beapproximated by FISH, which is hampered by insufficient resolutionand the possible disruption of chromosome folding by the harshdenaturation conditions the technique requires. Most long-range interactionsbetween chromosomal loci, as detected by 3C-based methods,probably occur in less than 10–20% of cells at a given time point 35,66–68 .Contacts of individual LADs and NADs with their respective landmarksmay occur in 10–50% of cells 14,27 . We emphasize that these areonly rough estimates, subject to arbitrary definitions of contacts usedin the respective studies.The stochastic nature of chromosome architecture raises importantquestions related to gene regulation. For example, if LADs contact thenuclear lamina only transiently, or only in a subpopulation of cells, thenhow can such interactions contribute to robust gene repression? Onepossibility is that a transient contact with the nuclear lamina causes along-lasting change in the chromatin, for example through a histonemodifyingenzyme embedded in the nuclear lamina. Except for enhancerpromoterinteractions, the functional relevance of stochastic, relativelylow-frequency contacts between linearly distant genes (‘gene kissing’) ismostly unclear. In some cases these contacts correlate with gene expression66 , but to establish causal relationships researchers must experimentallymodulate these contacts, for example, by specifically disrupting themand assessing the impact on gene expression and regulation.Future outlookA notable theme emerging from studies so far is that metazoangenomes are linearly segmented into large multigene domains, whichhave specific interactions with nuclear landmarks and each other. Thisraises the possibility that chromosomal aberrations such as translocationsand inversions, which are found in a variety of human geneticdisorders 69 and in many types of cancer 70 , can disrupt the spatialorganization of the affected chromosomes and perhaps thereby altergene expression 71 . Notably, it was recently shown that this logic canalso be turned around: 3C-derived techniques can identify chromosomalaberrations on the basis of altered spatial relationships betweenloci 72 . Inversely, the spatial organization of the genome may also affectthe spectrum of any translocations that could occur in that cell. Locithat are spatially proximal may more frequently engage in translocationthan more distant ones 73–75 .Another class of human disorders that may be of interest in thecontext of chromosome architecture is the so-called laminopathies.These disorders are caused by congenital defects in proteins of thenuclear lamina. For example, mutations in A-type lamins cause amarkedly diverse spectrum of disorders including progeria, musculardystrophy and cardiomyopathy 76 . Some of these disorders mayinvolve changes in chromosome architecture due to altered interactionswith the nuclear lamina. Indeed, in cells from patients sufferingfrom Hutchinson-Gilford progeria syndrome (HPGS), which showabnormal accumulation of lamin A at the nuclear lamina, changeshave been observed in the morphology and localization of heterochromatin77,78 , although this may be an indirect effect of misregulationof certain chromatin proteins 79 . Mapping of genome–nuclearlamina interactions and chromosome conformation in cells fromlaminopathy patients may provide important insights into the etiologyof this class of disorders.The initial results of various new genome-wide approaches havealready uncovered some important principles of chromosome architecture.Higher-resolution views, particularly for Hi-C, will becomeavailable as sequencing throughput continues to ramp up. Yet theprobabilistic and dynamic nature of chromatin organization posespractical and conceptual challenges. It would be extremely helpful iftechniques for the molecular mapping of chromatin architecture couldbe scaled down to single cells, as this would directly capture cell-tocellvariation. Although this will be technically demanding, the rapidadvances in high-throughput single-molecule DNA sequencing technologies,combined with further development of methods to detectinteractions, may offer new opportunities toward reaching this goal.AcknowledgmentsWe thank members of the van Steensel and Dekker labs and M. Walhout forsuggestions. This work was supported by the Netherlands Genomics Initiativeand an Netherlands Organization for Scientific Research–Earth and Life Sciences(NWO-ALW) VICI grant to B.v.S., a grant from the US National Institutes of Health(HG003143) and a W.M. Keck Foundation Distinguished Young Scholar Award to J.D.COMPETING FINANCIAL INTERESTSThe authors declare no competing financial interests.Published online at http://www.nature.com/naturebiotechnology/.Reprints and permissions information is available online at http://npg.nature.com/reprintsandpermissions/.nature biotechnology VOLUME 28 NUMBER 10 OCTOBER 2010 1093


e v i e w© 2010 Nature America, Inc. All rights reserved.1. Pombo, A. & Branco, M.R. Functional organisation of the genome during interphase.Curr. Opin. Genet. Dev. 17, 451–455 (2007).2. Misteli, T. Beyond the sequence: cellular organization of genome function. Cell128, 787–800 (2007).3. Zhao, R., Bodnar, M.S. & Spector, D.L. Nuclear neighborhoods and gene expression.Curr. Opin. Genet. Dev. 19, 172–179 (2009).4. Hetzer, M.W. & Wente, S.R. Border control at the nucleus: biogenesis andorganization of the nuclear membrane and pore complexes. Dev. Cell 17, 606–616(2009).5. Stuurman, N., Heins, S. & Aebi, U. Nuclear lamins: their structure, assembly, andinteractions. J. Struct. Biol. 122, 42–66 (1998).6. Herrmann, H. & Aebi, U. Intermediate filaments: molecular structure, assemblymechanism, and integration into functionally distinct intracellular Scaffolds. Annu.Rev. Biochem. 73, 749–789 (2004).7. Prokocimer, M. et al. Nuclear lamins: key regulators of nuclear structure andactivities. J. Cell Mol. Med. 13, 1059–1085 (2009).8. Franke, W.W. Structure, biochemistry, and functions of the nuclear envelope. Int.Rev. Cytol. 4 (suppl.), 71–236 (1974).9. Blobel, G. Gene gating: a hypothesis. Proc. Natl. Acad. Sci. USA 82, 8527–8529(1985).10. Takizawa, T., Meaburn, K.J. & Misteli, T. The meaning of gene positioning. Cell135, 9–13 (2008).11. Fedorova, E. & Zink, D. Nuclear genome organization: common themes andindividual patterns. Curr. Opin. Genet. Dev. 19, 166–171 (2009).12. Greil, F., Moorman, C. & van Steensel, B. DamID: mapping of in vivo protein-genomeinteractions using tethered DNA adenine methyltransferase. Methods Enzymol. 410,342–359 (2006).13. Vogel, M.J., Peric-Hupkes, D. & van Steensel, B. Detection of in vivo protein-DNAinteractions using DamID in mammalian cells. Nat. Protoc. 2, 1467–1478(2007).14. Pickersgill, H. et al. Characterization of the Drosophila melanogaster genome at thenuclear lamina. Nat. Genet. 38, 1005–1014 (2006).15. Guelen, L. et al. Domain organization of human chromosomes revealed by mappingof nuclear lamina interactions. Nature 453, 948–951 (2008).16. Peric-Hupkes, D. et al. Molecular maps of the reorganization of genome—nuclear lamina interactions during differentiation. Mol. Cell 38, 603–613(2010).17. Shevelyov, Y.Y. et al. The B-type lamin is required for somatic repression oftestis-specific gene clusters. Proc. Natl. Acad. Sci. USA 106, 3282–3287(2009).18. Reddy, K.L., Zullo, J.M., Bertolino, E. & Singh, H. Transcriptional repressionmediated by repositioning of genes to the nuclear lamina. Nature 452, 243–247(2008).19. Finlan, L.E. et al. Recruitment to the nuclear periphery can alter expression ofgenes in human cells. PLoS Genet. 4, e1000039 (2008).20. Kumaran, R.I. & Spector, D.L. A genetic locus targeted to the nuclear periphery inliving cells maintains its transcriptional competence. J. Cell Biol. 180, 51–65(2008).21. Casolari, J.M. et al. Genome-wide localization of the nuclear transport machinerycouples transcriptional status and nuclear organization. Cell 117, 427–439(2004).22. Brown, C.R., Kennedy, C.J., Delmar, V.A., Forbes, D.J. & Silver, P.A. Global histoneacetylation induces functional genomic reorganization at mammalian nuclear porecomplexes. Genes Dev. 22, 627–639 (2008).23. Kalverda, B., Pickersgill, H., Shloma, V.V. & Fornerod, M. Nucleoporins directlystimulate expression of developmental and cell-cycle genes inside the nucleoplasm.Cell 140, 360–371 (2010).24. Capelson, M. et al. Chromatin-bound nuclear pore components regulate geneexpression in higher eukaryotes. Cell 140, 372–383 (2010).25. Vaquerizas, J.M. et al. Nuclear pore proteins nup153 and megator definetranscriptionally active regions in the Drosophila genome. PLoS Genet. 6, e1000846(2010).26. Schermelleh, L. et al. Subdiffraction multicolor imaging of the nuclearperiphery with 3D structured illumination microscopy. Science 320, 1332–1336(2008).27. Németh, A. et al. Initial genomics of the human nucleolus. PLoS Genet. 6,e1000889 (2010).28. Stahl, A., Hartung, M., Vagner-Capodano, A.M. & Fouet, C. Chromosomal constitutionof nucleolus-associated chromatin in man. Hum. Genet. 35, 27–34 (1976).29. Thompson, M., Haeusler, R.A., Good, P.D. & Engelke, D.R. Nucleolar clustering ofdispersed tRNA genes. Science 302, 1399–1401 (2003).30. Dekker, J., Rippe, K., Dekker, M. & Kleckner, N. Capturing chromosomeconformation. Science 295, 1306–1311 (2002).31. Tolhuis, B., Palstra, R.J., Splinter, E., Grosveld, F. & de Laat, W. Looping andinteraction between hypersensitive sites in the active β-globin locus. Mol. Cell 10,1453–1465 (2002).32. Murrell, A., Heeson, S. & Reik, W. Interaction between differentially methylatedregions partitions the imprinted genes Igf2 and H19 into parent-specific chromatinloops. Nat. Genet. 36, 889–893 (2004).33. Spilianakis, C.G. & Flavell, R.A. Long-range intrachromosomal interactions in theT helper type 2 cytokine locus. Nat. Immunol. 5, 1017–1027 (2004).34. Vernimmen, D., De Gobbi, M., Sloane-Stanley, J.A., Wood, W.G. & Higgs, D.R.Long-range chromosomal interactions regulate the timing of the transition betweenpoised and active gene expression. EMBO J. 26, 2041–2051 (2007).35. Simonis, M. et al. Nuclear organization of active and inactive chromatin domainsuncovered by chromosome conformation capture-on-chip (4C). Nat. Genet. 38,1348–1354 (2006).36. Zhao, Z. et al. Circular chromosome conformation capture (4C) uncovers extensivenetworks of epigenetically regulated intra- and interchromosomal interactions. Nat.Genet. 38, 1341–1347 (2006).37. Dostie, J. et al. Chromosome Conformation Capture Carbon Copy (5C): a massivelyparallel solution for mapping interactions between genomic elements. Genome Res.16, 1299–1309 (2006).38. Horike, S., Cai, S., Miyano, M., Cheng, J.F. & Kohwi-Shigematsu, T. Loss of silentchromatinlooping and impaired imprinting of DLX5 in Rett syndrome. Nat. Genet.37, 31–40 (2005).39. Tiwari, V.K., Cope, L., McGarvey, K.M., Ohm, J.E. & Baylin, S.B. A novel 6C assayuncovers Polycomb-mediated higher order chromatin conformations. Genome Res.18, 1171–1179 (2008).40. Fullwood, M.J. et al. An oestrogen-receptor-α-bound human chromatin interactome.Nature 462, 58–64 (2009).41. Simonis, M., Kooren, J. & de Laat, W. An evaluation of 3C-based methods to captureDNA interactions. Nat. Methods 4, 895–901 (2007).42. Dekker, J. The three ‘C’ s of chromosome conformation capture: controls, controls,controls. Nat. Methods 3, 17–21 (2006).43. Xu, N., Tsai, C.L. & Lee, J.T. Transient homologous chromosome pairing marks theonset of X inactivation. Science 311, 1149–1152 (2006).44. Bacher, C.P. et al. Transient colocalization of X-inactivation centres accompaniesthe initiation of X inactivation. Nat. Cell Biol. 8, 293–299 (2006).45. Xu, N., Donohoe, M.E., Silva, S.S. & Lee, J.T. Evidence that homologousX-chromosome pairing requires transcription and Ctcf protein. Nat. Genet. 39,1390–1396 (2007).46. Sandhu, K.S. et al. Nonallelic transvection of multiple imprinted loci is organizedby the H19 imprinting control region during germline development. Genes Dev. 23,2598–2603 (2009).47. Rodley, C.D., Bertels, F., Jones, B. & O’Sullivan, J.M. Global identification of yeastchromosome interactions using genome conformation capture. Fungal Genet. Biol.46, 879–886 (2009).48. Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactionsreveals folding principles of the human genome. Science 326, 289–293 (2009).49. Duan, Z. et al. A three-dimensional model of the yeast genome. Nature 465,363–367 (2010).50. Shopland, L.S. et al. Folding and organization of a contiguous chromosome regionaccording to the gene distribution pattern in primary genomic sequence. J. CellBiol. 174, 27–38 (2006).51. Dekker, J. Mapping in vivo chromatin interactions in yeast suggests an extendedchromatin fiber with regional variation in compaction. J. Biol. Chem. 283,34532–34540 (2008).52. Taddei, A., Schober, H. & Gasser, S.M. The budding yeast nucleus. Cold SpringHarb. Perspect. Biol. 2, a000612 (2010).53. Hiratani, I. et al. Global reorganization of replication domains during embryonicstem cell differentiation. PLoS Biol. 6, e245 (2008).54. Schwaiger, M. et al. Chromatin state marks cell-type- and gender-specific replicationof the Drosophila genome. Genes Dev. 23, 589–601 (2009).55. O’Keefe, R.T., Henderson, S.C. & Spector, D.L. Dynamic organization of DNAreplication in mammalian cell nuclei: spatially and temporally defined replicationof chromosome-specific α-satellite DNA sequences. J. Cell Biol. 116, 1095–1110(1992).56. Ryba, T. et al. Evolutionarily conserved replication timing profiles predict long-rangechromatin interactions and distinguish closely related cell types. Genome Res. 20,761–770 (2010).57. Wen, B., Wu, H., Shinkai, Y., Irizarry, R.A. & Feinberg, A.P. Large histone H3 lysine9 dimethylated chromatin blocks distinguish differentiated from embryonic stemcells. Nat. Genet. 41, 246–250 (2009).58. Yokochi, T. et al. G9a selectively represses a class of late-replicating genesat the nuclear periphery. Proc. Natl. Acad. Sci. USA 106, 19363–19368(2009).59. Gilbert, N. et al. Chromatin architecture of the human genome: gene-rich domainsare enriched in open chromatin fibers. Cell 118, 555–566 (2004).60. Phillips, J.E. & Corces, V.G. CTCF: master weaver of the genome. Cell 137,1194–1211 (2009).61. Splinter, E. et al. CTCF mediates long-range chromatin looping and local histonemodification in the β-globin locus. Genes Dev. 20, 2349–2354 (2006).62. Majumder, P., Gomez, J.A., Chadwick, B.P. & Boss, J.M. The insulatorfactor CTCF controls MHC class II gene expression and is required for theformation of long-distance chromatin interactions. J. Exp. Med. 205, 785–798(2008).63. Soutoglou, E. & Misteli, T. Mobility and immobility of chromatin in transcriptionand genome stability. Curr. Opin. Genet. Dev. 17, 435–442 (2007).64. Chuang, C.H. & Belmont, A.S. Moving chromatin within the interphase nucleuscontrolledtransitions? Semin. Cell Dev. Biol. 18, 698–706 (2007).65. Bolzer, A. et al. Three-dimensional maps of all chromosomes in human malefibroblast nuclei and prometaphase rosettes. PLoS Biol. 3, e157 (2005).66. Osborne, C.S. et al. Active genes dynamically colocalize to shared sites of ongoingtranscription. Nat. Genet. 36, 1065–1071 (2004).67. Spilianakis, C.G., Lalioti, M.D., Town, T., Lee, G.R. & Flavell, R.A. Interchromosomalassociations between alternatively expressed loci. Nature 435, 637–645(2005).1094 VOLUME 28 NUMBER 10 OCTOBER 2010 nature biotechnology


e v i e w68. Miele, A., Bystricky, K. & Dekker, J. Yeast silent mating type loci form heterochromaticclusters through silencer protein-dependent long-range interactions. PLoS Genet.5, e1000478 (2009).69. Shaw, C.J. & Lupski, J.R. Implications of human genome architecture forrearrangement-based disorders: the genomic basis of disease. Hum. Mol. Genet.13 Spec No 1, R57–R64 (2004).70. Mitelman, F., Johansson, B. & Mertens, F. The impact of translocations and genefusions on cancer causation. Nat. Rev. Cancer 7, 233–245 (2007).71. Harewood, L. et al. The effect of translocation-induced nuclear reorganization ongene expression. Genome Res. 20, 554–564 (2010).72. Simonis, M. et al. High-resolution identification of balanced and complex chromosomalrearrangements by 4C technology. Nat. Methods 6, 837–842 (2009).73. Roix, J.J., McQueen, P.G., Munson, P.J., Parada, L.A. & Misteli, T. Spatial proximityof translocation-prone gene loci in human lymphomas. Nat. Genet. 34, 287–291(2003).74. Lin, C. et al. Nuclear receptor-induced chromosomal proximity and DNA breaksunderlie specific translocations in cancer. Cell 139, 1069–1083 (2009).75. Mani, R.S. et al. Induced chromosomal proximity and gene fusions in prostatecancer. Science 326, 1230 (2009).76. Worman, H.J., Fong, L.G., Muchir, A. & Young, S.G. Laminopathies and the longstrange trip from basic cell biology to therapy. J. Clin. Invest. 119, 1825–1836(2009).77. Goldman, R.D. et al. Accumulation of mutant lamin A causes progressive changesin nuclear architecture in Hutchinson-Gilford progeria syndrome. Proc. Natl. Acad.Sci. USA 101, 8963–8968 (2004).78. Taimen, P. et al. A progeria mutation reveals functions for lamin A in nuclearassembly, architecture, and chromosome organization. Proc. Natl. Acad. Sci. USA106, 20788–20793 (2009).79. Pegoraro, G. et al. Ageing-related chromatin defects through loss of the NURDcomplex. Nat. Cell Biol. 11, 1261–1267 (2009).© 2010 Nature America, Inc. All rights reserved.nature biotechnology VOLUME 28 NUMBER 10 OCTOBER 2010 1095


A n a ly s i sComparison of sequencing-based methods to profileDNA methylation and identification of monoallelicepigenetic modifications© 2010 Nature America, Inc. All rights reserved.R Alan Harris 1, *, Ting Wang 2 , Cristian Coarfa 1 , Raman P Nagarajan 3 , Chibo Hong 3 , Sara L Downey 3 ,Brett E Johnson 3 , Shaun D Fouse 3 , Allen Delaney 4 , Yongjun Zhao 4 , Adam Olshen 3 , Tracy Ballinger 5 ,Xin Zhou 2 , Kevin J Forsberg 2 , Junchen Gu 2 , Lorigail Echipare 6 , Henriette O’Geen 6 , Ryan Lister 7 ,Mattia Pelizzola 7 , Yuanxin Xi 8 , Charles B Epstein 9 , Bradley E Bernstein 9–11 , R David Hawkins 12 , Bing Ren 12,13 ,Wen-Yu Chung 14,15 , Hongcang Gu 9 , Christoph Bock 9,16–18 , Andreas Gnirke 9 , Michael Q Zhang 14,15 ,David Haussler 5 , Joseph R Ecker 7 , Wei Li 8 , Peggy J Farnham 6 , Robert A Waterland 1,19 , Alexander Meissner 9,16,17 ,Marco A Marra 4 , Martin Hirst 4 , Aleksandar Milosavljevic 1 & Joseph F Costello 3Analysis of DNA methylation patterns relies increasingly onsequencing-based profiling methods. The four most frequentlyused sequencing-based technologies are the bisulfite-basedmethods MethylC-seq and reduced representation bisulfitesequencing (RRBS), and the enrichment-based techniquesmethylated DNA immunoprecipitation sequencing (MeDIP-seq)and methylated DNA binding domain sequencing (MBD-seq).We applied all four methods to biological replicates of humanembryonic stem cells to assess their genome-wide CpG coverage,resolution, cost, concordance and the influence of CpG densityand genomic context. The methylation levels assessed by thetwo bisulfite methods were concordant (their difference didnot exceed a given threshold) for 82% for CpGs and 99% ofthe non-CpG cytosines. Using binary methylation calls, the twoenrichment methods were 99% concordant and regions assessedby all four methods were 97% concordant. We combined MeDIPseqwith methylation-sensitive restriction enzyme (MRE-seq)sequencing for comprehensive methylome coverage at lowercost. This, along with RNA-seq and ChIP-seq of the ES cellsenabled us to detect regions with allele-specific epigeneticstates, identifying most known imprinted regions and new lociwith monoallelic epigenetic marks and monoallelic expression.DNA methylation plays a vital role in embryonic development, maintenanceof pluripotency, X-chromosome inactivation and genomicimprinting through regulation of transcription, chromatin structureand chromosome stability 1 . It occurs at the C5 position of cytosineswithin CpG dinucleotides 2–4 and at non-CpG cytosines in plants andembryonic stem cells (ESCs) in mammals. 5-Hydroxymethylationof cytosine also occurs in certain human and mouse cells 5,6 and iscatalyzed by Tet proteins acting on methylated cytosine 7 . Several* A full list of author affiliations appears at the end of the paper.Published online 19 September 2010; doi:10.1038/nbt.1682experimental methods detect methylation but not hydroxymethylation,whereas others detect both but cannot distinguish them.Understanding the role of DNA methylation in development anddisease requires knowledge of the distribution of these modificationsin the genome. The availability of reference genome assemblies andmassively parallel sequencing has led to methods that provide highresolution,genome-wide profiles of 5-methylcytosine 8–16 . In contrastto arrays, sequencing-based methods can interrogate DNA methylationin repetitive sequences and more readily allow epigenetic statesto be assigned to specific alleles. The unique characteristics of eachmethod leave uncertainty about how to select the method best suitedto answer particular biological questions. DNA methylation maps arebeing produced by many laboratories worldwide, and their integrationforms a basis for emerging international epigenome projects 17 .Thus, it is critical to determine the precision of each method, and howreliably they can be compared.Here, we provide a detailed and quantitative comparison of foursequencing-based methods for genome-wide DNA methylationprofiling. We focused on two methods that use bisulfite conversion(MethylC-seq 8 and RRBS 9 ), and two methods that use enrichment ofmethylated DNA (MeDIP-seq 10,11 and MBD-seq 12 ). We also developedan integrative methodology combining MeDIP-seq to detectmethylated CpGs with MRE-seq 13,14 to detect unmethylated CpGs.Unlike the enrichment methods alone, the integrative method canaccurately identify regions of intermediate methylation which—inconjunction with single nucleotide polymorphism (SNP) profilingfrom the sequencing data—permits genome-wide identification ofallele-specific epigenetic states.RESULTSGeneration of DNA methylation profiles from human ESCsFour individual sequencing-based methods and one integrativemethod were used to generate and compare DNA methylation profilesof three biological replicates of H1 ESCs. MethylC-seq (data usedhere is from ref. 8) involves shotgun sequencing of DNA treated withnature biotechnology VOLUME 28 NUMBER 10 OCTOBER 2010 1097


A n a ly s i saPercent CpGs covered genome-wide100 MethyC-Seq no.390RRBS no.3MeDIP-seq no.280MBD-seq no.270MRE-seq no.260504030201001 3 5 7 9 11 13 15 17 19 21 23 25 27 29Read coverage threshold for CpGsPercent CpGs covered in CpG islands100 MethyC-Seq no.390RRBS no.3MeDIP-seq no.280MBD-seq no.270MRE-seq no.260504030201001 3 5 7 9 11 13 15 17 19 21 23 25 27 29Read coverage threshold for CpGsFigure 1 CpG coverage by each method. (a,b) The percentage of CpGs covered genome-wide (a) or in CpG islands(b) are plotted as a function of read-coverage threshold. (c) The percentage of genome-wide CpGs (28,163,863)covered by multiple, single or no methods are shown.bcGenome-wideMethod(s) CpGs coveredby method(s)Coverage by 4 methodsMethylC, RRBS, MeDIP, MBD 6.32%Coverage by 3 methodsMethylC, RRBS, MeDIP 0.81%MethylC, RRBS, MBD 1.46%MethylC, MeDIP, MBD 39.09%RRBS, MeDIP, MBD 0.31%Coverage by 2 methodsMethylC, RRBS2.30%MethylC, MeDIP19.95%MethylC, MBD10.27%RRBS, MeDIP0.03%RRBS, MBD0.68%MeDIP, MBD0.61%Coverage by 1 methodMethylCRRBSMeDIPMBDNoneNo coverage14.73%0.37%0.09%1.77%1.21%© 2010 Nature America, Inc. All rights reserved.bisulfite, a chemical that converts unmethylated cytosines but notmethylated cytosines to uracil. The second bisulfite-based method,RRBS 9 , reduces the portion of the genome analyzed through MspIdigestion and fragment size selection. MeDIP-seq 10,11 and MBD-seq 12involve enrichment of methylated regions followed by sequencing.In MeDIP-seq, an anti-methylcytosine antibody is used to immunoprecipitatemethylated single-stranded DNA fragments. MBD-sequses the MBD2 protein methyl-CpG binding domain to enrich formethylated double-stranded DNA fragments. As a complementaryapproach for use in conjunction with methylated fragment enrichmentmethods, unmethylated CpGs are identified by sequencingsize-selected fragments from parallel DNA digestions with themethyl-sensitive restriction enzymes (MREs) HpaII (C^CGG), Hin6I(G^CGC) and AciI (C^CGC)(MRE-seq) 13 .To reliably identify biological variation in methylation amongsamples from different individuals or biological states, one mustdetermine the variation attributable to biological and technical replication.As an initial assessment of DNA methylation concordanceamong three H1 ESC biological replicates, the methylation status of27,578 CpGs was assayed on the widely used bisulfite-based Infiniumbead-array. The Infinium method involves bisulfite conversion andhybridization, rather than sequencing. The beta values, roughly representingCpG methylation levels, in the replicates were comparedby calculating concordance correlation coefficients. The coefficientswere very high, ranging from 0.992 to 0.996 (Supplementary Fig. 1).Replicate no. 1 and replicate no. 2 were run a second time on theInfinium platform to assess technical variation (data not shown).Most (98.9%) of the total variation (technical and biological) wastechnical. Thus, platform comparisons using these replicates shouldbe very informative.As a second and more comprehensive analysis of variation inmethylation calls, RRBS, covering ~1.6 million CpGs, MeDIP-seqand MRE-seq was performed on all three biological replicates. Thecorrelation between the biological replicates was high for RRBS(Supplementary Fig. 2) as it was for MeDIP-seq and MRE-seq(Supplementary Fig. 3). These results show that cell passage–related‘biological variation’ in methylation is present but minimal on thescale of the genome. The rare biological variation in methylation levelswas confirmed by pyrosequencing of selected loci (SupplementaryFig. 4 and Supplementary Table 1).Several algorithms are available for bisulfite-treated short-readmapping, differences in which might alter local read density ina map, and ultimately affect methylation calls. Our assessment ofoverall concordance between aligners, including Bowtie 18 , BSMAP 19 ,Pash 20 , RMAP 21 and ZOOM 22 applied to a subset of the MethylCseqdata 9 , indicated that, despite differences in speed and accuracy,aligner choice was unlikely to have a significant impact on the platformcomparisons (Supplementary Table 2).There are several important parameters in choosing an appropriatemethod for particular experimental goals, including the totalnumber and local context of CpGs interrogated and the amount ofsequencing required. To determine the impact of sequencing depthon coverage, we plotted CpG coverage genome-wide (Fig. 1a) andin CpG islands (Fig. 1b) as a function of read coverage threshold forCpGs. For MeDIP-seq and MBD-seq, the CpG coverage does notinclude CpGs for which a lack of methylation could be inferred fromlack of reads (Fig. 1a,b). Thus, because CpG islands are predominantlyunmethylated, the CpG coverage in CpG islands is lower forthe enrichment methods than for either RRBS or MethylC-seq. As anindicator of the cost efficiency for each method, we also plotted theCpG coverage normalized to a single giga base pair (Gbp) of sequencedepth in the methylome maps (Supplementary Fig. 5). Enrichmentmethods had the lowest cost per CpG covered genome-wide, whereasRRBS had the lowest cost per CpG covered in CpG islands. Forthe enrichment methods we examined the potential effect of CpGdensity on read coverage. Most of the genome is methylated andCpG poor, but a small fraction is unmethylated and CpG rich(that is, CpG island). Consistent with this, MeDIP-seq and MBD-seqenrich primarily for low CpG density regions, along with a smallsubset of methylated CpG islands. In contrast, MRE-seq interrogateshigher CpG density regions because they have an abundanceof unmethylated recognition sites for these enzymes. Therefore,the coverage of MRE-seq and enrichment methods is notablycomplementary (Supplementary Fig. 6).A major advantage of the sequencing-based methods over microarraysis their ability to interrogate CpGs in repetitive elements.Approximately 45% of the human genome is derived from transposableelements, a major driving force in the evolution of mammaliangene regulation 23,24 , with nearly half of all CpGs falling within theserepetitive regions. The extent to which different sequencing-basedmethods interrogate repeats is therefore of considerable interest.In general, genome-wide CpG coverage (Fig. 1a) was proportionalto CpG coverage in repeats (Table 1). The percent of interrogatedCpGs in repeats was similar across all four methods, with MBD-seqcapturing the highest fraction of repeat sequences (59.1%). Each ofthese methods is therefore useful for investigating this important andlargely unexplored area. MRE-seq, however, only minimally interrogatesrepeats, consistent with their dense methylation.1098 VOLUME 28 NUMBER 10 OCTOBER 2010 nature biotechnology


a n a ly s i sTable 1 Critical parameters in sequencing-based DNA methylation profilingMethodH1 DNAsample no.Total basesgenerated (Gbp)Total high qualitybases (Gbp)Total bases inmap (Gbp)Maximumresolution (bp)1-read coverage of CpGsin repeats (no.,%)Percentage of assayedCpGs in repeats (%)MethylC-seq no. 3 172.49 115 87.5 1 13,303,415 (91.8) 49.7RRBS no. 3 1.58 1.43 1.28 1 1,646,649 (11.4) 47.5MeDIP-seq no. 1 3.42 2.07 1.95 150 10,004,670 (68.3) 52.9MeDIP-seq no. 2 3.02 1.84 1.73 150 10,101,868 (68.9) 53.2MeDIP-seq nos.1 + 2 6.44 3.91 3.68 150 11,693,059 (79.8) 53.5MBD-seq no. 2 5.67 3.71 2.21 150 10,080,007 (68.8) 59.1MRE-seq no. 1 3.61 1.31 0.96 1 306,635 (2.07) 21.7MRE-seq no. 2 4.03 1.69 1.3 1 232,885 (1.59) 18.6Sequencing statistics and CpG coverage are shown for MethylC-seq (207 lanes, data analyzed here were from ref. 8.), RRBS (2 lanes), MeDIP-seq (4 lanes each), MBD-seq(3 lanes) and MRE-seq (3 lanes each). As the amount of sequence produced per lane is increasing, we also provide “Gbp of sequence” as a measure of the relative cost of eachmethod. The methods differ significantly in total bases generated by the Illumina sequencer, total high-quality bases passing Illumina chastity filtering and mapping uniquelyand total bases used for generating methylome maps (high-quality bases passing redundancy filters). The H1 replicates assayed and the Gbp of sequence at successive processingstages by each method are shown. The bisulfite-based methods and MRE-seq resolve the methylation status of individual cytosines, whereas the MeDIP-seq and MBD-seqread mappings are extended to 150 bp, resulting in a maximum resolution of 150 bp. This extension is applied to calculations of CpG coverage but is not applied to the Gbp ofsequence at the processing stages. Coverage information is shown for repeats (primarily transposon sequences) genome-wide. Although maximum resolution of each method isreported, resolution can be assessed at various levels. As the level of resolution decreases, as a consequence of averaging of methylation scores over a window of larger size, forexample, imperfect coverage and limited accuracy become less limiting, provided that the average score is not affected by systematic biases in coverage and accuracy. Thus,methylome coverage and accuracy in methylation calls are a function of resolution.© 2010 Nature America, Inc. All rights reserved.Only CpGs interrogated in common can be compared directly. Theintersections of CpGs covered by the four methods were thereforedetermined (Fig. 1c). Overall, at the sequencing depth investigated,MethylC-seq provided the highest CpG coverage at 95% followed byMeDIP-seq at 67% and MBD-seq at 61%. RRBS covered the fewestCpGs genome-wide (12%), which drove the overlap of all methods to6% of genome-wide CpGs.For any given method, how deeply to sequence the library is an openquestion. As the sequencing depth increases, the number of unique readscovering a particular region approaches the total possible reads presentin the library for each enriched region. This saturation occurs when furthersequencing fails to discover additional regions above background.To understand the extent to which we sampled the regions representedin the RRBS, MeDIP-seq and MBD-seq libraries, saturation analysis wasperformed. RRBS approaches but does not reach saturation at the currentsequencing depth (Supplementary Fig. 7a). For MeDIP-seq and MBDseq,saturation was observed when false-discovery rate thresholds wereapplied, but not when unthresholded data were plotted (SupplementaryFig. 7b,c). Saturation was not observed for MRE-seq (SupplementaryFig. 7d,e), although the average restriction site was represented 13 timeswithin each library, indicating that additional reads would mostly resamplerestriction sites already interrogated. Sequencing beyond saturationimproves confidence in the observations and increases the CpGcoverage, though at greater cost per CpG covered. Thus, sequencingbelow or up to saturation maximizes the number of samples that canbe analyzed, whereas sequencing beyond saturation maximizes CpGcoverage and improves confidence in methylation calls.Comparison of bisulfite-based methodsSeveral observations from the CpG coverage analysis of MethylC-seqand RRBS are important to consider before assessing their concordancein methylation calls. First, RRBS provides substantial coverage ofCpGs in CpG islands, but low CpG coverage genome-wide (Fig. 1a,b).In contrast, MethylC-seq offers greater CpG coverage genome-wide.When coverage is normalized to 1 Gbp of sequence in the methylomemap, RRBS shows higher coverage of CpGs in CpG islands at all readdepths (Supplementary Fig. 5). This difference points to RRBS asthe method of choice if CpG islands are the main focus of a study.However, at lower read thresholds, MethylC-seq sampled far moreCpGs in CpG islands than RRBS (Fig. 1b).A major advantage of bisulfite-based methods is that they allowquantitative comparisons of methylation levels at single-baseresolution. For MethylC-seq and RRBS, we calculated and comparedthe proportion of methylated reads at individual CpGs genomewide.High concordance was observed using a simple method thatmakes methylation status calls at different minimum read depthsand allows multiple methylation value cutoffs to be examined(Fig. 2a). The difference in methylation proportions betweenMethylC-seq and RRBS at a minimum read depth of 5 was calculatedfor individual CpGs and concordance was declared if the differencedid not exceed a given threshold (Fig. 2b). Of the CpGs comparedbetween MethylC-seq and RRBS just 12.75% displayed identicalmethylation level or a difference threshold of zero. If the differencethreshold is relaxed to 0.1 or 0.25, the concordance increased to53.85% or 81.82%, respectively. This analysis was also performedat minimum read depths of 2 and 10 (Supplementary Fig. 8a,b),which, for the 0.25 threshold, showed concordance of 80.28% and83.89%, respectively, demonstrating that read depth has only amodest effect on concordance. We also performed this analysis forMethylC-seq on replicate no. 3 compared to RRBS on replicate nos. 1and 2, which showed a similar concordance (79.64% for nos. 3 and 1;82.95% for nos. 3 and 2) (Supplementary Fig. 8c–f). The concordancebetween MethylC-seq and RRBS both on replicate no. 3,(81.82%), falls between the concordances for different replicates.RRBS on replicate nos. 1 and 2 was also compared (SupplementaryFig. 8g,h) and showed a higher concordance (91.54%) than any ofthe comparisons between MethylC-seq and RRBS, consistent withtheir high correlation coefficient (Supplementary Fig. 2). The RRBSand MethylC-seq discordant calls were not attributable to the localCpG density or genomic context of the individual CpGs (Fig. 2c,d).Taken together, these analyses suggest that differences between replicatesare attributable to technical or stochastic factors as well asmodest biological variation.Given the notable presence of non-CpG cytosine methylation inH1 ESCs 8 , we also examined concordance between MethylC-seq andRRBS at CHH and CHG cytosines. Because CHH sites are asymmetricwith respect to strand and 98% of CHG sites are hemi-methylated 8 ,reads mapping to each strand were considered separately. When non-CpG cytosines were considered, either with (Supplementary Fig. 9)or without the zero (lack of methylation) methylation percentage(Supplementary Fig. 10), concordance was higher than concordanceat CpGs. However, a lower degree of variation at non-CpG sites isexpected because of the relatively narrow range of methylation levelsfor non-CpG sites.nature biotechnology VOLUME 28 NUMBER 10 OCTOBER 2010 1099


A n a ly s i s© 2010 Nature America, Inc. All rights reserved.abCount of CpGs (10 3 ) with differencecdMinimumread depth2510CpGscovered2,542,7631,681,719913,2303.57%PercentgenomewideCpGs9.035.973.240.80–0.20Methylation cutoff% concordant68.3567.4067.790.75–0.25Methylation cutoff% concordant72.8672.2873.2081.82% 14.61%0.20Methylation cutoff% concordant94.1496.1597.1322013.81% 53.85%32.34%200180160140120100806040200–1.00 –0.75 –0.50 –0.25 0 0.25 0.50 0.75 1.00Difference between MethylC-seq - RRBS CpG methylation proportions1,681,719 CpGs (minimum five reads)Discordant CpGConcordant CpGDiscordant CpGdensity5.89%density14.74%density5.93%High (>7%)14.61%12.74%Medium (5–7%)15.67%Low (90% at all read depths examinedand improved with increasing minimum read depths (Fig. 3a). We confirmedthe concordance between MeDIP-seq and MBD-seq at selectedloci by bisulfite treatment of the DNA, PCR, cloning and sequencing(Supplementary Fig. 11 and Supplementary Table 3). The substantiallyhigher concordance relative to the bisulfite-based methods is inpart related to the inference common to both enrichment methods thatneighboring CpGs within a given window have similar methylationlevels and to the binary rather than quantitative methylation calls.When applied in the context of the enrichment methods, the minimumread depths limit the analysis to regions with at least a minimalmethylation level. At sufficiently high sequencing depth, however,greater confidence can be placed in the lack of methylation inferredFigure 2 Comparison of bisulfite-based methods. (a) Calls of highly/partially/weakly methylated (0.80–0.20 or 0.75–0.25 cutoff) or highly/weakly methylated (0.20 cutoff) were made for CpGs covered at severalminimum read depths by MethylC-seq and by RRBS (both on replicateno. 3). The number and percent of genome-wide CpGs covered and thepercent of concordant calls are shown for each minimum read depthand methylation call cutoff. (b) Differences (MethylC-seq - RRBS)in methylated proportions (methylated reads/(methylated reads +unmethylated reads)) for CpGs with a minimum coverage of five readsby both methods. Percentages of concordant and discordant methylationwere determined at cutoffs of ±0.1 (green dashed lines) and ±0.25 (reddashed lines). (c,d) CpG density in a 400-bp window (c) and genomiccontext of concordant and discordant CpGs at the 0.25 cutoff (d).from lack of reads. However, at lower sequencing depth, lack of methylationcannot be distinguished from lack of coverage due to the stochasticnature of read coverage. This is an important difference fromthe bisulfite-based methods, which can identify unmethylated regionsat a sequencing depth well below saturation.The 1,000-bp windows covered at a minimum read depth of 5,representing 99.8% concordance, were examined for potential biasesrelated to CpG density (Fig. 3b) and genomic context (Fig. 3c) onconcordance between MeDIP-seq and MBD-seq calls. Concordantand discordant calls were similar in their genomic context, butdiscordant calls were shifted toward regions of lower CpG densitycompared to concordant calls. Thus, although these two methodsdiffer in the extent of CpG coverage and read depth at sites covered(Fig. 1a), in windows with even minimal coverage by both methods,the concordance is exceptionally high. To further examine the accuracyof the calls, we compared the methylation calls from MeDIP-seqto those from MethylC-seq. For regions with methylation detectableby MethylC-seq, MeDIP-seq and MBD-seq, calls of highly methylatedwere made in nearly every case (Fig. 3d).To examine the reliability of an enrichment-based method specificallyfor inferring weakly methylated regions at different CpG densities,we compared MeDIP-seq to MethylC-seq (SupplementaryFig. 4). These analyses and limited validation by pyrosequencingsuggest that MeDIP-seq allows accurate inferences of lack of methylationand/or weak methylation in regions of high and medium CpGdensity, whereas accuracy is moderately reduced in regions of lowCpG density. Thus, increasing the sequencing depth of MeDIP-seq orusing a complementary methodology targeting unmethylated CpGsmay be useful.Although MeDIP-seq and MBD-seq methylation calls are highlyconcordant in sequences represented in both data sets, interestingdifferences exist between the regions each interrogates, and thesensitivity of each method to detect non-CpG methylation. First,the rate of enrichment differs slightly with respect to local CpGdensity, with MeDIP-seq enriching more at regions with relativelylow CpG density and MBD-seq enriching more at regions withslightly higher CpG density (Supplementary Fig. 6), which is alsoreflected in their moderate (46.33%) overlap in CpG coverage.This substantial amount of non-overlap suggests that methylatedfragments with low CpG density may bind more efficiently to the5-methylcytosine antibody, or alternatively, these fragments may beselectively eliminated during enrichment in MBD-seq, depending onthe salt concentration used to elute the DNA.Second, the ability of MeDIP-seq or MBD-seq to detect non-CpG methylation could be particularly important for evaluating themethylome of ESCs, which contains abundant non-CpG methylation9 . To address this, we examined read densities in gene bodieswith similar CpG methylation levels but different CHG methylation1100 VOLUME 28 NUMBER 10 OCTOBER 2010 nature biotechnology


a n a ly s i s© 2010 Nature America, Inc. All rights reserved.abcdMinimumreaddepth2510Number of concordant windowsNumber of windowswith highly methylated callsNumber ofwindows1,189,545446,096162,661250,000200,000150,000100,00050,0000MeDIP-seq highlymethylated windows200,000180,000160,000140,000120,000100,00080,00060,00040,00020,0001,000-bp windows% genomewidePercent Number of200-bp windows% genome-CpGs concordant windows wide CpGs61.82 98.80 2,136,710 37.9632.65 99.80 753,329 17.7215.07 100.00 273,767 7.74MBD-seq highlymethylated windowsConcordant windowsDiscordant windows0 1 2 3 4 5 6 7 8 9 10 11 12CpG density (%)Concordant genomic context8.56%2.79%Promoter 40.17%1.75%Coding ExonUTRIntronIntergenic3,5003,0002,5002,0001,5001,00050006 7 8 9 10 11 12MeDIP-seq % highlymethylated calls0 97.01–55–1010–1515–2020–2525–3030–3535–4040–4545–50MethylC-seq methylation proportion sumsPercentconcordant92.4199.0199.97levels as measured by MethylC-seq. MeDIP-seq signal increasedwith increasing non-CpG cytosine methylation, whereas MBD-seqdid not (Supplementary Fig. 12), suggesting a differential sensitivityin these two enrichment methods. However, the power todistinguish CpG methylation signal from CHG methylation signalis low, because non-CpG cytosine methylation is often embeddedwithin regions with high CpG methylation. As a negative control,regions in the genome that contain no CpGs were examined.MeDIP-seq and MBD-seq had only background level reads, consistentwith the non-CpG cytosines being unmethylated in theseregions (Supplementary Fig. 13).Comparison of all methodsTo examine concordance of CpG methylation calls from the twobisulfite-based methods and the two methylation enrichment–basedmethods, a four-way comparison was performed. This can be viewedas combining the two previous pair-wise comparisons, but with threedifferences. First, to make the bisulfite-based methods comparableto the highly/weakly methylated categorization of MeDIP-seq/MBDseqscores, a binary calling scheme was applied with highly methylateddefined as >0.20 methylation and weakly methylated definedas ≤0.20 methylation. When this calling scheme for individual CpGswas applied to bisulfite-based data alone, the concordance betweenmethods was 94.14% for two reads, 96.15% for five reads and 97.13%for ten reads. Second, to perform the comparison at the same levelof resolution, the methylation proportions for individual CpGsin MethylC-seq and RRBS were averaged across windows. Third,to compare the bisulfite-based methods to the enrichment-based1614121086420600500400300200100Discordant genomic context7.70%3.01%43.53%1.90%099.599.098.598.097.5Number of discordant windows46.73% 43.86%MBD-seq % highlymethylated calls100.0Percent of windowswith highly methylated callsFigure 3 Comparison of methylated DNA enrichment methods. (a) Callsof highly/weakly methylated were made by averaging methylation scoresfor CpGs covered at varying minimum read depths by MeDIP-seq or MBDseqin 1,000- and 200-bp windows. The number of windows, percent ofgenome-wide CpGs covered and the percent of concordant calls are shownfor each minimum read depth and window size. (b,c) For the 1,000-bpwindows with a minimum read depth of 5, the CpG density (b) andgenomic context (c) of the concordant and discordant windows are shown.The inset in b shows a close-up of the concordance/discordance of CpGdensities consistent with CpG islands. (d) For the 1,000-bp windows witha minimum read depth of 5, MethylC-seq methylation proportions for CpGsand non-CpG cytosines covered at a minimum read depth of 5, 444,590windows, were summed and the windows were binned by the sum. For eachof these bins, the number of windows called highly methylated by MeDIPseqor MBD-seq is shown on the left y axis and the percent of total windowswith calls of highly methylated is shown on the right y axis. Windows witha MethylC-seq methylation proportion sum >15, representing 83% of allwindows, were called highly methylated by MeDIP-seq and MBD-seq in99.9% of cases. The windows with a methylation proportion sum of 1–15,representing 17% of all windows, were called highly methylated byMeDIP-seq and MBD-seq in at least 99.1% of cases.methods without inferring an unmethylated state from completeabsence of reads in enrichment methods, the comparison excludedregions lacking reads.Methylation calls were made for 1,000-bp windows where all of themethods had at least one CpG covered by a minimum of five or ten reads,allowing for comparison of 199,438 or 87,363 windows, respectively. Ofall the windows covered by a minimum of five reads, 2.45% completelyencompassed CpG islands and 5.5% overlapped with CpG islands. Thefour-way comparison revealed a high degree of concordance of methylationcalls among all methods (Fig. 4a,b and Supplementary Table 4).To investigate the effect of applying different highly/weakly methylatedcutoffs to MethylC-seq and RRBS, we performed the four-way comparisonat several cutoffs (Supplementary Fig. 14). Concordance remained>90% up to a highly/weakly methylated cutoff of 0.55, suggesting theconcordance results we report are applicable to a wide range of methylationcall cutoffs. This result is congruent with the known partitioning ofthe genome into methylated and unmethylated zones.As the limited coverage by RRBS constrained the number of windowsthat could be compared, a three-way comparison excludingRRBS was also performed. This allowed for the comparison of 444,4941,000-bp windows or 32% of CpGs genome-wide compared to 18% inthe four-way comparison, which showed a three-way concordance of99.69%. Using different minimum read depth and window sizes hadlittle effect on concordance (Supplementary Table 5a,b).To further evaluate the performance of the four methods, we comparedthem individually to the widely used Infinium bead-array. Forthe bisulfite-based methods, the differences in methylation for individualCpGs compared to beta values from the array assaying replicateno. 3 were calculated. At a difference threshold of 0.25, high concordancewas observed between the array and MethylC-seq (96.41%;20,885 CpGs) and between the array and RRBS (97.31%; 5,475 CpGs)(Supplementary Fig. 15). For the enrichment-based methods, theaverage methylation score was calculated for CpGs covered by a minimumof five reads in 200-bp windows centered on CpGs assayedby the array and used to make the binary methylation call. For thearray assaying replicate no. 2, highly methylated was defined as >0.20beta value and weakly methylated defined as ≤0.20 beta value. BothMeDIP-seq (96.19%; 4,960 windows) and MBD-seq (90.80%; 4,163windows) calls showed high concordance with the array. This highdegree of agreement between very different methods further supportsthe validity of comparing methylation profiles across platforms.nature biotechnology VOLUME 28 NUMBER 10 OCTOBER 2010 1101


A n a ly s i s© 2010 Nature America, Inc. All rights reserved.abMinimum read depth of 5 Minimum read depth of 10199,438 windows87,363 windows(18.01% of genome-wide CpGs) (9.39% of genome-wide CpGs)Methods(MethylC, RRBS, MeDIP, MBD)Percent windows97.64Percent windows98.30(MethylC, RRBS, MeDIP)(MBD) 0.070(MethylC, RRBS, MBD)(MeDIP) 0.070(MethylC, MeDIP, MBD)(RRBS) 1.981.60(RRBS, MeDIP, MBD)(MethylC) 0.030.02(MethylC, RRBS)(MeDIP, MBD) 0.200.07(MethylC, MeDIP)(RRBS, MBD) 0.010(MethylC, MBD)(RRBS, MeDIP)00MethylC-seqRRBSMeDIP-seq 1MeDIP-seq 2MBD-seqPCDHA1PCDHA2PCDHA3PCDHA4PCDHA5PCDHA6PCDHA7PCDHA8PCDHA9PCDHA10PCDHA11PCDHA12PCDHA1350 kbIntegrative methodTo increase DNA methylome coverage while maintaining modestsequencing requirements, MeDIP-seq was integrated with MREseq13 . The integration is advantageous because the two methods arelargely non-overlapping in the regions they interrogate, and becauseit allows intermediate methylation states to be identified, which isless reliably using MeDIP-seq alone. The methylation scores fromMRE-seq were inversely correlated with MeDIP-seq scores (Fig. 5a).The two methods combined assessed the DNA methylation status at22 million CpGs, 78% of genome-wide CpGs (Fig. 5b). In regionswhere MRE-seq scores were high and MeDIP-seq scores were low, theMRE-seq reads corroborate the lack of methylation inferred from theabsence of MeDIP-seq reads.Interestingly, there are a small but significant number of CpG islandswith overlapping MeDIP-seq and MRE-seq signals (SupplementaryTable 6), indicating an intermediate methylation level. We testedtwo regions from one locus, ZNF331, by clonal bisulfite sequencing(Fig. 5c,d and Supplementary Table 7). Region 1 of ZNF331 showedoverlap of signals from MeDIP-seq and MRE-seq, with bisulfitesequencing confirming intermediate and potentially monoallelicmethylation. In contrast, region 2 exhibited MeDIP-seq signal only,and bisulfite sequencing confirmed nearly complete methylation.ZNF331 exhibits paternal monoallelic expression in multigenerationalCEPH pedigrees consistent with imprinting 25,26 . In addition, allelicFigure 5 Integrative method increases methylome coverage and enablesidentification of a DMR. (a) MRE-seq involves parallel digests withmethylation-sensitive restriction enzymes (HpaII, AciI and Hin6I),selection of cut fragments of ~50–300 bp, pooling the digests,library construction and sequencing. For every 600-bp window alongchromosome 21, MeDIP-seq scores were plotted against MRE-seq scores.The plot depicts the inverse relationship between MRE-seq and MeDIPseqsignals. (b) Coverage of CpGs in the human genome by MeDIP-seqalone (red), MRE-seq alone (green), both (yellow) or neither method(no fill). Sequence from replicate nos. 1 and 2 were used in thesecalculations. (c) UCSC Genome Browser view of ZNF331 in H1 ESC,showing overlap of MeDIP-seq, MRE-seq and H3K4me3 (from ChIP-seq)signals at bisulfite region 1 and only MeDIP-seq signal at bisulfite region 2.(d) Clonal bisulfite sequencing results for specified regions in ESC fromreplicate no. 1. A filled circle represents a methylated CpG and an opencircle indicates an unmethylated CpG.Figure 4 Comparison of all methods. (a) The table shows thepercentage of 1,000-bp windows with concordant and discordantMethylC-seq (replicate no. 3), RRBS (replicate no. 3), MeDIP-seq(replicate no. 2) and MBD-seq (replicate no. 2) calls at minimumread depths of 5 and 10. Methods making the same call are groupedtogether in parentheses. Calls were made for MethylC-seq andRRBS by averaging the methylation proportion of CpGs withinthe window that were covered at the minimum read depth andapplying a highly/weakly methylated cutoff of 0.2. Calls were madefor MeDIP-seq and MBD-seq by averaging the methylation score ofCpGs within the window that were covered at the minimum readdepth. (b) Genome browser view of the 100-kb CpG rich Protocadherinalpha cluster (PCDHA), exemplifying the significant concordance inmethylation status seen on a genome-wide level. For MethylC-seqand RRBS, the y axis displays methylation scores of individual CpGs.Scores range between −500 (unmethylated) and 500 (methylated)and the zero line is equivalent to 50% methylated. Negative scoresare displayed as green bars and positive scores are displayed asorange bars. For MeDIP-seq (1), MeDIP-seq (2) and MBD-seq, they axis indicates extended read density. Browsable genome-wideviews of these data sets are available at http://www.genboree.org/ andhttp://genome.ucsc.edu/.skewing of DNA methylation at ZNF331 was reported using SNParrays 27 , further supporting a provisional status of ZNF331 as a novelimprinted gene. Histone H3 lysine 4 trimethylation (H3K4me3),a mark enriched at promoters, overlapped with region 1 but notregion 2 (Fig. 5c). A third CpG island at the 5′ end of ZNF331 wasfully unmethylated and had an even stronger H3K4me3 peak. Thus,our integrative approach identified a differentially methylated region(DMR) in ZNF331 that may be a DNA methylation–regulated promoterfor one of the ZNF331 transcripts.The analysis of ZNF331 suggested the possibility of using MeDIPseqand MRE-seq to generate a list of candidate DMRs genome-wide(Supplementary Tables 6 and 7). Ultimately this could define allregions with an intermediate methylation level, encompassing DMRsof all imprinted genes in the genome, or the imprintome, and sitesof non-imprinted monoallelic epigenetic regulation. Consistently,our candidate list includes 16 of 19 previously identified DMRs ofacdH1 ES MRE CpG score5432100Scalechr19CpG IslandsH3K4me3MRE-seqMeDIP-seqBisulfite regionZNF33120 40 60 80H1 ES MeDIP CpG score20 kb1100bMeDIP-seq only (20.65M)MRE-seq only (0.71M)Both (1.04M)None (5.6M)Bisulfite region 1 Bisulfite region 210 10 2021102 VOLUME 28 NUMBER 10 OCTOBER 2010 nature biotechnology


a n a ly s i saExpression210 14bScalechr7CpG IslandsH1ES no.1 MRE-seqH1ES no.1 MeDIP-seq10 kbBisulfite region 1 Bisulfite region 210 2010ES no.138 21 34DNA methylationHistone methylationH1ES no.2 MRE-seqH1ES no.2 MeDIP-seqBisulfite regionGRB1012ES no.2© 2010 Nature America, Inc. All rights reserved.Figure 6 Allelic DNA methylation, histone methylationand gene expression in ESCs. (a) Venn diagramsummarizing the number of loci exhibiting monoallelicDNA methylation, histone methylation or monoallelicexpression and their overlap. The top 1,000 loci (averagesize of 2.9 kb and encompassing a CpG island) withpotential allelic DNA methylation were further evaluated,using the following assays: MRE-Seq and MeDIP-Seqfor allelic DNA methylation within the loci, MethylCseqand expression data for monoallelic expression ofgenes associated (±50 kb) with the loci, MethylC-seqand histone modifications H3K4me3 and H3K9me3 formonoallelic histone methylation within 1 kb from theloci. (b,c) Validation of known and novel DMRs identifiedfrom MeDIP-seq and MRE-seq. DMRs are presented ina UCSC Genome Browser window with MeDIP-seq andMRE-seq signals in human H1 ESC, along with bisulfitesequencing results. The results from the biologicalreplicates (nos. 1 and 2) were very similar. (b) Imprintedgene GRB10 including a known DMR (Bisulfite region 1)and an upstream unmethylated CpG island (BisulfiteScalechr15CpG IslandsH1ES no.1 MRE-seqH1ES no.1 MeDIP-seqH1ES no.2 MRE-seqH1ES no.2 MeDIP-seqES no.1ES no.2Bisulfite regionPOTEBimprinted genes, including BLCAP, GRB10, H19, INPP5F, KCNQ1,MEST, SGCE, SNRPN, ZIM2, GNAS, GNASAS, DIRAS3, DLK1,NDN, PLAGL1 and TP73. Two of the known DMRs, in PEG3 andMEG3, appeared mostly methylated, potentially representing loss ofimprint marks 28 . One of the 19 known DMRs (for NAP1L5) is notwithin a CpG island but did in fact exhibit intermediate methylation(Supplementary Fig. 16). Thus, extension of this analysis to includeCpG-rich regions that are not strictly CpG islands will be useful. Thedata indicate intermediate DNA methylation states that characterizeDMRs within known imprinted regions and others are readily identifiableusing an integrative approach.Monoallelic methylation and gene expressionSequencing-based methods present a unique opportunity to assignepigenetic marks and gene transcripts to specific alleles. We exploredthis possibility in the ESCs by identifying SNPs within sequencereads, focusing on the top 1,000 CpG island loci with extensiveoverlap between MRE-seq and MeDIP-seq signals (Fig. 6a andSupplementary Tables 8 and 9). Of the 1,000 loci examined, 203contained an informative SNP and 63 of these exhibited monoallelicDNA methylation (Fig. 6a). The remaining 140 of the 203loci with an informative SNP represent intermediate methylationstates that may reflect heterogeneity in methylation across the cellpopulation. In total, 119 of the 1,000 loci exhibited evidence of monoallelicepigenetic modification and/or expression. Four DMRs wereidentified that were monoallelic in DNA methylation and histonemethylation and were associated with a gene exhibiting monoallelicc10 kbBisulfite region 1 Bisulfite region 210 20 30 40 10 20 30region 2). (c) Novel DMR upstream of POTEB, whichexhibits allele-specific DNA methylation. Open circle indicates an unmethylated CpG site. Filled circle represents a methylated CpG site. ‘x’ indicatesabsence of a CpG site due to a heterozygous SNP, which destroyed the 28 th CpG. All clones without the CpG were unmethylated, whereas all the clonescontaining the CpG were methylated. Furthermore, the alleles could be distinguished in the sequence reads from MeDIP-seq (G allele, 9 of 9 reads) andMRE-seq (A allele, 30 of 30 reads).expression (Supplementary Fig. 17). Strong corroborating evidencefor monoallelic DNA methylation was obtained from similar analysesof the MethylC-seq data (Supplementary Fig. 18). These results demonstratethe excellent capabilities of sequencing-based epigenomicand transcriptome assays for identifying genes exhibiting monoallelicepigenetic marks and monoallelic expression.To further assess the accuracy of methylation status predictions,eight regions (total of 17 nonoverlapping PCR products), whichexhibited apparent monoallelic methylation from the MeDIP-seqand MRE-seq SNP analyses (Fig. 6a and Supplementary Table 8)were selected for clonal bisulfite sequencing. Adjacent CpG island locicontaining only MRE-seq reads were confirmed to be largely unmethylated(Fig. 6b), whereas loci containing only MeDIP-seq reads wereheavily methylated (Supplementary Table 7). Individual bisulfiteclones from two known imprinted genes INPP5F and GRB10 wereeither methylated or unmethylated at nearly all CpGs (Fig. 6b andSupplementary Table 7). GRB10 exhibited DNA methylation consistentwith an isoform-specific imprint mark, as previously reported 29 .Seven (BCL8, FRG1, ZNF331, IAH1, MEFV, POTEB, ZFP3) of theeight putative DMRs showed evidence of differential methylation(Fig. 6c and Supplementary Table 7). Bisulfite analysis of a DMRupstream of POTEB at 15q11.2 provided direct evidence for allelespecificmethylation (Fig. 6c, lower panel). The H3K9me3 signal atthis locus is also monoallelic, as two nucleotides identified as heterozygousfrom the MethylC-seq reads both showed only a single allelein the H3K9me3 sequence reads (chr15:19346665, T in 4 of 4 reads;and chr15:19348112, C in 13 of 14 reads). In the 150 kb proximal1 2nature biotechnology VOLUME 28 NUMBER 10 OCTOBER 2010 1103


A n a ly s i s© 2010 Nature America, Inc. All rights reserved.to POTEB, three additional CpG islands exhibit intermediate methylationlevels, including one near the noncoding RNA, CXADRP2and one encompassing the 5′ end of BCL8. The allelic pattern ofDNA methylation of BCL8 was confirmed by bisulfite sequencing(Supplementary Table 7).DISCUSSIONOur quantitative comparison of four sequencing-based DNA methylationmethods revealed that all four methods yield largely comparablemethylation calls, but differ in CpG coverage, resolution, quantitativeaccuracy, efficiency and cost. The greater coverage provided byMethylC-seq comes at a >50-fold increase in cost compared to RRBS,MeDIP-seq and MBD-seq. These analyses should be widely useful inunderstanding the extent to which sequencing-based DNA methylationprofiles generated by different methods and different laboratoriescan be compared to define true biological differences. Given theinternational investment in mapping human DNA methylomes, andother epigenomic marks, high concordance is essential.Quantifying differences among the four methods highlighted theirstrengths and weaknesses. Strengths of bisulfite methods includesingle-base resolution and an ability to quantify methylation levels.The quantification is imperfect, however, with the methylation levelof ~18% of CpG varying by >25% between RRBS and MethylC-seq,and the methylation level of ~5–8% of CpGs varying by >25% inRRBS biological replicates. MethylC-seq is superior in genome-wideCpG coverage, whereas RRBS carries a significantly lower ratio ofcost to CpGs covered, particularly at CpG islands. A strength of theenrichment methods is even lower cost per CpG covered genomewiderelative to the bisulfite methods, albeit at reduced resolution.A second potential strength is that in the enrichment methods allfour nucleotides are retained, which modestly increases the rate ofuniquely mappable sequence reads and permits a greater numberof genotype-epigenotype correlations. Enrichment methods donot allow precise quantification of methylation levels, and theirmethylation calls are therefore fit into two or three categories. Usingbinary methylation calls, the enrichment methods are remarkablyreproducible and highly (99%) concordant, regardless of whether thewindow size is 200 bp or 1 kb. The inability of enrichment methodsto quantify methylation was addressed by integrating MeDIP-seqto map methylated regions with MRE-seq to map unmethylatedCpG sites. The integrative approach increases CpG coverage withonly a modest increase in cost, and permits accurate identificationof intermediate methylation states, such as the methylationstates of imprinted genes or cell type–specific methylation withincomplex tissues. The methods also differ in their abilities to detectmethylation at non-CpG cytosines and to discriminate betweenthese residues and CpG methylation. However, the high degree ofconcordance, approaching 100% between MeDIP-seq and MBD-seq,suggests that this differential ability to detect non-CpG methylationdoes not have a significant impact on the relative methylation levelswithin 1,000-bp windows. This observation may be related to the lowlevels of methylation at non-CpG sites, and their presence in regionswith high CpG methylation.Our finding that MeDIP-seq enriches for regions with lower CpGdensity compared to MBD-seq is seemingly in contrast to a previousfinding 30 that MeDIP-seq was more sensitive to regions of highCpG density than MBD-seq. However, it has also been shown 30 thatincreasing eluent salt concentrations in MBD-seq enriches for increasinglyhigher CpG densities. Our comparison between MeDIP-seq andMBD-seq used a salt concentration of 1 M compared to 700 mM 30 ,which could account for the differences.Variation in DNA methylation is a topic of wide interest. Variationis observed between individuals, cell and tissue types or within onecell type over time. Our biological replicates displayed variation thatwas similar in magnitude to variation from limited technical replicates,suggesting the concordance estimates may be marginally higherthan what we report. Thus, to identify potentially rare variation inmethylation between biological samples, the magnitude of technicalvariation should be considered.There are numerous opportunities to increase methylome coverage.First, for RRBS or MRE-seq, for example, selecting additionalenzymes, increasing the size range of selected fragments and increasingsequencing depth could dramatically increase CpG coverage.Second, increasing read length or using paired-end sequencing couldalso positively affect each method. Third, integrative approaches couldinclude MeDIP-seq or MBD-seq coupled with MRE-seq or RRBS, particularlyfor direct rather than inferred calling of unmethylated CpGswithin high CpG density regions. Versatile methods such as ‘bisulfitepadlock probes’ allow more targeted profiling and could also complementthe enrichment methods 14,31 .Sequencing-based methods are unique in that they allow assessmentof the methylation status of repetitive elements, which encompassnearly half of all CpGs in the methylome. The epigenetic statusof this entire genomic compartment has been inaccessible to microarrays,but is a critical component of epigenetic gene regulation, asmany of the sequences have a regulatory function 23,32 . Furthermore,the labile DNA methylation status of a particular transposon in themouse agouti locus influences susceptibility to diabetes and cancer33,34 . These and other studies indicate that there is a great dealto be learned about the epigenetic regulation of these abundant butenigmatic elements.Sequencing-based methylation analysis methods are also unique inthat the sequence reads themselves can be used to construct a partialmap of genetic variation, including common and rare variants. Thecomprehensiveness of the genetic map is a function of read coverageand whether reads contain three nucleotides (bisulfite methods) orfour nucleotides (enrichment methods). The sites of genetic variationenable local epigenetic states to be associated with specific alleles. SNPmicroarrays have been similarly deployed for allelic DNA methylationanalysis, but the detection of variants is confined to those present onthe microarray 35 . Our combined epigenomic-genomic analyses identifiedall CpG islands with intermediate methylation states in H1 ESCs,many of which were confirmed as monoallelic DNA methylation, and insome cases, also monoallelic for histone methylation and gene expression.This represents an initial step toward characterizing the humanimprintome and genome-wide monoallelic epigenetic states, a goal ofbasic biological and clinical importance in epigenomic research.MethodsMethods and any associated references are available in the online versionof the paper at http://www.nature.com/naturebiotechnology/.Data accession. Additional data related to this paper are availableat http://www.genboree.org/java-bin/project.jsp?projectName=Methylation%20Platform%20Comparison&isPublic=yes andhgwdev-remc.cse.ucsc.edu. Data used in this paper are available fordownload from the GEO NIH Roadmap Epigenomics Project DataListings (http://www.ncbi.nlm.nih.gov/geo/roadmap/epigenomics/)and the Epigenomics Atlas (http://genboree.org/epigenomeatlas/edaccDataFreeze1.rhtml).Note: Supplementary information is available on the Nature Biotechnology website.1104 VOLUME 28 NUMBER 10 OCTOBER 2010 nature biotechnology


a n a ly s i s© 2010 Nature America, Inc. All rights reserved.AcknowledgmentsWe would like to thank the US National Institutes of Health (NIH) RoadmapEpigenomics Program; sponsored by the National Institute on Drug Abuse(NIDA) and the National Institute of Environmental Health Sciences (NIEHS).J.F.C. and M.H. are supported by NIH grant 5U01ES017154-02. A. Milosavljevicis supported by NIH grant 5U01DA025956-02. A. Meissner and B.E.B. aresupported by NIH grant 6U01ES017155-02. J.R.E. and B.R. are supported byNIH grant 5U01ES017166-02. R.P.N. was supported by NIH T32 CA108462-04 and F32CA141799. S.L.D. was supported by CIRM TB1-01190. S.D.F.was supported by NIH T32 CA108462-06. B.E.J. was supported by NIH T32GM008568. M.A.M. is a Terry Fox Young Investigator and a Michael SmithSenior Research Scholar. We thank Z. Zhang and H. Li for modifying theZOOM algorithm for bisulfite alignments.AUTHOR CONTRIBUTIONSJ.F.C., R.A.H., T.W., M.H., M.A.M. and A. Milosavljevic conceived and designedthe experiments. R.P.N., C.H., S.L.D., B.E.J., S.D.F., Y.Z. and M.H. performed theMeDIP, MRE and bisulfite sequencing experiments. R.A.W. and X.Z. designed andperformed pyrosequencing and data analyses. H.G., C.B., A.G. and A. Meissner 9performed and analyzed RRBS. L.E., H.O., P.J.F., B.E.B., C.B.E., R.D.H. and B.R.performed and analyzed Chip-seq experiments. R.L., M.P. and J.R.E. analyzedMethylC-seq data and performed Bowtie aligner testing. R.A.H., T.W., K.J.F.,J.G., C.C., M.H., X.Z., A.D. and A.O. performed data analysis. T.W., T.B. and D.H.developed MeDIP and methyl-sensitive restriction enzyme scoring algorithms andperformed coverage analyses including repetitive sequence analyses. Y.X., W.-Y.C.,R.L., M.Q.Z. and W.L. compared bisulfite sequence aligners. J.F.C., R.A.H., M.H.,T.W., R.P.N. and R.A.W. wrote the manuscript.COMPETING FINANCIAL INTERESTSThe authors declare no competing financial interests.Published online at http://www.nature.com/naturebiotechnology/.Reprints and permissions information is available online at http://npg.nature.com/reprintsandpermissions/.1. Robertson, K.D. DNA methylation and human disease. Nat. Rev. Genet. 6, 597–610(2005).2. Bird, A. DNA methylation patterns and epigenetic memory. Genes Dev. 16, 6–21(2002).3. Feinberg, A.P. & Vogelstein, B. Hypomethylation distinguishes genes of some humancancers from their normal counterparts. Nature 301, 89–92 (1983).4. Gama-Sosa, M.A. et al. Tissue-specific differences in DNA methylation in variousmammals. Biochim. Biophys. Acta 740, 212–219 (1983).5. Tahiliani, M. et al. Conversion of 5-methylcytosine to 5-hydroxymethylcytosine inmammalian DNA by MLL partner TET1. Science 324, 930–935 (2009).6. Kriaucionis, S. & Heintz, N. The nuclear DNA base 5-hydroxymethylcytosine ispresent in Purkinje neurons and the brain. Science 324, 929–930 (2009).7. Ito, S. et al. Role of Tet proteins in 5mC to 5hmC conversion, ES-cell self-renewaland inner cell mass specification. Nature 466, 1129–1133 (2010).8. Lister, R. et al. Human DNA methylomes at base resolution show widespreadepigenomic differences. Nature 462, 315–322 (2009).9. Meissner, A. et al. Genome-scale DNA methylation maps of pluripotent anddifferentiated cells. Nature 454, 766–770 (2008).10. Jacinto, F.V., Ballestar, E. & Esteller, M. Methyl-DNA immunoprecipitation (MeDIP):hunting down the DNA methylome. Biotechniques 44, 35–43 (2008).11. Down, T.A. et al. A Bayesian deconvolution strategy for immunoprecipitation-basedDNA methylome analysis. Nat. Biotechnol. 26, 779–785 (2008).12. Serre, D., Lee, B.H. & Ting, A.H. MBD-isolated Genome Sequencing provides ahigh-throughput and comprehensive survey of DNA methylation in the humangenome. Nucleic Acids Res. 38, 391–399 (2010).13. Maunakea, A.K. et al. Conserved role of intragenic DNA methylation in regulatingalternative promoters. Nature 466, 253–257 (2010).14. Ball, M.P. et al. Targeted and genome-scale strategies reveal gene-body methylationsignatures in human cells. Nat. Biotechnol. 27, 361–368 (2009).15. Cokus, S.J. et al. Shotgun bisulphite sequencing of the Arabidopsis genome revealsDNA methylation patterning. Nature 452, 215–219 (2008).16. Lister, R. et al. Highly integrated single-base resolution maps of the epigenome inArabidopsis. Cell 133, 523–536 (2008).17. The American Association for Cancer Research Human Epigenome Task ForceEuropean Union, Network of Excellence, Scientific Advisory Board Moving AHEADwith an international human epigenome project. Nature 454, 711–715 (2008).18. Langmead, B., Trapnell, C., Pop, M. & Salzberg, S.L. Ultrafast and memory-efficientalignment of short DNA sequences to the human genome. Genome Biol. 10, R25(2009).19. Xi, Y. & Li, W. BSMAP: whole genome bisulfite sequence MAPping program. BMCBioinformatics 10, 232 (2009).20. Coarfa, C. & Milosavljevic, A. Pash 2.0: scaleable sequence anchoring for next-generationsequencing technologies. Pac. Symp. Biocomput. 2008, 102–113 (2008).21. Smith, A.D. et al. Updates to the RMAP short-read mapping software. Bioinformatics25, 2841–2842 (2009).22. Lin, H., Zhang, Z., Zhang, M.Q., Ma, B. & Li, M. ZOOM! Zillions of oligos mapped.Bioinformatics 24, 2431–2437 (2008).23. Wang, T. et al. Species-specific endogenous retroviruses shape the transcriptionalnetwork of the human tumor suppressor protein p53. Proc. Natl. Acad. Sci. USA104, 18613–18618 (2007).24. Kunarso, G. et al. Transposable elements have rewired the core regulatory networkof human embryonic stem cells. Nat. Genet. 42, 631–634 (2010).25. Pant, P.V.K. et al. Analysis of allelic differential expression in human white bloodcells. Genome Res. 16, 331–339 (2006).26. Pollard, K.S. et al. A genome-wide approach to identifying novel-imprinted genes.Hum. Genet. 122, 625–634 (2008).27. Schalkwyk, L.C. et al. Allelic skewing of DNA methylation is widespread across thegenome. Am. J. Hum. Genet. 86, 196–212 (2010).28. Pick, M. et al. Clone- and gene-specific aberrations of parental imprinting in humaninduced pluripotent stem cells. Stem Cells 27, 2686–2690 (2009).29. Arnaud, P. et al. Conserved methylation imprints in the human and mouse GRB10genes with divergent allelic expression suggests differential reading of the samemark. Hum. Mol. Genet. 12, 1005–1019 (2003).30. Li, N. et al. Whole genome DNA methylation analysis based on high throughputsequencing technology. Methods published online, doi: 10.1016/j.ymeth.2010.04.009 (27 April 2010).31. Deng, J. et al. Targeted bisulfite sequencing reveals changes in DNA methylationassociated with nuclear reprogramming. Nat. Biotechnol. 27, 353–360 (2009).32. Bourque, G. Transposable elements in gene regulation and in the evolution ofvertebrate genomes. Curr. Opin. Genet. Dev. 19, 607–612 (2009).33. Duhl, D.M., Vrieling, H., Miller, K.A., Wolff, G.L. & Barsh, G.S. Neomorphic agoutimutations in obese yellow mice. Nat. Genet. 8, 59–65 (1994).34. Waterland, R.A. & Jirtle, R.L. Transposable elements: targets for early nutritionaleffects on epigenetic gene regulation. Mol. Cell. Biol. 23, 5293–5300 (2003).35. Hellman, A. & Chess, A. Gene body-specific methylation on the active X chromosome.Science 315, 1141–1143 (2007).1 Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA. 2 Center for Genome Sciences and Systems Biology, Departmentof Genetics, Washington University School of Medicine, St. Louis, Missouri, USA. 3 Brain Tumor Research Center, Department of Neurosurgery, Helen Diller FamilyComprehensive Cancer Center, University of California San Francisco, San Francisco, California, USA. 4 Genome Sciences Centre, BC Cancer Agency, Vancouver,British Columbia, Canada. 5 Center for Biomolecular Science and Engineering, University of California, Santa Cruz, California, USA. 6 Department of Pharmacologyand the Genome Center, University of California-Davis, Davis, California, USA. 7 Genomic Analysis Laboratory, The Salk Institute for Biological Studies, La Jolla,California, USA. 8 Division of Biostatistics, Dan L. Duncan Cancer Center, Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, Texas,USA. 9 Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA. 10 Department of Pathology, Massachusetts General Hospital and Harvard Medical School,Boston, Massachusetts, USA. 11 Center for Cancer Research, Massachusetts General Hospital, Boston, Massachusetts, USA. 12 Ludwig Institute for Cancer Research.13 Department of Cellular and Molecular Medicine, University of California San Diego, La Jolla, California, USA. 14 Cold Spring Harbor Laboratory, Cold Spring Harbor,New York, USA. 15 Department of Molecular and Cell Biology, Center for Systems Biology, University of Texas at Dallas, Dallas, Texas, USA. 16 Department of StemCell and Regenerative Biology, Harvard University, Cambridge, Massachusetts, USA. 17 Harvard Stem Cell Institute, Cambridge, Massachusetts, USA. 18 Max PlanckInstitute for Informatics, Saarbrücken, Germany. 19 USDA/ARS Children’s Nutrition Research Center, Department of Pediatrics, Baylor College of Medicine, Houston,Texas, USA. Correspondence should be addressed to J.F.C. (jcostello@cc.ucsf.edu).nature biotechnology VOLUME 28 NUMBER 10 OCTOBER 2010 1105


© 2010 Nature America, Inc. All rights reserved.ONLINE METHODSESCs. H1 cells were grown in mTeSR1 medium 36 on Matrigel (BD Biosciences)for 10 passages on 10 cm 2 plates and harvested at passage 27. Cells were harvestedby scraping before snap freezing for DNA isolation. Cells were alsoharvested from passages 30 and 32 and divided for isolation of DNA, RNAand chromatin.Illumina Infinium methylation assay. We used 500 ng genomic DNA persample for the Infinium methylation assay (Illumina), which measures methylationat 27,578 CpGs, with ~2 probes per gene (14,475 RefSeq genes). Bisulfiteconversion was performed with the EZ DNA methylation kit (Zymo Research)and each sample was eluted in 12 μl water. Amplification and hybridization tothe Illumina HumanMethylation27 BeadChip were carried out according tomanufacturer’s instructions at the UCSF Genomics Core Facility. Beta values,representing quantitative measurements of DNA methylation at individualCpGs, were generated with Illumina GenomeStudio software. Beta values werenormalized to background and filtered to remove those with low signal intensity.The filtered data were used for all subsequent analysis.Shotgun bisulfite sequencing (MethylC-seq). As described 8 .RRBS. RRBS analysis was performed as described previously 37,38 , using ~30 ngof H1-derived DNA as input. The steps of the experimental protocol were asfollows. (i) DNA digestion using the MspI restriction enzyme, which cuts DNAat its recognition site (CCGG) independent of the CpG methylation status.(ii) End repair and ligation of adapters for Illumina sequencing. (iii) Gel-basedselection of DNA fragment sizes ranging from 40 bp to 220 bp. (iv) Two successiverounds of bisulfite treatment, after which we observed 98.4% convertedcytosines outside of CpGs. Due to the presence of non-CpG methylation inESCs, this value is an underestimate of the actual bisulfite conversion rate.(v) PCR amplification of the bisulfite-converted library and sequencing on theIllumina Genome Analyzer II according to the manufacturer’s protocol.A total of two lanes were sequenced, and the data were processed usingIllumina’s standard pipeline for image analysis and base calling. The alignmentwas performed using custom software developed at the Broad Institute 9 . Thenon-RepeatMasked reference sequence is generated by size-selecting from anin silico digest with the MspI restriction enzyme, and before the alignment allCs in the reference sequence and in the aligned reads are converted into Ts. Thealignment itself uses a straightforward seed-and-extension algorithm, identifyingall perfect 12 bp alignments and extending without gaps from either endof the seed. The best alignment is kept only in cases where the second-bestalignment has at least three more mismatches, whereas all reads that matchmultiple times are discarded. The DNA methylation level of a specific CpG iscalculated as the number of C-to-C matches between the unconverted referencesequence and the aligned read sequence divided by the sum of numberof C-to-C matches and C-to-T mismatches.MBD-seq. As described above, 3 μg of gDNA isolated was sheared to~300 bp using the Covaris E210 sonicator (Covaris) and size separated by PAGE(8%). The 200- to 400-bp DNA fraction was excised, eluted overnight at 4 °Cin 200 μl of elution buffer (5:1, LoTE buffer (3 mM Tris-HCl, pH 7.5, 0.2 mMEDTA)-7.5 M ammonium acetate) and purified using a QIAquick purificationkit (Qiagen). The size selected DNA was end-repaired, A-tailed and ligated to2.5 mMol of ‘paired-end’ adapters (IDT) following the manufactures recommendprotocol (Ilumina). The resulting product was purified on a QiaquickMinElute column (Qiagen) and assessed and quantified using an Agilent DNA1000 series II assay and Qubit fluorometer (Invitrogen), respectively. 100 ngof pre-adapted, size-selected product was subjected to immunoprecipitationusing the MethylMiner Methylated DNA Enrichment Kit (Invitrogen)following the manufacturer’s recommended protocol. The bound fraction waseluted at 600 mM, 1 M and 2 M NaCl and concentrated by the addition of1 μl (20 μg/μl) mussel glycogen, 1/10 th v/v 3 M sodium acetate (pH 5.2) and2x v/v 100% ethanol. Samples were incubated at −80 °C for 2 h and subsequentlycentrifuged for 15 min at 16,000g at 4 °C. Pellets were washed with500 μl cold 70% ethanol two times with 5 min centrifugation at 16,000g at4 °C between washes and resuspended in 60 μl nuclease-free water. Afterpurification eluted products were subjected to PCR using Illumina paired-endadapters (Illumina.) with 15 cycles of PCR amplification. PCR products werepurified on Qiaquick MinElute columns (Qiagen) and assessed and quantifiedusing an Agilent DNA 1000 series II assay and size separated by PAGE (8%).The 320- to 520-bp DNA fraction was excised and purified as describedabove. The products were assessed and quantified using an Agilent DNA1000 series II assay and Qubit fluorometer (Invitrogen), respectively. A 1 μlaliquot of each library was used as template in two independent PCR reactionsto confirm enrichment for methylated (SNRPN promoter) and de-enrichmentfor unmethylated (CpG-less sequence on Chr15) 13 for primer sequences).Cycling was 95 °C for 30 s, 55 °C for 30 s and 72 °C for 30 s with 30 cycles. PCRproducts were visualized by 1.8% agarose gel electrophoresis. Each library wasdiluted to 8 nM for sequencing on an Illumina Genome Analyzer followingthe manufacturer’s recommended protocol.MeDIP-seq. As described above 2–5 μg DNA isolated was sonicated to~100–500 bp with a Bioruptor sonicator (Diagenode). Sonicated DNA wasend-repaired, A-tailed and ligated to adapters following the standard Illuminaprotocol. After agarose size-selection to remove unligated adapters, adaptorligatedDNA was used for each immunoprecipitation using a mouse monoclonalanti-methylcytidine antibody (1 mg/ml, Eurogentec). DNA was heatdenatured at 95 °C for 10 min, rapidly cooled on ice and immunoprecipitatedwith 1 μl primary antibody per microgram of DNA overnight at 4 °C withrocking agitation in 500 μl immunoprecipitation (IP) buffer (10 mM sodiumphosphate buffer, pH 7.0, 140 mM NaCl, 0.05% Triton X-100). To recoverthe immunoabsorbed DNA fragments, 1 μl of rabbit anti-mouse IgG secondaryantibody (2.5 mg/ml, Jackson ImmunoResearch) and 100 μl Protein A/Gbeads (Pierce Biotechnology) were added and incubated for an additional2 h at 4 °C with agitation. After immunoprecipitation a total of 6 IP washeswere performed with ice cold IP buffer. A nonspecific mouse IgG IP (JacksonImmunoResearch) was performed in parallel to methyl DNA IP as a negativecontrol. Washed beads were resuspended in TE with 0.25% SDS and 0.25 mg/ml proteinase K for 2 h at 55 °C and then allowed to cool to 25 °C. MeDIP andsupernatant DNA were purified using Qiagen MinElute columns and elutedin 16 μl elution buffer (EB) (Qiagen). Fifteen cycles of PCR were performedon 5 μl of the immunoprecipitated DNA using the single-end Illumina PCRprimers. The resulting reactions are purified over Qiagen MinElute columns,after which a final size selection (220–420 bp) was performed by electrophoresisin 2% agarose. Libraries were quality checked by spectrophotometryand Agilent DNA Bioanalyzer analysis, which indicated an average fragmentsize of 150 bp. An aliquot of each library was diluted in EB to 5 ng/μl and 1 μlused as template in four independent PCR reactions to confirm enrichmentfor methylated and de-enrichment for unmethylated sequences, compared to5 ng of input (sonicated DNA). Two positive controls (SNRPN and MAGEA1promoters) and two negative controls (a CpG-less sequence on Chr15 andGAPDH promoter) were amplified 13 for primer sequences). Cycling was95 °C for 30 s, 58 °C for 30 s, 72 °C for 30 s with 30 cycles. PCR products werevisualized by 1.8% agarose gel electrophoresis.Calculating MeDIP-seq and MBD-seq scores for single CpGs. MeDIP-seqand MBD-seq reads were mapped to the non-RepeatMasked human genomeassembly (hg18) with Mapping and Assembly with Quality (MAQ). An algorithmwas developed to calculate methylation scores for individual CpGs basedon MeDIP-seq or MBD-seq data. Each uniquely mapped, non-redundantsequence read was extended to 150 bp long, representing individual DNA fragmentspulled down in the methylation enrichment experiment. The algorithmmakes two assumptions: first, for a given fragment, this fragment is assigned toa CpG site that is covered by this fragment and the probability of assigning it toa particular CpG, when there is more than 1 CpG is proportional to the levelof methylation of the CpG site; the weighted sum of the probability of all CpGscovered by this fragment is always 1. Second, for a given CpG site, the numberof fragments assigned to it is proportional to the level of methylation of thisCpG site. The algorithm initiates by assigning a score of 1 to all CpGs, and thenit iterates through two steps. In the first step, fragments are assigned to CpGsbased on their scores. In the first round, because all CpGs have the same scoreof 1, an equal fraction of a fragment is assigned to each CpG that the fragmentcovers, and this is done for all fragments. In the second step, all the fractionsof reads each CpG received in step 1 are added up, and this weighted sum isnature biotechnologydoi:0.1038/nbt.1682


© 2010 Nature America, Inc. All rights reserved.used as a methylation score for this CpG site. Then, the first step is repeated;only now individual CpGs may have a different prior for assigning reads.A fraction of a fragment is now assigned to CpGs that fragment covers basedon methylation scores of the CpGs, that is, the fraction assigned to each CpGis proportional to its methylation score. These updated fragment counts aresummed again in step 2 and used as methylation score for individual CpGs.The algorithm iterates through these two steps until the methylation scoresconverge. These scores are in essence CpG density normalized read density.Methylation sensitive restriction enzyme sequencing (MRE-seq). Threeparallel digests were performed (HpaII, AciI and Hin6I; Fermentas), eachwith 1 μg of DNA. Five units of enzyme per microgram DNA were addedand incubated at 37 °C in Fermentas “Tango” buffer for 3 h. A second dose ofenzyme was added (5 units of enzyme per microgram DNA) and the DNA wasincubated for an additional 3 h. Digested DNA was precipitated with sodiumacetate and ethanol and 500 ng of each digest were combined into one tube.Combined DNA was size-selected by electrophoresis on a 1% agarose TBEgel. A 100–300 bp gel slice was excised using a sterile scalpel and gel-purifiedusing Qiagen Qiaquick columns, eluting in 30 μl of Qiagen EB buffer. Libraryconstruction was performed using the Illumina Genomic DNA Sample Kit(Illumina) with single-end adapters, following the manufacturer’s instructionswith the following changes. For the end-repair reaction, T4 DNA polymeraseand T4 polynucleotide kinase were excluded and the Klenow DNA polymerasewas diluted 1:5 in water and 1 μl used per reaction. For single end oligo adaptorligation, adapters were diluted 1:10 in water and 1 μl used per reaction. Afterthe second size selection, DNA was eluted in 36 μl EB buffer using QiagenQiaquick columns, and 13 μl used as template for PCR, using Illumina reagentsand cycling conditions with 18 cycles. After cleanup with Qiagen MinElutecolumns, each library is examined by spectrophotometry (Nanodrop, ThermoScientific) and Agilent DNA Bioanalyzer (Agilent).Methyl-sensitive restriction enzyme scores. MRE-seq reads were mapped tothe human genome assembly (hg18) with MAQ with an additional constraintthat the 5′ end of a read must map to the CpG site within a methyl-sensitiverestriction enzyme site. An MRE-score was defined for each CpG site as thenumber of MRE-reads that map to the site, regardless of the orientation,normalized by the number of million reads generated by the specific enzyme.An MRE-score for each genomic window (e.g., any given 600 bp window)was defined as the average MRE-score for all CpGs that have a score withinthe window.RNA-seq. Polyadenylated RNA was purified from 20 μg of DNAse1(Invitrogen)-treated total RNA using the MACS mRNA Isolation Kit (MiltenyiBiotec). Double-stranded cDNA was synthesized from the purified polyA +RNA using Superscript Double-Stranded cDNA Synthesis kit (Invitrogen) andrandom hexamer primers (Invitrogen) at a concentration of 5 μM. The resultingcDNA was sheared using a Sonic Dismembrator 550 (Fisher Scientific)and size separated by PAGE (8%). The 190–210 bp DNA fraction was excised,eluted overnight at 4 °C in 300 μl of elution buffer (5:1, LoTE buffer (3 mMTris-HCl, pH 7.5, 0.2 mM EDTA)-7.5 M ammonium acetate) and purified usinga QIAquick purification kit (Qiagen). The sequencing library was preparedfollowing the Illumina Genome Analyzer paired end library protocol (Illumina)with 10 cycles of PCR amplification. PCR products were purified on QiaquickMinElute columns (Qiagen) and assessed and quantified using an AgilentDNA 1000 series II assay and Qubit fluorometer (Invitrogen) respectively.The resulting libraries were sequenced on an Illumina Genome Analyzer iixfollowing the manufacturer’s instructions. Image analysis and base calling wasperformed by the GA pipeline v1.1 (Illumina) using phasing and matrix valuescalculated from a control phiX174 library run on each flowcell.ChIP-seq. Protocols for the chromatin immunoprecipitation assay andIllumina library construction are described in details elsewhere 39 . Briefly,cross-linked hESCs were obtained from Cellular Dynamics, chromatin wasextracted and sonicated to an average size of 500 bp. Individual ChIP assayswere performed using 50 μg chromatin (equivalent to 5 × 10 6 cells) and2 μg of antibody were added to each ChIP reaction. The histone antibodiesused in this study include H3AcK9 (Millipore), H3me3K4 (CST), H3me3K27(CST), H3me3K9 (Abcam), H3me3K36 (Abcam) and H3me1K4 (Abcam5).ChIP libraries have been created 40 using the entire purified ChIP sample. AllChIP samples except H3me1K4 were amplified using paired-end Illuminaprimers for a total of 18 cycles. Libraries were then run on a 2% agarose gel,and the 150- to 500-bp fraction of the library was extracted and purified. TheH3me1K4 library was constructed by performing size selection of the 200- to400-bp library fragment before a 15-cycle amplification. The libraries werequantified using a BioAnalyzer and sequenced. ChIP-seq peaks were calledusing the Sole-Search software 41 .Bisulfite pyrosequencing. Site-specific analysis of CpG methylation wasperformed by bisulfite pyrosequencing. Genomic DNA (1.0 μg) was bisulfitemodified and pyrosequencing was performed as previously described 42 . Thequantitative performance of each pyrosequencing assay was verified by measuringmethylation standards comprised of known proportions of unmethylated (wholegenome-amplified) and fully methylated (SssI-treated) genomic DNA 43 .Comparison was performed on three combinations of DNA methylomeplatforms: MethylC-seq versus reduced representation bisulfite sequencing(RRBS) and MethylC-seq versus methylated DNA immunoprecipitationsequencing (MeDIP-seq). H1 cell lines of different passage number were usedin these experiments (Batch 3 for MethylC-seq, Batch 1 for RRBS and MeDIP).CpGs showing > 80% difference in methylation for the MethylC-seq − RRBScomparison or > 80% difference between the methylated proportion and themethylation score for MethylC-seq and MeDIP comparisons were identifiedand regions with clusters of these sites were identified for pyrosequencing.Based on the distribution of target CpGs we looked for genomic regions withappropriate length (within range 50 bp to 75 bp), few or no non-CG cytosinesand 2 or many target CpGs. Pyrosequencing assays were designed and carriedout in 16 regions selected for validation; 14 of these yielded reliable results.Genomic coordinates and primers used for pyrosequencing for the validatedregions are listed in Supplementary Table 1.Clonal bisulfite sequencing. Further validation of genome-wide data,particularly sites with apparent allelic DNA methylation, was performed bybisulfite sequencing. Total genomic DNA underwent bisulfite conversionfollowing established protocol 44 with a modified conversion conditions of:95 °C for 1 min, 50 °C for 59 min for a total of 16 cycles. Bisulfite PCR primers(Supplementary Table 4) were used to amplify regions of interest and weresubsequently cloned using pCR2.1/TOPO (Invitrogen). Single colony PCR andsequencing (QuintaraBio) provided contigs that were aligned for analysis.Data analyses. Comparison of CpG or non-CpG site methylation. Repeat maskingof the reference genome assembly was not used in any of these analyses.For bisulfite-based methods, reads that mapped to the positive and negativestrand were combined for CpG methylation calculations, but not for CHGand CHH methylation calculations due to the strand asymmetry of non-CpGmethylation 9 . The methylated proportion was calculated for each CpG ornon-CpG as (methylated reads/(methylated reads + unmethylated reads)).Comparisons of methylation status calls were performed by imposing minimumrequirements of 2, 5 or 10 reads covering a CpG or non-CpG site andapplying varying methylated proportion cutoffs (0.80–0.20, 0.75–0.25 or 0.20)to make calls on the methylation status. Methylated proportion differenceswere calculated as (MethylC-seq proportion - RRBS proportion). Methylationproportion difference graphs were generated by counting the number of CpGswith a particular methylated proportion difference and plotting the count onthe y-axis. Concordance was then calculated as the percent of CpGs with amethylation proportion difference less than 0.1 or 0.25.For enrichment-based methods, methylation scores inferred for individualCpGs were averaged across CpGs covered by a varying minimum number ofreads in 1,000- or 200-bp windows. Methylation calls of highly methylated(methylation score >8) or weakly methylated (methylation score ≤8) weremade based on the average methylation score for each window where at leastone CpG was covered by the minimum number of reads.Genomic context of concordant and discordant CpGs. The overlap ofconcordant and discordant CpGs with annotated genes, as defined bythe UCSC Genome Browser RefSeq Gene track (2010-01-24 versiondoi:0.1038/nbt.1682nature biotechnology


http://genome.ucsc.edu/cgi-bin/hgTrackUi?g=refGene), was identified.To deal with overlapping genes and multiple isoforms of genes, CpGs wereclassified into gene components based on the following prioritization order:Promoter (within 8,000 bp upstream of a transcription start site), CodingExon, UTR and Intron. CpGs that did not overlap with any of these genecomponents were identified as Intergenic.36. Ludwig, T.E. et al. Feeder-independent culture of human embryonic stem cells.Nat. Methods 3, 637–646 (2006).37. Gu, H. et al. Genome-scale DNA methylation mapping of clinical samples at singlenucleotideresolution. Nat. Methods 7, 133–136 (2010).38. Smith, Z.D., Gu, H., Bock, C., Gnirke, A. & Meissner, A. High-throughput bisulfitesequencing in mammalian genomes. Methods 48, 226–232 (2009).39. O’Geen, H., Frietze, S. & Farnham, P.J. Using ChIP-seq technology to identify targetsof zinc finger transcription factors. Methods Mol. Biol. 649, 437–455 (2010).40. Robertson, G. et al. Genome-wide profiles of STAT1 DNA association using chromatinimmunoprecipitation and massively parallel sequencing. Nat. Methods 4, 651–657(2007).41. Blahnik, K.R. et al. Sole-Search: an integrated analysis program for peak detectionand functional annotation using ChIP-seq data. Nucleic Acids Res. 38, e13 (2010).42. Waterland, R.A., Lin, J., Smith, C.A. & Jirtle, R.L. Post-weaning diet affects genomicimprinting at the insulin-like growth factor 2 (Igf2) locus. Hum. Mol. Genet. 15,705–716 (2006).43. Shen, L., Guo, Y., Chen, X., Ahmed, S. & Issa, J.J. Optimizing annealing temperatureovercomes bias in bisulfite PCR methylation analysis. Biotechniques 42, 48, 50,52 passim (2007).44. Grunau, C., Clark, S.J. & Rosenthal, A. Bisulfite genomic sequencing: systematicinvestigation of critical experimental parameters. Nucleic Acids Res 29, E65 (2001).© 2010 Nature America, Inc. All rights reserved.nature biotechnologydoi:0.1038/nbt.1682


a n a ly s i sQuantitative comparison of genome-wide DNAmethylation mapping technologiesChristoph Bock 1–4,6 , Eleni M Tomazou 1–3,6 , Arie B Brinkman 5 , Fabian Müller 1–4 , Femke Simmer 5 ,Hongcang Gu 1 , Natalie Jäger 1–3 , Andreas Gnirke 1 , Hendrik G Stunnenberg 5 & Alexander Meissner 1–3© 2010 Nature America, Inc. All rights reserved.DNA methylation plays a key role in regulating eukaryotic geneexpression. Although mitotically heritable and stable over time,patterns of DNA methylation frequently change in responseto cell differentiation, disease and environmental influences.Several methods have been developed to map DNA methylationon a genomic scale. Here, we benchmark four of theseapproaches by analyzing two human embryonic stem cell linesderived from genetically unrelated embryos and a matchedpair of colon tumor and adjacent normal colon tissue obtainedfrom the same donor. Our analysis reveals that methylated DNAimmunoprecipitation sequencing (MeDIP-seq), methylatedDNA capture by affinity purification (MethylCap-seq), reducedrepresentation bisulfite sequencing (RRBS) and the InfiniumHumanMethylation27 assay all produce accurate DNAmethylation data. However, these methods differ in their abilityto detect differentially methylated regions between pairs ofsamples. We highlight strengths and weaknesses of the fourmethods and give practical recommendations for the design ofepigenomic case-control studies.DNA methylation is a common mechanism of epigenetic regulationin eukaryotes. It occurs most frequently at cytosines that are followedby guanines (CpG). High levels of DNA methylation in promoterregions are typically associated with robust gene silencing 1 . Twentyfiveyears of research on cancer epigenetics have firmly established theprevalence of aberrant DNA methylation in cancer cells 2–6 . Moreover,recent studies have investigated the role of DNA methylation for neuraland autoimmune diseases, its correlation with physiological conditionsand its response to environmental influences 7–9 . Comprehensivemapping of DNA methylation in relevant clinical cohorts is likely toidentify new disease genes and potential drug targets, help to establishthe relevance of epigenetic alterations in disease and provide a richsource of potential biomarkers 10 . DNA methylation mapping could1 Broad Institute, Cambridge, Massachusetts, USA. 2 Department of Stem Celland Regenerative Biology, Harvard University, Cambridge, Massachusetts,USA. 3 Harvard Stem Cell Institute, Cambridge, Massachusetts, USA. 4 MaxPlanck Institute for Informatics, Saarbrücken, Germany. 5 Radboud UniversityDepartment of Molecular Biology, Nijmegen Center for Molecular Life Sciences,Nijmegen, The Netherlands. 6 These authors contributed equally to this work.Correspondence should be addressed to C.B. (cbock@broadinstitute.org) orA.M. (alexander_meissner@harvard.edu).Published online 19 September 2010; doi:10.1038/nbt.1681also facilitate quality control of cultured cells by exploiting the factthat cell states and differentiation potential of stem cells are reflectedin their DNA methylation patterns 11 .Several methods have been developed to map DNA methylationon a genomic scale. Most of these methods combine DNA analysisby microarrays or high-throughput sequencing with one of fourways of translating DNA methylation patterns into DNA sequenceinformation or library enrichment. (i) MeDIP-seq uses an antibodythat is specific for 5-methylcytosine to retrieve methylatedfragments from sonicated DNA 12,13 . (ii) MethylCap-seq employsa methyl-binding domain protein to obtain DNA fractions withsimilar methylation levels 14–16 . (iii) Bisulfite-based methodsuse a chemical reaction that selectively converts unmethylated,but not methylated, cytosines into uracils, thus introducingmethylation-specific, single nucleotide polymorphisms into theDNA sequence 11,17,18 . (iv) Methylation sensitive digestion usesprokaryotic restriction enzymes to selectively fractionate onlymethylated or only unmethylated DNA 19–21 .The diversity of methods to map DNA methylation and the absenceof an uncontested commercial market leader raise questions abouteach method’s strengths and weaknesses—questions that researchershave to answer for themselves when selecting the most appropriatetechnology for any given project. The goal of this study wasto comprehensively evaluate four popular methods—MeDIP-seq 12 ,MethylCap-seq 14 , RRBS 22 and the Infinium HumanMethylation27assay 17 with a special emphasis on their practical utility for biomedicalresearch and biomarker development. All four methods are relativelyeasy to set up because detailed protocols have been publishedand/or commercial kits are available. We chose RRBS because it targetsbisulfite sequencing to a well-defined set of genomic regionswith moderate to high CpG density 22 , which makes RRBS substantiallymore cost efficient than genome-wide bisulfite sequencing. TheInfinium HumanMethylation27 assay, also a bisulfite-based method,was included because of its wide use and easy integration with existinggenotyping pipelines; it is the only microarray-based method inour comparison. Methods that use tiling microarrays were excludedbecause they have been benchmarked previously 20 and becausenext-generation sequencing enables higher resolution and/or highergenomic coverage at competitive cost. Methylation-specific digestionwas excluded because no algorithm exists that could accurately inferquantitative DNA methylation data from digested read frequencies 19 .An outline of the experimental and analytical procedure of this technologycomparison is shown in Figure 1.1106 VOLUME 28 NUMBER 10 OCTOBER 2010 nature biotechnology


a n a ly s i s© 2010 Nature America, Inc. All rights reserved.MeDIP-seq1. Sonication of DNA2. Library preparation3. Denaturation and enrichmentwith antibody for5-methylcytosine4. Library amplificationHigh-throughput sequencingDNA for two pairs of samplesTwo human ES cell lines derived from unrelated embryosA colon tumor and matched normal colon tissue from the same patientMethylCap1. Sonication of DNA2. Enrichment with methylbindingdomain protein3. Washing and elution4. Library preparation andamplificationRRBS1. Digestion with MspI2. Library preparation3. Gel-based size selection4. Bisulfite treatment5. Library amplification1. Sequencing on the Illumina Genome Analyzer II (30–40 million reads per sample)2. Image processing, base calling and genome alignmentInfinium1. DNA preparation2. Bisulfite conversion3. Hybridization ontoIllumina bead arrays(Infinium Human-Methylation27)4. Data normalizationusing the IlluminaBeadStudio softwareBioinformatic analysis1. Accuracy analysis and quantification of DNA methylation levels2. Assessment of genomic coverage and statistical power to detect DNA methylation differences3. Identification of differentially methylated regions (DMRs), cross-method comparison and validation4. Saturation analysis estimating the effect of sequencing depth5. DNA methylation analysis of repetitive DNARESULTSDNA methylation mapping by four methodsGenome-wide DNA methylation mapping is most commonly used asa discovery tool to identify differentially methylated regions (DMRs)as candidates for further research. Typical examples are cancerspecificDMRs, which are increasingly used as biomarkers for cancerdiagnosis and therapy optimization 10 . To emulate the case-controlapproach that is widely used for epigenetic biomarker development,we focused on sample pairs that we statistically compare with eachother. Specifically, we selected two human embryonic stem (ES) celllines that were derived from genetically unrelated embryos 23 , and amatched pair of colon tumor and adjacent normal colon tissue obtainedfrom the same donor. We applied each of the four methods (MeDIPseq,MethylCap-seq, RRBS, Infinium) to all four samples (HUES6ES cells, HUES8 ES cells, colon tumor and matched normal colontissue), generating a total of 16 genome-scale DNA methylation maps.All data were processed with a standardized bioinformatic pipeline,and the technical data quality turned out to be similarly high acrossall samples and methods (Table 1).When plotting the DNA methylation data as genome browsertracks, we found excellent visual agreement between all four methods(Fig. 2; tracks are available online for interactive browsing:http://meth-benchmark.computational-epigenetics.org/). MeDIPseqand MethylCap-seq gave rise to peaks of methylated DNA thatwere similar in shape, size and location, indicating that MeDIP-seq’smonoclonal antibody and MethylCap-seq’s methyl-binding domainenrich for similar DNA fragments. However, MeDIP-seq exhibitedhigher baseline levels and lower peak heights than MethylCap-seq.This smaller dynamic range is already apparent from Figure 2 (notethe different scale of the y axis) and becomes more obvious whenplotting MeDIP and MethylCap-seq tracks along an entire chromosome(Supplementary Fig. 1). This observation was quantitativelyconfirmed by plotting the mean read frequency for enriched anddepleted fractions of the genome (Supplementary Fig. 2). We alsoobserved high visual agreement between RRBS and Infinium, with thelimitation that Infinium covers two orders of magnitude fewer CpGsthan RRBS (Table 1). Finally, the bisulfite-based methods (RRBS,Validation1. Primer design2. Bisulfite conversion3. PCR amplification4. Amplicon cloning5. Sanger sequencing6. Data processing usingthe BiQ AnalyzersoftwareFigure 1 Outline of the DNA methylation technology comparison. Four methods for DNA methylationmapping were compared on two pairs of samples. The resulting 16 DNA methylation maps werebioinformatically analyzed and benchmarked against each other. In addition, clonal bisulfitesequencing was performed on selected genomic regions to validate DNA methylation differencesthat were detected exclusively by one method.Infinium) generally confirm the results ofthe enrichment-based methods (MeDIP,MethylCap-seq), although there are deviationsin repeat-rich as well as in CpG-poorgenomic regions (Supplementary Fig. 3).Accuracy of DNA methylation mappingFor a more quantitative assessment of measurementaccuracy, we compared the resultsof the three sequencing-based methods(MeDIP-seq, MethylCap-seq, RRBS) withthe Infinium HumanMethylation27 assay asa common reference (Fig. 3). The Infiniumassay was used as reference because its quantitativeaccuracy has been established inprevious studies 17,24 , which reported correlationcoefficients around 0.9 relative to theGoldenGate and MethyLight assays. Note,however, that the probes of the Infinium assaycover only a small percentage of all CpGs inthe genome and are preferentially located inunmethylated promoter regions. To compensatefor this potential source of bias, wecalculated two correlation coefficients, oneacross the entire spectrum of methylationlevels and the other focusing only on those CpGs that exhibit at least20% methylation according to the Infinium assay.RRBS and Infinium data can be compared directly and withoutnormalization, because both methods measure absolute DNA methylationlevels. For a total of 5,088 single CpGs that were covered byboth an Infinium probe and at least five RRBS reads, we observeda Pearson correlation of 0.92 across all DNA methylation levels and aPearson correlation of 0.83 when we excluded unmethylated CpGs.Because neighboring CpGs tend to exhibit highly correlated DNAmethylation levels 18,25 , we also evaluated the correlation for RRBSmeasurement averages over a 200-base pair (bp) sequence windowaround each Infinium probe. Again, we observed excellent agreementbetween the two methods (Fig. 3c), with an overall Pearson correlationof 0.92 across all DNA methylation levels and a Pearson correlationof 0.84 when we excluded unmethylated CpGs. This secondcomparison supports the hypothesis that a single-CpG measurementcan often act as an indicator of the DNA methylation levels at neighboring,unmeasured CpGs.Comparison with MeDIP-seq and MethylCap-seq is less straightforwardbecause both methods measure the relative enrichmentof methylated DNA rather than absolute DNA methylation levels.When we correlated the number of sequencing reads per 1-kbregion with the DNA methylation measurements of the Infiniumassay, the Pearson correlation did not exceed 0.6 across all DNAmethylation levels and 0.4 when we excluded unmethylated CpGs(Supplementary Fig. 3a,b). High density of repetitive DNA wasidentified as a major source of spurious read enrichment in regionswith low absolute DNA methylation levels. In contrast, low CpGdensity gave rise to low read numbers in regions with high levelsof DNA methylation (Supplementary Fig. 3c,d). The confoundingeffect of DNA sequence is also visible in Figure 2. Low readcounts can indicate either the relative absence of CpGs (e.g., region 1in Fig. 2) or the absence of DNA methylation in the presence of CpGs(Fig. 2, region 2); and strong peaks can occur in genomic regions thatare incompletely methylated if the CpG density is sufficiently high togive rise to substantial read enrichment (Fig. 2, region 3).nature biotechnology VOLUME 28 NUMBER 10 OCTOBER 2010 1107


A n a ly s i s© 2010 Nature America, Inc. All rights reserved.Table 1 Summary of DNA methylation mapping experimentsRun no. Method Sample nameNumber oflanes aNumber ofreads (total)It has previously been reported that statistical correction for CpGdensity can improve the quantification of DNA methylation levelsbased on MeDIP-seq data 12,26 . We therefore constructed a linearregression model that corrects for the confounding effect of DNAsequence, and we observe substantially improved results (Fig. 3a,b).Across all DNA methylation levels the correlation between the statisticallycorrected read counts and the DNA methylation measurementsof the Infinium assay amounted to 0.84 for MeDIP-seq and to0.88 for MethylCap-seq. However, the correlations dropped to 0.57(MeDIP-seq) and 0.66 (MethylCap-seq) when we excluded unmethylatedCpGs. These results indicate that MeDIP-seq and MethylCap-seqcan distinguish between methylated and unmethylated regions almostas precisely as RRBS, but are less accurate for quantifying the DNAmethylation levels in partially methylated genomic regions.Genomic coverage of DNA methylation mappingThe single-bp resolution of the two bisulfite-based methods comes atthe cost of reduced genomic coverage compared to the two enrichmentbasedmethods. RRBS reads cover less than 10% of the 28 million CpGsin the human genome and Infinium is by design restricted to 27,578promoter-associated CpGs (Table 1). In contrast, MeDIP-seq andMethylCap-seq are theoretically able to identify methylated genomicregions located anywhere in the genome, although they too are subject tointrinsic limitations 27 . To assess the empirical genomic coverage of eachmethod, we calculated the number of reads (MeDIP-seq, MethylCap-seq)or CpG methylation measurements (RRBS, Infinium) for each of thefollowing genomic regions: (i) CpG islands, (ii) gene promoters and(iii) a 1-kb tiling of the genome. The results are shown in Figure 4, andcoverage details for a total of 13 types of genomic regions are availableonline (http://meth-benchmark.computational-epigenetics.org/).As expected, MeDIP-seq and MethylCap-seq provide broad coverageof the genome, whereas RRBS and Infinium are more restrictedto CpG islands and promoter regions. However, the practically relevantdifferences in genomic coverage are lower than Figure 4 maysuggest. This is because a minimum number of reads are required inat least one sample to reliably detect differential methylation among aNumber ofreads (aligned)AlignmentrateNumber ofreads (unique)Number ofreads (duplicates)Uniqueread rate b1 MeDIP-seq HUES6 ES cell line 2 37,086,239 22,798,831 61.5% 12,849,623 9,949,208 56.4%2 MeDIP-seq HUES8 ES cell line 2 36,078,308 24,266,670 67.3% 12,287,174 11,979,496 50.6%3 MeDIP-seq Primary colon tumor 2 33,453,797 18,582,183 55.5% 7,006,484 11,575,699 37.7%4 MeDIP-seq Matched normal colon tissue 2 37,789,936 21,793,567 57.7% 10,360,103 11,433,464 47.5%5 MethylCap-seq HUES6 ES cell line 3 38,436,495 23,401,511 60.9% 21,712,433 1,689,078 92.8%6 MethylCap-seq HUES8 ES cell line 3 38,735,596 21,670,301 55.9% 19,585,988 2,084,313 90.4%7 MethylCap-seq Primary colon tumor 3 37,718,830 23,206,054 61.5% 21,600,129 1,605,925 93.1%8 MethylCap-seq Matched normal colon tissue 3 38,330,519 22,724,002 59.3% 21,290,282 1,433,720 93.7%Number ofCpGs (total)Number ofCpGs (unique)Mean CpGcoverage9 RRBS HUES6 ES cell line 2 30,004,147 12,150,905 40.5% 22,181,147 2,181,128 10.2x10 RRBS HUES8 ES cell line 2 28,395,040 12,670,034 44.6% 29,704,332 2,185,751 13.6x11 RRBS Primary colon tumor 4 c 40,015,958 9,545,423 23.9% 16,891,325 1,297,296 13.0x12 RRBS Matched normal colon tissue 4 c 32,072,287 6,214,732 19.4% 10,190,227 1,134,963 9.0xNumber ofarraysNumber ofCpGs (total)Number ofCpGs (valid)Number of Valid probe rateCpGs (unique)13 Infinium HUES6 ES cell line 1 27,578 27,192 27,192 98.6%14 Infinium HUES8 ES cell line 1 27,578 27,090 27,090 98.2%15 Infinium Primary colon tumor 1 27,578 27,561 27,561 99.9%16 Infinium Matched normal colon tissue 1 27,578 27,478 27,478 99.6%a All sequencing was performed in 2009 using the Illumina Genome Analyzer II (36-bp, single-end reads). As of June 2010, we routinely observe total read numbers per lane averaging ~40 millionfor MeDIP-seq and MethylCap-seq and close to 30 million for RRBS. Current alignment rates range from 60% to 80% for all three methods. b The unique read rate was calculated by dividing thenumber of reads that map to a unique position in the genome (defined by chromosome, read start position and strand) by the total number of aligned reads. c Samples 11 and 12 were part of asequencing-optimization run that resulted in lower sequencing yield and reduced alignment rates. Four lanes were sequenced to reach the target of 30–40 million reads per sample and method.given pair of samples. We illustrate this point by two statistical powercalculations, which were performed with G*Power 3 (ref. 28). Assumethat a genomic region is covered by five MeDIP-seq or MethylCapseqreads in one sample. Then it has to contain at least 20 reads inthe second sample to be detected as hypermethylated (assuminga statistical power of 80% and a P-value of 5% without multipletestingcorrection). Similarly, RRBS would detect a DNA methylationincrease from 30% to 70% only when at least 25 measurements areavailable in each sample (again assuming a statistical power of 80%and a P-value of 5% without multiple-testing correction).Identification of differentially methylated regionsGenome-wide DNA methylation mapping is most commonly usedfor detecting DNA methylation differences, for example, betweendiseased and healthy tissue or between genetically modified andunmodified control cells. To assess how well MeDIP-seq, MethylCapseqand RRBS perform on this task, we developed a bioinformaticmethod that identifies statistically significant DMRs from multipletypes of sequencing data (the Infinium assay requires a differentapproach and is discussed in a separate section below). For a predefinedset of genomic regions we count the numbers of sequenced reads(for MeDIP-seq and MethylCap-seq) or, alternatively, the numbersof methylated versus unmethylated CpGs (for RRBS), and we test forstatistically significant differences between two samples using Fisher’sexact test. When applied to a complete tiling of the human genome,this method performs genome-wide DMR detection. Alternatively,it can be targeted to specific region types such as CpG islands, genepromoters or putative enhancers, which often leads to more sensitivedetection of small differences because the multiple-testingburden is reduced compared to genome-wide DMR detection. Wepursued both the unbiased and the annotation-guided approach inparallel, focusing our comparison on three types of genomic regions:(i) CpG islands, (ii) gene promoters and (iii) a 1-kb tiling of thegenome (Fig. 5 and Supplementary Figs. 4–8).Overall, we observed high correlation for each of the two samplepairs, but also outliers suggesting the presence of DMRs. Based on the1108 VOLUME 28 NUMBER 10 OCTOBER 2010 nature biotechnology


a n a ly s i s© 2010 Nature America, Inc. All rights reserved.Figure 2 Comparison of DNA methylation mapsobtained with four different methods. Thescreenshot shows genome browser tracks forMeDIP-seq (first two tracks, in green), MethylCapseq(three tracks in blue, gray and red), RRBS(stacked light blue tracks) and Infinium (singleblack track with percentage values) across theHOXA cluster in a human ES cell line (HUES6).Each track represents data from a singlesequencing lane (MeDIP-seq, MethylCap-seq,RRBS) or microarray hybridization (Infinium).MeDIP-seq and MethylCap-seq data are visuallysimilar to ChIP-seq data, with peaks in regionsthat show high density of the target molecule(5-methylcytosine) and troughs in regions with lowdensity of methylated cytosines. The heights ofthe peaks represents the number of reads in eachgenomic interval, for each track normalized to thesame genome-wide read count. RRBS gives rise toclusters of CpGs with absolute DNA methylationmeasurements, separated by regions that arenot covered due to the reduced-representationproperty of the RRBS protocol. Each data pointcorresponds to the methylation level at a singleCpG, and dark blue points indicate highermethylation levels than light blue points. Infiniumdata is represented in a similar way to the RRBSdata, and the methylation levels at single CpGsare shown as percentage values. For reference,the CpG density is indicated by stacked points(black) at the bottom of the diagram, and CpGislands (red) as well as known genes (blue) arelisted as described previously 55,56 .RRBS data, we obtained Pearson correlationsaround 0.9 for all three region types, bothbetween the two ES cell lines (HUES6 andHUES8) and between the colon tumor andMeDIP (lane 1)MeDIP (lane 2)CpG Islandsmatched normal colon tissue. For MethylCap-seq and MeDIP-seq, thecorrelations were somewhat lower and ranged from 0.75 to 0.92 (Fig. 5and Supplementary Figs. 4–8). Using the DMR detection algorithm(Online Methods), we identified several hundred to several thousandDMRs in both sample pairs. There was substantial, but by no meansperfect, overlap between the DMRs identified by all three methods.For the two human ES cell lines, 277 out of 44,440 CpG islands weredetected as differentially methylated by each of the three methods(Fig. 5d). Pairwise comparisons for each sample and region type(Supplementary Figs. 4–8) confirmed that the agreement betweenthe three methods was statistically significant in all cases (P < 0.01,Fisher’s exact test). In total, we observed that up to 1,000 CpG islands,405 promoter regions or 1,924 of the 1-kilobase tiling regions (that is,


A n a ly s i sa b cFigure 3 Quantification of DNA methylationwith MeDIP-seq, MethylCap-seq and RRBS.(a–c) Absolute DNA methylation levels werecalculated from the data obtained by MeDIPseq(a), MethylCap-seq (b) and RRBS (c),respectively, and compared to DNA methylationlevels determined by the Infinium assay. ForMeDIP-seq and MethylCap-seq, sequencingreads were counted in 1-kb regions surroundingeach CpG that is interrogated by the Infinium1.00.80.60.40.21.00.80.60.40.21.00.80.60.40.2Pearson's r = 0.92assay, and a regression model was used to infer0Pearson's r = 0.84 0Pearson's r = 0.88 0absolute DNA methylation levels. Scatter plots0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0and correlation coefficients were calculated onDNA methylation level (Infinium) DNA methylation level (Infinium) DNA methylation level (Infinium)a test set that was not used for model fitting or feature selection. For RRBS, the DNA methylation level was determined as the percentage of methylatedCpGs within 200 bp surrounding each CpG that is interrogated by the Infinium assay. Data shown are for the HUES6 human ES cell line, and regionsthat did not have sufficient sequencing coverage were excluded.DNA methylation level (MeDIP-seq)DNA methylation level (MethylCap)DNA methylation level (RRBS)© 2010 Nature America, Inc. All rights reserved.As an additional validation, we selected eight method-specificDMRs based on the ES cell comparison, and we investigated DNAmethylation patterns in the two ES cell lines by clonal bisulfitesequencing (Table 2). These genomic regions were handpicked suchthat one method clearly identified them as DMRs whereas the twoother methods did not show a trend in either direction. Note that thispreselection makes the validation substantially harder than confirmingrandomly selected DMRs, because the magnitude of the DNAmethylation difference tends to be lower for method-specific DMRsthan for DMRs that are detected by multiple methods. As an additionalcomplication, some of the selected DMRs are highly repetitiveor overlap with known copy-number variations. Sequencing anaverage of 11 clones per sample and region we were able to confirmthree out of three MethylCap-seq–specific DMRs and two out of twoRRBS-specific DMRs. In contrast, two MeDIP-seq–specific DMRscould not be confirmed, and for the third region the agreement wasmarginal (Table 2 and Supplementary Data 1).To assess the practical relevance of the method-specific differences,we asked whether biologically interesting hits were missed by any ofthe three methods. For this analysis we focused on the colon samplesbecause of the large number of genes with a known or suspected rolein colon cancer. Our results show that several interesting DMRs aredetected by all methods, including tumor-specific hypermethylationin the promoters of GATA2 (ref. 31) and GATA5 (ref. 32). However, aconsiderable number of interesting DMRs were missed by MeDIP-seq,whereas MethylCap-seq and RRBS both detected those regions; theseinclude tumor-specific hypermethylation in the promoter regions ofSOX17 (ref. 33), POU2AF1 (ref. 34) and SEPT9 (ref. 35). Somewhatmore rarely, we also observed interesting DMRs being missed byMethylCap-seq or RRBS. For example, MethylCap-seq overlookedtumor-specific hypermethylation at the promoter of SFRP1 (ref. 36),and RRBS missed tumor-specific hypermethylation at the promoterof DKK2 (ref. 37).The effect of sequencing depth on mapping performanceMeDIP-seq, MethylCap-seq and RRBS use DNA sequencing as away of counting DNA fragments to determine the percentage ofmethylation-enriched reads that align to specific genomic regions(MeDIP-seq, MethylCap-seq) or to calculate the ratio of methylatedand unmethylated cytosines at single CpGs (RRBS). Conceptually,sequencing can be thought of as random sampling from a large pool ofDNA fragments. It is therefore expected that the performance of thesemethods increases when sequencing more DNA fragments, until itlevels off as the sequencing depth approaches saturation. To quantifyFigure 4 Genomic coverage of MeDIP-seq,MethylCap-seq, RRBS and Infinium. Genomiccoverage was quantified by the number of DNAmethylation measurements that overlap withCpG islands (top row), gene promoters (centerrow) and a 1-kb tiling of the genome (bottomrow). For MeDIP-seq and MethylCap-seq, thenumber of measurements is equal to the numberof unique sequencing reads that fall inside eachregion. For RRBS, it refers to the number ofvalid DNA methylation measurements at CpGswithin each region (one RRBS sequencing readtypically yields one measurement, but canalso give rise to more than one measurementif it contains several CpGs). For Infinium, thenumber of measurements is equal to the numberof CpGs within each region that are present onthe HumanMethylation27 microarray. CpGislands were calculated using CgiHunterCpG Islands(length ≥ 700 bp)44,440 regionsgenome-widePromoter regions(2 kb centered on TSS)23,690 regionsgenome-wideWhole genome(1 kb sliding window)2,858,143 regionsgenome-wide25–4910–245–9MeDIP-seqNo coverage≥50 12–425–49 ≥50 No coverage1No coverage≥50 1 2–45–9(http://cgihunter.bioinf.mpi-inf.mpg.de/), requiring a minimum CpG observed versus expected ratio of 0.6, a minimum GC content of 0.5 and aminimum length of 700 bp 55 . Promoter regions were calculated based on Ensembl gene annotations, such that the region starts 1 kb upstream of theannotated transcription start site (TSS) and extends to 1 kb downstream of the TSS. The genomic tiling was obtained by sliding a 1-kb window throughthe genome such that each tile starts at the position where the previous tile ends. No repeat-masking was performed for any of the three types ofgenomic regions. Data are shown for the HUES6 human ES cell line.5–910–2415–92–410–24 25–49 ≥50No coverage2–41≥5025–49MethylCap-seqNo coverage 1 2–45–925–4910–245–92–425–49≥5010–2410–24No coverage1≥100≥100RRBS50–9925–49 50–99≥1001–45–910–24No coverage1–45–910–2425–49No coverageNo coverage1–45–910–2425–4950–991221Infinium3–4 ≥53–4 ≥51 ≥2No coverageNo coverageNo coverage1110 VOLUME 28 NUMBER 10 OCTOBER 2010 nature biotechnology


a n a ly s i s© 2010 Nature America, Inc. All rights reserved.aMeDIP-seq read frequencyfor HUES6cRRBS measurementfor HUES610080604020Pearson's r = 0.8600 20 40 60 80 100MeDIP-seq read frequency for HUES81.00.80.60.40.20Pearson's r = 0.950 0.2 0.4 0.6 0.8 1.0RRBS measurement for HUES8bMethylCap-seq read frequencyfor HUES6d0 20 40 60 80 100MethylCap-seq read frequency for HUES8this effect, we repeated the accuracy analysis (Fig. 3) and the DMRdetection (Fig. 5) on randomly sampled subsets of sequencing reads.First, we benchmarked each method against the Infinium data, assessingtheir ability to quantify DNA methylation levels based on reducedread numbers (Supplementary Fig. 10). The results show that allthree methods give rise to accurate DNA methylation measurementsbased on as little as 20% of the total read coverage, and almost noimprovement was observed between 50% and 100% sequencing depth.Although these data suggest that relatively low sequencing depths areoften sufficient for obtaining accurate DNA methylation levels, thiscannot be generalized to the entire genome. Infinium probes tendto be located in CpG-rich genomic regions, which are also preferentiallycovered by MeDIP-seq, MethylCap-seq and RRBS measurements(Fig. 4), such that saturation is reached earlier in the vicinityof Infinium probes than in CpG-poor genomic regions.Second, we tested how many DMRs were still detected among thetwo sample pairs when the number of sequencing reads in each ofthe samples was reduced (Supplementary Fig. 11). For MeDIP-seq,the number of detected DMRs dropped to less than half when thesequencing depth was reduced to 50%, and there was little indicationthat the number of MeDIP-seq DMRs approaches saturation even atthe highest sequencing depth. For MethylCap-seq the decrease inthe number of detected DMRs is less dramatic and there is a trendtoward saturation. RRBS quickly approaches saturation especiallyfor the ES-cell comparison (Supplementary Fig. 11). Overall, the100806040200Pearson's r = 0.86Higher methylation in HUES6Lower methylation in HUES6MeDIP-seqMethylCap-seq535777182413525423332RRBS 288217151Number of CpG islandsgenome-wide: 44,4401,208484Figure 5 Detection of DMRs with MeDIP-seq, MethylCap-seq and RRBS.Average DNA methylation measurements were calculated for each CpGisland and compared between two human ES cell lines (HUES6 andHUES8). (a–c) Total read frequencies are shown for MeDIP-seq (a) andMethylCap-seq (b), and mean DNA methylation levels are shown for RRBS (c).Regions with insufficient sequencing coverage were excluded. (d) The Venndiagram displays the total number and mutual overlap of differentiallymethylated CpG islands that could be identified by each method. CpGislands were classified as hypermethylated or hypomethylated (dependingon the directionality of the difference) if the absolute DNA methylationdifference exceeded 20 percentage points (for RRBS) or if there wasat least a twofold difference in read number between the two samples(for MeDIP-seq and MethylCap-seq)—but only if Fisher’s exact test withmultiple-testing correction gave rise to an estimated false-discovery rate ofdifferential DNA methylation that was


A n a ly s i s© 2010 Nature America, Inc. All rights reserved.Table 2 Validation of method-specific DMRs for MeDIP-seq, MethylCap-seq and RRBSDMR location Description Experimental validation MeDIP-seq MethylCap-seq RRBSMeDIP-seq–specific DMRchr10:88,149,016-88,149,732MeDIP-seq–specific DMRchr16:31,142,904-31,143,799MeDIP-seq–specific DMRchr1:211,290,079-211,290,896MethylCap-seq–specificDMR chr20:29,526,646-29,527,380MethylCap-seq–specificDMR chr2:151,825,938-151,826,902MethylCap-seq–specificDMR chr13:44,348,934-44,349,700RRBS-specific DMRchr3:186,889,821-186,890,200RRBS-specific DMRchr3:32,609,320-32,609,612Intergenic CpG island ~30 kbupstream of GRID1, partial overlapwith degenerate L1 elementCpG island overlapping with theterminal exon of TRIM72CpG island overlapping with theputative promoter region of RPS6KC1CpG island overlapping with theputative promoter region of REM1CpG island overlapping with theputative promoter region of RBM43and a known copy-number variationIntergenic CpG island ~60 kbupstream of NUFIP1, partial overlapwith degenerate Alu elementCpG island overlapping with aninternal exon and intron of IGF2BP2Intergenic CpG island ~20kbupstream of DYNC1LI1lines (Supplementary Data 2). These data suggest that young retrotransposonsfind ways to evade silencing by DNA methylation inpluripotent cells, which may contribute to their ability to maintainactivity in spite of an elaborate epigenetic genome defense 40 .DMR discovery using the Infinium assayOur study used the Infinium HumanMethylation27 assay as a commonreference for evaluating the accuracy of the sequencing-basedmethods, which was justified by prior studies showing high quantitativeaccuracy of the Infinium assay 17,24 . However, no prior studyinvestigated the Infinium HumanMethylation27 assay’s power todetect DMRs on a genome-wide scale, hence we could not use theInfinium assay as reference when evaluating DMR discovery bythe sequencing-based methods. In fact, its low genomic coverage isexpected to limit the utility of the Infinium assay for DMR discoveryin spite of its well-established accuracy (Fig. 4). To empirically addressthis question, we initially performed statistical testing in much thesame way as was done for Figure 5. However, most CpG islands werecovered by only two Infinium probes, which resulted in low statisticalpower to detect significant differences. Specifically, paired-samplest-tests identified just three significant DMRs among the ES cell linesand two DMRs between the colon tumor and matched normal colontissue (data not shown).Thus, we reformulated our question and asked how many trueDMRs exhibited suggestive (albeit insignificant) DNA methylationdifferences in the Infinium data. As an approximation of trueDMRs, we focused on those CpG islands that were detected by atleast two sequencing-based methods (which are unlikely to containa high number of technical artifacts according to the comparativevalidations described above). Between the two ES cell lines a total of1,000 consensus DMRs were identified (corresponding to the sumof all center fields in Fig. 5), of which 251 were covered by at leastone Infinium probe. Similarly, we identified 463 consensus DMRsHUES6: 38/56 (68%) methylated CpGsHUES8: 26/44 (59%) methylated CpGs→ insignificant (P = 0.41)HUES6: 342/362 (94%) methylated CpGsHUES8: 466/523 (89%) methylated CpGs→ marginally hypermeth. (P = 0.0051)HUES6: 53/60 (88%) methylated CpGsHUES8: 45/50 (90%) methylated CpGs→ insignificant (P = 1.0)HUES6: 5/72 (7%) methylated CpGsHUES8: 78/84 (93%) methylated CpGs→ hypomethylated (P = 1.4E-30)HUES6: 161/208 (77%) methylated CpGsHUES8: 9/104 (9%) methylated CpGs→ hypermethylated (P = 3.3E-33)HUES6: 80/88 (91%) methylated CpGsHUES8: 41/79 (52%) methylated CpGs→ hypermethylated (P = 1.2E-08)HUES6: 5/90 (6%) methylated CpGsHUES8: 88/90 (98%) methylated CpGs→ hypomethylated (P = 4.3E-42)HUES6: 41/121 (34%) methylated CpGsHUES8: 130/143 (91%) methylated CpGs→ hypomethylated (P = 3.5E-23)Hypermethylated(Q = 1.1E-04)Hypermethylated(Q = 1.2E-05)Hypermethylated(Q = 3.0E-06)InsufficientcoverageInsignificant(Q = 0.18)Insignificant(Q = 0.40)InsufficientcoverageInsufficientcoverageInsignificant(Q = 0.59)Insignificant(Q = 0.73)Insignificant(Q = 0.97)Hypomethylated(Q = 1.8E-09)Hypermethylated(Q = 7.3E-09)Hypermethylated(Q = 8.3-07)Insignificant(Q = 0.18)Insignificant(Q = 0.52)Insignificant(Q = 0.43)InsufficientcoverageInsignificant(Q = 0.29)InsufficientcoverageInsufficientcoverageInsufficientcoverageHypomethylated(Q = 3.5E-40)Hypomethylated(Q = 2.9E-26)Experimental validation of method-specific DMRs between two ES cell lines (HUES6 and HUES8). The table summarizes the results of clonal bisulfite sequencing for eight regionsthat showed clear-cut DNA methylation differences according to one method but not according to the other two. The P values in column 3 were calculated from the clonal bisulfitesequencing data using Fisher’s exact test, based on the DNA methylation levels of individual CpGs. The Q values in columns 4–6 were derived from the DNA methylation maps asdescribed in the Online Methods. One out of three MeDIP-seq–specific DMRs, three out of three MethylCap-seq–specific DMRs and two out of two RRBS-specific DMRs could beconfirmed by clonal bisulfite sequencing data (bold print). All genomic coordinates are relative to the NCBI36 (hg18) genome assembly and refer to the amplicon on which thevalidation was performed. A detailed documentation of the validation experiments is available in Supplementary Data 1.between the colon tumor and matched normal colon tissue, of which177 were covered by at least one Infinium probe. In most cases, thedirectionality of the difference was consistent between the consensusDMRs and the Infinium data (Supplementary Fig. 12). But when weimposed a minimum threshold of 20 percentage points DNA methylationdifference in the same way as for RRBS, the number of InfiniumdetectedDMRs dropped to 162 (ES-cell comparison) and 95 (coloncancer comparison). In other words, the Infinium assay detectedapproximately one-fifth of the consensus DMRs that we identifiedby the sequencing-based methods.DISCUSSIONOver the last decade, DNA methylation mapping has played animportant role in establishing the prevalence of altered DNA methylationin cancer cells 41,42 . More recently, researchers have alsostarted to systematically study the role of DNA methylation ina wide range of non-neoplastic diseases 43 . This is indeed a goodtime to probe for epigenetic alterations that contribute to humandiseases. Genome-wide association studies have been completedfor all common diseases and point to a major role of nongeneticfactors in the etiology of most diseases 44 . Furthermore, it has beensuggested that epigenetic events could provide a tractable linkbetween the genome and the environment, with the epigenomeemerging as a biochemical record of relevant life events 45,46 .Systematic investigation of these topics requires powerful, accurateand cost-efficient methods for identifying DNA methylationdifferences across many samples.The goal of this study was to evaluate current methods for globalDNA methylation mapping and to compare their performance whenapplied under real-world conditions. To mimic a typical diseasecenteredcase-control study, we worked with primary patient material(colon samples) and used lower amounts of input DNA than in mostprevious studies (MeDIP-seq: 300 ng; MethylCap-seq: 1 μg; RRBS:1112 VOLUME 28 NUMBER 10 OCTOBER 2010 nature biotechnology


a n a ly s i s© 2010 Nature America, Inc. All rights reserved.50 ng; Infinium: 1 μg). We focused on cell types that are known toexhibit relatively moderate DNA methylation differences 31,47 , incontrast to the massive DNA methylation alterations that are frequentlyobserved in cultured somatic cells 11 and cancer cell lines 48 .Finally, because all four methods included in the current study arewidely available and not excessively costly, there are few obstaclesto using this technology comparison as a blueprint for individuallaboratory efforts as well as large-scale epigenomic case-controlstudies investigating the epigenetics of human diseases.Overall, our data confirmed that all four methods provide accurateDNA methylation measurements and can be used to detect DMRs inclinical samples. In terms of accuracy, the bisulfite-based methods(RRBS, Infinium) performed slightly better than the enrichmentbasedmethods and did not require any statistical correction of CpGbias. Furthermore, the genomic coverage was moderately higher forMethylCap-seq than for MeDIP-seq, RRBS coverage was by designfocused on CpG-rich regions and the Infinium assay covered a relativelysmall number of preselected genomic regions.Despite the striking differences in genomic coverage, a substantialfraction of DMRs detected by MeDIP-seq or MethylCap-seq werealso identified by RRBS, and vice versa. This somewhat counterintuitiveobservation can be explained by the role of region-specific readcoverage for the ability to identify statistically significant DMRs. If agenomic region is CpG poor and thus rarely sequenced by MeDIPseqor MethylCap-seq, both methods have low statistical power todetect differential DNA methylation. In contrast, CpG-rich genomicregions tend to be more amenable to DMR detection by MeDIP-seqand MethylCap-seq and are also frequently covered by RRBS measurements.Finally, we observed that MethylCap-seq was able to detectroughly twice as many DMRs as MeDIP-seq at comparable sequencingdepths, RRBS detected more DMRs than MeDIP-seq but fewer DMRsthan MethylCap-seq, and the Infinium assay detected only 20% ofthe consensus DMRs identified by the sequencing-based methods.These differences could be reproduced in two independent pairwisecomparisons, providing strong indication that they are robust acrossbiological replicates and cannot be explained by random experimentalvariation. On the other hand, we used one specific protocolfor each method, and it is quite possible that protocol variations(e.g., different antibody for MeDIP-seq, different elution procedure forMethylCap-seq or different size selection for RRBS) would producedifferent results.Our study also reinforces the importance of sequencing depth as akey parameter determining the power to detect differential methylationwith any of the sequencing-based methods. To allow for a fair andpractically relevant comparison, we sequenced ~30–40 million readsfor each sample and method. However, it became evident that deepersequencing would identify further DMRs, especially for MeDIP-seqand MethylCap-seq (Supplementary Fig. 11). For disease-centeredstudies it is therefore necessary to make an informed decision abouthow to distribute the available resources between sequencing fewersamples more deeply and sequencing more samples less deeply. Sucha decision can be guided by statistical power calculations when someprior knowledge exists about the characteristics of expected DMRs(e.g., magnitude of difference, location in CpG-rich versus CpG-poorgenomic regions), or they can be dictated by practical considerationssuch as the number of available samples. In our experience and atcurrent sequencing costs, a range of ~30–60 million reads per samplefor MeDIP-seq and MethylCap-seq, and a range of ~10–20 millionreads per sample for RRBS constitute a viable compromise betweenbreadth and depth of sequencing. In contrast, whole-genome bisulfitesequencing 49 provides comprehensive genomic coverage at the cost ofhaving to sequence over a billion reads per sample. On the other endof the spectrum, low sequencing depths are often sufficient to detectstrong differences such as global loss of DNA methylation but do notprovide reliable locus-specific information 50 .Genome-wide studies tend to ignore repetitive regions due totechnical difficulties, and the few studies that focused specificallyon mapping DNA methylation in repetitive regions did so at relativelylow coverage 51–53 . The current data set was well-suited toanalyze DNA methylation in repetitive regions because the jointresults obtained by three different experimental methods helpedus to control for technical artifacts that can burden the analysis ofrepetitive DNA. We observed that repeat sequences are most highlymethylated when they are CpG rich and highly prevalent in thehuman genome (Supplementary Data 2). In contrast, the DNAmethylation levels varied widely among repeat sequences that areeither CpG poor or infrequent in the genome. These results lendsupport to the hypothesis that DNA methylation provides a mechanismfor keeping active retrotransposons in check 54 . They also arguefor a highly specific mechanism of repeat repression, which targetsDNA methylation mostly to those repeat sequences that threatengenome integrity, whereas many ‘benign’ repeat sequences mayremain unmethylated.In summary, we benchmarked four methods for genome-scaleDNA methylation mapping in terms of their accuracy and power todetect DNA methylation differences. These results will facilitate theselection of suitable methods for studying the role of DNA methylationin disease and development.MethodsMethods and any associated references are available in the online versionof the paper at http://www.nature.com/naturebiotechnology/.Note: Supplementary information is available on the Nature Biotechnology website.AcknowledgmentsWe thank A. Crenshaw and M. Parkin (Broad Institute) for assistance withthe Infinium assay and K. Halachev (Max Planck Institute for Informatics) forthe provision of genome annotation files. C.B. is supported by a Feodor LynenFellowship from the Alexander von Humboldt Foundation. A.B.B. is supported bythe Dutch Cancer Foundation (KWF, grant KUN 2008-4130). A.M. is supportedby the Massachusetts Life Science Center and the Pew Charitable Trusts. Thedescribed work was in part funded by the Pew Charitable Trusts, the US NationalInstitutes of Health Roadmap Initiative on Epigenomics (U01ES017155) and theEuropean Union’s CANCERDIP project (HEALTH-F2-2007-200620).AUTHOR CONTRIBUTIONSC.B., E.M.T. and A.M. conceived and designed the study; E.M.T., A.B.B., F.S. andH.G. performed the experiments; C.B., F.M. and N.J. analyzed the data; C.B., A.G.,H.G.S. and A.M. interpreted the results; and C.B. and A.M. wrote the paper.COMPETING FINANCIAL INTERESTSThe authors declare no competing financial interests.Published online at http://www.nature.com/naturebiotechnology/.Reprints and permissions information is available online at http://npg.nature.com/reprintsandpermissions/.1. Bird, A. DNA methylation patterns and epigenetic memory. Genes Dev. 16, 6(2002).2. Baylin, S.B. & Ohm, J.E. Epigenetic gene silencing in cancer—a mechanism forearly oncogenic pathway addiction? Nat. Rev. Cancer 6, 107–116 (2006).3. Esteller, M. Epigenetics in cancer. N. Engl. J. Med. 358, 1148–1159 (2008).4. Feinberg, A.P. & Tycko, B. The history of cancer epigenetics. Nat. Rev. Cancer 4,143–153 (2004).5. Issa, J.P. CpG island methylator phenotype in cancer. Nat. Rev. Cancer 4, 988–993(2004).6. Jones, P.A. & Laird, P.W. Cancer epigenetics comes of age. Nat. Genet. 21,163–167 (1999).nature biotechnology VOLUME 28 NUMBER 10 OCTOBER 2010 1113


A n a ly s i s© 2010 Nature America, Inc. All rights reserved.7. Richardson, B. Primer: epigenetics of autoimmunity. Nat. Clin. Pract. Rheumatol.3, 521–527 (2007).8. Tobi, E.W. et al. DNA methylation differences after exposure to prenatal famine arecommon and timing- and sex-specific. Hum. Mol. Genet. 18, 4046–4053 (2009).9. Urdinguio, R.G., Sanchez-Mut, J.V. & Esteller, M. Epigenetic mechanisms in neurologicaldiseases: genes, syndromes, and therapies. Lancet Neurol. 8, 1056–1072 (2009).10. Bock, C. Epigenetic biomarker development. Epigenomics 1, 99–110 (2009).11. Meissner, A. et al. Genome-scale DNA methylation maps of pluripotent anddifferentiated cells. Nature 454, 766–770 (2008).12. Down, T.A. et al. A Bayesian deconvolution strategy for immunoprecipitation-basedDNA methylome analysis. Nat. Biotechnol. 26, 779–785 (2008).13. Weber, M. et al. Chromosome-wide and promoter-specific analyses identify sites ofdifferential DNA methylation in normal and transformed human cells. Nat. Genet.37, 853–862 (2005).14. Brinkman, A.B. et al. Whole-genome DNA methylation profiling using MethylCapseq-seq.Methods published online, doi:10.1016/j.ymeth.2010.06.012 (11 June2010).15. Rauch, T. & Pfeifer, G.P. Methylated-CpG island recovery assay: a new techniquefor the rapid detection of methylated-CpG islands in cancer. Lab. Invest. 85,1172–1180 (2005).16. Serre, D., Lee, B.H. & Ting, A.H. MBD-isolated Genome Sequencing provides ahigh-throughput and comprehensive survey of DNA methylation in the humangenome. Nucleic Acids Res. 38, 391–399 (2010).17. Bibikova, M. et al. Genome-wide DNA methylation profiling using Infinium assay.Epigenomics 1, 177–200 (2009).18. Eckhardt, F. et al. DNA methylation profiling of human chromosomes 6, 20 and 22.Nat. Genet. 38, 1378–1385 (2006).19. Brunner, A.L. et al. Distinct DNA methylation patterns characterize differentiatedhuman embryonic stem cells and developing human fetal liver. Genome Res. 19,1044–1056 (2009).20. Irizarry, R.A. et al. Comprehensive high-throughput arrays for relative methylation(CHARM). Genome Res. 18, 780–790 (2008).21. Oda, M. et al. High-resolution genome-wide cytosine methylation profiling withsimultaneous copy number analysis and optimization for limited cell numbers.Nucleic Acids Res. 37, 3829–3839 (2009).22. Gu, H. et al. Genome-scale DNA methylation mapping of clinical samples at singlenucleotideresolution. Nat. Methods 7, 133–136 (2010).23. Cowan, C.A. et al. Derivation of embryonic stem-cell lines from human blastocysts.N. Engl. J. Med. 350, 1353–1356 (2004).24. Weisenberger, D.J. et al. Comprehensive DNA methylation analysis on the IlluminaInfinium assay platform (Illumina, San Diego, California, USA, 2008). 〈http://www.illumina.com/Documents/products/appnotes/appnote_infinium_methylation.pdf〉.(2008).25. Bock, C. et al. Inter-individual variation of DNA methylation and its implicationsfor large-scale epigenome mapping. Nucleic Acids Res. 36, e55 (2008).26. Pelizzola, M. et al. MEDME: an experimental and analytical methodology for theestimation of DNA methylation levels based on microarray derived MeDIPenrichment.Genome Res. 18, 1652–1659 (2008).27. Robinson, M.D., Statham, A.L., Speed, T.P. & Clark, S.J. Protocol matters: whichmethylome are you actually studying? Epigenomics 2, 587 (2010).28. Faul, F. et al. G*Power 3: a flexible statistical power analysis program for the social,behavioral, and biomedical sciences. Behav. Res. Methods 39, 175–191 (2007).29. Beroukhim, R. et al. The landscape of somatic copy-number alteration across humancancers. Nature 463, 899–905 (2010).30. Redon, R. et al. Global variation in copy number in the human genome. Nature444, 444–454 (2006).31. Irizarry, R.A. et al. The human colon cancer methylome shows similar hypo- andhypermethylation at conserved tissue-specific CpG island shores. Nat. Genet. 41,178–186 (2009).32. Hellebrekers, D.M. et al. GATA4 and GATA5 are potential tumor suppressors andbiomarkers in colorectal cancer. Clin. Cancer Res. 15, 3990–3997 (2009).33. Zhang, W. et al. Epigenetic inactivation of the canonical Wnt antagonist SRY-boxcontaining gene 17 in colorectal cancer. Cancer Res. 68, 2764–2772 (2008).34. Tenesa, A. et al. Genome-wide association scan identifies a colorectal cancersusceptibility locus on 11q23 and replicates risk loci at 8q24 and 18q21. Nat.Genet. 40, 631–637 (2008).35. Lofton-Day, C. et al. DNA methylation biomarkers for blood-based colorectal cancerscreening. Clin. Chem. 54, 414–423 (2008).36. Caldwell, G.M. et al. The Wnt antagonist sFRP1 in colorectal tumorigenesis. CancerRes. 64, 883–888 (2004).37. Hirata, H. et al. Wnt antagonist gene DKK2 is epigenetically silenced and inhibitsrenal cancer progression through apoptotic and cell cycle pathways. Clin. CancerRes. 15, 5678–5687 (2009).38. Ehrlich, M. DNA hypomethylation in cancer cells. Epigenomics 1, 239–259 (2009).39. Jurka, J. Repbase update: a database and an electronic journal of repetitiveelements. Trends Genet. 16, 418–420 (2000).40. Bestor, T.H. & Tycko, B. Creation of genomic methylation patterns. Nat. Genet. 12,363–367 (1996).41. Esteller, M. Cancer epigenomics: DNA methylomes and histone-modification maps.Nat. Rev. Genet. 8, 286–298 (2007).42. Jones, P.A. & Baylin, S.B. The epigenomics of cancer. Cell 128, 683–692(2007).43. Feinberg, A.P. Phenotypic plasticity and the epigenetics of human disease. Nature 447,433–440 (2007).44. Manolio, T.A. et al. Finding the missing heritability of complex diseases. Nature 461,747–753 (2009).45. Foley, D.L. et al. Prospects for epigenetic epidemiology. Am. J. Epidemiol. 169,389–400 (2009).46. Heijmans, B.T. et al. The epigenome: archive of the prenatal environment.Epigenetics 4, 526–531 (2009).47. Doi, A. et al. Differential methylation of tissue- and cancer-specific CpG islandshores distinguishes human induced pluripotent stem cells, embryonic stem cellsand fibroblasts. Nat. Genet. 41, 1350–1353 (2009).48. Smiraglia, D.J. et al. Excessive CpG island hypermethylation in cancer cell linesversus primary human malignancies. Hum. Mol. Genet. 10, 1413–1419 (2001).49. Lister, R. et al. Human DNA methylomes at base resolution show widespreadepigenomic differences. Nature 462, 315–322 (2009).50. Popp, C. et al. Genome-wide erasure of DNA methylation in mouse primordial germcells is affected by AID deficiency. Nature 463, 1101–1105 (2010).51. Horard, B. et al. Global analysis of DNA methylation and transcription of humanrepetitive sequences. Epigenetics 4, 339–350 (2009).52. Rodriguez, J. et al. Genome-wide tracking of unmethylated DNA Alu repeats innormal and cancer cells. Nucleic Acids Res. 36, 770–784 (2008).53. Weisenberger, D.J. et al. Analysis of repetitive element DNA methylation byMethyLight. Nucleic Acids Res. 33, 6823–6836 (2005).54. Yoder, J.A., Walsh, C.P. & Bestor, T.H. Cytosine methylation and the ecology ofintragenomic parasites. Trends Genet. 13, 335–340 (1997).55. Bock, C. et al. CpG island mapping by epigenome prediction. PLoS Comput. Biol.3, e110 (2007).56. Pruitt, K.D., Tatusova, T. & Maglott, D.R. NCBI reference sequences (RefSeq): acurated non-redundant sequence database of genomes, transcripts and proteins.Nucleic Acids Res. 35 (Database issue), D61–D65 (2007).1114 VOLUME 28 NUMBER 10 OCTOBER 2010 nature biotechnology


© 2010 Nature America, Inc. All rights reserved.ONLINE METHODSSample origin and cell culture. Human ES cells were cultured in knockoutserum replacement (KOSR) medium according to established protocols 23 andgenomic DNA was extracted as described previously 57 . DNA for the colontumor and matched normal colon tissue was purchased from BioChain (lotnumber A704198). Both samples originate from the same donor, an 81-yearoldmale patient diagnosed with moderately differentiated adenocarcinoma.Methylated DNA immunoprecipitation (MeDIP-seq). MeDIP-seq 12 wasperformed using the EZ DNA methylation kit (Zymo Research). A total of300ng DNA per sample was sonicated using Bioruptor (Diagenode) with 8intervals of 10min (30s on, 30s off). Sonicated DNA was end-repaired andligated with sequencing adapters as described previously 12 . After gel-basedsize selection, methylated DNA immunoprecipitation was performed accordingto the manufacturer’s protocol. A total of 1 μg of monoclonal antibodyagainst 5-methylcytosine (included in the EZ DNA methylation kit) was usedfor immunoprecipitation. The immunoprecipitated DNA was PCR-amplifiedand the specificity of the enrichment was confirmed by qPCR for selected locias described previously 58 . Two lanes of 36-bp single-ended sequencing wereperformed on the Illumina Genome Analyzer II according to the manufacturer’sstandard protocol. Maq with default parameters was used to align thesequencing reads to the NCBI36 (hg18) assembly of the human genome 59 .Methylated-DNA capture (MethylCap-seq). MethylCap-seq 14 was performedin a robotized procedure using a SX-8G / IP-Star (Diagenode). 2 μg of His6-GST-MBD (Diagenode) was combined with 1μg of sonicated DNA in 200μlof binding buffer (BB, 20mM Tris-HCl pH 8.5, 0.1% Triton X-100) containing200mM NaCl. This solution was incubated at 4 °C for 2 h. Magnetic GST-beadswere prepared by washing 35μl of a well-mixed MagneGST glutathione particlesuspension (Promega) with 200 μl of binding buffer plus 200 mM NaClat 4 °C. Washing was repeated once and the supernatant was removed. TheGST-MBD-DNA solution was added to the washed and collected beads, andthis suspension was rotated for another hour at 4 °C. After removal of thesupernatant (this is the flow-through) the beads-GST-MBD-DNA complexeswere eluted by washing. 200 μl of binding buffer with different concentrationsof NaCl was added and the suspension was rotated for 10min at 4 °C. Beadswere captured using a magnet, and the supernatant was collected. The elutionprocedure consisted of 1× 300 mM (wash), 2 × 400 mM (wash), 1 × 500 mM(“low” eluate), 1 × 600 mM (“medium” eluate), 1× 800 mM NaCl (“high”eluate). The collected eluates were purified using QIAquick PCR purificationspin columns (Qiagen), eluted with 100 μl elution buffer and prepared forsequencing as described previously 14 . A single lane of 36-bp single-endedsequencing on the Illumina Genome Analyzer II was performed for the low,medium and high eluates, respectively. The sequencing reads were aligned tothe NCBI36 (hg18) assembly of the human genome using Illumina’s analysispipeline (ELAND) with default parameters. The lanes for each of the threeeluates are shown separately in Figure 2, and we tested whether the accuracyrelative to the Infinium assay could be improved by taking this additionalinformation into account. However, a linear model that was based on theseparate read counts of the three lanes did not outperform a model that wasbased on the sum of the three lanes, which is why we combined the reads fromall three libraries per sample for the analyses described in this paper.Reduced representation bisulfite sequencing (RRBS). RRBS 22 was performedaccording to a previously published protocol 57 with some optimizations forclinical samples and low amounts of input DNA 22 . The main steps were:(i) A total of 50ng (ES cells) or 1 μg (colon samples) genomic DNA was digestedby 5U to 20 U of MspI (New England Biolabs, NEB) for up to 16 h. (ii) End-repairand adenylation of digested DNA were performed in a 20 μl reaction consistingof 10U of Klenow fragments (3′→ 5′ exo-, NEB), 2 μl premixed nucleotidetriphosphates (1 mM dGTP, 10 mM dATP, 1 mM 5′ methylated dCTP). Thereaction was incubated at 30 °C for 30 min followed by 37 °C for additional 30min. (iii) Preannealed 5-methylcytosine-containing Illumina adapters wereligated with adenylated DNA fragments in a 20 μl reaction containing of1 μl concentrated T4 ligase (NEB), 1–2 μl of 15 μM adapters at 16 °C for16 to 20 h. (iv) Gel-based selection for fragments with insertion sizes of 40 to120 bp and 120 to 220 base pairs was performed as described previously 22 .(v) Bisulfite treatment with the EpiTect Bisulfite Kit (Qiagen) was conductedfollowing the protocol designated for DNA isolated from formalin-fixed andparaffin-embedded tissues. Two rounds of conversion were performed inorder to maximize bisulfite conversion rates. The final bisulfite-convertedDNA was eluted with 2× 20μl pre-heated (65 °C) EB buffer. (vi) To determinethe minimum number of PCR cycles for final library enrichment, analytical(10 μl) PCR reactions containing 0.5 μl of bisulfite-treated DNA, 0.2 μM eachof Illumina PCR primers LPX1.1 and 2.1 and 0.5 U PfuTurbo Cx Hotstart DNApolymerase (Stratagene) were set up. The thermocycler conditions were: 5 minat 95 °C, varied cycle numbers (10–20) of 20 s at 95 °C, 30 s at 65 °C, 30 s at72 °C, followed by 7 min at 72 °C. PCR products were visualized by runningon a 4–20% polyacrylamide Criterion TBE Gel (Bio-Rad) and stained by SYBRGreen. The final libraries were generated by 8 of 25 μl PCR reaction with eachone containing 2–3μl of bisulfite-converted template, 1.25 U PfuTurbo CxHotstart polymerase and 0.2 μM each of Illumina LPX1.1 as well as 2.1 PCRprimers. The libraries were PCR amplified and sequenced on the IlluminaGenome Analyzer II as described previously 22 . The sequencing reads werealigned to the NCBI36 (hg18) assembly of the human genome using a customalignment software that was developed for RRBS data 11 .Microarray-based epigenotyping (Infinium). Infinium 17 analysis was performedby the Genetic Analysis Platform at the Broad Institute. A total of1 μg of genomic DNA per sample was bisulfite-treated according to themanufacturer’s protocol and hybridized onto Infinium HumanMethylation 27bead arrays (Illumina). We previously observed almost perfect agreementbetween technical replicates (Pearson’s r > 0.98), which is why only a singlehybridization was performed for each sample.Data preparation and quality control. For MeDIP-seq and MethylCap-seq,the aligned reads were extended to the mean fragment length obtained duringsonication, and from each group of duplicate reads (that is, reads alignedto the exact same start position on the same chromosome) all but one readwere discarded, in order to minimize the impact of PCR bias on downstreamanalysis. For RRBS, the aligned reads were compared to the reference genome,and the DNA methylation status was determined using custom software asdescribed previously 22 . Infinium HumanMethylation27 data were processedwith Illumina’s BeadStudio 3.2 software, using the default background subtractionmethod for normalization. UCSC Genome Browser tracks were constructedby custom scripts implemented in the Python programming language(http://www.python.org/).Quantification of absolute DNA methylation levels. We used linear regressionmodels to estimate the absolute DNA methylation levels from the MeDIPseqand MethylCap-seq read counts. Based on a number of different featureselection experiments, we found that the following combination of variableswas robustly predictive of DNA methylation levels: (i) the square root of thetotal number of MeDIP-seq or MethylCap-seq reads within the given region,(ii) the square root of the total number of whole-cell extract (WCE) readswithin the region (based on a cross-tissue WCE track that we routinely usefor ChIP-seq data normalization), (iii) the logit of the CpG frequency withinthe region, (iv) the relative GC content of the region, (v) the ratio of Cs relativeto CpGs, and (vi) the relative repeat content of the region as determinedby RepeatMasker (http://www.repeatmasker.org/). For both MeDIP-seq andMethylCap-seq, we observed that the read frequencies were strongly positivelyassociated with the absolute methylation level obtained using the Infiniumassay, whereas the repeat content was moderately positively associated. Incontrast, the logit of the CpG frequency was highly negatively associatedwith DNA methylation, and all other variables as well as the model’s interceptexhibited a moderately negative association. For model fitting and performanceevaluation, the current data set was split into equally sized training andtest sets. All model fitting was performed using the R statistics package(http://www.r-project.org/).Identification of differentially methylated regions. In our experience, classicalpeak detection 60,61 is not well-suited for DMR identification becauseof the high number of spurious hits encountered when borderline peaks aredetected in one sample but not in the other (C.B., unpublished observation).doi:10.1038/nbt.1681nature biotechnology


© 2010 Nature America, Inc. All rights reserved.Instead, we used a statistical test to compare two samples directly with eachother. For a given region with RRBS data, we count the number of methylatedvs. unmethylated CpGs in both samples and perform Fisher’s exact testto obtain a p-value that is indicative of the likelihood of the region being aDMR. Similarly, for MeDIP-seq and MethylCap-seq we count the numbers ofreads that align inside the region for both samples and use Fisher’s exact test tocontrast these values with the total numbers of reads that align elsewhere in thegenome. And for the Infinium assay we use a paired-samples t-test to comparethe two samples’ β-values of all Infinium probes inside the region. These testsare performed on a large number of genomic regions in parallel (e.g., on allCpG islands), and the p-values are corrected for multiple testing using theq-value method 62 . Genomic regions with a q-value of less than 0.1 are flaggedas hypermethylated or hypomethylated (depending on the directionality ofthe difference), but only if the absolute DNA methylation difference exceeds20 percentage points (for RRBS and Infinium) or if there is at least a twofolddifference in the read number (for MeDIP-seq and MethylCap-seq). Thesethresholds were chosen by their practical utility in a number of comparisonsbetween different cell types and have no further justification. We also markgenomic regions with insufficient sequencing coverage, but do not excludethem from DMR analysis. For MeDIP-seq and MethylCap-seq we require atleast ten reads per 10 million total reads for the sample with higher read coverage,and for RRBS we require a minimum of five CpGs with at least five readseach in both samples.This statistical approach to DMR identification requires us to define setsof genomic regions on which the analysis is being performed. We pursueda two-way strategy to maximize the chances of finding interesting DMRs.One the one hand, we focused specifically on CpG islands and gene promoters,which are prime candidates for epigenetic regulation. This approachprovides increased statistical power for regions with well-known functionalroles because the relatively low number of CpG islands and gene promotersreduces the burden of multiple-testing correction compared to the genomewidecase. On the other hand, we used a 1-kilobase tiling of the genome todetect DMRs that are located outside of any candidate regions. And to castan even wider net, we collected a comprehensive set of 13 types of genomicregions, which includes not only CpG islands and gene promoters, but alsoCpG island shores 31 , enhancers 63 , evolutionary conserved regions andother types of genomic regions. DMR data for all of these region setswere calculated using a set of Python and R scripts and are available online(http://meth-benchmark.computational-epigenetics.org/).Experimental validation. Based on the CpG islands that were detected asdifferentially methylated between the two ES cell lines (Fig. 5), we manuallyselected eight method-specific DMRs for experimental validation. To thatend, those CpG islands that were identified as statistically significant DMRsby one method (but not by the other two methods) were visually inspectedin the UCSC Genome Browser, and regions were selected for validation onlyif the data fully supported their classification as method-specific DMRs. Inparticular, regions were not selected if a second method already picked up asuggestive but insignificant trend in the same direction as the first method,or when the data of the first method already suggested that the DMR was afalse-positive hit (e.g., because of contradictory trends in the vicinity of theDMR). Experimental validation was performed by clonal bisulfite sequencingfollowing established protocols 64 . Primers were designed using MethPrimer 65such that the amplicon overlapped with those CpGs that exhibited the highestlevels of differential methylation according to our original data. To preparefor bisulfite sequencing, 1 μg of DNA was bisulfite-converted using theEpiTect kit (Qiagen); 50 ng of bisulfite-converted DNA was PCR-amplified(Supplementary Data 1 for primer sequences); and purified amplicons werecloned using the TOPO TA cloning kit (Invitrogen). For each region an averageof 11 clones were randomly chosen for sequencing. All sequencing data wereprocessed using the BiQ Analyzer software 66 , and the results are summarizedin Supplementary Data 1.Analysis of repetitive DNA. Repeat sequences were obtained from databaseversion 14.07 of RepBase Update 39 , which is publicly available online(http://www.girinst.org/server/RepBase/index.php). From a total of 11,670prototypic repeat sequences we selected those 1,267 that were annotated eitherto human or to its ancestors in the taxonomic tree, and we combined theseprototypic repeat sequences into a pseudo-genome file. Maq with defaultparameters was used to align MeDIP-seq, MethylCap-seq, RRBS, ChIP-seq(H3K4me3) and whole-cell extract (WCE) sequencing reads against thispseudo-genome 59 . For RRBS, both the reads and the reference genome werebisulfite-converted in silico before the alignment. The epigenetic status ofeach prototypic repeat sequence was quantified as follows: (i) For MeDIP-seq,MethylCap-seq and ChIP-seq we calculated the odds ratios relative to theWCE data. (ii) For RRBS we computed the number of methylated CpGs, totalnumber of CpG measurements and percentage of DNA methylation based onthe comparison of the aligned reads with the prototypic repeat sequence.We discarded rare repeats with WCE coverage below 100 aligned readsor RRBS coverage below 25 CpG measurements, resulting in 553 prototypicrepeat sequences that were used for further analysis. Among these were 97LINE class sequences (92 of them from the L1 family), 51 SINEs (48 of themfrom the Alu family), 6 SVAs, 62 DNA repeats, 15 satellite repeats, 315 LTRs,1 low-complexity repeat and 6 RNA repeats (Supplementary Data 2). To quantifydifferential methylation between a pair of MeDIP-seq and MethylCap-seqsamples, we calculated the pairwise odds ratio of the read coverage for eachprototypic repeat sequence. The absolute DNA methylation difference wasused in the case of RRBS (Supplementary Data 3). The significance of thedifference was assessed using Fisher’s exact test in the same way as for thenonrepetitive genome (described above).57. Smith, Z.D. et al. High-throughput bisulfite sequencing in mammalian genomes.Methods 48, 226–232 (2009).58. Rakyan, V.K. et al. An integrated resource for genome-wide identification andanalysis of human tissue-specific differentially methylated regions (tDMRs). GenomeRes. 18, 1518–1529 (2008).59. Li, H., Ruan, J. & Durbin, R. Mapping short DNA sequencing reads and callingvariants using mapping quality scores. Genome Res. 18, 1851–1858 (2008).60. Bock, C. & Lengauer, T. Computational epigenetics. Bioinformatics 24, 1–10(2008).61. Park, P.J. ChIP-seq: advantages and challenges of a maturing technology. Nat. Rev.Genet. 10, 669–680 (2009).62. Storey, J.D. & Tibshirani, R. Statistical significance for genomewide studies. Proc.Natl. Acad. Sci. USA 100, 9440–9445 (2003).63. Heintzman, N.D. et al. Histone modifications at human enhancers reflect globalcell-type-specific gene expression. Nature 459, 108–112 (2009).64. Hajkova, P. et al. DNA-methylation analysis by the bisulfite-assisted genomicsequencing method. Methods Mol. Biol. 200, 143–154 (2002).65. Li, L.C. & Dahiya, R. MethPrimer: designing primers for methylation PCRs.Bioinformatics 18, 1427–1431 (2002).66. Bock, C. et al. BiQ Analyzer: visualization and quality control for DNA methylationdata from bisulfite sequencing. Bioinformatics 21, 4067–4068 (2005).nature biotechnologydoi:10.1038/nbt.1681


A r t i c l e sNon-invasive imaging of human embryos beforeembryonic genome activation predicts developmentto the blastocyst stageConnie C Wong 1,2,7 , Kevin E Loewke 1–3,6,7 , Nancy L Bossert 4 , Barry Behr 2 , Christopher J De Jonge 4 ,Thomas M Baer 5 & Renee A Reijo Pera 1,2© 2010 Nature America, Inc. All rights reserved.We report studies of preimplantation human embryo development that correlate time-lapse image analysis and gene expressionprofiling. By examining a large set of zygotes from in vitro fertilization (IVF), we find that success in progression to the blastocyststage can be predicted with >93% sensitivity and specificity by measuring three dynamic, noninvasive imaging parameters byday 2 after fertilization, before embryonic genome activation (EGA). These parameters can be reliably monitored by automatedimage analysis, confirming that successful development follows a set of carefully orchestrated and predictable events. Moreover,we show that imaging phenotypes reflect molecular programs of the embryo and of individual blastomeres. Single-cell geneexpression analysis reveals that blastomeres develop cell autonomously, with some cells advancing to EGA and others arresting.These studies indicate that success and failure in human embryo development is largely determined before EGA. Our methodsand algorithms may provide an approach for early diagnosis of embryo potential in assisted reproduction.Little is known about the basic pathways and events of early humanembryo development, including factors that would aid in predictingsuccess or failure to develop. Consequently, to increase the chances ofpregnancy through IVF, multiple embryos are often transferred to theuterus, despite the potential for well-documented adverse outcomes.Development of the human embryo begins with the fusion of spermand egg, the epigenetic reprogramming of the gametic pronuclei and aseries of cleavage divisions that culminate with activation of the embryonicgenome by day 3 of development 1 . The embryo compacts to forma morula and subsequently a blastocyst, containing the outer trophectodermand inner cell mass 1 . Although development of the humanembryo shares many features with other species, there are also somenotable differences, including unique gene-expression and epigeneticpatterns and a protracted period of transcriptional silence through thefirst 3 d after fertilization 1–9 . In the mouse, by contrast, activation of thezygotic genome is initiated concurrent with the first cleavage divisionon day 1 (refs. 7,8). Human embryo development is also more fragilethan that of many other species. Human fecundity rates are relativelylow, largely due to pre- and post-implantation embryo loss 10,11 . In vitro,50–70% of IVF embryos fail to reach the blastocyst stage 12,13 .Most human embryo research has been based on a small numberof samples generated under diverse experimental conditions 1,14–17 .Studies that involve imaging have been limited to measurements ofearly development, such as pronuclear formation and fusion and timeto first cleavage 18–21 , and molecular profiling studies have generallyrequired pooling of oocytes, embryos or blastomeres, which masksdifferences in gene expression between embryos or between singleblastomeres within an embryo 15–17,22,23 . Here we sought to overcomethese limitations and to define critical pathways and events of humanembryo development by correlating imaging profiles and moleculardata throughout preimplantation development from the zygote tothe blastocyst stage. We studied a large set of supernumerary IVFembryos that had been cryopreserved at the zygote stage 12–18 hafter fertilization (Fig. 1). The embryos appeared representative ofthe typical IVF population, as they were frozen at the two-pronucleate(2PN) stage and thus indiscriminately selected for cryopreservationrelative to those selected for culture. This is in contrast to embryoscryopreserved at the 8-cell stage or later, which are not selected fortransfer during fresh IVF cycles and may therefore be of lower quality.With this unique set of embryos, we carried out a large-scale studythat correlated time-lapse image analysis and gene expression profilingto show that successful development to the blastocyst stage canbe predicted by the 4-cell stage, before EGA.RESULTSCytokinesis as an embryo quality markerA normal human zygote undergoes the first cleavage division earlyon day 2, at ~24–27 h after fertilization 18–20,24 (Fig. 2a, embryo Hin Supplementary Video 1). Subsequently, the embryo cleaves to a4- and 8-cell embryo on days 2 and 3, respectively, before compacting1 Institute for Stem Cell Biology and Regenerative Medicine, School of Medicine, Stanford University, Stanford, California, USA. 2 Department of Obstetrics andGynecology, School of Medicine, Stanford University, Stanford, California, USA. 3 Department of Mechanical Engineering, Stanford University, Stanford, California,USA. 4 Reproductive Medicine Center, University of Minnesota, Minneapolis, Minnesota, USA. 5 Stanford Photonics Research Center, Department of Applied Physics,Stanford University, Stanford, California, USA. 6 Present address: Auxogyn, Inc., Menlo Park, California, USA. 7 These authors contributed equally to this work.Correspondence should be addressed to R.A.R.P. (reneer@stanford.edu).Received 5 April; accepted 3 September; published online 3 October 2010; doi:10.1038/nbt.1686nature biotechnology VOLUME 28 NUMBER 10 OCTOBER 2010 1115


A rt i c l e s© 2010 Nature America, Inc. All rights reserved.14Day 1: Thaw 1-cellhuman embryoson multiplemicroscopesExpt. 1: n = 61Expt. 2: n = 80Expt. 3: n = 64Expt. 4: n = 37High-throughput, singlecellqPCR analysisTime-lapseimaging2Single embryosDay1–2Single blastomeresHarvest a mixture of normaland arrested embryos onconsecutive daysinto a morula on day 4 and forming a blastocyst on days 5 to 6. Forthe purposes of this study, embryos that reached the blastocyst stagewere considered developmentally competent and designated ‘normal’,whereas embryos that arrested at a stage before the blastocyst stage wereconsidered developmentally incompetent and designated ‘abnormal’.We tracked the development of 242 IVF embryos in four independentexperimental sets using multiple time-lapse microscopes equippedwith low-power, dark-field illumination. Of the 242 embryos, 100were cultured to day 5 or 6, whereas the remaining 142 were removedat various stages for quantitative real-time (qRT) PCR gene expressionanalysis. Among the 100 embryos cultured to day 5 or 6, 33–53%formed blastocysts (Fig. 2b), and the remaining embryos arrestedat different developmental stages, usually between the 2- and 8-cellstages. To identify quantitative imaging parameters that would predictsuccess in development to the blastocyst stage, we extracted andanalyzed several parameters from the time-lapse videos, includingblastomere size, thickness of the zona pellucida, degree of fragmentation,length of the first cell cycles, time intervals between the firstfew mitoses and duration of the first cytokinesis. As the embryos inthis study were cryopreserved 12–18 h after fertilization, we did notmeasure parameters before the onset of the first cytokinesis, such astime to first cleavage or length of the first cell cycle, properties thathave been evaluated previously 18–20 .Out of the set of parameters measured, three collectively predictedblastocyst formation: (i) duration of the first cytokinesis (the verybrief last step in mitosis that physically separates the two daughtercells), (ii) time interval between the end of the first mitosis and theinitiation of the second and (iii) the time interval between the secondand third mitoses (the time between the appearance of the cleavagefurrows of the second and third mitoses) (Fig. 2c). The third3Day 3 Day 4Day5–6Figure 1 Experimental plan. We tracked the development of 242 twopronucleatestage embryos in four experimental sets (containing 61,80, 64 and 37 embryos, respectively). In each set of experiments,human zygotes were thawed on day 1 and cultured in small groups onmultiple plates. Each plate was observed independently with time-lapsemicroscopy under dark-field illumination on separate imaging stations.At ~24 h intervals, one plate of embryos was removed from the imagingsystem and collected as either single embryos or single cells (blastomeres)for high-throughput qRT-PCR gene expression analysis. Each platetypically contained a mixture of embryos that reached the expecteddevelopmental stage at the time of harvest (termed ‘normal’) and thosethat were arrested or delayed at earlier development stages, or fragmentedextensively (termed ‘abnormal’). Gene expression analysis was carried outon single intact embryos or on single blastomeres of dissociated embryos.One hundred of the 242 embryos were imaged until day 5 or 6 to monitorblastocyst formation.parameter represents the synchronicity in the formation of the twosets of granddaughter cells. The mean values and s.d. for these threeparameters for the embryos that developed to the blastocyst stagewere (i) 14.3 ± 6.0 min, (ii) 11.1 ± 2.2 h and (iii) 1.0 ± 1.6 h, respectively.It is important to note that the first three mitotic events yielda 4-cell embryo from a 1-cell embryo, as opposed to the first threecleavage divisions, which yield an 8-cell embryo (Supplementary Fig. 1).Embryos that reached the blastocyst stage could be predicted, with asensitivity and specificity of 94% and 93%, respectively, by having afirst cytokinesis of 0–33 min, a time between first and second mitosesof 7.8–14.3 h and a time between second and third mitoses of 0–5.8 h(Fig. 2d, Supplementary Fig. 2 and Supplementary Data Set 1).Conversely, embryos that exhibited values outside of one or more ofthese windows were predicted to arrest.We further examined the behavior of cytokinesis in both normaland abnormal embryos. Embryos that reached the blastocyst stageinitiated and completed cytokinesis in a smooth, controlled mannerover a narrow time window of 14.3 ± 6.0 min (n = 36), from appearanceof the cleavage furrows to complete separation of the daughtercells (Fig. 2e, first panel, and Supplementary Video 2). In contrast,abnormal embryos showed a diverse range of behaviors that can beclassified into three aberrant cytokinesis phenotypes (SupplementaryFig. 3). In the least frequent and mildest phenotype, the morphologyand mechanism of cytokinesis appears normal, but the time requiredto complete the process is increased by a few minutes to an hour(Fig. 2e, second panel, Supplementary Video 3 and SupplementaryFig. 3, top panel). A small fraction of the embryos that underwent aslightly prolonged cytokinesis still developed into a blastocyst. In thesecond phenotype, embryos formed a unipolar cleavage furrow anddisplayed unusual morphological behavior for several hours beforefinally cleaving and fragmenting into smaller pieces (Fig. 2e, thirdpanel, Supplementary Video 4 and Supplementary Fig. 3, middlepanel). In the third phenotype, embryos displayed membrane rufflingand/or multiple cleavage furrows before cleaving and fragmentinginto smaller pieces (Supplementary Fig. 3, bottom panel). Together,the second and third abnormal cytokinesis phenotypes confirmthat abnormal cytokinesis is one of the mechanisms for embryofragmentation, a common observation in abnormal human embryodevelopment. Moreover, we observed that fragmentation in abnormalembryos rarely reversed, whereas moderate fragmentation in normalembryos sometimes reversed at the 2-cell stage before the secondmitosis (Supplementary Fig. 4).To determine whether cryopreservation and thawing altered the kineticsof development, we also imaged a small set (n = 10) of embryos that hadnot been cryopreserved (Fig. 2e, fourth panel, Supplementary Video 5and Online Methods). Analysis of our three dynamic imaging parameterssuggested that cryopreserved embryos are not developmentally delayedby the cryopreservation process.Validation of imaging parameters by automated analysisOur time-lapse imaging data showed that human embryo developmentvaries substantially between embryos within a cohort andthat embryos exhibit a wide range of behaviors during cell division.However, characterization of developmental events, such as the durationof cytokinesis, by human observers may be distorted by subjectiveinterpretation. To validate our method for predicting blastocystformation, we developed an algorithm for automated tracking of celldivisions up to the 4-cell stage. Our tracking algorithm employs aprobabilistic model estimation technique based on sequential MonteCarlo methods. This technique works by generating distributions ofhypothesized embryo models, simulating images based on a simple1116 VOLUME 28 NUMBER 10 OCTOBER 2010 nature biotechnology


A rt i c l e sa Day 1 Day 2 a.m. Day 2 p.m.100%b Blastocyst8-cellfDay 680%4- to 7-cell2- to 3-cell1-cell60%Day 3 Day 4 Day 540%n = 25n = 39 n = 9 n = 27Normal20%© 2010 Nature America, Inc. All rights reserved.ce–0:05–0:05–0:05–0:051st mitosisTime between 1st and 2nd mitosesDuration of 1st cytokinesis0:000:000:000:000:050:051:000:052nd mitosisSynchronicity of 2nd and 3rd mitoses0:100:102:000:100:153:000:204:003rd mitosis0:255:000%Time between2nd and 3rd mitoses (h)504030201000Expt. 1 Expt. 2 Expt. 3 Expt. 410Time between1st and 2nd mitoses (h)2030406AbnormalExpt 1 blastocyst (n = 9)Expt 2 blastocyst (n = 17)Expt 3 blastocyst (n = 3)Expt 4 blastocyst (n = 7)Expt 1 arrested (n = 14)Expt 2 arrested (n = 18)Expt 3 arrested (n = 6)Expt 4 arrested (n = 20)421st cytokinesis duration (h)Figure 2 Abnormal embryos exhibit abnormal cytokinesis and mitosis timing during the first divisions. (a) The developmental time line of a healthyhuman preimplantation embryo. Scale bar, 50 μm. (b) The distribution of normal and arrested embryos among samples that were cultured to day 5 or 6.(c) Cytokinesis duration was measured from the appearance of a cleavage furrow to complete daughter-cell separation during the first division. Timebetween the first and second mitoses was measured from the completion of the first mitosis to the appearance of cleavage furrow of the second mitosis.Synchronicity of the second and third mitoses was defined as the time between the appearance of the cleavage furrows of the second and third mitoses.(d) Normal embryos followed strict timing in cytokinesis and mitosis during early divisions, before EGA begins. Out of the 100 embryos imaged to day 5or 6, six were excluded from subsequent image analysis due to technical issues (e.g., inability to track identity after media change, or loss of imagefocus). Raw data for this plot are included as Supplementary Data Set 1, and additional views can be seen in Supplementary Figure 2. (e) Normalcytokinesis (first row) was typically completed in 14.3 ± 6.0 min in a smooth, controlled manner. In the mild phenotype (second row), the cytokinesismechanism appears normal although it is slightly prolonged. In the severe phenotype (third row), a one-sided cytokinesis furrow is formed, accompaniedby unusual ruffling of cell membranes for a prolonged period of time. Cytokinesis was defined by the first appearance of the cytokinesis furrow (arrows)to the complete separation of daughter cells. Imaging was also performed on a subset of triploid embryos (fourth row), which exhibited a distinctphenotype of dividing into three cells in a single event. Scale bar, 50 μm. (f) Embryos that underwent abnormal development and behavior (right) wouldoccasionally appear morphologically similar to normal embryos (left) at the time of sample collection. In this particular case, time-lapse video datashowed that what appeared to be a six to eight-cell embryo (right) was in fact the product of a highly aberrant cell division (Supplementary Video 10).Thus, the correlated imaging data served to ensure the accuracy of sample selection and identification for the gene expression analysis.d0optical model and comparing these simulations to the observed imagedata (Fig. 3a and Supplementary Video 6).Embryos were modeled as a collection of ellipses with position,orientation and overlap indices (to represent the relative heights ofthe cells). With these models, the duration of cytokinesis and timebetween mitoses can be extracted. Cytokinesis is typically defined bythe first appearance of the cytokinesis furrow (where bipolar indentationsform along the cleavage axis) to the complete separation ofdaughter cells. We simplified the problem by approximating cytokinesisas the duration of cell elongation before a 1-cell to 2-cell division.A cell is considered elongated if its major axis has increased by >15%(chosen empirically). The time between mitoses is straightforward toextract by counting the number of cells in each model.We tested our algorithm on 14 human embryos from the set of 100that were imaged up to the blastocyst stage (Fig. 3b and SupplementaryVideo 7) and compared the automated measurements to manual imageanalysis (Fig. 3c). In this data set, eight embryos reached the blastocyststage with good morphology (Fig. 3d, top). The automated measurementswere closely matched to the manual measurements, and alleight embryos were correctly predicted to reach the blastocyst stageby both methods. Two embryos reached the blastocyst stage with poormorphology (poor quality of inner cell mass; Fig. 3d, bottom). Forthese embryos, manual assessment indicated that one would reach theblastocyst stage and one would arrest, whereas the automated assessmentpredicted that both would arrest. Finally, four embryos arrested beforethe blastocyst stage; all four were correctly predicted to arrest by bothmethods. These results suggest that a systematic, automated predictionof blastocyst formation can be achieved as early as the 4-cell stage.Gene expression and cytokinesisTo assess whether imaging parameters that predict success or failureof development are associated with transcriptional patterns, wenature biotechnology VOLUME 28 NUMBER 10 OCTOBER 2010 1117


A rt i c l e sa© 2010 Nature America, Inc. All rights reserved.Frame 15 Frame 125 Frame 127 Frame 269 Frame 270 Frame 276 Frame 314b c dDuration of first cytokinesis (min)Time between1st and 2nd mitoses (h)20015010050020151050Window for blastManualAutomatic2 4 6 8 10 12 1424Good-morphologyblastocyst6 8 10 12 14PoormorphologyArrestedblastocystGood-morphologyblastocyst (embryo no.6)Poor-morphologyblastocyst (embryo no.9)Figure 3 Automated image analysis confirms the utility of the imaging parameters to predict blastocyst formation. (a) Results of tracking algorithm fora single embryo. Images were captured every 5 min, and only a select group is displayed. The top row shows frames from the original time-lapse imagesequence, and the bottom row shows the overlaid tracking results. (b) Set of 14 embryos that were analyzed (Supplementary Video 6). One embryowas excluded as it was floating and out of focus. (c) Comparison of image analysis by a human observer and automated analysis of the duration ofcytokinesis (top) and of the time between first and second mitoses (bottom). There is excellent agreement between the two methods for embryos thatreached the blastocyst stage with good morphology. The few cases of disagreement occurred mostly for abnormal embryos and were caused by unusualbehavior that is difficult to characterize by both methods. The gray shade region shows the window for blastocyst prediction. The two methods agreed onblastocyst prediction except in the case of embryo 10, which was predicted as abnormal by the automated method and normal by the manual method.(d) Comparison of blastocysts with good (top) and bad (bottom) morphology.analyzed the expression of nine putative cytokinesis-related genesin both normal and arrested embryos (Supplementary Table 1 andSupplementary Data Set 2). Aberrant cytokinesis seen in the timelapseimage data correlated strongly with reduced expression of keycytokinesis genes. Like their morphological phenotypes, the geneexpression profiles of embryos that arrested were diverse and variable.For example, an arrested 2-cell embryo that displayed a slightlyprolonged cytokinesis and an unusual plasma membrane ruffling(Supplementary Video 8) expressed all nine cytokinesis genes examinedat significantly lower levels (P < 0.05) compared with developmentallynormal embryos at the same stage (Fig. 4a). On the otherhand, an arrested 4-cell embryo (Fig. 4b) that underwent prolonged,unipolar cytokinesis during its first division (Supplementary Video 9)showed significantly reduced expression (P < 0.05) of only two cytokinesisgenes, ANLN and ECT2.We also examined genes in categories other than cytokinesisin arrested and normal embryos at the 1- and 2-cell stage. For thispurpose we calculated average expression levels for each of 52 additionalgenes that included housekeeping genes, germ cell markers, maternalfactors, EGA markers, trophoblast markers, inner cell mass markers,pluripotency markers, epigenetics regulators, transcription factors,hormone receptors and others based primarily on published datain model organisms 1,7,8 . Normal 1-cell embryos were identified ashaving undergone successful fusion of the two pronuclei (syngamy) on day 1and displaying a round, firm appearance. Eighteen of the 52 genesshowed statistically significant differences in expression betweennormal and arrested embryos (P < 0.05), with certain gene categoriesaffected more severely than others (Fig. 4c). In abnormal embryos,expression of most of the housekeeping genes, hormone receptorsand maternal factors was not appreciably altered, but many genesinvolved in cytokinesis and in microRNA (miRNA) biogenesis, suchas DGCR8, DICER1 and TARBP2, were expressed at highly reducedlevels. Two of the most severely affected genes, CPEB1 and SYMPK,belong to the same molecular pathway, which regulates maternalmRNA storage and reactivation by modulating the length of poly(A)tails on oocyte/embryo transcripts 25 .1118 VOLUME 28 NUMBER 10 OCTOBER 2010 nature biotechnology


A rt i c l e s© 2010 Nature America, Inc. All rights reserved.abRelative expressionRelative expression–0:05 0:00 0:05 0:10 0:15108642010864200:20 0:25Arrested 2-cell embryoNormal 2-cell embryos(n = 9)0:00 1:00 3:00 5:00 7:00 9:00 11:00Arrested 4-cell embryoNormal 4-cell embryosANLNCFL1DIAPH1DIAPH2DNM2ECT2MKLP2MYLC2RHOAANLNCFL1DIAPH1DIAPH2DNM2ECT2MKLP2MYLC2RHOA(n = 12)Embryonic stage–specific patternsTo further correlate our three imaging parameters with gene expression,we measured expression of two slightly different but overlappingsets of 96 genes (Supplementary Table 1) at multiple time points ofembryo development. Time-lapse imaging was used to aid the identificationand classification of normal and abnormal embryos becauseoccasionally embryos that developed and behaved abnormally wouldappear morphologically normal at the time of sample collection (Fig. 2fand Supplementary Video 10). By analyzing the gene expression patternsof 141 of the 242 embryos that had apparently normal developmentas assessed by imaging (and without any prior assumptions),we derived four unique embryonic stage–specific patterns (ESSPs) ofgene expression (Fig. 5a, Supplementary Fig. 5 and SupplementaryTable 2). ESSP1 describes maternally inherited oocyte mRNAs destinedfor degradation. These transcripts were expressed at high levels at thezygote stage and declined during development to the blastocyst stage.Their half-life was just 21 h (Supplementary Fig. 6). ESSP2 includesembryonic-activated genes, first transcribed on day 3, at approximatelythe 8-cell stage. ESSP3 comprises genes not expressed until the blastocyststage. Finally, ESSP4 includes persistent transcripts that maintainedstable expression relative to the reference genes from the zygoteto blastocyst stages. The half-life of ESSP4 genes was 193 h, more thannine times longer than that of ESSP1 genes (21 h) (Supplementary Fig. 6).Fourteen of the 96 genes analyzed did not fit into any of the four ESSPpatterns and were labeled ‘undefined’ (Supplementary Table 2). Weconfirmed the four patterns of gene expression in two additional independentexperimental sets using both single, intact normal embryosand isolated single blastomeres (Supplementary Fig. 7).cANLNTranscriptionTAF4 CFL1*factorNELF*DIAPH1* CytokinesisGTF2A1 100 DIAPH2GABPB2DNM2BTF3*ECT2*ATF7IP2MKLP2ATF410MYLC2*IGFR2*RHOAReceptor IGFR1XPO5miRNAFGFR21RNASENbiogenesisFGFR1DGCR8*YY1*DICER1*TERT*TARBP2*POU5F1CPEB1*PluripotencyNANOGSYMPK*DNMT3B*TACC3TBPAURKARPLPOYBX2*HPRT1PARNGAPDHCCR4 RNAHousekeeping CTNNB1*DAZL processingACTBVASAZP1BNC2ZAR1*GDF9PDCD5 HSF1NLRP5MaternaleffectNormal 1- and 2-cell embryos (n = 5)Arrested 1- and 2-cell embryos (n = 6)Figure 4 Distinct gene expression profiles of developmentally delayed or arrested embryos. (a) An arrested 2-cell embryo that showed abnormal membraneruffling during the first cytokinesis had significantly (P < 0.05) reduced expression level of all cytokinesis genes tested. Scale bar, 50 μm. (b) An arrested4-cell embryo that underwent aberrant cytokinesis with a one-sided cytokinesis furrow and extremely prolonged cytokinesis during the first division showedlower expression of ANLN and ECT2. Scale bar, 50 μm. (c) The average expression level of 52 genes from six abnormal 1- to 2-cell embryos and fivenormal 1- to 2-cell embryos were plotted in a radar graph on a logarithmic scale. Arrested embryos in general expressed less mRNA than normal embryos,with genes related to cytokinesis, RNA processing and miRNA biogenesis most severely affected. Genes highlighted in orange with an asterisk indicate astatistically significant difference (P < 0.05) between normal and abnormal embryos as determined by the Mann-Whitney test.We compared our qRT-PCR data in 1-cell and 2-cell embryos topublished microarray data on human oocytes 23 (SupplementaryData Set 3). In ref. 19, the expression values for individual genesin the microarray data were normalized against the geometricmean of GAPDH and RPLP0, which were the same referencegenes used in our studies. Among the 86 genes that we analyzed,almost every gene that was expressed in 1- and 2-cell embryos wasalso expressed or upregulated in oocytes, with the exception ofTACC3 and H2AFZ. In addition, by dividing the genes into low-,medium- and high-expression genes, we observed good correlationbetween the two data sets among all gene sets, especiallybetween the highly expressed genes. A comparison of our data toa study of cell cycle genes expressed in the 8-cell human embryo 26showed agreement for the two genes that were assayed in bothstudies, AURKA and CCNA1.Individual blastomeres show cell autonomyIndividual blastomeres in an intact early human embryo areusually assumed to be synchronized in constitution and developmentalprogramming, and developmental success or failure isconsidered a property of the whole embryo. We measured expressionof ten maternal transcript genes and ten embryonic genes insingle blastomeres of 36 normal and abnormal embryos betweenthe 2- and 10-cell stage. Notably, this experiment revealed a subsetof normal embryos that contained blastomeres whose gene expressionsignatures corresponded to different developmental ages.Among 24 morphologically and developmentally normal embryos,6 (25%) contained blastomeres of different transcriptional agesnature biotechnology VOLUME 28 NUMBER 10 OCTOBER 2010 1119


A rt i c l e s© 2010 Nature America, Inc. All rights reserved.aRelative expression (n = 141)b12108642043210Maternal/embryonic ratio1cESSP1ESSP32c 3c 4c 6c 8c 9c M BEmbryonicMaternal2-cell 4-cell 6-cell 8-cell(Fig. 5b). Among 12 abnormal embryos arrested between the 2-and 10-cell stage, this phenomenon was detected in 8 (66%). Insome cases (e.g., Fig. 5b, right), the transcriptional profile of individualblastomeres varied to such an extent that some blastomeresin both normal and abnormal embryos may have been arrestedfor a considerable amount of time while the others progressedin development.DISCUSSIONWe have carried out a large-scale study of preimplantation humanembryos that correlated time-lapse image analysis and gene expressionprofiling with development from the zygote to the blastocyst. Ourresults shed light on human embryo developmentand provide an approach for predictingwhich embryos will reach the blastocyst8765432104stage using three dynamic imaging parameters(Fig. 6). First, we showed that humanembryos that develop to the blastocyst stagefollow a strict and predictable developmentaltimeline that is correlated with predictablegene expression patterns. This timelineFigure 6 Proposed model for human embryodevelopment. Human embryos begin life with aset of oocyte RNAs inherited from the mother.After fertilization, a subset of maternal RNAsspecific to the egg (ESSP1) must be degraded asthe transition from oocyte to embryo begins. Asdevelopment continues, other RNAs are partitionedequally to each blastomere (ESSP4). At EGA,ESSP2 genes are transcribed in a cell-autonomousmanner. During the cleavage divisions, embryonicblastomeres may arrest or progress independently.‘Feature extraction’ indicates the three imagingparameters for predicting successful developmentto the blastocyst stage: cytokinesis, the timebetween 1 st and 2 nd mitoses, and the timebetween 2 nd and 3 rd mitoses.32101cESSP2ESSP42c 3c 4c 6c 8c 9c M BEmbryonicMaternalExamples of abnormal embryosTime lineStageMolecularImagingAutomatedtrackingFeatureextractionEmbryotransferOocyteprovidesmRNAsFigure 5 Gene expression analysis of single human embryos andblastomeres. (a) Genes analyzed in human embryos are defined by fourdistinct ESSPs. Relative expression level of an ESSP was calculated byaveraging the expression levels of genes with similar expression patterns.(b) The ratio of maternal to embryonic genes in embryos changes duringpreimplantation development (left). Some embryos contained blastomeresof different developmental ages (right). The expression levels of embryonicand maternal programs were calculated by averaging the relativeexpression of ten ESSP1 and ten ESSP2 markers, respectively.enabled us to derive an algorithm to automatically measure our imagingparameters and to predict blastocyst formation systematically andreliably by the 4-cell stage, before EGA. The finding that embryodevelopment can be predicted at this early stage suggests that successor failure is likely to be determined at least in part by inheritance ofmaternal transcripts, which we observed to be expressed at alteredlevels in abnormal embryos. Other factors that may contribute toabnormal development before EGA include inherited genetic mutations,aneuploidy, environmental insult to germ cells, events duringfertilization and sperm-related factors 27–29 .Second, we found that gene expression in preimplantation humanembryos is cell autonomous and follows four distinct patterns.Maternally inherited transcripts in ESSP1 have a half-life after fertilizationof ~21 h; ESSP2 and ESSP3 are expressed at EGA and thereafter,respectively; and ESSP4 genes are stably expressed, with a half-lifeof ~193 h. Previous studies of gene expression in the human embryofrom the oocyte to day 3 have not analyzed single blastomeres, primarilybecause of the technical difficulty of obtaining single-cell data 1,26 .At the whole-embryo level, maintenance of maternal mRNA expressionprofiles and failure to progress to EGA has never been observedpast the first cell division 1 . Our single-cell gene expression analysisshows that individual blastomeres in an embryo can differ, withsome maintaining maternal mRNAs whereas others progress to EGA.The frequency of this observation (in >25% of embryos) indicatesthat individual blastomeres in human embryos are cell autonomous.Moreover, the observation that maternal mRNAs can be maintainedeven after 3 days of development suggests that the degradation of thematernal programs is not simply a passive process.24 hOnset ofdegradationof ESSP1mRNAESSP1ESSP215 min11 hEach blastomereinherits half ofstable mRNA (ESSP4)Duration of1st cytokinesisTime betweenmitoses1 hBlastomeres are cell autonomousTransfer prior to EGA18 hEmbryonicgene activation(ESSP2)24 h24 h1120 VOLUME 28 NUMBER 10 OCTOBER 2010 nature biotechnology


A rt i c l e s© 2010 Nature America, Inc. All rights reserved.Previous studies in the mouse have sought to understand how andwhen the first cell fate decisions in the embryo are established. It hasbeen suggested that the first cleavage division itself determines the blastocystaxis and that subsequently all blastomeres are not equivalent 30–31 .These studies may imply that cell lineage fate is determined in the firstdivision 32 ; more likely is that the first cleavage division affects theprobabilities of fates in subsequent divisions 33 . Our results supportthe conclusion that some aspects of embryo fate, especially successor failure to reach the blastocyst stage, are determined very early indevelopment and likely inherited from the oocyte, as described above.Moreover, they imply that each cell-autonomous blastomere is capableof contributing, or not, to subsequent lineages.Third and finally, given that embryo developmental potential canbe assessed with a combination of cytokinetic and mitotic parametersin the first two cleavage divisions, it may be feasible to translate thesebasic studies to clinical applications. Current morphological and growthcriteria that are commonly used to assess embryo viability on day 3in assisted reproduction clinics may both underestimate and overestimateembryo potential, with well-documented consequences, such asmultiple births, the need for fetal reduction and miscarriage 34 . Giventhe uncertainties associated with evaluation at day 3, some clinics haveturned to longer culture to assess embryo potential, as embryos transferredat the blastocyst stage have a higher implantation rate comparedwith embryos transferred at day 3 (refs. 13,35–38). However, this practiceinvolves prolonged in vitro culture and may increase the chance ofaltered gene expression and epigenetic inheritance 39–41 . Thus, a methodto predict blastocyst formation at day 2 could improve IVF outcomesby increasing pregnancy rates while reducing the risk of multiple gestations.This question will be evaluated in future clinical studies.MethodsMethods and any associated references are available in the onlineversion of the paper at http://www.nature.com/naturebiotechnology/.Note: Supplementary information is available on the Nature Biotechnology website.AcknowledgmentsWe thank R. Raja for help with the microarray analysis and early imaging experiments,the members of the Reijo Pera laboratory for technical assistance and discussions,S. Walker for advice regarding the cell tracking algorithm and K. Salisbury forproviding K.E.L. with hardware and software resources. We acknowledge fundingcontributions from the Stanford Institute for Stem Cell Biology and RegenerativeMedicine, a generous, anonymous donor and the March of Dimes (6-FY06-326).AUTHOR CONTRIBUTIONSC.C.W. and K.E.L. performed and designed experiments, analyzed data andassisted in writing and editing of the manuscript. K.E.L. designed cell trackingalgorithms. N.L.B. assisted in performing the experiments. B.B., N.L.B. and C.J.D.J.assisted in analyzing data and editing the manuscript. T.M.B. and K.E.L. designedand built the imaging instrumentation. T.M.B. and R.A.R.P. designed experiments,interpreted results and assisted in writing and editing the manuscript.COMPETING FINANCIAL INTERESTSThe authors declare competing financial interests: details accompany the full-textHTML version of the paper at http://www.nature.com/naturebiotechnology/.Published online at http://www.nature.com/naturebiotechnology/.Reprints and permissions information is available online at http://npg.nature.com/reprintsandpermissions/.1. Dobson, A.T. et al. The unique transcriptome through day 3 of human preimplantationdevelopment. Hum. Mol. Genet. 13, 1461–1470 (2004).2. Braude, P., Bolton, V. & Moore, S. Human gene expression first occurs between the fourandeight-cell stages of preimplantation development. Nature 332, 459–461 (1988).3. Memili, E. & First, N.L. Zygotic and embryonic expression in cow: a review of timingand mechanisms of early gene expression as compared with other species. Zygote8, 87–96 (2000).4. Beaujean, N. et al. Effect of limited DNA methylation reprogramming in the normalsheep embryo on somatic cell nuclear transfer. Biol. Reprod. 71, 185–193 (2004).5. Fulka, H., Mrazek, M., Tepla, O. & Fulka, J. Jr. DNA methylation pattern in humanzygotes and developing embryos. Reproduction 128, 703–708 (2004).6. Duranthon, V., Watson, A.J. & Lonergan, P. Preimplantation embryo programming:transcription, epigenetics, and culture environment. Reproduction 135, 141–150(2008).7. Wang, Q.T. et al. A genome-wide study of gene activity reveals developmental signalingpathways in the preimplantation mouse embryo. Dev. Cell 6, 133–144 (2004).8. Zeng, F. & Schultz, R. RNA transcript profiling during zygotic gene activation inthe preimplantation mouse embryo. Dev. Biol. 283, 40–57 (2005).9. Vanneste, E. et al. Chromosome instability is common in human cleavage-stageembryos. Nat. Med. 15, 577–583 (2009).10. Macklon, N.S., Geraedts, J.P.M. & Fauser, B.C.J.M. Conception to ongoing pregnancy:the “black box” of early pregnancy loss. Hum. Reprod. Update 8, 333–343 (2002).11. Evers, J.L. Female subfertility. Lancet 360, 151–159 (2002).12. French, D.B., Sabanegh, E.S. Jr., Goldfarb, J. & Desai, N. Does severeteratozoospermia affect blastocyst formation, live birth rate, and other clinicaloutcome parameters in ICSI cycles? Fertil. Steril. 93, 1097–1103 (2010).13. Gardner, D.K., Lane, M. & Schoolcraft, W. Culture and transfer of viable blastocysts:a feasible proposition for human IVF. Hum. Reprod. 15 (Suppl 6), 9–23 (2000).14. Payne, D., Flaherty, S.P., Barry, M.F. & Matthews, C.D. Preliminary observations onpolar body extrusion and pronuclear formation in human oocytes using time-lapsevideo cinematography. Hum. Reprod. 12, 532–541 (1997).15. Adjaye, J., Bolton, V. & Monk, M. Developmental expression of specific genesdetected in high-quality cDNA libraries from single human preimplantation embryos.Gene 237, 373–383 (1999).16. Assou, S. et al. The human cumulus—oocyte complex gene-expression profile. Hum.Reprod. 21, 1705–1719 (2006).17. Kimber, S.J. et al. Expression of genes involved in early cell fate decisions in humanembryos and their regulation by growth factors. Reprod. 135, 635–647 (2008).18. Nagy, Z.P., Liu, J., Joris, H., Devroey, P. & Steirteghem, A.V. Time-course of oocyteactivation, pronucleus formation and cleavage in human oocytes fertilized byintracytoplasmic sperm injection. Hum. Reprod. 9, 1743–1748 (1994).19. Fenwick, J., Platteau, P., Murdoch, A.P. & Herbert, M. Time from insemination tofirst cleavage predicts developmental competence of human preimplantationembryos in vitro. Hum. Reprod. 17, 407–412 (2002).20. Lundin, K., Bergh, C. & Hardarson, T. Early embryo cleavage is a strong indicatorof embryo quality in human IVF. Hum. Reprod. 16, 2652–2657 (2001).21. Lemmen, J.G., Agerholm, I. & Ziebe, S. Kinetic markers of human embryo qualityusing time-lapse recordings of IVF/ICSI-fertilized oocytes. Reprod. Biomed. Online17, 385–391 (2008).22. Bermudez, M.G. et al. Expression profiles of individual human oocytes usingmicroarray technology. Reprod. Biomed. Online 8, 325–337 (2004).23. Kocabas, A.M. et al. The transcriptome of human oocytes. Proc. Natl. Acad. Sci.USA 103, 14027–14032 (2006).24. Rienzi, L. et al. Significance of morphological attributes of the early embryo. Reprod.Biomed. Online 10, 669–681 (2005).25. Bettegowda, A. & Smith, G.W. Mechanisms of maternal mRNA regulation: implicationsfor mammalian early embryonic development. Front. Biosci. 12, 3713–3726 (2007).26. Kiessling, A.A. et al. Evidence that human blastomere cleavage is under uniquecell cycle control. J. Assist. Reprod. Genet. 26, 187–195 (2009).27. Schatten, H. & Sun, Q. The role of centrosomes in fertilization, cell division andestablishment of asymmetry during embryo development. Semin. Cell Dev. Biol.21, 174–184 (2010).28. Ostermeier, G.C., Miller, D., Huntriss, J.D., Diamond, M.P. & Krawetz, S.A. Reproductivebiology: delivering spermatozoan RNA to the oocyte. Nature 429, 154 (2004).29. Hammoud, S.S. et al. Distinctive chromatin in human sperm packages genes forembryo development. Nature 460, 473–478 (2009).30. Zernicka-Goetz, M. Patterning of the embryo: the first spatial decisions in the lifeof a mouse Development 129, 815–829 (2002).31. Plusa, B. et al. The first cleavage of the mouse zygote predicts the blastocyst axis.Nature 434, 391–395 (2005).32. Hiiragi, T., Louvet-Vallee, S., Solter, D. & Maro, B. Embryology: does prepatterningoccur in the mouse egg? Nature 442, E3–4 (2006).33. Zernicka-Goetz, M. The first cell-fate decisions in the mouse embryo: destiny is amatter of both chance and choice Curr. Opin. Genet. Dev. 16, 406–412 (2006).34. Racowsky, C. High rates of embryonic loss, yet high incidence of multiple birthsin human ART: Is this paradoxical? Theriogenology 57, 87–96 (2002).35. Milki, A.A., Hinckley, M., Fisch, J., Dasig, D. & Behr, B. Comparison of blastocysttransfer with day 3 embryo transfer in similar patient populations. Fertil. Steril. 73,126–129 (2000).36. Gardner, D.K., Lane, M., Stevens, J., Schlenker, T. & Schoolcraft, W.B. Blastocystscore affects implantation and pregnancy outcome: towards a single blastocysttransfer. Fertil. Steril. 73, 1155–1158 (2000).37. Gardner, D.K. & Lane, M. Towards a single embryo transfer. Reprod. Biomed. Online6, 470–481 (2003).38. Gardner, D.K. et al. Single blastocyst transfer: a prospective randomized trial. Fertil.Steril. 81, 551–555 (2004).39. Manipalviratn, S., DeCherney, A. & Segars, J. Imprinting disorders and assistedreproductive technology. Fertil. Steril. 91, 305–315 (2009).40. Niemitz, E.L. & Feinberg, A. Epigenetics and assisted reproductive technology: acall for investigation. Am. J. Hum. Genet. 74, 599–609 (2004).41. Horsthemke, B. & Ludwig, M. Assisted reproduction: the epigenetic perspective.Hum. Reprod. Update 11, 473–482 (2005).nature biotechnology VOLUME 28 NUMBER 10 OCTOBER 2010 1121


© 2010 Nature America, Inc. All rights reserved.ONLINE METHODSSample source. All embryos used in this study were supernumerary embryosfrom the Lutheran General Hospital IVF Program that were donated toresearch by informed consent. Embryos were moved to the ReproductiveMedicine Center at the University of Minnesota after the Lutheran GeneralHospital IVF Program closed in 2002. Before the program closed, the embryoswere collected over several years. Oocytes were fertilized and cryopreservedby multiple embryologists. The average number of embryos per patient inour study was 3, and all age groups encountered in a routine IVF center wereincluded. All embryos were generated by IVF, not intracytoplasmic sperminjection, so they were derived from sperm able to penetrate the cumulus,zona and oolemma and form a pronuclei. Stimulation protocols were standardlong lupron protocols. The embryos were cryopreserved by placing them infreezing medium (1.5 M 1,2 propanediol + 0.2 M sucrose) for 25 min at 22 +2 °C and then freezing them using a slow-freeze protocol (−1 °C/min to −6.5 °C;hold for 5 min; seed; hold for 5 min; −0.5 °C/min to −80 °C; plunge in liquidnitrogen). The embryos were approved for research by the University ofMinnesota Internal Review Board and the Stanford University Internal ReviewBoard and Stem Cell Research Oversight Committee. No protected healthinformation could be associated with the embryos.We chose to use this embryo set after consideration of alternatives. Threesources of IVF embryos are theoretically available: (i) day 1 embryos obtainedfrom a clinic for immediate analysis without cryopreservation, (ii) clinicalembryos destined for transfer for reproductive purposes if our imaging systemwas set up in a clinic and (iii) cryopreserved embryos that are validated withregard to key developmental landmarks. Clinical practices and general guidelinespose considerable practical obstacles to alternatives (i) and (ii). Moreover,‘fresh’ embryos donated for research are generally abnormal in development.We therefore chose to study a large set of cryopreserved zygotes available forresearch. The following considerations suggest that cryopreservation did notadversely affect our results. First, the timing of developmental landmarks wassimilar to that of normal embryos, including cleavage to 2 cells (occurredearly day 2), onset of RNA degradation (occurred on days 1 to 3), cleavage to4 and 8 cells (occurred on late day 2 and day 3, respectively), EGA (on day 3at the 8-cell stage) and formation of the morula and blastocyst (occurred ondays 4 and 5, respectively) 1,2 . Second, the fraction of embryos that reachedthe blastocyst stage is typical of IVF embryos in a clinical setting 12,13,42 . Thisis most likely because the embryos were cryopreserved at the 2PN stage andrepresented the spectrum of embryos encountered in an IVF clinic. No triagewas done before cryopreservation. Third, embryos frozen at the 2PN stagehave been shown to possess similar potential for development, implantation,clinical pregnancy and delivery compared with fresh embryos 43–45 . Otherstudies have also shown similar results for frozen oocytes 24,46 . Fourth, wefocused on parameters that were not dependent on time of fertilization orthaw time. As described in the manuscript, the first parameter, duration ofthe first cytokinesis, is short (~10–15 min) and is not dependent on the timeof fertilization. The other parameters we measured are relative to this initialmeasurement point and compared between embryos that succeed in developingto blastocyst and those that do not.Control for cryopreservation. In addition to the observations described above,which support the use of cryopreserved embryos, we also examined a smallset of embryos obtained from the Stanford IVF clinic that were notcryopreserved as a control. These embryos were 3PN (triploid) starting at thesingle-cell stage. 3PN embryos have been shown to follow the same time lineof landmark events as normal fresh embryos through at least the first threecell cycles 47–49 . These embryos were imaged before our main experimentsto validate the imaging systems (but for technical reasons were not followedout to blastocyst). Out of this set of fresh embryos, three of the embryos followeda similar time line of events as our cryopreserved 2PN embryos, withduration of cytokinesis ranging from 15 to 30 min, time between first andsecond mitoses ranging from 9.6 to 13.8 h and time between second andthird mitoses ranging from 0.3 to 1.0 h. However, in seven of the embryoswe observed a unique cytokinesis phenotype that was characterized by thesimultaneous appearance of three cleavage furrows, a slightly prolonged cytokinesisand ultimate separation into three daughter cells (Fig. 2e, fourth panel,and Supplementary Video 5). These embryos had a duration of cytokinesisranging from 15 to 70 min (characterized as the time between the initiationof the cleavage furrows until complete separation into three daughter cells),time between first and second mitoses (3-cell to 4-cell) ranging from 8.7 to 12.7 h,and time between second and third mitoses (4-cell to 5-cell) ranging from0.3 to 2.6 h. This observation, together with the diverse range of cytokinesisphenotypes displayed by abnormal embryos, suggests that our cryopreservedembryos are not developmentally delayed by the cryopreservation process andbehave similarly to fresh zygotes that cleave to two blastomeres. The data alsodemonstrate that abnormal cytokinesis may be associated with underlyingabnormalities in chromosomal composition (as demonstrated by zygotes thatform three blastomeres initially). This hypothesis is consistent with previousobservations correlating aneuploidy with morphology 50–52 .Human embryo culture and microscopy. Human embryos were thawed byremoving the cryovials from the liquid nitrogen storage tank and thawingthem at 22 + 2 °C. Once a vial was thawed, it was opened and the embryoswere visualized under a dissecting microscope. The contents of the vial werethen poured into the bottom of a 3003 culture dish. The embryos were locatedin the drop and the survival of each embryo was assessed and recorded. At 22 +2 °C, the embryos were transferred to a 3037 culture dish containing 1.0 M1,2 propanediol + 0.2 M sucrose for 5 min, then 0.5 M 1,2 propanediol + 0.2 Msucrose for 5 min and 0.0 M 1,2 propanediol + 0.2M sucrose for 5 min.Subsequently, embryos were cultured in Quinn’s Advantage Cleavage Medium(Cooper Surgical) supplemented with 10% Quinn’s Advantage Serum ProteinSubstitute (SPS; Cooper Surgical) between day 1 to 3, and Quinn’s AdvantageBlastocyst Medium (Cooper Surgical) with 10% SPS after day 3 using microdropsunder oil. All of the experiments used the same type of cleavage-stagemedium, except for two stations during the first experiment, which used aGlobal medium (LifeGlobal). In this small subset (12 embryos), the embryosexhibited a slightly lower blastocyst formation rate (3 out of 12, or 25%) butthe sensitivity and specificity of our predictive parameters were both 100%for this group.Time-lapse imaging was performed on multiple systems to accommodateconcurrent analysis of multiple samples as well as to validate the consistency ofthe data across different platforms. The systems consisted of seven individualmicroscopes: (i) two modified Olympus IX-70/71 microscopes equippedwith Tokai Hit heated stages, white-light Luxeon LEDs, and an aperturefor dark-field illumination; (ii) two modified Olympus CKX-40/41 microscopesequipped with heated stages, white-light Luxeon LEDs, and HoffmanModulation Contrast illumination (note: these systems were used only duringthe first of four experiments after it was decided that dark-field illuminationwas preferable for measuring the parameters); and (iii) a custom builtthree-channel miniature microscope array that fits inside a standard incubator,equipped with white-light Luxeon LEDs and apertures for darkfield illumination.We observed no important difference in developmental behavior, blastocystformation rate or gene expression profiles between embryos cultured onthese different systems; indeed, our parameters for blastocyst prediction wereconsistent across multiple systems and experiments.The light intensity for all systems was considerably lower than the lighttypically used on an assisted reproduction microscope due to the low-powerof the LEDs (relative to a typical 100 W Halogen bulb) and high sensitivityof the camera sensors. Using an optical power meter, we determined that thepower of a typical assisted-reproduction microscope (Olympus IX-71 HoffmanModulation Contrast) at a wavelength of 473 nm ranges from roughly 7 to10 mW depending on the magnification, whereas the power of our imagingsystems were measured to be between 0.2 and 0.3 mW at the same wavelength.Images were captured at a 1 s exposure time every 5 min for up to 5 or 6 d,resulting in ~24 min of continuous light exposure. At a power of 0.3 mW, thisis equivalent to roughly 1 min of exposure under a typical assisted-reproductionmicroscope.To track the identity of each embryo during correlated imaging and geneexpression experiment, we installed a video camera on the stereomicroscopeand recorded the process of sample transfer during media change and samplecollection. We performed control experiments with mouse preimplantationembryos (n = 56) and a small subset of human embryos (n = 22), and observedno significant difference (P = 0.96) in the blastocyst formation rate betweenimaged and control embryos.nature biotechnologydoi:10.1038/nbt.1686


© 2010 Nature America, Inc. All rights reserved.High-throughput qRT-PCR analysis. For single embryo or single blastomereqRT-PCR analysis, embryos were first treated with Acid Tyrode’s solutionto remove the zona pellucida. To collect single blastomeres, the embryoswere incubated in Quinn’s Advantage Ca 2+ Mg 2+ –free medium with HEPES(Cooper Surgical) for 5–20 min at 37 °C with rigorous pipetting. Sampleswere collected directly into 10 μl of reaction buffer; subsequent one-stepreverse transcription/pre-amplification reaction was performed as previouslydescribed 53 . Pooled 20× ABI assay-on-demand qRT-PCR primer andprobe mix (Applied Biosystems) were used as gene-specific primers duringthe reverse transcription and pre-amplification reactions. High throughputqRT-PCR reactions were performed with Fluidigm Biomark 96.96 DynamicArrays as previously described 53 using the ABI assay-on-demand qRT-PCRprobes listed in Supplementary Data Set 4. All samples were loaded in threeor four technical replicates. qRT-PCR data analysis was performed with qBase-Plus (Biogazelle), Microsoft Excel and a custom-built software. Certain geneswere omitted from data analysis owing to either poor data quality (e.g., poorPCR amplification curves) or consistent low to no expression in the embryosassessed. For the analysis of blastomere age, the maternal transcript panel usedincludes DAZL, GDF3, IFITM1, STELLAR, SYCP3, VASA, GDF9, PDCD5,ZAR1 and ZP1, whereas the embryonic gene panel includes ATF7IP, CCNA1,EIF1AX, EIF4A3, H2AFZ, HSPA1B, JARID1B, LSM3, PABPC1 and SERTAD1.The expression value of each gene relative to the reference genes GAPDH andRPLP0, as well as relative to the gene average, was calculated using the geNormand ΔΔCt methods (Supplementary Data Sets 5–7) 54,55 . GAPDH and RPLP0were selected as the reference genes for this study empirically based on thegene stability value and coefficient of variation: 1.18 and 46% for GAPDH and1.18 and 34% for RPLP0. These were the most stable among the ten housekeepinggenes that we tested and well within range of a typical heterogeneoussample set 56 . Second, we observed that in single blastomeres, as expected, theamount of RPLP0 and GAPDH transcripts decreased by ~1 Ct value per divisionbetween 1-cell and 8-cell stage, congruent with expectations that eachcell inherits approximately one-half of the pool of mRNA with each cleavagedivision, in the absence of new transcripts before EGA during the first 3 d ofhuman development (Supplementary Fig. 8). Third, we noted that the expressionlevel of these reference genes in single blastomeres remained stable fromthe 8-cell to morula stage, after EGA began. At the intact embryo level, theCt values of both RPLP0 and GAPDH remained largely constant throughoutdevelopment until the morula stage with a slight increase following in the blastocyststage perhaps due to increased transcript levels in the greater numbersof blastomeres present. Most of the gene expression analysis performed in thisstudy focused on developmental stages before the morula stage, however, whenthe expression level of the reference genes was extremely stable.Normal and arrested embryos. In the experiments described in this work,arrested embryos were defined as embryos that did not reach the expecteddevelopmental stage at the time of sample collection. For example, when a plateof embryos was removed from the imaging station on late day 2 for sample collection,any embryo that had reached 4-cell stage and beyond would be identifiedas normal, whereas those that failed to reach 4-cell stage would be labeled asarrested. These arrested embryos were categorized by the developmental stage atwhich they became arrested, such that an embryo with only 2 blastomeres on lateday 2 would be analyzed as an arrested 2-cell embryo. Care was taken to excludeembryos that morphologically appeared to be dead and porous at the time ofsample collection (e.g., degenerate blastomeres). Only embryos that appearedalive (for both normal and arrested) were used for gene expression analysis.However, it is possible that embryos that appeared normal during the time ofcollection might ultimately arrest if they were allowed to grow to a later stage.Multiplex qRT-PCR reactions of up to 96 genes belonging to different categorieswere assayed per sample, including housekeeping genes, cytokinesis components,germ cell markers, maternal factors, embryonic genome activation (EGA)markers, trophoblast markers, inner cell mass markers, pluripotency markers,epigenetics regulators, transcription factors, hormone receptors and others. Twoslightly different but overlapping sets of genes were assayed in three differentexperimental sets (Supplementary Table 1 and Supplementary Data Set 2).Individual blastomeres and maternal and embryonic programs. Based on ourdata, we defined a maternal program as the average expression values of theten markers most quantitatively representative of the ESSP1 genes (maternaltranscripts), and the embryonic program by the average expression values of tenmarkers most representative of the ESSP2 genes (embryonic transcripts). Thus,the gene expression profile of a human embryo at any given time is the sumof maternal mRNA degradation and embryonic or EGA transcripts. Maternaltranscripts typically comprise the bulk of transcripts in a young blastomere ofearly developmental age relative to EGA transcripts, and the opposite holds truefor an older blastomere at a more advanced developmental age (Fig. 5b).Automated cell tracking. Our cell tracking algorithm uses a probabilisticframework based on sequential Monte Carlo methods, which in the field ofcomputer vision is often referred to as the particle filter. The particle filtertracks the propagation of three main variables over time: the state, the controland the measurement. The state variable is a model of an embryo and isrepresented as a collection of ellipses. The control variable is an input thattransforms the state variable and consists of our cell propagation and divisionmodel. The measurement variable is an observation of the state and consists ofour images acquired by the time-lapse microscope. Our estimate of the currentstate at each time step is represented with a posterior probability distribution,which is approximated by a set of weighted samples called particles. We usethe terms particles and embryo models interchangeably, where a particle is onehypothesis of an embryo model at a given time. After initialization, the particlefilter repeatedly applies three steps: prediction, measurement and update.Prediction. Cells are represented as ellipses in two-dimensional space, andeach cell has an orientation and overlap index. The overlap index specifies therelative height of the cells. In general, there are two types of behavior that wewant to predict: cell motion and cell division. For cell motion, our control inputtakes a particle and randomly perturbs each parameter for each cell, includingposition, orientation, and length of major and minor axes. The perturbation israndomly sampled from a normal distribution with relatively small variance(5% of the initialized values). For cell division, we use the following approach.At a given point in time, for each particle, we assign a 50% probability that oneof the cells will divide. This value was chosen empirically, and spans a widerange of possible cell divisions while maintaining good coverage of the currentconfiguration. If a division is predicted, then the dividing cell is chosen randomly.When a cell is chosen to divide (Supplementary Fig. 9, left), we apply asymmetric division along the major axis of the ellipse, producing two daughtercells of equal size and shape (Supplementary Fig. 9, middle). We then randomlyperturb each value for the daughter cells (Supplementary Fig. 9, right).Finally, we randomly select the overlap indices of the two daughter cells whilemaintaining their collective overlap relative to the rest of the cells.After applying the control input, we convert each particle into a simulatedimage. This is achieved by projecting the elliptical shape of each cell onto thesimulated image using the overlap index. The corresponding pixel values areset to a binary value of 1 and dilated to create a membrane thickness comparableto the observed image data. Because the embryos are partially transparentand out-of-focus light is collected, cell membranes at the bottom of theembryo are only visible sometimes. Accordingly, occluded cell membranes areadded with 10% probability. In practice, we have found that these occludedmembrane points are crucial for accurate shape modeling, but it is importantto make them sparse enough so that they do not resemble a visible edge.Measurement. Once we have generated a distribution of hypothesized models,the corresponding simulated images are compared to the actual microscopeimage. The microscope image (Supplementary Fig. 10, left) is pre-processedto create a binary image of cell membranes using a principle curvaturebasedmethod (Supplementary Fig. 10, middle) followed by thresholding(Supplementary Fig. 10, right). The accuracy of the comparison is evaluatedusing a symmetric truncated chamfer distance, which is then used to assign aweight, or likelihood, to each particle.Update. After weights are assigned, particles are selected in proportion to theseweights to create a new set of particles for the next iteration. This focuses theparticle distribution in the region of highest probability. Particles with lowprobability are discarded, whereas particles with high probability are multiplied.Particle re-sampling is performed using the low variance method.doi:10.1038/nbt.1686nature biotechnology


© 2010 Nature America, Inc. All rights reserved.Once the embryos have been modeled, we can extract the dynamic imagingparameters such as duration of cytokinesis and time between mitoses, as discussedin the main text. Our cell tracking software was previously implementedin Matlab, and computation times ranged from a couple seconds to half a minutefor each image depending on the number of particles. Our current version of thesoftware is implemented in the programming languages C/C++, and computationtimes range from 1 to 5 s depending on the number of particles.Discrepancies. In cases where there were discrepancies between manual andautomated measurements, abnormal embryos were correctly assessed by theautomated software. Moreover, our results indicated that the use of automatedimage analysis software may potentially provide improved ability to differentiatenot only the success or failure to reach blastocyst but also blastocyst quality;however, further data and analysis are needed to support this observation.Applications. Our imaging analysis was validated through the developmentof a novel cell tracking algorithm that can automatically extract the dynamicimaging parameters. We believe this to be the first documented report withhuman embryo development subjected to diagnosis by automated imageanalysis algorithms. Recently, cell tracking algorithms have generated muchinterest in areas such as predicting stem cell fate 57 , characterizing subcellularand membrane dynamics 58,59 , and lineage tracing during embryogenesis inCaenorhabditis elegans 60 . We anticipate that our algorithms, and in particularthe underlying probabilistic model estimation technique, will be useful inother applications of time-lapse image analysis that deal with arbitrary motiondynamics and measurement uncertainties.42. Zhang, J.Q. et al. Reduction in exposure of human embryos outside the incubatorenhances embryo quality and blastulation rate. Reprod. Biomed. Online 20, 510–515(2010).43. Veeck, L.L. et al. Significantly enhanced pregnancy rates per cycle throughcryopreservation and thaw of pronuclear stage oocytes. Fertil. Steril. 59, 1202–1207(1993).44. Miller, K.F. & Goldberg, J.M. In vitro development and implantation rates of freshand cryopreserved sibling zygotes. Obstet. Gynecol. 85, 999–1002 (1995).45. Damario, M.A., Hammitt, D.G., Galanits, T.M., Session, D.R. & Dumesic, D.A. Pronuclearstage cryopreservation after intracytoplasmic sperm injection and conventional IVF:implications for timing of the freeze. Fertil. Steril. 72, 1049–1054 (1999).46. Vajta, G., Nagy, Z., Cobo, A., Conceicao, J. & Yovich, J. Vitrification in assistedreproduction: myths, mistakes, disbeliefs and confusion. Reprod. Biomed. Online19, 1–7 (2009).47. Liebermann, J. et al. Blastocyst development after vitrification of multipronuclear zygotesusing the Flexipet denuding pipette. Reprod. Biomed. Online 4, 146–150 (2002).48. Tarin, J.J., Trounson, A. & Sathananthan, H. Origin and ploidy of multipronuclearzygotes. Reprod. Fertil. Dev. 11, 273–279 (1999).49. Sathananthan, A.H. et al. Development of the human dispermic embryo (CD-ROM).Hum. Reprod. Update 5, 553–560 (1999).50. Baltaci, V. et al. Relationship between embryo quality and aneuploidies. Reprod.Biomed. Online 12, 77–82 (2006).51. Fino, E. et al. How good is embryo morphology at predicting chromosomal integrity?When is aneuploidy PGD useful? Fertil. Steril. 84, S98–S99 (2005).52. Kearns, W. et al. Aneuploidy rates of human preimplantation embryos in relationto morphology and development. Fertil. Steril. 86, S474 (2006).53. Foygel, K. et al. A novel and critical role for Oct4 as a regulator of the maternalembryonictransition. PLoS ONE 3, e4109 (2008).54. Livak, K.J. & Schmittgen, K.D. Analysis of relative gene expression data using realtime quantitative PCR and the 2-deltaCT Method. Methods 25, 402–408(2001).55. Vandesompele, J. et al. Accurate normalization of real-time quantitative RT-PCRdata by geometric averaging of multiple internal control genes. Genome Biol. 3,0034.1–0034.12 (2002).56. Hellemans, J., Mortier, G., De Paepe, A., Speleman, F. & Vandesompele, J. qBaserelative quantification framework and software for management and automated analysisof real-time quantitative PCR data. Genome Biol. 8, R19.1–R19.14 (2007).57. Cohen, A.R., Gomes, F.L., Roysam, B. & Cayouette, M. Computational predictionof neural progenitor cell fates. Nat. Methods 7, 213–218 (2010).58. Jaqaman, K. et al. Robust single-particle tracking in live-cell time-lapse sequences.Nat. Methods 5, 695–702 (2008).59. Sergé, A., Bertaux, N., Rigneault, H. & Marguet, D. Dynamic multiple-target tracingto probe spatiotemporal cartography of cell membranes. Nat. Methods 5, 687–694(2008).60. Bao, Z. et al. Automated cell lineage tracing in Caenorhabditis elegans. Proc. Natl.Acad. Sci. USA 103, 2707–2712 (2006).nature biotechnologydoi:10.1038/nbt.1686


l e t t e r sSubstrate elasticity provides mechanical signals for theexpansion of hemopoietic stem and progenitor cellsJeff Holst 1,2 , Sarah Watson 1 , Megan S Lord 3 , Steven S Eamegdool 4 , Daniel V Bax 4 , Lisa B Nivison-Smith 4 ,Alexey Kondyurin 5 , Liang Ma 6 , Andres F Oberhauser 6 , Anthony S Weiss 4 & John E J Rasko 1,2,7© 2010 Nature America, Inc. All rights reserved.Surprisingly little is known about the effects of the physicalmicroenvironment on hemopoietic stem and progenitor cells.To explore the physical effects of matrix elasticity on wellcharacterizedprimitive hemopoietic cells, we made use ofa uniquely elastic biomaterial, tropoelastin. Culturing mouseor human hemopoietic cells on a tropoelastin substrate ledto a two- to threefold expansion of undifferentiated cells,including progenitors and mouse stem cells. Treatment withcytokines in the presence of tropoelastin had an additiveeffect on this expansion. These biological effects requiredsubstrate elasticity, as neither truncated nor cross-linkedtropoelastin reproduced the phenomenon, and inhibitionof mechanotransduction abrogated the effects. Our datasuggest that substrate elasticity and tensegrity are importantmechanisms influencing hemopoietic stem and progenitor cellsubsets and could be exploited to facilitate cell culture.Stem cells require signals from their environment to retain their phenotype.These signals arise from soluble factors, including growth factors,cytokines and chemokines 1 , from cell-cell contact and from extracellularmatrix proteins 2–4 . Such signals are detected by receptors presenton the surface of stem cells 5 . In addition to these well-characterizedsignaling pathways, cell structure and function may be determinedby the mechanical forces of tensegrity (tensional integrity) 6,7 . Indeed,shear stress has recently been shown to promote embryonic hemopoiesisfrom progenitor cells 8 , and mesenchymal stem cells have beenshown to sense alterations in compressive elasticity, differentiatingaccording to the stiffness or elasticity of their substrate 9 .In this study we have taken advantage of the unique extensional elasticityproperties of tropoelastin, the most elastic biomaterial known 10 ,to examine the effects of elasticity on hemopoietic stem and progenitorcell populations ex vivo. Mouse bone marrow cells were cultured oncontrol or tropoelastin-coated tissue culture plates for 3 d (Fig. 1a,band Supplementary Fig. 1). In the presence or absence of cytokines,there was a significant (P = 0.0071 and P = 0.0051) increase in thepercentage of lineage-negative (Lin − ) cells after culture on tropoelastin(Supplementary Fig. 1c,d). For this study we tested a number ofpublished cytokine cocktails, and the combination of interleukin (IL)-3,IL-6 and stem cell factor was chosen as the optimal mixture. We have previouslyused this combination to support the culture of repopulating stemcells 11–13 . Furthermore, compared to controls, the Lin − cells cultured ontropoelastin-coated plates contained a significantly (P = 0.0001) higherpercentage of Sca-1 + c-Kit + cells, which resulted in a greater numberof total Lin − Sca-1 + c-Kit + (LSK) cells (Fig. 1b and SupplementaryFig. 1e,f). There was no evidence of more cell death in the cultured cells(Supplementary Fig. 1a,b); there was also more Sca-1 mRNA and Sca-1protein on the surface of each cell (Supplementary Fig. 2a,b). There was,however, little effect on the percentage of mature cells, including T cells, Bcells, granulocytes and macrophages in this population (SupplementaryFig. 2c). The higher percentage and greater number of LSK cells observedwhen cells were cultured on tropoelastin-coated plates in the absenceof cytokines compared to control (Fig. 1b and Supplementary Fig. 1e)was similar to the those observed when cytokines were added to controlplates (Fig. 1b and Supplementary Fig. 1f). This suggested thattropoelastin mediated a similar effect on the maintenance of LSK cellsas that produced by the combination of IL-3, IL-6 and stem cell factor,and may be able to replace these cytokines for ex vivo culture. However,tropoelastin and the cytokines act though different mechanisms, as thecombination of both produced an additive effect on the percentage andnumber of LSK cells (Fig. 1b and Supplementary Fig. 1e,f).As the extracellular matrix proteins collagen and fibronectin havepreviously been used to enhance the growth of hemopoietic cells,we compared their ability to increase the percentage of LSK cellsin vitro with tropoelastin’s (Supplementary Fig. 3). When bone marrowcells were cultured on tropoelastin-coated plates for up to 7 d,there was a significant (see figures for P-values) increase in thepercentage of LSK cells at each time point compared to controls(Fig. 1c and Supplementary Fig. 3). There was a significant (seefigures for P-values) increase in LSK cells after 1 (SupplementaryFig. 3a) and 3 d (Supplementary Fig. 3b) in the tropoelastin-coatedplates compared to control, fibronectin- or collagen-coated plates.After 5 (Supplementary Fig. 3c) or 7 d (Supplementary Fig. 3d)the increase in LSK cells with tropoelastin compared to controlremained significant (P = 0.0022 and P = 0.0008, respectively).1 Gene & Stem Cell Therapy Program, Centenary Institute, Camperdown, New South Wales, Australia. 2 Sydney Medical School, University of Sydney, Sydney,New South Wales, Australia. 3 Graduate School of Biomedical Engineering, The University of New South Wales, Sydney, New South Wales, Australia. 4 School ofMolecular Bioscience, University of Sydney, Sydney, New South Wales, Australia. 5 School of Physics, University of Sydney, Sydney, New South Wales, Australia.6 Department of Neuroscience and Cell Biology, University of Texas Medical Branch, Galveston, Texas, USA. 7 Cell and Molecular Therapies, Royal Prince AlfredHospital, Camperdown, New South Wales, Australia. Correspondence should be addressed to J.E.J.R. (j.rasko@centenary.org.au).Received 16 June; accepted 7 September; published online 3 October 2010; doi:10.1038/nbt.1687nature biotechnology VOLUME 28 NUMBER 10 OCTOBER 2010 1123


l e t t e r s© 2010 Nature America, Inc. All rights reserved.aLineage markerscKit10 510 410 3ControlLineage markersTropoelastin10 2 10 206.49019.90 50K 100K 150K 200K 250K 0 50K 100K 150K 200K 250KFSCFSC10 5 10 510 4 10 410 3 8.1310 312.110 210 2000 10 2 10 3 10 4 10 50 10 2 10 3 10 4 10 5Sca-1Sca-1cKit10 510 410 3ControlThe increased numbers of LSK cells after culture on tropoelastinsuggested that these cells were preferentially expanding ex vivo(Fig. 1b). Compared to starting numbers of LSK cells, there was a5.7-fold increase in cell numbers after culture on tropoelastin withoutcytokines, and a 14.2-fold increase in cell numbers after culture ontropoelastin with cytokines (Fig. 1b). To further examine this expansionof hemopoietic cells ex vivo, we used carboxyfluorescein succinimidylester (CFSE)-labeled bone marrow cells to directly show thedivision of LSK cells. After 3 d in culture, there was an increase in thenumber and percentage of LSK cells on tropoelastin-coated plates thathad divided and maintained their phenotype, compared to controlplates (Fig. 1d and Supplementary Fig. 4a,b).We next sought to define the effects of tropoelastin on specificprimitive hemopoietic cell subsets, including progenitors andrepopulating stem cells. To determine the clonogenic potential ofcells cultured on tropoelastin, we analyzed equal numbers of cellsby colony-forming assay (Fig. 1e,f). A significant (day 3, P = 0.0046;day 5, P = 0.0043) increase in the total number of colonies wasobserved in cells cultured on tropoelastin-coated plates compared tocontrols (Fig. 1e,f). No difference in the size or type of colonies wasobserved (Supplementary Fig. 5). The signaling lymphocyte attractantmolecule (SLAM) markers CD48 and CD150 can be used to moreaccurately determine the presence of long-term repopulating hemopoieticcells within the LSK population 14,15 . We observed a significant(P = 0.0039) two- to threefold increase in the CD48 − CD150 + LSKpopulation after culture on tropoelastin-coated plates compared tocontrol (Fig. 1g). To further assess the hemopoietic repopulatingbLSK (number)cLSK (% of total)25,00020,00015,00010,00032105,0000Day 0 bone marrowDay 3 controlDay 3 tropoelastinP = 0.0262P = 0.0286P = 0.7879P = 0.0001NocytokinesIL-3, IL-6,SCFP = 0.0317P = 0.0175P = 0.0159Day 1 Day 3 Day 5 Day 7TropoelastinFigure 1 Tropoelastin increased mouse hemopoietic stem and progenitor cells. Mouse bone marrowcells were cultured on control or tropoelastin-coated plates for 7 d. (a,b) On day 3, cells were harvested,counted, and analyzed by flow cytometry (a), and the numbers of LSK cells were compared to thosein fresh uncultured bone marrow (baseline), expressed as mean ± s.e.m. (b; n = 4–5). (c) On days 1, 3,5 and 7, cells were analyzed by flow cytometry, expressed as mean ± s.e.m. (days 1, 5 and 7, n = 3; day 3,n = 10). (d) Cells were labeled with CFSE, cultured for 3 d and analyzed by flow cytometry; resultsare expressed as mean ± s.e.m., with left y axis denoting undivided cells and right axis denoting celldivisions 1–4 (n = 4). (e,f) Cells cultured for either 3 (n = 4) or 5 d (n = 3) were subsequently cultureddLSK (% of total)0.30.20.10eColonies (per 10 4 cells)806040200ControlTropoelastin0 1 2 3 4DivisionsControlP = 0.0046hNegative mice (% total)stem cells, we measured the engraftment of cultured cells transplantedinto irradiated congenic recipient mice. Cells were cultured on controlor tropoelastin-coated plates and transplanted into mice, andthe percentage of mice with engrafted donor cells was determined.The results showed a higher frequency of repopulating cells afterculture on tropoelastin-coated plates compared to control (Fig. 1h;n = 25). The frequency of repopulating cells was also determinedusing extreme limiting dilution analysis (ELDA) software, showinga significantly higher number of repopulating cells after culture ontropoelastin-coated plates (1 in 2.91 × 10 6 ) compared to control (1 in7.34 × 10 6 ; P = 0.0029).Mouse and human hemopoietic cells differ in their cell surfacemarkers and in some aspects of their biology. To determine whethertropoelastin could induce similar effects on human hemopoietic progenitorcells to those observed in mouse cells, we cultured humanumbilical cord blood cells on control or tropoelastin-coated platesfor 3 d. There was a significant (P = 0.0039) increase in the percentageof Lin − CD34 + CD38 + cells, as well as a small increase in thelevel of CD34 and CD38 staining per cell, in tropoelastin-coatedcompared to control plates (Fig. 2a,b). There was no evidenceof more cell death as measured by cell count or flow cytometry(Supplementary Fig. 6). Additionally, human cells were analyzedby colony-forming assay and, consistent with mouse cells, there weresignificantly (P = 0.0025) more progenitor cells after culture on tropoelastin-coatedplates (Fig. 2c). There was no difference in the sizeor type of colonies observed between the control and tropoelastincoatedplates (Supplementary Fig. 7).gCells (% of total)0.150.100.0500 2 4 6 81003.1 6.437101.00.80.60.40.20LSK (% of total)ControlTropoelastinP =0.0039LSK0LSK CD150+CD48-Transplanted cells (× 10 6 )ControlTropoelastinColonies (per 10 4 cells)60 P = 0.00430.200.150.100.05Tropoelastin ControlTropoelastinin MethoCult medium and colonies enumerated (expressed as mean ± s.e.m.). (g) Cells cultured for 3 dwere analyzed for SLAM markers by flow cytometry, expressed as mean ± s.e.m. (n = 4). CD45.1 + bone marrow cells cultured for 3 d were injected intoirradiated CD45.2 + mice and analyzed after 8 weeks for engraftment. (h) The number of transplanted cells were plotted against the percentage of micewith unsuccessful engraftment to determine the frequency of repopulating cells (n = 25 recipients per group from five separate experiments).f40200Cells (% of total)1124 VOLUME 28 NUMBER 10 OCTOBER 2010 nature biotechnology


l e t t e r s© 2010 Nature America, Inc. All rights reserved.aLineage markersCD3810 510 410 310 2010 510 410 310 20050KControlFSC33.7100K150K200K250K3.8Lineage markersCD3810 510 410 310 2010 510 410 310 20050KTropoelastin100K150K200K250K0 10 2 10 3 10 4 10 5 010 2 10 3 10 4 10 5CD34CD34To determine the mechanism by which tropoelastin mediates theincrease of hemopoietic progenitor cells, we purified three differenttropoelastin truncation mutants—ELN27-540, ELN27-365, ELN297-595—and full-length tropoelastin and used them to coat tissue cultureplates (Fig. 3a). Mouse bone marrow cells were cultured in the platesfor 3 d to determine whether a subdomain of the protein conferredits functionality. Plates with intact tropoelastin or ELN27-540 hada higher percentage of LSK cells than control plates, whereas plateswith ELN27-365 or ELN297-595 alone did not differ from controlplates (Fig. 3b). ELN27-365 and ELN297-595 overlap, and togethercontain all intact amino acid sequences comprising the ELN27-540mutant. To determine whether the increase in LSK cells was due todomains present in the ELN27-540, we coated tissue culture plateswith ELN27-365 and ELN297-595 together. The combination of thesetwo truncations, however, did not mediate an increase in the percentageof LSK cells (Fig. 3b). This result shows that a property uniqueto the intact region ELN27-540, and not any intrinsic ability of theamino acid sequences to bind to cells, was responsible for the cellulareffects. To determine whether these effects were due to a lossof extensional elasticity in the truncated tropoelastin proteins, weused single-molecule atomic force microscopy (AFM) to determinetheir extensibilities. Force-extension measurements confirmed thatELN27-540 was more extensible than ELN27-365 or ELN297-595compared to full-length tropoelastin (Fig. 3c).To further confirm that the elastic properties of tropoelastin wererequired for its biological effects on hemopoietic progenitor cells,we cultured mouse bone marrow cells on full-length tropoelastinthat had been cross-linked using glutaraldehyde. At concentrations≥0.1% glutaraldehyde, most of the biological effects of tropoelastinwere lost (Fig. 3d). This result confirmed that the physical propertiesof tropoelastin were responsible for its cellular effects. To confirmthat glutaraldehyde cross-linking had reduced the elasticity of7.7538.5Lin – CD34 + CD38 + (%)Colonies (per 10 4 cells)4 P = 0.00393210150100500ControlTropoelastin200 P = 0.0025ControlTropoelastinFigure 2 Tropoelastin increased human hemopoietic progenitor cells.Human umbilical cord blood cells were cultured on control or tropoelastincoatedplates. (a,b) After 3 d cells were analyzed by flow cytometry withrepresentative dot plots shown for lineage negative gated co-expression ofCD34 and CD38. (b) The percentage of Lin − CD34 + CD38 + cells is shown(mean ± s.e.m.; n = 5). (c) After 3 d, cells were cultured in MethoCultmedium and colonies enumerated (expressed as mean ± s.e.m.) (n = 4).Statistical significance was determined using a two-tailed Wilcoxon signedrank t test.FSCbctropoelastin, we performed single-molecule AFM on the cross-linkedtropoelastin (Fig. 3c). These data established that there was a correlationbetween the extent of the elasticity of tropoelastin and maintenanceof progenitor cells. For example, in a very low concentrationof 0.01% glutaraldehyde, tropoelastin maintained a mean contourlength of 186 nm and increased the percentage of LSK cells comparedto higher glutaraldehyde concentrations, which abrogated the effects(P = 0.0079). The overall comparisons of the atomic force measurementsfor truncations and glutaraldehyde cross-linking revealed athreshold for the mean extensional elasticity, showing that extensionlengths >125 nm were required to mediate increased percentages ofLSK cells (Fig. 3e, dotted line).Quartz crystal microbalance with dissipation monitoring (QCM-D) analyses were performed to further characterize the tropoelastincoatings, as well as fibronectin and collagen. Collagen type I formed amultilayered structure on the oxidized polystyrene surface, as shownby the low Δ frequency values (Fig. 4a,b). Fibronectin and tropoelastin,however, formed a single-layer configuration with higher Δfrequency values, and lower thickness than collagen type I (Fig. 4a,b).The higher Δ dissipation value for tropoelastin showed increasedviscoelasticity compared to fibronectin (Fig. 4a). The high Δ dissipationfor collagen type I is a consequence of the highly hydratedmultilayer configuration on the oxidized polystyrene surface, whichleads to a thickness of >100 nm (Fig. 4b). Intact tropoelastin andELN27-540 were found to rapidly adsorb onto oxidized polystyrenewith differences in the binding evident from the Df plot (Fig. 4a).ELN27-540 bound to the surface and underwent rearrangement asshown by the decrease in dissipation whereas intact tropoelastin didnot undergo rearrangement once adsorbed onto the surface. Modelingof the QCM-D data revealed that intact tropoelastin and ELN27-540each bound to oxidized polystyrene in a monolayer (Fig. 4b,c). Intacttropoelastin bound to oxidized polystyrene in a thicker layer thanELN27-540 (Fig. 4b) and higher mass density (Fig. 4c) owing to thedifference in molecular weight of the two proteins (60 and 44.4 kDa,respectively). Both proteins bound to oxidized polystyrene with thesame circular footprint (Fig. 4d). Taken together with the thicknessdata presented in Figure 4b, this suggests that the C-terminal regionprotruded from the surface and was not required for the mechanicalsignals that led to expansion of hemopoietic stem and progenitor cells(Fig. 3). Tropoelastin molecules did not appreciably interact as onlynanogram amounts were present on the coated surfaces. Tropoelastinintermolecular interactions display a K a of 1.71 ± 0.31 × 10 5 Ms −1and a K d of 3.8 ± 0.22 × 10 −3 s −1 , resulting in a K d of 2.28 ± 0.29 ×10 −8 M at a χ 2 of 1.18 (Baldock, C. et al., unpublished data). Thislow dissociation constant and molecular tethering means that biomolecularinteractions between surface-bound tethered tropoelastinmolecules would be transient and rare. Furthermore, these data donot suggest that tropoelastin forms a multilayer structure that couldresult in increased surface area being presented to the cells.The AFM image confirmed that nonhydrated tropoelastin coatedthe polystyrene surface although some ‘holes’ were not coated(Supplementary Fig. 8a). The area coated by tropoelastin was assmooth as the initial polystyrene substrate. Such island-like proteincoats also occur for horseradish peroxidase on polystyrene 16 . Theroughness histogram shows the distribution of surface pixel eventsof independent random events (Supplementary Fig. 8b). The positionof the bottom peak was 1.69 nm, whereas the position of theupper peak was 8.11 nm, with an average thickness of the tropoelastincoating being 6.42 nm. The roughness of the polystyrene substratumand tropoelastin upper surface was defined by the width of thesepeaks. The roughness of each surface was less than the thicknessnature biotechnology VOLUME 28 NUMBER 10 OCTOBER 2010 1125


l e t t e r s© 2010 Nature America, Inc. All rights reserved.Figure 3 Effect of tropoelastin truncationsand cross-linking on the ability of mousehemopoietic cells to respond to tropoelastin.(a) Schematic of full-length tropoelastin andtruncations, including both domain structureand amino acid numbering of each constructend. (a–d) Mouse bone marrow cells werecultured on control, tropoelastin-coated ortruncated tropoelastin–coated plates (b) oron glutaraldehyde–cross-linked tropoelastincoatedplates (d). After 3 d cells were analyzedby flow cytometry for the absence of lineagespecificsurface markers and for coexpressionof Sca-1 and c-Kit. Shown is the fold increasein percentage of LSK cells relative to controlplates. In b,d, data are mean ± s.e.m. fromthree to six separate experiments. In c,tropoelastin or truncations were deposited onglass slides (with or without cross-linking withglutaraldehyde) and their extensibilities (contourlengths) were determined by atomic forcemicroscopy using the worm-like chain modelof polymer elasticity. Data are the percentageof total events, including a Gaussian nonlinearfit curve. (e) The mean ± s.e.m. of the contourlength was compared. A putative threshold ofextensional elasticity to retain biological activityis shown (dotted line).Fold increase(% Lin – Sca-1 + cKit + )Fold increase(% Lin – Sca-1 + cKit + )of a single tropoelastin coating. The distributionof tropoelastin on polystyrene was0calculated based on the thickness estimationfor the tropoelastin coating. The AFMe250image was analyzed to obtain a distribution 200of holes >7 nm in depth (Supplementary150Fig. 8c). Nonhydrated tropoelastin covered70–75% of the surface. Therefore, because all100cell assays were performed on hydrated tropoelastin,in which >75% coverage would be500expected, every cell that was in contact withthe culture dish would be expected to interactwith many tropoelastin molecules.The regulation of intracellular and extracellular tension relies onthe balance between adhesion and actomyosin contractility, whichcombine to affect gene expression by the process of mechanotransduction17 . As the truncation and cross-linking studies definitively showedthat the physical properties of tropoelastin were required to increasethe percentage of LSK cells, we set out to test whether tropoelastinacts through the mechanotransduction machinery. This machineryrequires the activity of myosin II, and so we used specific inhibitorsof either myosin II heavy chain (blebbistatin) or myosin lightchain kinase (ML-7). Both inhibitors led to a significant (blebbistatin,P = 0.0001; ML-7, P = 0.0156) abrogation or reduction in the effectsof culturing LSK cells on tropoelastin, with blebbistatin-treated cellson tropoelastin indistinguishable from control cells (Fig. 4e). Neitherblebbistatin nor ML-7 had an effect on control cells (Fig. 4e,f).Molecular signals originating from the ligation of hemopoietic stemcell surface receptors including integrins, growth factor receptorsand cytokine receptors have been studied in detail for their ability tomaintain quiescence 18 , induce proliferation or promote differentiation19 . However, little is known regarding the mechanisms by whichthe stiffness or elasticity of the microenvironment may affect cellularfunctions or indeed how the sensing apparatus and transductionof mechanical signals function 20,21 . The most likely mechanism bya272727KP ( ) and KA ( ) cross-linking domainsVGVAPG hexapeptide domainbdContour length (nm)5432143210297P = 0.0001P = 0.0002365540595P = 0.0002P = 0.0002P = 0.02800 0.01 0.1 1Glutaraldehyde (%)724 TropoelastinELN27-365ELN297-595ELN27-540Hydrophobic domainsIntegrin binding domainControlTropoelastinELN27-540ELN27-365ELN297-595ELN27-365 +ELN297-595ControlTropoelastinTropoelastinELN27-540ELN27-365ELN297-5950.01% glut.0.1% glut.1% glut.cTropoelastinMean = 236 nm0 100 200 300 400ELN27-540Mean = 149 nm00 100 200 300 40030ELN27-365Mean = 66 nm2000 100 200 300 40030ELN297-595Mean = 110 nm2000 100 200 300 400400.1% glut.30Mean = 87 nm00 100 200 300 400301% glut.Mean = 99 nm2000 100 200 300 400Contour length (nm)which cells could monitor substrate elasticity involves two-way interactionswith the actin-myosin cytoskeleton, coupled through membranereceptors such as integrins 22 . The response of hemopoietic cellsto tropoelastin that we observed did not require integrin signaling,as truncated tropoelastin, lacking the integrin binding C terminus(Fig. 3a) 23 , retained most of the capacity to increase the percentage ofLSK cells in vitro (Fig. 3b). Glycosaminoglycans have also been shownto mediate cell binding to tropoelastin 24 , as has the elastin bindingprotein/elastin-laminin receptor 25 . Whereas the elastin bindingprotein recognition sequence (VGVAPG) hexapeptide repeat presentin tropoelastin mutant ELN482-525 may be required for binding oftropoelastin to the cells 26 , any signals generated by this domain alonewere not sufficient for ELN297-595 to maintain LSK cells (Fig. 3b).Like integrins, this receptor has previously been shown to mediatemechanotransduction 27,28 . Furthermore, culturing cells on platescoated with cross-linked tropoelastin, whose signaling domains areintact but whose elasticity is impaired, significantly (see Fig. 3d forP values) reduced the percentage of LSK cells compared to cells grownon untreated tropoelastin (Fig. 3d). The presence of a monolayer oftropoelastin in the experiments described herein would not permitthe cells to exert a compressive effect, because tropoelastin monomerscan be extended, but not compressed, from their native form. TakenEvents (% of total)Events (% of total)Events (% of total)Events (% of total)Events (% of total)Events (% of total)Events (% of total)2520151050252015105101000 100 200 300 400400.01% glut.30Mean = 186 nm20102010101126 VOLUME 28 NUMBER 10 OCTOBER 2010 nature biotechnology


l e t t e r sTropoelastin ELN27-540 Fibronectin Collagen type IP = 0.3027a b c1001506,500d10e f5 P = P =0.0001 0.0001∆ dissipation (1E-6)101–200 –150 –100 –50 0∆ frequency (Hz)Thickness (nm)12510020100Mass (ng/cm 2 )6,0001,5001,0005000Circular footprint (diameter, nm)86420Fold increase(% Lin – Sca-1 + cKit + )43210ControlTropoelastinTropoelastin +50 µM blebbistatinFold increase(% Lin – Sca-1 + cKit + )543210P = 0.0047P =0.0002P =0.0156ControlTropoelastinTropoelastin +20 µM ML-7© 2010 Nature America, Inc. All rights reserved.Figure 4 QCM-D analysis of collagen, fibronectin, tropoelastin and ELN27-540 binding to oxidized polystyrene, and the effect of myosin II heavy chainand myosin light chain kinase inhibitors on the ability of mouse hemopoietic cells to respond to tropoelastin. Intact collagen, fibronectin, tropoelastinand ELN27-540 adsorbed onto oxidized polystyrene were monitored by QCM-D for 1 h at 20 ± 0.1°C. (a) Changes in dissipation (Δ dissipation) versuschanges in frequency (Δ frequency) are presented for the third overtone (Dƒ plot). (b–d) These data were analyzed using the Voigt model to determineadsorbed layer thickness (b), mass (c) and protein circular footprint (d). Circular footprint diameter measurements were performed assuming globularproteins of tropoelastin (60 kDa), ELN27-540 (44.4 kDa), fibronectin (440 kDa) and collagen (300 kDa). Data are presented as mean ± s.d. from threeseparate experiments. (e,f) Mouse bone marrow cells were cultured on control (uncoated) or tropoelastin-coated plates in the presence or absence ofblebbistatin (e) or DMSO ± ML-7 (f). On day 3, cells were analyzed by flow cytometry to determine the percentage of LSK cells, relative to control (uncoated)plates. Data are mean ± s.e.m. from three separate experiments. Statistical significance was determined using a two-tailed Wilcoxon signed rank t test.together, these data suggest that the property of extensional elasticityitself conferred the increase in LSK cells.The ability of cells to sense their microenvironment by couplingactin fibers and integrins together may provide a capability to pushand pull the matrix, testing elasticity and thus determining cell fate 29 .Inhibition of myosin II heavy chain and myosin light chain kinaseconfirmed the role of the actin-myosin cytoskeleton in this process,again suggesting that mechanical signals are involved (Fig. 4e,f). Asmyosin II is a highly elastic molecule, it is conceivable that elasticityextends continuously from the extracellular tropoelastin deep insidethe cell 30 . It is known that certain disease states in which tissue elasticityis abnormal may result in suboptimal cellular differentiation 31 .The AFM measurements of tropoelastin used in this study showedthat a cell would be able to extend tropoelastin up to 200 nm, however,truncations maintaining elasticity of over 125 nm also retainedmost of the functionality. As membrane tension has also recently beenshown to be an important determinant for cell protrusions, our resultslend support to a tensegrity model of the stem cell niche 32 . The stemcell and its ‘niche synapse’ may establish a stable mechanical structurethat actively promotes the pluripotent state.Shear stress has recently been shown to enhance mouse embryonichemopoiesis, suggesting that biomechanical forces are involved inembryonic hemopoietic development 8 . The data presented hereinprovide evidence that both mouse and human hemopoietic stem andprogenitor cells can also respond directly to biomechanical forcesthrough extensional elasticity. Taken together, these data suggestthat throughout development, hemopoietic stem and progenitor cellsconstantly sense and react to the physical signals provided by theirniche environments. As such, biomimetic surface and ex vivo culturedesign should include consideration of physical properties, includingelastic and compressive extensibility and shear stress 33 . Furthermore,we produced highly purified good laboratory practice (GLP)-gradetropoelastin, which showed a significant increase in maintenance ofhemopoietic progenitor cells compared to laboratory-grade tropoelastin(Supplementary Fig. 9). GLP-grade tropoelastin exhibited anincrease in mean extensional length compared head to head withlaboratory-grade tropoelastin (229 ± 35 nm versus 198.3 ± 35.6 nm,respectively). Recent data showing that primitive hemopoietic cellsemerge from the endothelium in the aortic floor during developmentsuggest that elastin might be in contact with stem or progenitor cellsduring early blood cell development 34–38 . However, it remains to bedetermined whether elastin is present in the microenvironmentalniche or whether it has a physiological role in hemopoiesis. Eitherway, biomaterials engineered with unique mechanical propertiessuch as tropoelastin may be used to mimic these niche environmentsex vivo. The evidence presented here supports the use of optimalphysical substrates that may replace, complement or add to existingapproaches to support and expand stem cells 33 . In this way, elasticsubstrates such as tropoelastin may offer an approach to biomaterialdesign 39,40 aimed at achieving optimal mechanical culture conditionsfor the maintenance of stem cells ex vivo.MethodsMethods and any associated references are available in the onlineversion of the paper at http://www.nature.com/naturebiotechnology/.Note: Supplementary information is available on the Nature Biotechnology website.AcknowledgmentsWe thank D. Vignali for scientific discussion. J.H. received grant support from theCancer Institute of New South Wales, A.S.W. from the Australian Research Counciland National Health and Medical Research Council, A.F.O. received support fromthe US National Institutes of Health and J.E.J.R. from the National Health andMedical Research Council and the Cell and Gene Trust, Cure The Future.AUTHOR CONTRIBUTIONSJ.H. and J.E.J.R. designed the experiments and wrote the paper, J.H. and M.S.L.analyzed the data, J.H., S.W., A.F.O., L.M., S.S.E., M.S.L. and A.K. generated thedata, A.S.W. and J.E.J.R. provided conceptual input and D.V.B., L.B.N.-S. and S.S.E.provided tropoelastin reagents.COMPETING FINANCIAL INTERESTSThe authors declare competing financial interests: details accompany the full-textHTML version of the paper at http://www.nature.com/naturebiotechnology/.Published online at http://www.nature.com/naturebiotechnology/.Reprints and permissions information is available online at http://npg.nature.com/reprintsandpermissions/.1. Sauvageau, G., Iscove, N.N. & Humphries, R.K. In vitro and in vivo expansion ofhematopoietic stem cells. Oncogene 23, 7223–7232 (2004).nature biotechnology VOLUME 28 NUMBER 10 OCTOBER 2010 1127


l e t t e r s© 2010 Nature America, Inc. All rights reserved.2. Carter, W.G. & Wayner, E.A. Characterization of the class III collagen receptor, aphosphorylated, transmembrane glycoprotein expressed in nucleated human cells.J. Biol. Chem. 263, 4193–4201 (1988).3. Ghaffari, S., Dougherty, G.J., Lansdorp, P.M., Eaves, A.C. & Eaves, C.J. Differentiationassociatedchanges in CD44 isoform expression during normal hematopoiesis andtheir alteration in chronic myeloid leukemia. Blood 86, 2976–2985 (1995).4. Williams, D.A., Rios, M., Stephens, C. & Patel, V.P. Fibronectin and VLA-4 inhaematopoietic stem cell-microenvironment interactions. Nature 352, 438–441(1991).5. Wilson, A. & Trumpp, A. Bone-marrow haematopoietic-stem-cell niches. Nat. Rev.Immunol. 6, 93–106 (2006).6. Janmey, P.A. & McCulloch, C.A. Cell mechanics: integrating cell responses tomechanical stimuli. Annu. Rev. Biomed. Eng. 9, 1–34 (2007).7. Ainsworth, C. Cell biology: stretching the imagination. Nature 456, 696–699 (2008).8. Adamo, L. et al. Biomechanical forces promote embryonic haematopoiesis. Nature 459,1131–1135 (2009).9. Engler, A.J., Sen, S., Sweeney, H.L. & Discher, D.E. Matrix elasticity directs stemcell lineage specification. Cell 126, 677–689 (2006).10. Knowles, T.P. et al. Role of intermolecular forces in defining material properties ofprotein nanofibrils. Science 318, 1900–1903 (2007).11. Holst, J. et al. Generation of T-cell receptor retrogenic mice. Nat. Protoc. 1, 406–417(2006).12. Holst, J., Vignali, K.M., Burton, A.R. & Vignali, D.A. Rapid analysis of T-cell selectionin vivo using T cell-receptor retrogenic mice. Nat. Methods 3, 191–197 (2006).13. Holst, J. et al. Scalable signaling mediated by T cell antigen receptor-CD3 ITAMsensures effective negative selection and prevents autoimmunity. Nat. Immunol. 9,658–666 (2008).14. Kiel, M.J. et al. SLAM family receptors distinguish hematopoietic stem and progenitorcells and reveal endothelial niches for stem cells. Cell 121, 1109–1121 (2005).15. Foudi, A. et al. Analysis of histone 2B-GFP retention reveals slowly cyclinghematopoietic stem cells. Nat. Biotechnol. 27, 84–90 (2009).16. Gan, B.K., Kondyurin, A. & Bilek, M.M. Comparison of protein surface attachmenton untreated and plasma immersion ion implantation treated polystyrene: proteinislands and carpet. Langmuir 23, 2741–2746 (2007).17. Clark, K., Langeslag, M., Figdor, C.G. & van Leeuwen, F.N. Myosin II andmechanotransduction: a balancing act. Trends Cell Biol. 17, 178–186 (2007).18. Passegué, E., Wagers, A.J., Giuriato, S., Anderson, W.C. & Weissman, I.L. Globalanalysis of proliferation and cell cycle gene expression in the regulation of hematopoieticstem and progenitor cell fates. J. Exp. Med. 202, 1599–1611 (2005).19. Kiel, M.J. & Morrison, S.J. Uncertainty in the niches that maintain haematopoieticstem cells. Nat. Rev. Immunol. 8, 290–301 (2008).20. Wang, N., Tytell, J.D. & Ingber, D.E. Mechanotransduction at a distance:mechanically coupling the extracellular matrix with the nucleus. Nat. Rev. Mol. CellBiol. 10, 75–82 (2009).21. Discher, D.E., Mooney, D.J. & Zandstra, P.W. Growth factors, matrices, and forcescombine and control stem cells. Science 324, 1673–1677 (2009).22. Even-Ram, S., Artym, V. & Yamada, K.M. Matrix control of stem cell fate. Cell 126,645–647 (2006).23. Bax, D.V., Rodgers, U.R., Bilek, M.M. & Weiss, A.S. Cell adhesion to tropoelastinis mediated via the C-terminal GRKRK motif and integrin alphaVbeta3. J. Biol.Chem. 284, 28616–28623 (2009).24. Broekelmann, T.J. et al. Tropoelastin interacts with cell-surface glycosaminoglycansvia its COOH-terminal domain. J. Biol. Chem. 280, 40939–40947 (2005).25. Mecham, R.P. et al. Elastin binds to a multifunctional 67-kilodalton peripheralmembrane protein. Biochemistry 28, 3716–3722 (1989).26. Rodgers, U.R. & Weiss, A.S. Cellular interactions with elastin. Pathol. Biol. (Paris)53, 390–398 (2005).27. Spofford, C.M. & Chilian, W.M. The elastin-laminin receptor functions as amechanotransducer in vascular smooth muscle. Am. J. Physiol. Heart Circ. Physiol.280, H1354–H1360 (2001).28. Spofford, C.M. & Chilian, W.M. Mechanotransduction via the elastin-lamininreceptor (ELR) in resistance arteries. J. Biomech. 36, 645–652 (2003).29. Galbraith, C.G., Yamada, K.M. & Galbraith, J.A. Polymerizing actin fibers positionintegrins primed to probe for adhesion sites. Science 315, 992–995 (2007).30. Schwaiger, I., Sattler, C., Hostetter, D.R. & Rief, M. The myosin coiled-coil is atruly elastic protein structure. Nat. Mater. 1, 232–235 (2002).31. Puttini, S. et al. Gene-mediated restoration of normal myofiber elasticity indystrophic muscles. Mol. Ther. 17, 19–25 (2009).32. Ji, L., Lim, J. & Danuser, G. Fluctuations of intracellular forces during cell protrusion.Nat. Cell Biol. 10, 1393–1400 (2008).33. Lutolf, M.P., Gilbert, P.M. & Blau, H.M. Designing materials to direct stem-cellfate. Nature 462, 433–441 (2009).34. Zovein, A.C. et al. Fate tracing reveals the endothelial origin of hematopoietic stemcells. Cell Stem Cell 3, 625–636 (2008).35. Chen, M.J., Yokomizo, T., Zeigler, B.M., Dzierzak, E. & Speck, N.A. Runx1 isrequired for the endothelial to haematopoietic cell transition but not thereafter.Nature 457, 887–891 (2009).36. Bertrand, J.Y. et al. Haematopoietic stem cells derive directly from aortic endotheliumduring development. Nature 464, 108–111 (2010).37. Boisset, J.C. et al. In vivo imaging of haematopoietic cells emerging from the mouseaortic endothelium. Nature 464, 116–120 (2010).38. Kissa, K. & Herbomel, P. Blood stem cells emerge from aortic endothelium by anovel type of cell transition. Nature 464, 112–115 (2010).39. Mithieux, S.M., Rasko, J.E. & Weiss, A.S. Synthetic elastin hydrogels derived frommassive elastic assemblies of self-organized human protein monomers. Biomaterials25, 4921–4927 (2004).40. Mitragotri, S. & Lahann, J. Physical approaches to biomaterial design. Nat. Mater.8, 15–23 (2009).1128 VOLUME 28 NUMBER 10 OCTOBER 2010 nature biotechnology


© 2010 Nature America, Inc. All rights reserved.ONLINE METHODSExtracellular matrix proteins. We have previously described the expression andpurification from bacteria of recombinant human tropoelastin isoforms 41,42 .SHELΔ26A (synthetic human elastin without domain 26A) is the 60-kDa matureform of the secreted protein after removal of the signal peptide (ELN27-724;tropoelastin). This is the most common normal splice variant of human tropoelastin,as exon 26A is expressed only in certain disease states 43,44 . Tropoelastintruncations have been described previously, including ELN27-540 (exons2–25) 45 , ELN27-365 (exons 2–18) 46 and ELN297-595 (exons 17–27) 47 (Fig. 3a).Full-length tropoelastin or truncations (1.5 mg/ml in PBS) were deposited on6- or 24-well plates (Cellstar tissue culture–treated plates, Greiner) overnight at4 °C. Plates were washed twice with PBS to remove excess tropoelastin immediatelybefore use. Tropoelastin solution was reused multiple times without anydetectable difference in performance. Collagen-precoated six-well plates (BectonDickinson) were washed twice with PBS before use. Fibronectin (1 mg/ml;Millipore) was deposited on 6- or 24-well plates for 30 min at 37 °C accordingto the manufacturers’ instructions and were washed twice with PBS immediatelybefore use. Glutaraldehyde was diluted in PBS and deposited on washedtropoelastin-precoated plates for 1 h at room temperature. Glutaraldehyde wassubsequently removed, quenched twice with 1 M Tris base solution (pH 7.0)and then washed twice with PBS to remove remaining glutaraldehyde and Trisbuffer. Two washes after glutaraldehyde treatment were sufficient to avoid anyobservable cellular toxicity, and incubation of cells in glutaraldehyde at 0.001%caused no reduction in the percentage of LSKs (data not shown).Cell culture. C57BL/6J mice were obtained from the Animal ResourceCentre (Perth, Australia), and all animal experiments performed in a specificpathogen–free facility according to national and institutional guidelines. Bonemarrow was harvested from the femur, tibia and spine using a mortar andpestle in PBS supplemented with 2% FCS as previously described 11–13 . Bonemarrow cells (2.5 × 10 6 cells per ml) in duplicate or triplicate were incubatedat 5% CO 2 in a fully humidified atmosphere in complete Dulbecco’s ModifiedEagles Medium with 20% FCS (cDMEM), with or without mouse IL-3(20 ng/ml), human IL-6 (50 ng/ml) and mouse stem cell factor (50 ng/ml;all from Peprotech) in 6- or 24-well plates, with or without precoating withextracellular matrix proteins. Bone marrow cells were harvested using Tryple(Invitrogen). Inhibitors (blebbistatin (Merck); 50 μM, ML-7 (Merck); 20 μM)were added to the initial culture and again after 48 h. ML-7 was dissolved inDMSO, which was also added to control wells. For CFSE labeling, 5 × 10 7 cellsper ml in DMEM were rapidly mixed with CFSE to a final concentration of5 μM, incubated at 37 °C for 10 min, before washing 4 times in cold cDMEM.Human umbilical cord blood mononuclear cells were isolated from freshsamples using a Ficoll-Paque density gradient centrifugation following themanufacturers’ instructions. Mononuclear cells were cultured in triplicate insix-well plates, with or without precoating with tropoelastin, harvested after3 d using Tryple, washed, and resuspended in PBS with 2% FCS for analysis.Colony assays were performed by plating equal numbers of cells in triplicate inMethoCult medium containing mouse or human cytokines following the manufacturers’instructions (Stem Cell Technologies). Colonies containing more thanten cells were enumerated and classified after 7–10 d using a light microscope.Recipient mice (C57BL/6J or B6SJL-Ptprc a Pep3 b /BoyJ) were irradiated to900 cGy using a cesium irradiator before intravenous injection of culturedbone marrow cells (2–4 × 10 6 cells/mouse). Reconstituted mice were killedafter 8 weeks, at which time spleen and peripheral blood were analyzed by flowcytometry for CD45.1 and CD45.2 expression. The frequency of stem cellsand comparison of the two groups was determined by the ELDA web tool forlimiting dilution analysis, or using a nonlinear fit algorithm (x-linear, y-log)with analysis at 37% unsuccessful engraftment 48 .Flow cytometric analysis. Bone marrow cells were incubated with lineagespecific antibodies (B220, CD3, Gr-1, Ter119 and CD11b; BioLegend), preconjugatedto biotin, together with antibodies against Sca-1–fluorescein isothiocyanate(Sca-1–FITC; Biolegend) or Sca-1–allophycocyanin (Sca-1–APC;BioLegend) and c-Kit–phycoerythrin (c-Kit-PE; Becton Dickinson). SLAMantibody markers used were CD150-APC (BioLegend) and CD48–PacificBlue (BioLegend). Cells were washed twice in PBS with 2% FCS and stainedwith streptavidin-PE/indotricarbocyanine (streptavidin PE/Cy5; BectonDickinson). Peripheral blood and spleen cells from transplanted mice werestained with lineage-specific antibody combinations together with antibodiesagainst CD45.1–Pacific Blue and CD45.2–Alexa Fluor-700 (BioLegend) todetect recipient and donor cells.Atomic force microscopy. The mechanical properties of tropoelastin proteinswere studied using a purpose-built single-molecule atomic force microscopeas previously described 49,50 . The spring constant of each individual cantilever(MLCT-AUHW: silicon nitride gold-coated cantilevers; Veeco MetrologyGroup) was calculated using the equipartition theorem 51 . The r.m.s. forcenoise (1–kHz bandwidth) was ~10 pN. The pulling speed of the different forcedistancecurves was in the range of 0.1–0.5 nm/ms. In a typical experiment,purified tropoelastin protein (≤50 μl, 10–100 μg/ml) was adsorbed to a cleanglass coverslip for ~10 min and then rinsed with PBS pH 7.4.Random segments of tropoelastin molecules were then picked up by adsorptionto the cantilever tip. Less than 30% of AFM pulls showed force-extensioncurves consistent with either single or multiple tropoelastin molecules. Thesewere separated into single peaks that fit the Worm-Like–Chain (WLC) model(Supplementary Fig. 10a) or multiple peaks suggesting more than one tropoelastinmolecule was extended (Supplementary Fig. 10b). The single peakshows the nonlinear relationship of extension length and force. Parameters thatare consistent with those for a single full-length tropoelastin molecule includea contour length, Lc, of 210 nm and a persistence length of 0.32 nm. Thepredicted values for a single 697 amino acid long unstructured (random coil)polypeptide chain are 254 nm for Lc and ~0.4 nm for the persistence length.Hence, this recording corresponded to the stretching of a single moleculeto almost its full length. Truncated tropoelastin molecules exhibit a shorterextension length and different WLC curve to that of full-length tropoelastin(Supplementary Fig. 10c). Only single-peak data were used to generate theextension measurements shown in Figure 3c.Polystyrene films of 100 nm nominal thickness were prepared by spin coatingonto (100) silicon substrates (10 × 10 mm) at 2000 rpm using a SCS G3P-8Spincoater. The spin coating solution consisted of polystyrene (Austrex 400from Polystyrene Australia) dissolved to a concentration of 10 g/l in toluene(Sigma Aldrich; purity >99.9%). Solutions produced a homogenous film thicknessover the entire surface of the silicon wafer. The absence of toluene in theresulting polystyrene films was confirmed by FTIR spectroscopy. After physisorptionof tropoelastin, the unbound protein was removed by washing withbuffer solution, followed by a brief aqueous immersion before being allowedto dry. AFM images of samples were collected on a Pico SPM instrument intapping mode, at a scan rate of 0.5 lines/sec over areas of 5 × 5 and 1 × 1 μm.Analysis of the AFM images was performed using the WS × M software(version 3, Nanotec Electronica S.L. Spain).Quartz crystal microbalance. Quantification of protein adsorption onto oxidizedpolystyrene was determined by quartz crystal microbalance (Q SenseAB) with dissipation monitoring (QCM-D). Gold QCM-D crystals were spincoated with polystyrene and oxidized as described previously 52 . Experimentswere performed at 20 ± 0.1 °C. A stable measurement in PBS (pH 7.2) wasestablished before the addition of tropoelastin (20 μg/ml), ELN27-540(20 μg/ml), fibronectin (20 μg/ml) or collagen type I (10 μg/ml) for 1 h followedby two rinses with PBS. Adsorbed mass estimates were derived usingthe Voigt model. Tropoelastin monomers are globular proteins with diametersof ~5–7 nm 53 . Our QCM-D thickness estimate of 4.1 ± 0.3 nm presented inFigure 4b shows therefore that tropoelastin exists as a single monolayer. Thetropoelastin monolayer has previously been reported to be approximately350 ng/cm 2 , which is in agreement with the results presented in this manuscript(Fig. 4c) 54 . Furthermore, tropoelastin was present as a single structural species,as shown by X-ray and neutron scattering (Baldock, C. et al., unpublisheddata). The circular footprint describes the circular area, given in diameter,covered by each protein molecule, which is calculated from the QCM-D massand protein molecular weight assuming monolayer adsorption, according tothe following equation:circular footprint (cm) = 2 ×MWmass × N A × pdoi:10.1038/nbt.1687nature biotechnology


where MW = molecular weight of the protein (ng/mol), mass = mass ofadsorbed protein (ng/cm 2 ) and N A = Avogadro’s number (6.02 × 10 23 mol −1 ).The circular footprint calculations were used to determine the binding footprintof intact tropoelastin, ELN27-540, fibronectin and collagen.Statistical analysis. All primary cell culture experiments were performed on duplicateor triplicate samples, with the figure legend showing the number (n) of separateexperiments performed. Statistical analysis was performed using GraphPadPrism 5 (GraphPad Software). Statistical significance was determined using a twotailedMann Whitney t test unless stated otherwise in the figure legend.41. Martin, S.L., Vrhovski, B. & Weiss, A.S. Total synthesis and expression in Escherichiacoli of a gene encoding human tropoelastin. Gene 154, 159–166 (1995).42. Wu, W.J., Vrhovski, B. & Weiss, A.S. Glycosaminoglycans mediate the coacervationof human tropoelastin through dominant charge interactions involving lysine sidechains. J. Biol. Chem. 274, 21719–21724 (1999).43. Debelle, L. & Tamburro, A.M. Elastin: molecular description and function. Int. J.Biochem. Cell Biol. 31, 261–272 (1999).44. Ostuni, A., Lograno, M.D., Gasbarro, A.R., Bisaccia, F. & Tamburro, A.M. Novelproperties of peptides derived from the sequence coded by exon 26A of humanelastin. Int. J. Biochem. Cell Biol. 34, 130–135 (2002).45. Wu, W.J. & Weiss, A.S. Deficient coacervation of two forms of human tropoelastinassociated with supravalvular aortic stenosis. Eur. J. Biochem. 266, 308–314 (1999).46. Rodgers, U.R. & Weiss, A.S. Integrin alpha v beta 3 binds a unique non-RGDsite near the C-terminus of human tropoelastin. Biochimie 86, 173–178(2004).47. Toonkool, P., Jensen, S.A., Maxwell, A.L. & Weiss, A.S. Hydrophobic domains ofhuman tropoelastin interact in a context-dependent manner. J. Biol. Chem. 276,44575–44580 (2001).48. Hu, Y. & Smyth, G.K. ELDA: extreme limiting dilution analysis for comparingdepleted and enriched populations in stem cell and other assays. J. Immunol.Methods 347, 70–78 (2009).49. Miller, E., Garcia, T., Hultgren, S. & Oberhauser, A.F. The mechanical propertiesof E. coli type 1 pili measured by atomic force microscopy techniques. Biophys. J.91, 3848–3856 (2006).50. Greene, D.N. et al. Single-molecule force spectroscopy reveals a stepwise unfoldingof Caenorhabditis elegans giant protein kinase domains. Biophys. J. 95, 1360–1370(2008).51. Florin, E.L. et al. Sensing specific molecular interactions with the atomic forcemicroscope. Biosens. Bioelectron. 10, 895–901 (1995).52. Lord, M.S. et al. Monitoring cell adhesion on tantalum and oxidised polystyreneusing a quartz crystal microbalance with dissipation. Biomaterials 27, 4529–4537(2006).53. Mecham, R.P. & Heuser, J.E. The elastic fiber. in Cell Biology of Extracellular Matrix(ed. Hay, E.D.) 79–110 (Plenum Press, New York, 1991).54. Yin, Y. et al. Covalent immobilisation of tropoelastin on a plasma deposited interfacefor enhancement of endothelialisation on metal surfaces. Biomaterials 30, 1675–1681(2009).© 2010 Nature America, Inc. All rights reserved.nature biotechnologydoi:10.1038/nbt.1687


Erratum: Pfizer explores rare disease patherrata and corrigendaCatherine ShafferNat. Biotechnol. 28, 881–882 (2010); published online 9 September 2010; corrected after print 22 September 2010In the version of this article initially published, it was reported that GlaxoSmithKline’s (GSK’s) EpiNova was one of several “biotech-like ideas” that“have been known to fizzle in pharma hands”; in fact, EpiNova has not “fizzled” but is in its second year of operation as a discovery performance unitof GSK focusing on epigenetic approaches to autoimmune disease. The error has been corrected in the HTML and PDF versions of the article.Erratum: Public biotech 2009—the numbersBrady Huggett, John Hodgson & Riku LähteenmäkiNat. Biotechnol. 28, 793–799 (2010); published online 9 August 2010; corrected after print 13 October 2010In the version of this article initially published, in Table 6, Acorda was said to have entered into a licensing agreement with Bayer. In fact, Acordaentered into a licensing agreement with Biogen, not Bayer. The error has been corrected in the HTML and PDF versions of the article.© 2010 Nature America, Inc. All rights reserved.Corrigendum: Food firms test fry Pioneer’s trans fat-free soybean oilEmily WaltzNat. Biotechnol. 28, 769–770 (2010); published online 9 August 2010; corrected after print 13 October 2010The version of the article originally published states that Monsanto petitioned the USDA for deregulation of two “soybean products with modifiedoil profiles, one with omega-3 fatty acids for nutrition and the other with enhanced texture and functionality, called high stearic acid soybeans.”The article should have stated that “Monsanto has petitioned for deregulation of Vistive Gold soybeans, with mono-unsaturated fat levels similarto that of olive oil, and saturated fat levels similar to canola oil, which would produce an oil more stable than regular soybean oil at high fryingtemperatures.” The high stearate soybeans are still in development. The error has been corrected in the HTML and PDF versions of the article.Corrigendum: Glyphosate resistance threatens Roundup hegemonyEmily WaltzNat. Biotechnol. 28, 537–538 (2010); published online 7 June 2010; corrected after print 13 October 2010The version of the article originally published erroneously states that “Unlike pesticide use, herbicide use is not regulated by the US federalgovernment.” The article should have stated “Unlike insect resistance, the US government does not have a mandated herbicide-resistanceprogram.” The error has been corrected in the HTML and PDF versions of the article.Corrigendum: Pluripotent patents make prime time: an analysis of theemerging landscapeBrenda M Simon, Charles E Murdoch & Christopher T ScottNat. Biotechnol. 28, 557–559 (2010); published online 7 June 2010; corrected after print 13 October 2010In the version of this article initially published, the authors state: “The patents have been cross-licensed, protecting against unlicensed use ofeither method. Both the Sakurada and Yamanaka patents are part of the portfolio held by iPierian, a company recently formed by the mergerof iZumi Bio, a San Francisco Bay Area biotech and Boston-based Pierian.” This statement is incorrect. The Yamanaka patent (owned by KyotoUniversity) is not licensed to iPierian. The Sakurada patent (owned by iPierian) is not licensed to Kyoto University. The error has been correctedin the HTML and PDF versions of the article.nature biotechnology volume 28 number 10 OCTOBER 2010 1129


careers and recruitmentPortfolio managing for scientistsDavid Sable© 2010 Nature America, Inc. All rights reserved.A doctor-turned-portfolio manager finds the ever-changing economic and business environment stimulating.At 10:44 AM on the Tuesday after Labor Day,seven of the 26 tiny companies currentlyin the Special Situations Life Sciences Fund aretrading up, ten are trading down and eight havenot yet traded. The fund has risen by eight basispoints (0.08%). The general market indices aretrading down for the day and have been flat forthe year. I sit in front of six computer screens,four of which are filled with flashing red andgreen numbers, one showing a research reportand a biomedical statistics program and thelast filled with e-mail, instant messaging, andTwitter and news feeds. From this mix of datasources I look for situations where I can be rightwhile everyone else is wrong, triage my investors’capital to those places where it can do themost good and generate the highest returns.This is not what I envisioned when I graduatedfrom medical school.There are plenty of white coats hangingin the closets of portfolio mangers, analysts,investment bankers and traders. A lot of scientists,doctors and engineers work on WallStreet; the skill sets needed to succeed in anyof these fields are very similar. Moreover, giventhe increasing complexity of the platforms onwhich many biotech, life sciences and medicaltechnology firms are based, familiarity withthe basic vocabulary of molecular biology,epidemiology and biostatistics is essential tomaking educated investment analysis, recommendationsor decisions.There are several routes from the laboratoryto the business world. Most scientists get theirstarts on Wall Street as ‘sell-side’ analysts, supportinginvestment bankers who broker sales ofcompanies, stock offerings and merger activityby evaluating the quality of the scientific workunderlying potential new client companies, andlater publishing written reports on those companiesas they progress through their businessplans. The scope of the analyses broadens overDavid Sable is at Special Situations Life SciencesFund, New York, New York; USA.e-mail: dsable@ssfund.comtime from a narrow focus on just the scientificplatform of a company to a comprehensiveevaluation that also includes critiques of themanagement, capital structure and other corebusiness considerations.Experienced analysts are therefore bilingual—conversantin the languages of scienceand business.Many analysts later move to the ‘buy-side’,that is, to hedge funds, venture capital or privateequity funds and mutual funds, wherethey help the portfolio manager decide howto invest client’s money. Some subsequentlybecome portfolio managers themselves.From bedside to buy-sideI took a somewhat unusual route to beinga portfolio manager. After graduating witha degree in economics, I attended medicalschool, undertook a residency in obstetricsand gynecology, a fellowship in reproductiveendocrinology and practiced medicine for11 years. During this time I also started andfunded a business, performed consulting workfor colleagues in venture capital and continuedstudying finance and accounting. In 2003,I was asked to evaluate the healthcare portfolioof a trading desk at Deutsche Bank, which inturn led to an offer to manage the healthcareportfolio of the Special Situations Funds, a NewYork hedge fund group. As portfolio managerI decide which of the almost 3,000 public lifescience companies to include in our funds, Inegotiate the terms of structured stock offeringsin which we participate, collect industryand company-specific data and formulate ourfunds’ strategies to respond to general scienceand healthcare shifts, such as healthcare reformor changes in government funding for stem cellresearch.A typical week includes meetings with fiveor six company management teams, negotiationswith bankers over two or three majorstock sales, discussions with senior managersof companies in which we have invested, allpunctuated by fact checks and on-the-fly opin-ions with my partners and frequent glances atthe flashing red and green numbers showingmarket activity. I travel a couple of times amonth for medical conferences, company sitevisits or board meetings. I can duplicate mydesk by means of the internet from anywherein the world.Although there are similarities between thelaboratory and the business world, there arefundamental differences as well. Those of ustrained in science, having grown accustomed toreproducible outcomes, the scientific methodand evidence-based decision making, face anadjustment period: in the business world thestandards for data presentation vary with thejudgment and intentions of the presenter. Thetechnical and very specialized platforms onwhich biotech and life science companies arebuilt give senior management teams an advantagewhen dealing with potential investors, particularlythose who lack the science vocabularyto adequately assess the value that these companiesmay represent. Intentional omissions ofrelevant data or background material, misleadingpresentations of data and outright errors instatistical assumptions are not unusual.Labor Day week has ended. A small biotechcompany was acquired by a large pharma company,whereas a large biotech company tries toavoid the same fate—or extract a higher price.The National Institutes of Health has resumedstem cell funding. My fund reinvested in newbornhealth. I spoke at length with the CEO ofone of our portfolio companies about fundinghis clinical trial, then had him walk me throughhow it was powered. Another company in myfund announced that their pivotal data will bereleased in a few days, whereas a third pushedtheir phase 2 proof-of-concept trial back a fewmonths. Many data points to analyze, ideasto generate, decisions to make, outcomes toobserve, record and report.Science after all.COMPETING FINANCIAL INTERESTSThe author declares no competing financial interests.nature biotechnology volume 28 number 10 OCTOBER 2010 1131


people© 2010 Nature America, Inc. All rights reserved.George F. Horner III (left) was named chairman of the boardof directors of Luxembourg-based Creabilis. He brings a trackrecord of building successful biotech businesses, most recentlyas CEO of Prestwick Pharmaceuticals, which was sold to Biovailand Ovation Pharmaceuticals in late 2008. Previously, Hornerwas the CEO of Vicuron.Horner said he was “attracted to this opportunity by thecompany’s rich pipeline of clinical and preclinical drug candidatesfocused on dermatological indications, its deep understanding ofthe science behind a number of serious skin disorders, as well as its unique technologyplatform with broad utility. I am sure that from this strong base we will be able togenerate significant value for Creabilis shareholders.”Agios Pharmaceuticals (Cambridge, MA,USA) has appointed Scott Biller CSO. Hejoins Agios from Novartis Pharmaceuticals,where he was vice president and head ofglobal discovery chemistry at the NovartisInstitutes for BioMedical Research.Effective on December 31, Dario Carrarawill resign from his position as senior vicepresident and managing director of the pharmaceuticalgroup at Antares Pharma (Ewing,NJ, USA). Carrara will transition to FerringInternational Center, which in November2009 purchased from Antares certain assetsand assumed a leased facility in Switzerlandalong with a majority of the site’s employees.MethylGene (Montreal) has announcedthe resignation of Donald F. Corcoranas president, CEO and director. He joinedMethylGene in 1997. Charles Grubsztajn,who has been with MethylGene since 2005,most recently serving as vice president, businessdevelopment, has been appointed presidentand CEO.Intarcia Therapeutics (Hayward, CA, USA)has announced the appointment of KurtGraves as executive chairman of the board.Graves most recently served as executive vicepresident, head of corporate and strategicdevelopment and chief commercial officerat Vertex Pharmaceuticals. Before joiningVertex, he held senior leadership positionsat Novartis Pharmaceuticals including USgeneral manager and head of commercialoperations and then global head of generalmedicines and chief marketing officer for thepharmaceuticals division.Aya Jakobovits wasappointed presidentand CEO ofcancer immunotherapydeveloperKite Pharma (LosAngeles). She bringsover 20 years of experience,previouslyserving as executive vice president and head ofR&D at Agensys, which became an affiliate ofAstellas Pharmaceuticals in December 2007.Before joining Agensys in 1999, Jakobovitsserved as director, discovery research and principalscientist at Abgenix, which was spun out ofCell Genesys in 1996 based on the XenoMousetechnology developed under her leadership.Kite Pharma also announced the appointmentof Gloria Lee as chief medical officer. Lee comesfrom Syndax Pharmaceuticals, where she servedas vice president of clinical development.Frank Kelly Jr. has been appointed a directorat InVasc Therapeutics (Atlanta). His 40 yearsof experience include serving as president andCEO of a joint venture between The Coca-Cola Company and Nestlé Refreshments. Heis a principal at MFK Global, a marketing andconsulting firm.Steven Lo joined Corcept Therapeutics (MenloPark, CA, USA) as vice president, commercialoperations. Lo has worked 15 years in thepharma and biotech industry, most recentlyleading the endocrinology marketing and salesorganization at Genentech.Kane Biotech (Winnipeg, Manitoba, Canada)has appointed Philip Renaud to its board ofdirectors. Renaud is managing director of investmentadvisory firm Church Advisors and waspreviously a founding partner of Change CapitalPartners. He serves on the boards of YorbeauResources, Diagnos and Dia Bras Exploration.Gene Logic (Gaithersburg, MD, USA), anOcimum Biosolutions company, appointedNorrie J.W. Russell president. Russell has a30-year history in pharmaceutical R&D holdingleadership roles at NovaRx, Invitrogen,Aviva Biosciences and Lynx Therapeutics. Healso was formerly global head, biological scienceand technology at AstraZeneca.FluoroPharma (Boston) has named Thijs Spooras CEO and a member of its board of directors.Spoor has 15 years of industry experience,most recently as CFO of Sunstone BioSciences.He previously served in regulatory affairs, newproduct development and as the global brandhead for nuclear cardiology at GE Healthcare.CytoGenix (New Braunfels, TX, USA) namedCy Stein chairman of the company’s board ofdirectors, replacing Randy Moseley. Stein, aprofessor of medicine and molecular pharmacologyat the Albert Einstein College of Medicinein New York, has been on the CytoGenix boardfor 7 years and has served as chairman of the scientificadvisory board. Additionally, Moseley hasstepped down as CFO. His replacement is StevenM. Plumb, former president of Clear FinancialSolutions. He also co-founded and served asCFO of Houston Pharma and 3A Pharma.Marc Tessier-Lavigne, executive vice presidentfor research and CSO at Genentech, will becomethe next president of Rockefeller University (NewYork) effective March 1, 2011. His is the firstdeparture from Genentech’s top scientific rankssince its acquisition by Roche in March 2009.Tessier-Lavigne will succeed Paul M. Nurse,who announced his plans in April to becomepresident of the UK’s Royal Society.George Yu was appointed president and CEOof Sinobiomed (Hong Kong). He most recentlyserved as managing partner of Bay2Peak, afinancial advisory and investment managementfirm. His experience includes small-cap hedgeand venture capital funds in emerging marketsand investment banking at Lehman Brothers.1132 volume 28 number 10 OCTOBER 2010 nature biotechnology


EditorialTeetering on the brinkThe US Congress must authorize federal funding of human embryonic stem cell research.© 2010 Nature America, Inc. All rights reserved.With his August 23 preliminary injunction banning federal fundingof research on human embryonic stem cells (hESCs), US DistrictCourt Judge Royce Lamberth has singlehandedly upended one of themost promising fields of biomedical science. On September 9, in responseto an appeal by the Department of Justice, the injunction was temporarilylifted by the US Court of Appeals in Washington, DC. Still to be decidedare the plaintiffs’ appeal of the appellate court’s decision and the originallawsuit, Sherley v. Sebelius. The legal wrangling, which may reach the USSupreme Court, could drag on for some time. In the meantime, a cloudof uncertainty hangs over US hESC research, spreading confusion andanguish among US National Institutes of Health (NIH)-funded scientistsand their collaborators, disrupting careers, damaging the prospects ofcompanies working in the area of regenerative medicine and impedingthe search for new therapies.Lamberth’s ruling rested on his interpretation of the ambiguous Dickey-Wicker amendment, a rider attached annually to the federal appropriationsbill covering the NIH. Introduced in 1996, two years before hESCswere first derived, Dickey-Wicker prohibits the use of federal money for“research in which a human embryo or embryos are destroyed, discarded,or knowingly subjected to risk of injury or death.” Lamberth’s readingof the amendment—that it excludes federal funding of hESC research—contradicts the interpretations of US Presidents Clinton, Bush and Obama;of Congress, which twice passed bills supporting federal funding of hESCresearch and has never acted to ban such funding; and of the NIH, responsiblefor administering grants in accordance with Dickey-Wicker. Theirview that federally funded hESC research is permissible relied on a 1999analysis by Harriet Rabb, then the general counsel of the Department ofHealth and Human Services, who noted that “human pluripotent stemcells […] are not a human embryo within the statutory definition.”Although some US stem cell scientists and their supporters had worriedabout Dickey-Wicker for years, most were stunned by the latest turnof events. The field was thriving as never before. The NIH had fundedhESC grants since 2002, Obama had removed Bush-era funding restrictionsand offered an eloquent defense of hESC research, and supportamong the public and Congress was continuing to rise. To those familiarwith Sherley v. Sebelius, the plaintiffs’ case appeared weak. In hindsight,it is clear that many were lulled into a false sense of security and thatCongress miscalculated in failing to codify Obama’s March 2009 StemCell Executive Order when the political climate was more favorable. Now,only weeks before an election defined by an anti-incumbent mood thathas politicians reluctant to take on controversial issues, the chances forCongressional action seem slim.The chilling effect from Lamberth’s ruling is likely to include an exodusof US scientists from the stem cell field, the departure of othersto continue their studies abroad and loss of US leadership in a fieldwith exceptional therapeutic promise. Although the ruling targets onlyfederally funded scientists, it will surely affect regenerative medicinecompanies as well, harming efforts to translate basic research on stemcells into therapies.Research on hESCs has always been whipsawed by politics and inconsistentpolicy. If the US government abandons the field now, the consequencesfor tech transfer from federally funded universities and forindustry partnerships with NIH-funded academics, as well as questionsover the eligibility of hESC-based therapies for reimbursement, will makeregenerative medicine even less attractive to investment. Investors, boardsand industry scientists are already skeptical about how close hESCs areto therapies, and in a bad economic environment for biotech generally,a new round of uncertainty may prove to be the field’s undoing. Thebright spot in this sorry tale is California, where the California Institutefor Regenerative Medicine has provided $351 million for projects thatinclude hESCs (not counting money spent on facilities and traininggrants). Although this and other smaller state initiatives have come underpressure from severe state budget crunches, such investments now seemprudent given that they have buffered the effects of fickle federal policy.On scientific grounds, the argument for continued research on hESCsis irrefutable. Different hESC lines behave differently; understandingthese differences and developing therapeutic strategies will require comparisonsamong a large number of lines. Although induced pluripotentstem cells have many advantages and may one day replace hESCs, in theforeseeable future the latter will remain an indispensable cell type forstudying pluripotency and differentiation.During the Bush administration, many US scientists and companiesinterested in working on hESCs concluded that the obstacles were simplytoo great. Obama’s executive order placed the presidential imprimatur onthis young, controversial science, granting it unprecedented legitimacyand encouraging those who had stayed on the sidelines to proceed. Butas the Lamberth decision makes clear, Dickey-Wicker represents a swordof Damocles over the field.It has often been pointed out that allowing excess in vitro fertilization(IVF) embryos to be discarded while outlawing federally funded researchon these embryos is inconsistent. US prohibitions on embryo researchreach back as far as 1974, when opponents of Roe v. Wade claimed thatthe ruling would bring about worst-case scenarios, including indiscriminateembryo experimentation. For 35 years, federal moratoria on embryoresearch have made investigation of infertility, early human development,reproductive medicine and pre-natal diagnosis off-limits to most US scientistsand clinicians. The US needs laws that will protect free inquiryin these areas, and particularly in the field of hESCs, in accordance withethical norms and the NIH’s competitive, merit-based formula. On theday he signed his Stem Cell Executive Order, Obama also issued a memorandumrequiring the development of “a strategy for restoring scientificintegrity to government decision making.” To ensure the scientific integrityof stem cell research—whatever the outcomes of the court cases—thebest solution is swift legislative action.nature biotechnology volume 28 number 10 october 2010 987


newsin this sectionRoche inksstapled peptidedeal with Aileronp 992Pilot biologicsplants springup outsideindustryp 995Agronomicresearchersfree to studyMonsanto’sseeds p 996Geron trial resumes, but standards for stem cell trialsremain elusive© 2010 Nature America, Inc. All rights reserved.As stem cell research findings percolate intothe clinic, there is a growing realization thata lack of procedural and regulatory standardsare creating huge translational potholes.“I can tell you based on my experience,”says Tara Clark, general manager of NorthAmerican clinical operations for BergischGladbach, Germany–based Miltenyi Biotec,“that two investigators from different institutionshave submitted a similar stem cellresearch proposal to the FDA [US Food andDrug Administration]. One came back regulatedas a device and the other came backregulated as a biologic.”For Geron, regulatory confusion hastranslated into a two-year stop/go/stop/goas it has attempted to embark on the firstclinical trials using cells derived from humanembryonic stem cells (hESCs) to treat spinalcord injuries (Nat. Biotechnol. 27, 877, 2009).The FDA has now lifted the clinical hold itimposed last August on the biotech afterstudies revealed that mice used in preclinicalwork had developed cysts. After almosta year of delay, the Menlo Park, California–based firm has the go-ahead to start recruitingpatients with spinal cord injuries.Geron is paving the way for human testingof hESC-derived products, but the lackof standards for translational work is seenas so serious and pervasive that a slew ofefforts is underway to address the problem.The International Society for CellularTherapy (ISCT), a researcher/industryorganization headquartered in Vancouver,Canada, the International Society for StemCell Research (ISSCR) based in DeerfieldIllinois, the European Medicines Agency(EMA) of London, and the CaliforniaInstitute for Regenerative Medicine (CIRM)in San Francisco, all have standardizationinitiatives in the works.The reasons underlying the push are multifold.The most immediate is that hESCderivedtherapies have now joined adultstem cell therapies under investigation inthe clinic. According to a recent estimate,about 68 stem cell-based approaches arecurrently in clinical development (StemCells 6, 517–520, 2010). However, there is noproven approval pathway for either companiesor regulators to refer to. “This is the veryfirst time that FDA [has] had to review anIND [investigational new drug] applicationlike this one,” says Anna Krassowska, headof investor and media relations for Geron,which has faced a procession of regulatoryred lights and green lights. “So we and theydon’t have a lot of information to go on to say‘do this, and this and then this,’” Krassowskanotes.Standardizations are also proving hard toarrive at because unlike new chemical entities,for example, cellular products are oftenheterogeneous and difficult to standardize.“Every type of stem cell seems to be differentin its behavior and thus in its underlying biology,”remarks Lawrence Goldstein, directorof the University of California at San Diego’sStem Cell Program and an ISSCR spokesperson.Paul de Sousa, senior research fellow,MRC Centre for Regenerative Medicine,University of Edinburgh and chief scientistat Roslin Cells, also in Edinburgh, agrees.“Stem cells are by their nature the epitome ofa dynamic entity…they can be one thing oneminute and something else another minute,”he says. One consequence of this variability isthat the proof of a therapeutic efficacy mustbe joined to some sort of regulatory processwhich ensures that the cell type whichbegan the research hasn’t transmogrified intosomething quite different by the time it isimplanted in a person.Guarantees of the uniformity of the cellsare further complicated by the reality that inmany cases stem cells are harvested from onepatient and then reimplanted in someoneelse. These cells carry the unique genetic fingerprintof the donor. “Intrinsically the productsare different between one patient andanother,” say Mohamad Mohty, a professorof hematology at the University of Nantes,France, and chairman of the prospective clinicaltrials committee of the European Groupfor Blood and Marrow Transplantation.Another effect of stem cells’ variability isthe lack of a standardized approach to growingthem in culture. Jon Rowley, director ofcell therapy process development at Lonza,a Basel-based company that specializes ina good manufacturing practice approach tostem cell production, points out that a stemcell line’s phenotypic development can be significantlyaltered by the medium it is grownin. Unfortunately, the fetal animal serumSebastian Kaulitzki/istockphotoGeron has clearance to resume its pioneering trialof human ESC-derived oligodendrocyte precursorcell therapy in humans with spinal cord injuries.nature biotechnology volume 28 number 10 OCTOBER 2010 989


NEWS© 2010 Nature America, Inc. All rights reserved.in briefChina’s $2.4 billion splurgeBiopharma projectswill receive billions.c40/ZUMA Press/NewscomThe Chinesegovernment is pouringan estimated 16 billionyuan ($2.4 billion)to shore up drugdevelopment whileintroducing policies topromote the biotechsector. The newpolicies—designedto boost sevenemerging strategicindustries, fromsustainable energiesto biotech—came under a resolution issued bythe State Council, China’s cabinet, on September8. China’s key new drug R&D scheme waslaunched in 2009. In its first stage, whichwill last until 2011, central government willinvest nearly 6 billion yuan ($882.5 million)to support more than 900 drug developmentprojects as well as several innovative technologyplatforms. This is followed by a second stage,running from 2011 to 2015, with an expected10 billion yuan ($1.47 billion). The biopharmasector is expected to be one of the mainbeneficiaries of this funding push, althoughthe government’s recent announcement did notprovide a breakdown of the investments. Centralgovernment plans to couple this financial supportwith moves to strengthen intellectual propertyprotection, and promote favorable taxationand lending policies. Zailin Yu, chairman andCEO of Tianjin-based protein drug developerSinoBiotech, who is funded by the scheme, saysthere is no preference for biologics or chemicaldrugs, as long as the proposals are strong.Mingde Yu, president of China PharmaceuticalEnterprise Management Association, in Beijing,says Chinese firms are unlikely to developoriginal chemical compounds, and he believesthe opportunities lie in developing biotech drugs.But despite this strong governmental support,biopharma researchers complain the moneyis spread thin among hundreds of projects.The promised funding also arrives late, takes along time to reach scientists and is too tightlyregulated, leaving researchers little flexibilityto modify their research plans. In addition,most contract research organizations (CROs)and large international pharma with facilitiesin China are not invited to participate in thescheme, despite their expertise manufacturing tointernational standards. “In China, most of thehuge government support goes to academics wholack industrial experience and to state-ownedpharmaceuticals because of the gap betweenthe public institutions and privately and foreignownedindustries. This is a big loss to innovativedrug development,” says Shoufu Lu, founderand CEO of Shanghai’s Zhangjiang-based startupAqbio Pharma. “We CROs charge more, soacademics do not accept us. But we are happy tocut our prices in order to be involved in the Statefundedprojects as long as there is mutual understandingbetween academics and us,” says theCEO of a leading CRO in Shanghai’s Zhangjiang,who requested anonymity.Hepeng Jiamediums used in many university laboratoriescarry safety risks whereas the push towardanimal product–free media during commercialscale-ups can create the phenotypic drifteveryone worries about. “And,” says Rowley,“if there is too big a change (in phenotype)you may have to re-run expensive preclinicalor even early human clinical trials.”Clinical trial design is a further challenge.Traditional small molecules and antibodieshave a limited life in the body. If you ceaseadministering the drug the body eventuallywashes it out. But hESCs and other specializedstem cells don’t leave the body; theybecome part of it in a manner akin to theimplantation of a medical device. “In manycases, the introduction of cells into a humanpatient, at least with current technology,is often an irreversible intervention,” saysGoldstein.Stem cells’ idiosyncratic biology createsas well unique intellectual property issuesfor people looking for ways of standardizingpatent claim processes. “There arepatent thickets everywhere,” says DebraMathews, assistant director for science programs,the Johns Hopkins Berman Instituteof Bioethics, and principal investigator inthe Hinxton Group Project. Her institute istrying to come up with ways of optimizingstem cell innovation while at the same timeensuring its products reach as many peopleas quickly as possible (Nat. Biotechnol. 28,544–546, 2010). “The unique property of[hESCs] makes for a particularly stickywicket, as a pluripotent stem cell is a gatewaytechnology,” says Mathews. “And patent controlover a line of [hESCs] gives the patentholder control over downstream research,such as that which differentiates stem cellsinto neurons, islet cells, isolate proteins,et cetera,” she says.And to all of the above must be addedwhat is described as the ‘low hanging fruit’complication. There are already treatmentsfor most simple conditions, and stem cellsare held up as a treatment for the highhanging and as-yet intractable conditions.Geron’s potential treatment to restore limbmovement after a spinal injury is a classicexample. Benchmarking effectiveness of atherapy in such a condition is a substantialchallenge.Finally, the plethora of standardizationuncertainty can translate into a translationalfunding paralysis. “It is clear not enoughinformation is available for new investors tomake informed decisions,” remarks RobertDeans, senior vice president of RegenerativeMedicine of Cleveland-based Athersys,and chair of the ISCT CommercializationCommittee.Unsurprisingly, the multiplicity of issuesto be resolved has created a certain cautionin those groups seeking to have various partsof the translation process become more standardized.Deans says the ISCT is not at thispoint programmatic but seeks rather to bringindustry and researchers together to arriveat a consensus. “We want to give regulators,such as the FDA, exposure to certain testsand scientific models and let them hear froma number of academic investigators what thebottom line should be,” he says.Elona Baum, general counsel for CIRMand the point person in CIRM’s efforts tocome up with standards, says that her organizationhas actively begun to investigatewhat the standardization priorities shouldbe. Working with the Washington, DC–basedlobbying group, the Alliance for RegenerativeMedicine, they are looking at what existingstandards and guidelines exist and are askingkey players in the field what should bedone and in what order. “We all agree withthe need to move ahead, now we are tryingto identify what our priorities should be,” shesays. Goldstein says ISSCR’s core belief is that“the most important thing is protection ofthe people who will participate in the trials,or who will potentially purchase marketedtherapies.”With this in mind ISSCR recently createda website to provide information by whichpatients and physicians can judge the bonafides of stem cell–based cures being promotedon the internet by clinics aroundthe world. (Nat. Biotechnol. 28, 885, 2010).In Europe, EMA has been pushing activeconsultations on various areas of stem cellresearch and applications that need regularization.It hopes to have a guidance documentadopted by November, which shouldgo up on their website soon after.But with all the push for adopting uniformstandards, some in the field fear moreregulatory paralysis. University of Nantes’sMohty points out that a European directivein 2001 that aimed to standardize all clinicaltrial procedures and thus speed up theapproval process actually has had the oppositeeffect when it comes to stem cells. “Itis very difficult, maybe even nearly impossible,to perform clinical trials in the fieldof hematopoietic stem cell transplantationbecause this activity cannot be comparedwith single drugs,” he says. “Consequently,there has been a big drop in the numberof clinical trials performed in Europe afterthat directive.” Simply put, the regulatoryapproval bar has been raised too high.Stephen Strauss Toronto990 volume 28 number 10 october 2010 nature biotechnology


NEWS© 2010 Nature America, Inc. All rights reserved.in briefChina’s $2.4 billion splurgeBiopharma projectswill receive billions.c40/ZUMA Press/NewscomThe Chinesegovernment is pouringan estimated 16 billionyuan ($2.4 billion)to shore up drugdevelopment whileintroducing policies topromote the biotechsector. The newpolicies—designedto boost sevenemerging strategicindustries, fromsustainable energiesto biotech—came under a resolution issued bythe State Council, China’s cabinet, on September8. China’s key new drug R&D scheme waslaunched in 2009. In its first stage, whichwill last until 2011, central government willinvest nearly 6 billion yuan ($882.5 million)to support more than 900 drug developmentprojects as well as several innovative technologyplatforms. This is followed by a second stage,running from 2011 to 2015, with an expected10 billion yuan ($1.47 billion). The biopharmasector is expected to be one of the mainbeneficiaries of this funding push, althoughthe government’s recent announcement did notprovide a breakdown of the investments. Centralgovernment plans to couple this financial supportwith moves to strengthen intellectual propertyprotection, and promote favorable taxationand lending policies. Zailin Yu, chairman andCEO of Tianjin-based protein drug developerSinoBiotech, who is funded by the scheme, saysthere is no preference for biologics or chemicaldrugs, as long as the proposals are strong.Mingde Yu, president of China PharmaceuticalEnterprise Management Association, in Beijing,says Chinese firms are unlikely to developoriginal chemical compounds, and he believesthe opportunities lie in developing biotech drugs.But despite this strong governmental support,biopharma researchers complain the moneyis spread thin among hundreds of projects.The promised funding also arrives late, takes along time to reach scientists and is too tightlyregulated, leaving researchers little flexibilityto modify their research plans. In addition,most contract research organizations (CROs)and large international pharma with facilitiesin China are not invited to participate in thescheme, despite their expertise manufacturing tointernational standards. “In China, most of thehuge government support goes to academics wholack industrial experience and to state-ownedpharmaceuticals because of the gap betweenthe public institutions and privately and foreignownedindustries. This is a big loss to innovativedrug development,” says Shoufu Lu, founderand CEO of Shanghai’s Zhangjiang-based startupAqbio Pharma. “We CROs charge more, soacademics do not accept us. But we are happy tocut our prices in order to be involved in the Statefundedprojects as long as there is mutual understandingbetween academics and us,” says theCEO of a leading CRO in Shanghai’s Zhangjiang,who requested anonymity.Hepeng Jiamediums used in many university laboratoriescarry safety risks whereas the push towardanimal product–free media during commercialscale-ups can create the phenotypic drifteveryone worries about. “And,” says Rowley,“if there is too big a change (in phenotype)you may have to re-run expensive preclinicalor even early human clinical trials.”Clinical trial design is a further challenge.Traditional small molecules and antibodieshave a limited life in the body. If you ceaseadministering the drug the body eventuallywashes it out. But hESCs and other specializedstem cells don’t leave the body; theybecome part of it in a manner akin to theimplantation of a medical device. “In manycases, the introduction of cells into a humanpatient, at least with current technology,is often an irreversible intervention,” saysGoldstein.Stem cells’ idiosyncratic biology createsas well unique intellectual property issuesfor people looking for ways of standardizingpatent claim processes. “There arepatent thickets everywhere,” says DebraMathews, assistant director for science programs,the Johns Hopkins Berman Instituteof Bioethics, and principal investigator inthe Hinxton Group Project. Her institute istrying to come up with ways of optimizingstem cell innovation while at the same timeensuring its products reach as many peopleas quickly as possible (Nat. Biotechnol. 28,544–546, 2010). “The unique property of[hESCs] makes for a particularly stickywicket, as a pluripotent stem cell is a gatewaytechnology,” says Mathews. “And patent controlover a line of [hESCs] gives the patentholder control over downstream research,such as that which differentiates stem cellsinto neurons, islet cells, isolate proteins,et cetera,” she says.And to all of the above must be addedwhat is described as the ‘low hanging fruit’complication. There are already treatmentsfor most simple conditions, and stem cellsare held up as a treatment for the highhanging and as-yet intractable conditions.Geron’s potential treatment to restore limbmovement after a spinal injury is a classicexample. Benchmarking effectiveness of atherapy in such a condition is a substantialchallenge.Finally, the plethora of standardizationuncertainty can translate into a translationalfunding paralysis. “It is clear not enoughinformation is available for new investors tomake informed decisions,” remarks RobertDeans, senior vice president of RegenerativeMedicine of Cleveland-based Athersys,and chair of the ISCT CommercializationCommittee.Unsurprisingly, the multiplicity of issuesto be resolved has created a certain cautionin those groups seeking to have various partsof the translation process become more standardized.Deans says the ISCT is not at thispoint programmatic but seeks rather to bringindustry and researchers together to arriveat a consensus. “We want to give regulators,such as the FDA, exposure to certain testsand scientific models and let them hear froma number of academic investigators what thebottom line should be,” he says.Elona Baum, general counsel for CIRMand the point person in CIRM’s efforts tocome up with standards, says that her organizationhas actively begun to investigatewhat the standardization priorities shouldbe. Working with the Washington, DC–basedlobbying group, the Alliance for RegenerativeMedicine, they are looking at what existingstandards and guidelines exist and are askingkey players in the field what should bedone and in what order. “We all agree withthe need to move ahead, now we are tryingto identify what our priorities should be,” shesays. Goldstein says ISSCR’s core belief is that“the most important thing is protection ofthe people who will participate in the trials,or who will potentially purchase marketedtherapies.”With this in mind ISSCR recently createda website to provide information by whichpatients and physicians can judge the bonafides of stem cell–based cures being promotedon the internet by clinics aroundthe world. (Nat. Biotechnol. 28, 885, 2010).In Europe, EMA has been pushing activeconsultations on various areas of stem cellresearch and applications that need regularization.It hopes to have a guidance documentadopted by November, which shouldgo up on their website soon after.But with all the push for adopting uniformstandards, some in the field fear moreregulatory paralysis. University of Nantes’sMohty points out that a European directivein 2001 that aimed to standardize all clinicaltrial procedures and thus speed up theapproval process actually has had the oppositeeffect when it comes to stem cells. “Itis very difficult, maybe even nearly impossible,to perform clinical trials in the fieldof hematopoietic stem cell transplantationbecause this activity cannot be comparedwith single drugs,” he says. “Consequently,there has been a big drop in the numberof clinical trials performed in Europe afterthat directive.” Simply put, the regulatoryapproval bar has been raised too high.Stephen Strauss Toronto990 volume 28 number 10 october 2010 nature biotechnology


newsUS courts throw ES cell research into disarray© 2010 Nature America, Inc. All rights reserved.Funds for human embryonic stem cell(hESC) research are flowing againfollowing a temporary ban on federalsupport for such research. A lower courtinjunction imposed by US District CourtJudge Royce Lamberth on August 23 waslifted mid-September by the US Courtof Appeals for the District of Columbia.When the injunction was issued, theUS National Institutes of Health (NIH)responded by broadly suspending itsfunding for grants and contracts involvinghESC research, including for projects thattook shape under the restrictive federalpolicies of the Bush Administration. Forluismmolina/istockphotoEmbryonic stem cells are proving hard tostandardize.now, that blanket ban is lifted, but the issue is far from resolved. The lawsuit thatled to the injunction is still pending. “We are pleased with the Court’s interim ruling,which will allow promising stem cell research to continue while we present furtherarguments to the Court,” says NIH director Francis Collins. The ongoing legal battle isseen as harmful to hESC research across the world as it will slow progress and stymiecollaborations with US researchers. As Ian Wilmut at the MRC Centre for RegenerativeMedicine in the UK puts it: “Any disruption of [hESC] research, such as that imposedby the present injunction, will have a chilling effect on research throughout theworld.” According to Elaine Fuchs of Rockefeller University, New York, president ofthe International Society for Stem Cell Research, “Halting federal funding for suchresearch impedes efforts aimed at ‘translating’ this knowledge into new and improvedtreatments for patients.”The lawsuit was brought in part by the Alliance Defense Fund (ADF), a group of“Christian attorneys and like-minded organizations,” based in Scottsdale, Arizona.ADF, acting on the behalf of “doctors opposed to the [Obama] Administration’s[hESC research] policy,” argues that this policy violates the federal Dickey-WickerAmendment, which prohibits “federal funding of research involving the destruction ofhuman embryos.” The Administration says that its hESC research policy complies withthat law because cells from human embryos are donated from private sources and nofederal funds are used obtaining them. Congress, with Representative Diana DeGette(D-CO) as a chief sponsor, twice passed legislation that would explicitly permit federalfunding for hESC research, but former President Bush vetoed those bills. AlthoughPresident Obama would surely sign such a bill, moving it through Congress seemsunlikely anytime soon.Jeffrey L Fox Washington, DCin their words“This is not afixer-upper, this isbeachfront property.’’Genzyme’s CEO HenriTermeer explains why herejected the unsolicited$69 a share offer frompharma giant Sanofiaventis(Boston Globe, 1September 2010).“I wasn’t looking tomove away. In fact, this is probably the only jobthat could have lured me away from Genentech.”Marc Tessier-Lavigne, who will give up his role ofchief scientific officer at Genentech to becomepresident of Rockefeller University. (The NewYork Times, 8 September 2010)“The makers of Viagra would jump at the chanceto sponsor the largest pole in North America.”City councillor Howard Moscoe of Toronto, wherea 410 foot pole to fly the Canadian flag is beingproposed, makes a pitch for corporate supportfrom erectile dysfunction drugmaker Pfizer.(Pharmalot, 27 August 2010)“What they’ve been doing for years isbuying off doctors to sell their products anda doctor’s primary obligation should be to thepatient not the pharmaceutical company.”Paul Thacker, an investigator working forRepublican Chuck Grassley on the US SenateFinance Committee who recently stepped downto join a non-profit, highlights the Washingtonperspective on conflicts of interest. (Pharmalot,23 September 2010)nature biotechnology volume 28 number 10 OCTOBER 2010 991


© 2010 Nature America, Inc. All rights reserved.NEWSin briefDrug user fees top $1 millionFor the ninth straight year, the US Food andDrug Administration (FDA) is raising thefees companies must pay to have their drugsreviewed. As of October 1, new applicationswill cost over a million dollars. User fees wereinstituted in 1992 by the Prescription DrugUser Fee Act (PDUFA) to provide funding so thatthe FDA can conduct timely reviews of drugs.The fees have risen from $100,000 in 1993to $1,542,000 for a new drug application withclinical trial data. Whether PDUFA has beengood for the biotech industry is debatable.Reducing the time to approval (50% reductionsince the late 1990s) has meant millions ofdollars in revenue, as drugs can be brought tomarket earlier in their patent lives, accordingto Mary Olson, at Tulane University in NewOrleans. “This expected revenue for mostdrugs greatly exceeds the user fee even withthe proposed increases,” she says. However,Kurt Karst, a lawyer at Hyman, Phelps, andMcNamara in Washington, DC, with clients inthe biotech industry, says the fees are a concernfor smaller companies deciding whether toseek approval for a drug. In a letter to the FDA,the Biotechnology Industry Organization ofWashington, DC, pointed out that PDUFA feesnow pay a greater share of the budget for drugreviews, almost two-thirds in 2008 up from42.5% in 2006, and called for transparency onhow the fees are used. Laura DeFrancescoSugar beets still in the gameSeed producers will be allowed to plant biotechsugar beets again following a Septemberdecision from the United States Department ofAgriculture’s crop approval arm to allow plantingunder interim guidelines. The Animal and PlantHealth Inspection Service (APHIS) will issuelimited permits to seed developers authorizinggenetically modified (GM) beet planting this fallas long as the harvested beets are not allowedto flower. The permits are a legal way arounda federal judge’s 13 August decision to banall commercial farming of Monsanto’s GenuityRoundup Ready sugar beets beyond that date.GM sugar beets planted before the ruling may beharvested, processed and sold without restrictionand the beets remain eligible for futurecommercial approval pending USDA/APHIS’sfull environmental review of the beets. A federaljudge had revoked APHIS’s beet deregulationand prohibited further planting and sale on thegrounds that the agency had not adequatelyconsidered the potentially irreparable harm GMbeets might cause related species through crossfertilization(Nat. Biotechnol. 27, 970, 2009).APHIS has announced it will expedite the sugarbeets review, which will take about two years.Luther Markwart, of the American SugarbeetGrowers Association and Sugar Industry BiotechCouncil, Washington, DC, says GM beet farmers,who grow 95% of the US crop, already voluntarilymaintain 4-mile isolation from related crops toprevent cross-fertilization. “Most of the interimmeasures that we’re looking at…are things thatwe’re already doing,” he says. Lucas LaursenRoche backs Aileron’s stapled peptidesA company that staples peptides into drugsto target ‘undruggable’ proteins has landed a$1.1 billion deal with Swiss drug maker Roche.The deal signed in August will see Aileronpocket $25 million upfront in technologyand access fees and R&D support. More thanthat, it provides validation from big pharmafor Aileron’s stapling platform.Chemically stapled peptidesresult in helical peptidesthat reputedly combinehigh stabilitywith the ability tocross the cell membraneto hit cellulartargets. This is thefirst major industrycollaboration forAileron, althoughthe Cambridge,Massachusetts–based biotech hasalready receivedthe industry’s collective imprimatur. Last year,the corporate venture arms of no less than fourpharmaceutical firms —Roche Venture Fund,Lilly Ventures, Novartis Venture Funds andGlaxoSmithKline-owned SR One—backedthe company’s vision of peptide modificationwith a $40 million investment round. “They’velooked very hard at this question. In mostcases—without naming names —they havetried [to do this themselves]. And I suspectthey will continue to try,” says Aileron CEOJoseph Yanchik. Notwithstanding such concertedindustry support, converting the promiseof stapled peptides into clinically validateddrug molecules is going to be a complex anddifficult challenge, the scale of which is notlost on its promoters—or its investors. “Youcan imagine we’ve had to run a pretty difficultscientific gauntlet,” says Yanchik. “The lengthand nature of the due diligence was extraordinary.”Peptides make more attractive medicinesthan proteins or nucleic acids. They haveevolved in nature to take on highly specificfunctions, work with great potency and are farsmaller than recombinant proteins and antibodies.But they are inherently unstable chains.As soon as a job is done, they are degradedquickly by proteases—a factor that has tendedto limit their utility as pharmaceuticals. “Thecatch-22 is you want the peptide for its biologicalactivity, but you don’t want the peptidefor its pharmacological vulnerability,” saysLoren Walensky, assistant professor of pediatricsat Harvard Medical School, in Boston,and a member of Aileron’s scientific advisoryboard. Exposure of their amide bonds renderspeptides susceptible to proteolytic breakdown,and their polarity makes cell penetration difficult.“The major problem in the discovery ofpeptide-based drugs has been the ability to getrobust cell penetration,” says Gregory Verdine,professor of chemistryat Cambridge-basedHarvard Universityand chairman ofAileron’s scientificadvisory board.“We’re not the first tostabilize helices.”Stapled peptidesare locked into anα-helical—and, thus,Atomic structure of a single-turn stapled peptidebound to its target. Stapling locks peptides into stable,biologically active alpha-helices.Ailerona biologicallyactive—conformation.To achievethis, hydrocarboncross-links areadded between two non-natural amino acidresidues inserted at each end of the target peptidesequence. A ruthenium-catalyzed olefinmetathesis reaction generates the hydrocarbonlinkages that impart structural stability to thestapled peptide and render it resistant to proteolyticbreakdown. The method is general in itsscope. “You can apply this to any peptide that isnaturally inclined to be helical,” says Walensky.Stapled peptides have a dual role, serving bothas molecular probes for studying biological processes,such as protein-protein interactions, andas drug leads that target those same processes.“This really changes the paradigm, in that wecan create bioactive secondary structures anduse them in vivo to target a disease and study thebiology,” Walensky says. For example, his grouphas generated a stapled peptide, based on theBH3 domain found in the BCL2 protein familymember BID, tha t can activate apoptosis inhuman leukemia xenografts (Science 305, 1466–1470, 2004). More recently, they have identifieda second function for the pro-apoptotic BCL2protein BAD, in insulin secretion and beta cellsurvival. Stapled peptides, based on the BADBH3 domain, act directly on glucokinase andthereby influence glucose-stimulated insulinsecretion (Nat. Med. 14, 144–153, 2008). Overthe summer, the Walensky group also reportedon a stapled peptide that was a highly selectiveinhibitor of MCL1, an anti-apoptotic proteinimplicated in tumor survival (Nat. Chem. Biol.6, 595–601, 2010). Similarly, Verdine’s group hasused stapled peptides to demonstrate inhibitionof the Notch transcription factor complex,992 volume 28 number 10 october 2010 nature biotechnology


news© 2010 Nature America, Inc. All rights reserved.Table 1 Selected therapeutic peptides either registered or in late-stage development.Company Product description Clinical stageScios (Fremont, California)Natrecor (nesiritide; JNS-004, a recombinant B-type brain natriureticpeptide)Novo Nordisk (Bagsvaerd, Denmark) Victoza (liraglutide, a once-daily subcutaneous modified(Arg34, Lys26-[N-epsilon(gamma-Glu[N-alpha-hexadecanoyl])])glucagon-like peptide 1 [7-37] analog)Roche/Trimeris (Durham, North Carolina) Fuzeon (enfuvirtide, a 36-amino-acid peptide derived from theC-terminal (heptad repeat sequence 2; HR2) domain of humanimmunodeficiency virus (HIV) gp41).Amylin (San Diego)/Lilly (Indianapolis) Byetta (extendin-4, a 39-amino-acid peptide exhibiting 52%structural identity to human GLP-1)NPS Allelix (Mississauga, Ontario, Canada) Gattex (teduglutide; ALX-0600, a subcutaneous injectableGLP-2 analog, containing an Ala→Gly substitution at position 2)HealOr (Rehovot, Israel)A topical formulation of peptide pseudosubstrates that bindprotein kinase C isoformsAccess Pharmaceuticals (Dallas) Cytolex (pexiganan acetate, a cream formulation of a 22-amino-acidtopical peptide based on a protein discovered in frog skin, whichdisrupts the integrity of bacterial cell membranes)Zensun Sci&Tech (Shanghai, China) Neucardin (an injectable recombinant peptide fragment of the beta2a isoform of human neuregulin-1)FDA, US Food and Drug Administration.a hitherto notoriously difficult target (Nature462, 182–188, 2009).Verdine first reported on the stapling techniquea decade ago (J. Am. Chem. Soc. 122,5891–5892, 2000). Ten years on, with the firstclinical trials looming, he makes no claims thatthe technology is the finished article. “Everynew drug modality has its own pharmacologicalchallenges,” he says. For that reason,getting large pharma—and its attendant pharmacologicalexpertise—on board at an earlystage has been an explicit goal of the company.Verdine likens the situation to WhitehouseStation, New Jersey–based Merck’s acquisitionof small-interfering-RNA specialistSirna Therapeutics, of San Francisco, in 2006.“When Merck acquired Sirna, they took on theforward challenge of learning how to deliverthese things,” he says.The corresponding challenge with stapledpeptides is learning how to measure theirpharmacodynamic properties. “I don’t thinkthat’s the work of a moment,” says KevinJohnson, newly appointed life sciencespartner at London-based Index Ventures.“You’re getting into small-molecule realms.”And as small molecules, stapled peptidesare most likely to be eliminated through thekidneys. “That almost inevitably makes thehalf-life short,” says Erkki Ruoslahti, distinguishedprofessor at Sanford BurnhamMedical Research Institute, in Santa Barbara,California. Ruoslahti, who is developingtumor-penetrating peptides that can increasethe efficacy of other drugs, also questions thelikely potency of stapled peptides. “One of thethings about peptides is they tend to have lowaffinities,” he says. “That means that a lot ofpeptide is needed.”The peptides enter cells by means of whatVerdine and co-workers have described as“an active, endocytic, peptide import mechanism.”Serum levels of stapled peptides, therefore,are not a good proxy for intracellularlevels, as they are for small-molecule drugsthat enter cells by mass action. “The rate ofclearance from the cells is very different fromthe rate of clearance from the blood,” Verdinesays. Learning about their routes of metabolismand routes of clearance—and learninghow to measure these processes—are all outstandingchallenges.Moreover, a full understanding of the cellpenetratingproperties of stapled peptidesremains elusive for now, although the issue isthe subject of intense scrutiny. “The moleculesthat have been reported have fantastic properties,and even better, they are perfect tools tohelp understand how peptide-like moleculesgain entry to the cell and [how they] trafficonce they do so,” says Alanna Schepartz, whoholds chairs in chemistry and in molecular,cellular and developmental biology at YaleUniversity, in New Haven (Table 1). “Thebiggest questions have to do with how themolecules get out of endosomes rather thanhow they get in.”The immediate attraction of peptides,particularly those of human origin, is thatthey generally have a benign toxicity profile.Aileron’s biggest claim in support of stapledpeptides is that their cell-penetrating abilitieswill open up a target universe that was previouslyoff limits to drug developers. Betweenthem, monoclonal antibodies, which canrecognize only extracellular binding sites orsecreted ligands, and small molecules, whichcan bind only hydrophobic pockets found inFDA approved for acute decompensated congestiveheart failureFDA approved for type 2 diabetesFDA approved for HIV infectionFDA approved for type 2 diabetes and polycysticovary syndromePhase 3 for gastrointestinal diseases, including shortbowel syndrome, enterocolitis and pediatric disordersPhase 3 for decubitus ulcers, varicose ulcers anddiabetic foot ulcersPhase 3 for the treatment of bacterial infectionsassociated with diabetic foot ulcersPhase 3 for the treatment of chronic heart failurea small fraction of proteins, hit little morethan 10% of all possible targets, says Verdine.“That’s the operating theatre of the entire biotechnologyindustry,” he argues.Verdine does not claim that Aileron alonewill be able to access this newly emerging landscape.“I think in the next ten years we’re goingto see a real efflorescence of new drug modalities,”he says. Schepartz identifies β-peptides,peptoids and miniature proteins—which wereinvented in her laboratory—among alternativemodification technologies that have promise.Cambridge, UK–based Bicycle Therapeuticsrecently raised seed funding from two ofAileron’s investors—Novartis Venture Fundsin Basel and SR One in Conshohocken,Pennsylvania—to develop further its chemicallyconstrained cyclic peptides, for whichit claims high target specificity and bindingaffinity, as well as resistance to proteolyticbreakdown. Its cofounders include antibodypioneer Greg Winter.Despite the large headline value of the Rochedeal, the field of peptide modification remainsdistinctly early stage. Basel-based Roche committedonly $25 million in guaranteed funding.A successful outcome to Aileron’s firstclinical trial next year—in an as-yet unidentifiedoncology indication—would thereforeprovide important momentum to the alliance.“It’s very important for us as a companyto get this first clinical trial right. It’s not justthe drug being judged, it’s the platform,” saysYanchik. In the meantime, he says, the deal hasprobably already enlivened the wider field. “I’dbe willing to place a bet with the announcementwe’ve just made, we’ve got another fewcompanies funded.”Cormac Sheridan Dublinnature biotechnology volume 28 number 10 OCTOBER 2010 993


© 2010 Nature America, Inc. All rights reserved.NEWSin briefLife swallows Ion TorrentInstruments provider Life Technologies hasacquired sequencing firm Ion Torrent ofGuilford, Connecticut and S. San Francisco ina deal worth $725 million—a price tag that hasleft some industry observers reeling. In August,the Carlsbad, California–based Life paid $375million upfront, with potential for an additional$350 million in milestones. The prize is IonTorrent’s Personal Genome Machine, a systemthat uses semiconductors rather than optics forsequencing DNA. According to Life, the firstgenerationsystem, due in Q4 2010, will cost$50,000, and its potential scalability suggestsit could tackle entire genomes relatively soon.This machine cannot readily compete withthe multi-gigabase output of San Diego-basedIllumina’s HiSeq2000 or Life’s SOLiD 4—andIon Torrent founder and CEO JonathanRothberg stressed at a recent meeting that itis not intended to do so. “In the near term,there could be some virology and pathogenapplications, and longer term there could besome clinical diagnostic applications,” saysDoug Schenkel, managing director and seniorresearch analyst at Cowen & Company, NewYork. However, Life’s investment considerablyexceeds their target market—estimated at$200 million—suggesting a focus on longterm opportunities. Success is contingentupon both expansion of the sequencing marketand the impact of other powerful contenders:newcomers Pacific Biosciences and CompleteGenomics have recently filed initial publicofferings, and market leader Illumina is unlikelyto rest on its laurels. Michael EisensteinAnti-anemics price hikeNew payment rules for dialysis servicescould further erode the use of erythropoietinstimulatingagents (ESAs), already underscrutiny for potential safety risks. The USCenters for Medicare & Medicaid Services arechanging how Medicare pays for end-stagerenal disease services. From 1 January 2011,payment will bundle equipment and drugsinto a single base rate, which will be increasedfrom $198 to $229.60. This single rate willinclude injectable ESAs, prescribed to stimulatered blood cell production, which are currentlyreimbursed separately. “The move couldaffect prescribing patterns for ESAs and maydiscourage healthcare providers from using largedoses of erythropoietin for patients as it couldlead to financial loss,” says Aparna Krishnan,senior research analyst at IHS Global Insight inLexington, Massachusetts. Makers of all versionsof epoetin alpha are likely to be affected. The USFood and Drug Administration already requires arisk evaluation and mitigation strategy for ESAs,following studies linking an increase in tumorgrowth or risk of cardiovascular events to thedrugs (Nat. Biotechnol. 28, 303, 2010). Withthe new rules, “Companies that manufactureESAs will be forced to reduce drug pricesor risk loss [of] market share,” says SwethaShantikumar, research associate at Frost &Sullivan, Chennai, India. Emma DoreyGenzyme resumes shipping as Sanofi-aventis hoversGenzyme is moving towards resolvingthe manufacturing issues that havecurtailed supplies of its biologics to treatGaucher’s disease and Fabry’s diseasefor over a year. In late August, in themidst of reacting to a hostile takeover bidfrom French drug maker Sanofi-aventis,the biotech sent patient communitiesseparate letters detailing the company’snear-term plans for supply of the drugs.In September, people with Gaucher’sdisease would receive two full doses ofCerezyme (imiglucerase; recombinanthuman (rh) β-glucocerebrosidase)—thesame as before the company had to cutback supplies after discovery of vesivirus2117 contamination at its Allston,Henri A. Termeer, Genzyme’s Chairman,President and Chief Executive Officer, hasbeen fending off Sanofi-aventis’ overtures whiledealing with manufacturing problems.Massachusetts, manufacturing facility (Nat. Biotechnol. 27, 681, 2009). Individualstreated for Fabry’s disease would receive one full dose of Fabrazyme (agalsidase β;rh α-galactosidase A) in September and another this month, which is double what theCambridge, Massachusetts–based firm had been supplying, but still below full dosage.But the company now expects the remediation work at the Allston plant to takefour years. This is up from the two to three years it had estimated earlier this year,when it signed a draft consent decree with the US Food & Drug Administration thatdetailed the process for completing that work and the penalties for missing deadlines(Nat. Biotechnol. 28, 388, 2010). As part of that process, Genzyme is required tocomplete an initial inspection of the facility later this year.The good news is that in the end, Genzyme should have a more efficient productionprocess. By introducing a new working cell bank for Fabrazyme, for example, Genzymehas already increased productivity 30%, and hopes to go 30% higher than that. Bycontrolling the process parameters around cell density, “we think we’ll be able to get theadditional productivity,” said Scott Canute, newly hired president, global manufacturingand corporate operations, on the conference call.“Every company emerges from a consent decree in much better shape,” says WilliamTanner, biotech analyst with Lazard Capital Markets in New York. “Operating undera consent decree, things are going to be tighter, protocols more tightly adhered to. Itstands to reason your production costs should go down.” What’s more, the lost revenuefrom discarded batches of a high-value biologic “far eclipses the cost of having somepeople on the ground to assure that they are in compliance with the consent decree,”he says.That said, with competitors aiming at the Gaucher’s and Fabry’s markets, the timingof these problems couldn’t have been worse for Genzyme. Basingstoke, UK–based Shireobtained EU approval for its Vpriv (velaglucerase alfa) Gaucher’s therapy, on the heels ofa US approval in March 2010. It also sells Replagal (agalsidase alfa) for Fabry’s in theEU and other countries (it is under review in the US). And Protalix, in Carmiel, Israel, ispartnering with Pfizer, in New York, to commercialize plant-derived glucocerebrosidase(taliglucerase alfa); it is also in early-stage development of a plant-derived enzyme drugto treat Fabry’s (Nat. Biotechnol. 28, 107–108, 2010). “It’s irreparable damage,” saysTanner. His initial projections for Vpriv, for example, were for 10–15% of the market butnow, based on physician feedback, they’re at 30–40%.These issues haven’t stopped Sanofi-aventis, however, from pursuing a takeover ofGenzyme. After months of discussions, on August 29, the Paris-based pharma madea formal offer at $69 per share, or $18.5 billion, which Genzyme promptly rejected.However, Tanner estimates that Genzyme lost around $1–1.3 billion in value becauseof its manufacturing stumbles. “If they were better able to hang onto the Gaucher[’s]and Fabry[’s] franchises,” he says, “fair value would be $4–5 per share higher.” Mid-September, Genzyme sold its Genetic Testing Unit to LabCorp of America Holdings,located in Burlington, North Carolina, for $925 million and, in a cost cutting exercise,the biotech will implement over 1,000 job cuts. Mark Ratner Cambridge, MassachusettsSipra Das/The India Today Group/Getty Images994 volume 28 number 10 october 2010 nature biotechnology


newsCancer research fund launches biologics pilot plant© 2010 Nature America, Inc. All rights reserved.Cancer Research UK, the country’s largestcancer research funding charity, has openeda £18 ($28) million facility at Clare Hall,Hertfordshire, to serve as a pilot plant for investigationalbiologics. The charity’s small-scaleBiotherapeutics Development Unit (BDU),launched on July 30, will produce small batchesof clinical grade material ready for testing, inwhat could become an attractive new model forindustry-academia collaborations.The BDU is not the first such initiative totake shape—a slew of small-scale manufacturinginitiatives at academic institutionsand research organizations signals a growingawareness that in-house drug productioncould avoid the delays that often stall a promisingagent at the very early stages. Such pilotmanufacturing plants have been adopted by theMayo Clinic, in Rochester, Minnesota, as wellas by the National Cancer Institute (NCI) andthe National Institute of Allergy and InfectiousDiseases, each of which own manufacturingfacilities in Frederick, Maryland. Otheracademic institutions with their own manufacturingunits include the UK’s Universityof Oxford and University of Bristol, and theBaylor College of Medicine in Houston.At the NCI, the BiopharmaceuticalDevelopment Program (BDP), the US counterpartto the BDU, plays a dual role. On theone hand, it supplies the host research institutionwith timely drugs for medical trials, andon the other, it takes on commercially unattractiveprojects, whose products may not bepicked up for manufacture by big pharma.“Typical projects undertaken by BDP would beCancer Research UK.higher risk for a commercial entity: rare diseases,pediatric indications, small or uncertainmarkets, or concepts with significant technicalor regulatory challenges or need for proof-ofprincipleof a first-in-class approach,” explainsJoseph Tomaszewski, deputy director, Divisionof Cancer Treatment and Diagnosis of the NCIin Bethesda, Maryland.Before the BDP was in place, the manufactureof small, trial-scale batches of new therapieshad to be outsourced to big pharma, whichwas time consuming and expensive. “The littletiny jobs like we do—might get bounced out ofthe queue if [the company] has a commercialjob coming up,” says Stephen Creekmore, chiefof the Biological Resources Branch of the NCI,which houses the BDP. At Cancer ResearchUK, BDU head Heike Lentfer agrees. “Havingour own BDU will prove to be more efficientand cost effective than outsourcing clinicaltrial stage drugs to contract manufacturers,”she says. “The new facility allows us to bemore flexible in scheduling new projects andto develop in-house expertise in the productionof biologics.”Such facilities can also take on novel therapiesthat have yet to establish a commercialtrack record or show significant results in earlyclinical trials. “There are a lot of projects developedat research labs that need a lot more work,and you really need some sort of lab at thedevelopment and early scale-up stage,” notesCreekmore. “That’s very hard to outsourcewithout a lot of money.”The first project of the UK-based BDU willbe the production of Chi Lob 4/7, an anti-CD40Researcher working on monoclonal antibody Chi Lob4/7, Cancer Research UK’s first attempt to producesmall batches of investigational drugs for clinical trials.monoclonal antibody, which will enter phase1 testing for large B-cell non-Hodgkin’s lymphoma.Plans are also underway to work on fiveother projects over the coming year. The unit,operated by a 15-member team, is certified tomeet current good manufacturing practice andcontains two suites to separately work on mammalianand microbial production.“[Projects]come to us when they’re ready for processdevelopment and optimization and scaleup,”says Nigel Blackburn, director of CancerResearch UK’s Drug Development Office. Asthe experimental drugs move through the differentclinical trial phases, the BDUs developlarge-scale manufacturing protocols gearingup for the agents’ eventual commercial manufactureat external facilities, after regulatoryapproval.If phase 1 and phase 2 trials are successful,the units may at that point liaise with commercialentities to take the agent into phase 3trials and beyond, often through cooperativeR&D agreements. “After a novel idea is shownto ‘work’ it usually is not hard to identify commercialinterest,” says NCI’s Tomaszewski. NCIcollaborated with researchers at the Universityof California, San Diego, on the monoclonalantibody Erbitux (cetuximab) now marketedby Eli Lilly of Indianapolis. “NCI contractorsmanufactured purified mouse antibodies thatwere used by the academic investigators inextensive preclinical experiments, followed bysmall clinical trials to show that such antibodiescould target tumors well enough to imagethe patient’s cancers,” writes Tomaszewski. NCIthen made a chimeric antibody.Another small-scale manufacturing facilitywith an impressive track record is the pilot bioproductionfacility at the government-fundedWalter Reed Army Institute for Research(WRAIR), in Silver Spring, Maryland. TheWalter Reed pilot facility was set up around1958, and has nursed several projects throughphase 1 and 2 clinical trials, partnering withpharma companies to carry out later-stagehuman testing.One recent project was the Reed pilotfacility’s collaboration with London-basedGlaxoSmithKline (GSK) on a dengue virus vaccinenow in phase 2 trials. The pilot bioproductionfacility at the Walter Reed Army Instituteof Research has played a significant role in theGSK-WRAIR dengue vaccine program, notesKatie Moore, director of media, vaccines globalpublic health at GSK. The manufacturing facilityis attractive for GSK, Moore says, becauseof “its experienced personnel and because ofits close interactions with the other depart-nature biotechnology volume 28 number 10 OCTOBER 2010 995


NEWS© 2010 Nature America, Inc. All rights reserved.in briefWellcome partners with IndiaA £45 ($70) million fifty-fifty partnershipbetween the UK’s Wellcome Trust and India’sDepartment of Biotechnology (DBT) to supportdevelopment of “affordable healthcare products”is just the kind of boost small Indian biotechcompanies hankered after. The initiativeannounced 29 July builds on the existing £80($124) million alliance launched in 2008 tostrengthen the biomedical research base inIndia (Nat. Biotechnol. 26, 1202, 2008). Theadded impetus is for translating research intomedical products “that are not totally marketdriven but are required by people at [an]affordable price,” says DBT secretary MaharajKishan Bhan. Venture capitalists usually shyaway from backing products that do not have abig market, he says, and the new partnershipplugs this gap. Chandrasekhar Nair, director ofBigtec, a Bangalore-based startup, which hasdeveloped a diagnostic handheld microarrayis investigating biomarker detection for earlyidentification of chronic diseases. Nair says thatunder the Wellcome-DBT alliance his companymay consider sourcing microfluidics capabilitiesfrom UK universities to fast-track the device’sdevelopment. Banda Ravi Kumar of XCytonDiagnostics, Bangalore, says a governmentloan enabled the initial development of theirdiagnostic DNA Macro Chips device. “Thanksto the new initiative, we are looking actively todevelop another such platform for oncology withOxford Biodynamics that has an epigeneticsbasedtechnology,” he says.Killugudi JayaramanHungary eyes biotech jobsThe Hungarian Ministry for National Economyhas unveiled a $4.5 billion scheme aimed atcreating one million jobs within ten years. TheNew Széchenyi Development Plan will bolstersmall and medium enterprises (SMEs) acrossall industries, including biotech. The launch ofa series of consultations, slated for September2011, will provide SMEs with resources fromlocal government and EU funds by 2013.The key points include developing healthcareand ‘green’ industries, improving science andinnovation, promoting business growth, andinvesting in housing, employment and transport.“What we see is promising, but the plan is onlyone piece of the policy. We need to see how itwill work all together, “ notes Ernö Duda, CEO ofSOLVO, headquartered in Budapest, and founderand president of the Hungarian BiotechnologyAssociation. “It is still too early to say how muchof the funding will go into the biotechnologyindustry, but we hope that the government willrecognize that while biotechnology is a smallsector, it is growing—even while Hungary wasin recession, the biotechnology sector grew byaround 50% a year,” says Duda. The HungarianBiotechnology Association, which was foundedonly seven years ago and already has over 100members; has compiled a strategic report on thebiotech industry for the government. “We see theSzéchenyi plan as being in line with our strategy,and we feel that this will give the industry aboost,” says Duda.Suzanne Elvidgements at the institute.” What’s more, the smallmanufacturing plant has produced the clinicalgrade material needed to move projects frompreclinical research into phase 1/2 clinical trials,she adds.Another pilot bioproduction facility success isthe Japanese encephalitis virus (JEV) purifiedinactivatedvaccine, manufactured and distributedas Ixiaro (inactivated JEV strain SA14-14-2with aluminum hydroxide adjuvant). Ixiaroreceived US Food and Drug Administrationapproval last year, and is now distributed andmanufactured in the US by Novartis of Baselunder license from Intercell of Vienna.Ken Eckels, who leads the research team atthe Walter Reed pilot facility, has no doubt thatbiomanufacturing units springing up in publiclyfunded organizations provide a valuableservice. The key, he says, is keeping up withregulatory protocols such as current goodmanufacturing practice and ensuring that theappropriate quality control and quality assurancechecks are in place.Nidhi Subbaraman BostonMonsanto relaxes restrictions on sharing seeds for researchPublic sector scientists whocomplained last year that seedcompanies were curbing theirrights to study commercial biotechcrops are negotiating researchagreements with industry. InAugust, the Agricultural ResearchService (ARS), an agency withinthe US Department of Agriculturein Washington, DC, finalized anumbrella license with St. LouisbasedMonsanto that gives ARSscientists the freedom to studyMonsanto’s commercial seedswithout asking the company forNews.comAgronomic research scientists are now free to studyMonsanto’s commercial seeds.permission on each project. “[The agreement] is extremely good and specific. ARS will beallowed to do basically everything that could be desired,” says one ARS scientist who askedto remain anonymous.ARS scientists were part of a group of 26 researchers who lodged an anonymous publiccomplaint in February 2009 that charged that seed companies were thwarting public sectorresearch. They said a legal contract called a ”stewardship agreement” forbid research frombeing conducted on the companies’ crops and seeds, no matter how they were obtained.The scientists said they felt forced to seek permission from the seed companies beforeconducting studies, even on crops that had been on the market for years (Nat. Biotechnol.27, 880–882, 2009). “No truly independent research can be legally conducted on manycritical questions involving these crops” because of company-imposed restrictions, thescientists wrote in their public comment.In response to the complaint and the press reports that followed, seed companiesreexamined their research agreements with the public sector. Indianapolis-based DowAgroSciences, Basel-based Syngenta and Johnston, Iowa–based Pioneer Hi-Bred have allbegun discussions with ARS over new umbrella agreements, according to the companies.These industry players, along with Monsanto, have also been working with universities onsimilar licenses.The Monsanto-ARS agreement obtained by Nature Biotechnology allows ARS scientiststo conduct agronomic research—studies on how crops interact with local environmentsand which varieties perform best. Studies outside of agronomic research, such asbreeding, reverse engineering or characterizing the genetic composition of the crop,require separate contracts with the company. The agreement is nearly identical in scopeto Monsanto’s licenses with universities, but is more specific. An appendix includedin ARS’s license lists more than 25 examples of the specific types of studies that areconsidered “agronomic” and therefore permissible—a definition that has been unclearto public sector scientists in the past. “It allows us to do our research under a blanketagreement instead of negotiating everything [with Monsanto] every time,” says LarryChandler, an area director at ARS who facilitated the negotiations. “This is much moreefficient for all parties.”Emily Waltz Nashville, Tennessee996 volume 28 number 10 october 2010 nature biotechnology


NEWS© 2010 Nature America, Inc. All rights reserved.in briefWellcome partners with IndiaA £45 ($70) million fifty-fifty partnershipbetween the UK’s Wellcome Trust and India’sDepartment of Biotechnology (DBT) to supportdevelopment of “affordable healthcare products”is just the kind of boost small Indian biotechcompanies hankered after. The initiativeannounced 29 July builds on the existing £80($124) million alliance launched in 2008 tostrengthen the biomedical research base inIndia (Nat. Biotechnol. 26, 1202, 2008). Theadded impetus is for translating research intomedical products “that are not totally marketdriven but are required by people at [an]affordable price,” says DBT secretary MaharajKishan Bhan. Venture capitalists usually shyaway from backing products that do not have abig market, he says, and the new partnershipplugs this gap. Chandrasekhar Nair, director ofBigtec, a Bangalore-based startup, which hasdeveloped a diagnostic handheld microarrayis investigating biomarker detection for earlyidentification of chronic diseases. Nair says thatunder the Wellcome-DBT alliance his companymay consider sourcing microfluidics capabilitiesfrom UK universities to fast-track the device’sdevelopment. Banda Ravi Kumar of XCytonDiagnostics, Bangalore, says a governmentloan enabled the initial development of theirdiagnostic DNA Macro Chips device. “Thanksto the new initiative, we are looking actively todevelop another such platform for oncology withOxford Biodynamics that has an epigeneticsbasedtechnology,” he says.Killugudi JayaramanHungary eyes biotech jobsThe Hungarian Ministry for National Economyhas unveiled a $4.5 billion scheme aimed atcreating one million jobs within ten years. TheNew Széchenyi Development Plan will bolstersmall and medium enterprises (SMEs) acrossall industries, including biotech. The launch ofa series of consultations, slated for September2011, will provide SMEs with resources fromlocal government and EU funds by 2013.The key points include developing healthcareand ‘green’ industries, improving science andinnovation, promoting business growth, andinvesting in housing, employment and transport.“What we see is promising, but the plan is onlyone piece of the policy. We need to see how itwill work all together, “ notes Ernö Duda, CEO ofSOLVO, headquartered in Budapest, and founderand president of the Hungarian BiotechnologyAssociation. “It is still too early to say how muchof the funding will go into the biotechnologyindustry, but we hope that the government willrecognize that while biotechnology is a smallsector, it is growing—even while Hungary wasin recession, the biotechnology sector grew byaround 50% a year,” says Duda. The HungarianBiotechnology Association, which was foundedonly seven years ago and already has over 100members; has compiled a strategic report on thebiotech industry for the government. “We see theSzéchenyi plan as being in line with our strategy,and we feel that this will give the industry aboost,” says Duda.Suzanne Elvidgements at the institute.” What’s more, the smallmanufacturing plant has produced the clinicalgrade material needed to move projects frompreclinical research into phase 1/2 clinical trials,she adds.Another pilot bioproduction facility success isthe Japanese encephalitis virus (JEV) purifiedinactivatedvaccine, manufactured and distributedas Ixiaro (inactivated JEV strain SA14-14-2with aluminum hydroxide adjuvant). Ixiaroreceived US Food and Drug Administrationapproval last year, and is now distributed andmanufactured in the US by Novartis of Baselunder license from Intercell of Vienna.Ken Eckels, who leads the research team atthe Walter Reed pilot facility, has no doubt thatbiomanufacturing units springing up in publiclyfunded organizations provide a valuableservice. The key, he says, is keeping up withregulatory protocols such as current goodmanufacturing practice and ensuring that theappropriate quality control and quality assurancechecks are in place.Nidhi Subbaraman BostonMonsanto relaxes restrictions on sharing seeds for researchPublic sector scientists whocomplained last year that seedcompanies were curbing theirrights to study commercial biotechcrops are negotiating researchagreements with industry. InAugust, the Agricultural ResearchService (ARS), an agency withinthe US Department of Agriculturein Washington, DC, finalized anumbrella license with St. LouisbasedMonsanto that gives ARSscientists the freedom to studyMonsanto’s commercial seedswithout asking the company forNews.comAgronomic research scientists are now free to studyMonsanto’s commercial seeds.permission on each project. “[The agreement] is extremely good and specific. ARS will beallowed to do basically everything that could be desired,” says one ARS scientist who askedto remain anonymous.ARS scientists were part of a group of 26 researchers who lodged an anonymous publiccomplaint in February 2009 that charged that seed companies were thwarting public sectorresearch. They said a legal contract called a ”stewardship agreement” forbid research frombeing conducted on the companies’ crops and seeds, no matter how they were obtained.The scientists said they felt forced to seek permission from the seed companies beforeconducting studies, even on crops that had been on the market for years (Nat. Biotechnol.27, 880–882, 2009). “No truly independent research can be legally conducted on manycritical questions involving these crops” because of company-imposed restrictions, thescientists wrote in their public comment.In response to the complaint and the press reports that followed, seed companiesreexamined their research agreements with the public sector. Indianapolis-based DowAgroSciences, Basel-based Syngenta and Johnston, Iowa–based Pioneer Hi-Bred have allbegun discussions with ARS over new umbrella agreements, according to the companies.These industry players, along with Monsanto, have also been working with universities onsimilar licenses.The Monsanto-ARS agreement obtained by Nature Biotechnology allows ARS scientiststo conduct agronomic research—studies on how crops interact with local environmentsand which varieties perform best. Studies outside of agronomic research, such asbreeding, reverse engineering or characterizing the genetic composition of the crop,require separate contracts with the company. The agreement is nearly identical in scopeto Monsanto’s licenses with universities, but is more specific. An appendix includedin ARS’s license lists more than 25 examples of the specific types of studies that areconsidered “agronomic” and therefore permissible—a definition that has been unclearto public sector scientists in the past. “It allows us to do our research under a blanketagreement instead of negotiating everything [with Monsanto] every time,” says LarryChandler, an area director at ARS who facilitated the negotiations. “This is much moreefficient for all parties.”Emily Waltz Nashville, Tennessee996 volume 28 number 10 october 2010 nature biotechnology


newsNEWS makerConstellation PharmaceuticalsReplete with investor funds, the Cambridge, Massachusetts–based epigenetics firm is taking aim at methylases anddemethylases linked to disease.© 2010 Nature America, Inc. All rights reserved.Constellation’s $22 million series B financingthis summer again drew pundits’ attentiontoward a field of research that has been tilledwith particular vigor over the past year or so.The basic science underpinning epigenetics—manipulating gene expression without alteringthe sequence itself—is widely regarded as conceptuallysound. Four epigenetic drugs—twothat take aim at (among other things) histonedeacetylases (HDACs) and two that targetDNA methyltransferases (DNMTs), whichcontrol chemical tags on histones or cytosinesin the gene sequence, respectively—have thusfar received US Food and Drug Administration(FDA) approval. For Constellation, the questionis whether its therapeutic focus, histonemethylases and demethylases, which modifythe proteins that package and order DNA, willprove as successful.Constellation started out, fueled by$32 million in series A funding, in April2008. It spent the first year and a half hiringkey personnel, building infrastructure andoptimizing assays. Part of that initial fundingcame from Third Rock Ventures, and apartner at the fund, Mark Levin—the formerCEO of Cambridge, Massachusetts–basedMillennium Pharmaceuticals—served asinterim CEO of Constellation. One industryinsider says Levin’s reputation at Millenniumenabled him to sell the figurative sizzle toinvestors before there was any clinical ‘steak’.Constellation work remains only at the preclinicalstage.Now with 50 employees—up from about30 this time last year—the company aimsto develop drugs targeting a broad rangeof histone methylation enzymes. So far,Constellation scientists have published evidencethat the histone lysine methyltransferaseG9a/KMT1C regulates chromatinstructure by promoting the methylation of thehistone H1.4K26 in vivo in mammals (J. Biol.Chem. 284, 8395–8405, 2009). Constellationclaims to have nailed down programs thatidentify enzymes with specific linkages todisease. The company will not reach the clinicby next year, but could by 2012.Mark Goldsmith, Constellation’s presidentand CEO, says the firm’s research is bolsteredby enhanced understanding of the role ofhistone methylation in modulating chromatinthrough action by enzymes and proteinsthat act as ‘writers’, ‘readers’ and ‘erasers’ toactivate or deactivate genes. ‘Writers’ addchemical groups, ‘readers’ bear bindingregions that recognize changes and ‘erasers’remove the marks. Now that a surveyof human methylomes—the map of humanmethylation patterns—has been published(Nature 462, 315–322, 2009), Goldsmithsays the linkage between DNA methylationand biological consequences can be broughtinto sharper focus.Skepticism concerning the safety and efficacyof pharmacological interventions in theDNA and chromatin remodeling machineryhas receded with the approval of several drugsin hematological cancers: HDAC inhibitorsIstodax (romidepsin) and Zolinza (vorinostat),and DNMT inhibitors Vidaza (azacitidine)and Dacogen (decitabine). Indeed, bigpharma is investing heavily in the area, withsuch deals as the $200 million agreementin March between Cambridge, UK–basedCellCentric and Takeda Pharmaceutical,of Tokyo. The same month, London-basedGlaxoSmithKline inked a $644 million epigeneticspact with Cellzome, of Cambridge,UK. Both deals grew out of existing relationshipswith nonepigenetic concerns, and saylittle about whether Constellation can proveitself to suitors as well, but a bubbling epigeneticspot has led would-be partners todiscuss potential arrangements, accordingto Goldsmith.Meanwhile, as the company holds fast tothree programs of special focus, Constellationis casting a wide net to consider target classesbeyond those validated so far. Goldsmithwants to leave no would-be opportunities onthe table, he says, but the firm’s determinationto mine varied classes of enzymes for theirpossibilities could become a rate-limitingfactor. Indeed, the nascent biology surroundingmany of these targets could make a slowdowninevitable, in the view of Jean-PierreIssa, co-director of cancer epigenetics atM.D. Anderson Cancer Center, in Houston.Despite acknowledging the considerableLeft to right: Constellation’s CEO Mark Goldsmith,and founders Danny Reinberg, Professor ofBiochemistry at NYU School of Medicine, andYang Shi, Professor of Pathology at HarvardMedical Schoolscientific expertise at Constellation, he suspectsthe broad approach will mean onlyplodding progress. Issa likens the needle-inthe-haystackapproach to that taken by companiesthat first began investigating tyrosinekinases, and predicts the road for epigeneticscould be similarly fraught with failure. Asan example of one target that almost everycompany is pursuing, Issa points to histonelysineN-methyltransferase EZH2, an enzymethat in humans is encoded by the EZH2 gene.This histone-modifying enzyme belongs tothe polycomb group family, and three paperspublished in the past year in Nature Genetics(42, 181–185; 665–667; 722–726, 2010) havesuggested that EZH2 could act as a tumorsuppressor. Constellation would not confirmany work on EZH2, but says the targetis interesting.Stuart Hwang, director of business developmentat SuperGen of Dublin, California,says Constellation’s plan to use approachesother than the more popular HDAC andDNMT inhibitors is logical because drugstargeting HDACs do not seem to workagainst solid tumors, and oral versions bringtoxicity, whereas DNMT blockers displayonly a short half-life, which makes themunsuitable for solid tumors as well. But histonemethyltransferases outside of the twomain classes come in many flavors, and it’san open question whether their pharmacologicalinhibition will prove successful.Hwang doesn’t think so, mainly because ofthe problem that has beset EZH2 work: turningon a gene or genes can mean shuttingdown an equal number of them. Biologicalbenefits are starting to emerge, Hwang says,but outside the two known categories ofepigenetic drugs, clinical proof of efficacycould yet lie far off.Randy Osborne Atlanta, GeorgiaSeacia Pavaonature biotechnology volume 28 number 10 OCTOBER 2010 997


© 2010 Nature America, Inc. All rights reserved.data pageDrug pipeline: Q310Wayne PengThe number of small-molecule approvals declined more sharplythan that of biologics over the past decade. However, new targets,such as atrium-specific K + channel, phosphodiesterase-4 and renalNa + -glucose co-transporter, continue to open up new opportunities.Such novel targets are not without risk, as Eli Lilly found thisFDA approvals by drug molecule typeFewer small molecules are being approved than before.Number of FDA approvals6050403020100Small moleculePeptideProtein2317121 31142SteroidPolyclonal antibodyAntisense nucleic acidCarbohydrateMonoclonal antibodyCells/bacteria/viruses 117 5411995199619971998199920002001200220032004200520062007200820091/1–9/3 2010461 29 13Source: US Food and Drug Administration and BioMedTracker, a service of Sagient Research(http://biomedtracker.com/). Includes vaccines approved by the FDA.Notable regulatory approvals (June–September 2010)Company/drug name Indication Approvals Drug descriptionGenentech-Roche/Lucentis(ranibizumab)Retinal venousocclusionFDA, 6/22/10(sBLA)fragmentForest Lab/Daxas(roflumilast)Shire/Vpriv(velaglucerase alfa)Cardiome Pharma/Kynapid(vernakalant)Savient/Krystexxa(pegloticase)Humanized anti-VEGFmonoclonal antibody FabChronic obstructivepulmonary diseaseEMA, 7/6/10 The first selective phosphodiesterase-4inhibitorGaucher’s disease EMA, 8/26/10; Gene-activated humanFDA, 2/26/10 glucocerebrosidaseAtrial fibrillation EMA, 9/1/10 Small-molecule blocker foratrium-specific potassiumchannel Kv1.5Gout FDA, 9/14/10 PEG-conjugated recombinanthuman uricaseSource: FDA and EMA. FDA, US Food and Drug Administration. EMA, European Medicines Agency.sBLA, supplemental Biologic License Application. VEGF, vascular endothelial growth factor.Notable development setbacks (June-September 2010)Company/drug Indication Setback summarynameMedImmune-AstraZeneca/Respiratorysyncytial virusOn 6/2/10, an FDA panel voted against approval. On8/30/10, the FDA issued Complete Response LetterNumax(motavizumab)(RSV) infection requesting additional trials to support the risk-benefitprofile. Motavizumab is a humanized monoclonalantibody against the fusion (F) protein of RSV.Merck/Peg-Intron(peg-interferonalpha-2b)Human GenomeSciences(mapatumumab)Eli Lilly/Semagacestat(LY450139)Roche(taspoglutide)MelanomaMultiplemyelomaAlzheimer’sdiseasePhase 3 trial did not meet either primary or secondaryendpoints; treatment is no better than conventionallow-dose interferon treatment. (American Society ofClinical Oncology Annual Meeting, 6/05/10, AbstractLBA8506)Phase 2 study showed that treatment did not significantlyimprove disease response or progressionfreesurvival. (Company press release, 06/09/10)Mapatumumab is a human monoclonal antibodyagonist to TRAIL receptor-1.Company discontinued development of the gammasecretaseinhibitor after interim analysis of phase 3 trialdata showed that treatment resulted in worse outcomesthan placebo. (Company press release, 8/17/10)Type 2 diabetes On 9/10/10, company announced suspension ofphase 3 trials for the long-acting glucagon-likepeptide-1 analog because serious side effects ledtoo many patients to drop out of the trial.Source: BioMedTracker, a service of Sagient Research (http://www.biomedtracker.com/)quarter when its gamma-secretase inhibitor failed to meet itsendpoints in Alzheimer’s. Meanwhile, MannKind’s inhaled insulin,Afrezza, demonstrated both efficacy and safety in a key trial.Approvals are also expected for Benlysta (belimumab), ipilimumaband Bydureon (exenatide LAR).Notable trial results (June–September 2010)Company/drug Indication Result summarynameBristol-MyersSquibb andAstraZeneca/dapagliflozinType 2diabetesPhase 3 study met primary and secondary endpointsafter 24 weeks of treatment with this smallmoleculeinhibitor of the renal sodium glucoseco-transporter 2 (SGLT-2). (Diabetes Care, doi:10.2337/dc10-0612)Morphoteck-Eisai/farletuzumabJerini-Shire/Firazyr(icatibant)ThromboGenetics/ocriplasmin(recombinantmicroplasmin)Ovarian cancer Phase 2 study met primary endpoints and demonstratedbenefits of this humanized monoclonalantibody against folate receptor alpha comparedwith conventional carboplatin+taxane treatment.(American Society of Clinical Oncology AnnualMeeting, 7/07/10, Abstract 5001)HereditaryangioedemaVitreomacularadhesionPhase 3 study showed significant benefit of thisselective peptide antagonist of bradykinin B2receptor. (N. Engl. J. Med. 363, 532–541)Phase 2 study showed significantly increased nonsurgicalresolution of vitreomacular adhesion byintravitreal injection of the recombinant human protein(Retina 30, 1122–1127). Preliminary phase 3data also met primary endpoint. (American Societyof Retina Specialists Annual Meeting, 8/31/10)Source: BioMedTracker, a service of Sagient Research (http://biomedtracker.com/)Notable upcoming approvals (Q4 2010)Company/ Indication Approval decisiondrug nameAmylin Type 2 10/22/10 PDUFA date. Phase 3 trial met primary andPharmaceuticals/ diabetesBydureon(exenatide LAR)secondary endpoints and showed significant superiorityover comparators (American Diabetes AssociationAnnual Meeting, 6/25–29/2010). This controlledrelease form of Byetta (exenatide, a 39-amino-acidpeptide agonist of glucagon-like peptide-1, GLP-1)uses the Medisorb technology (microspheres made ofpolylactide co-glycolide polymer).Human GenomeSciences/Benlysta(belimumab)Medarex-Bristol-Myers Squibb(ipilimumab)MannKind/Afrezza(inhaled insulin,dry powder)Systemic lupuserythematosusMetastaticmelanomaDiabetes,types 1 and 2LG Life Sciences/ GrowthLB03002 hormone(SR-rHGH) deficiency12/09/10 PDUFA date, priority review. MMA approvalexpected in H2 2011. Two phase 3 trials showedsignificant improvement in patient response after52 weeks of treatment of this human monoclonalantibody against B-lymphocyte stimulator (BLyS).(European League Against Rheumatism AnnualCongress, 6/17/10)12/25/10 PDUFA date. Priority review granted on8/18/10. Ipilimumab is a fully human antibody againstcytotoxic T-lymphocyte antigen-4 (CTLA-4). Phase 3study met primary and secondary endpoints (N. Engl.J. Med. 363, 711–723, 2010)12/29/10 PDUFA date. In phase 3 trial, inhaled insulinwas statistically noninferior to injected insulin. Over 52weeks, there was no difference in pulmonary functionbetween groups. Inhaled insulin was as effective andwell tolerated. (The Lancet 375, 2244–2253, 2010)09/10/10 - 01/03/11 PDUFA date range. MMAapproval expected in H2 2010. LB03002 is thesustained release form of recombinant human growthhormone and requires once-weekly injection versuscurrent daily treatment. Phase 3 study results showedsignificant superiority over placebo in adult patientsafter 26 weeks of treatment. (Endocrine SocietyAnnual Meeting, 6/11/09, Abstract P2-746)Other expected approvals in Q4 include Theratechnologies’ Egrifta (tesamorelin) and Novartis’Gilenia (fingolimod). See Nat. Biotechnol. 28, 640, 2010, for details.Source: BioMedTracker, a service of Sagient Research (http://biomedtracker.com/). PDUFA,Prescription Drug User Fee Act. MAA, market authorization application. LAR, long-acting release.SR, sustained release.Wayne Peng, Emerging Technology Analyst, Nature Publishing Group998 volume 28 number 10 October 2010 nature biotechnology


news feature© 2010 Nature America, Inc. All rights reserved.Turning the tide in lung cancerResearchers are testing a slew of targeted therapeutic strategies inlung cancer. Signs are emerging that these therapies are gainingincreasing traction in what has long been one of oncology’sminefields. Malorye Allison investigates.In June, New York–based Pfizer’s targeted cancerdrug crizotinib’s “unprecedented” early trialresults brought a sliver of hope to lung cancerresearch 1 . As good as it was—almost 90% ofparticipants had some measure of disease control—thatnews was also a reminder that lungcancer is one of the toughest cancers to beat.Only 3–5% of lung cancer patients have theALK (anaplastic lymphoma receptor tyrosinekinase) gene rearrangement that crizotinibtargets. Add to that the number of patientswho respond to the already approved epidermalgrowth factor receptor (EGFR) inhibitorsand that adds up to just 14–20% of patientsof European descent who are likely to benefitfrom targeted therapy. (Response rates mightbe higher among Asians, who have a higherincidence of EGFR mutations, but that stillleaves a large population of patients for whomno targeted therapy is yet available.)On top of that, each bit of good news isinvariably accompanied by a stream of latestagedrug failures. This trend continuesdespite hundreds of trials with dozens of newagents. As a recent editorial in the Lancetlamented, “It is quite disheartening to see thata number of clinical trials with new [lung cancer]drugs have failed to meet even the mostmodest endpoints” 2 .Undeterred, biotechs and pharmas alike aregoing full throttle after new lung cancer drugs.Approximately 100 new compounds are beingtested in an estimated 650 lung cancer trialsworldwide. More importantly, lung cancerhas become a hotbed of research innovation.Having hit the proverbial wall with traditionalapproaches, experts in this field are pioneeringbold new adaptive trial designs, noveltypes of diagnostic and prognostic tests andbreakthrough tools for early detection. Therewards for success, after all, should be considerable:unmet need remains so high, analystsat Waltham-based Decision Resources areprojecting the lung cancer market will doubleto over $684 million by 2017.An unmet need with scant solutionsThe most common malignancy in the world,lung cancer affects >200,000 people per year inthe US alone and kills more people worldwidethan any other cancer. One form of the disease,non-small cell lung cancer (NSCLC), causes~85% of deaths.Scientists blame the lung cancer drugdrought on various factors. First, there wasless money, relative to other cancers, earlyon. “Lung cancer research has been severelyunderfunded because it’s regarded as a smokers’disease,” says Alice Shaw, an assistant professorand attending physician of the ThoracicCancer Program at Massachusetts GeneralHospital in Boston.Lung tumors are also typically discoveredonly late in the course of the disease becauseit’s difficult to detect them until they are largeand symptomatic. About half the patients withadvanced disease die within a year, even withtreatment.Finally, it has become increasingly clearthat NSCLC is highly heterogeneous, but themove to biomarker-based studies in lung cancerhas been slow. A recent review of clinicaltrails in NSCLC cancer found that only ~7%of trials in this disease have used biomarkersCellmembranePKCIP3Ca 2+CAMKNuclearmembraneDAGCREBPIP2Ubiquitinationand receptorendocytosisAKTSurvivalMetabolismProteinsynthesisPLCγPCBLPABL1 CRKPPI3KRACp85PNckPTranscriptional activationleads to proliferationand transformationEGFMigrationSTATfor patient selection: that’s just 34 out of 493trials listed on ClinicalTrials.gov 3 .The lack of biomarkers has been a majorstumbling block. Iressa (gefitinib) fromAstraZeneca of London was the first targetedtherapy approved for lung cancer. It inhibits thetyrosine kinase domain of EGFR (Fig. 1). Thedrug garnered accelerated approval in Japanin 2002, then in the US in 2003. The overallresponse of ~11% from a phase 2 study was justenough to let the drug squeak past US regulators.But then phase 3 data suggested that thoseearlier numbers painted a rosier picture aboutthe drug than deserved, and in 2005 the drugwas relabeled in the US and restricted for usein only those patients who had already startedtreatment with the drug.Because there were tantalizing leads implyingsubpopulation effects (within nonsmokers,women and Asians), a biomarker might havereturned the drug to the market. But unlikeGenentech in S. San Francisco, California (nowpart of Roche of Basel), which had a biomarkerin its pocket when Herceptin (trastuzumab)proved ineffective in the general breast cancerpopulation, AstraZeneca had no such option. Ittook until 2008 to determine that EGFR inhibitorsare most effective in lung cancer patientswith EGFR mutations.During that interval, OSI Pharmaceuticals ofMelville, New York, and its partner Genentech,launched the EGFR inhibitor Tarceva (erlotinib).In 2004, Tarceva was approved in theUS as a second-line therapy for individualswho didn’t respond to a chemotherapy regimen;SOSGRB2SHCPSRCPSTATPPEGFRMEF2FAKMYCRasMEKK2 andMEKK5MEK5ERK5AP1BRAFRAF1ETSMEKERKKSR1EGR1Figure 1 Target practice. EGFR signaling pathway offers a plethora of potential drug targets. (Reprintedwith permission from Nat. Rev. Cancer 10, 618–629, 2010)nature biotechnology volume 28 number 10 october 2010 999


NEWS feature© 2010 Nature America, Inc. All rights reserved.since that time, the drug has claimed most ofthe US market. In April of this year, it was alsoapproved as a maintenance therapy (Box 1).Already, many US oncologists are testingfor EGFR mutations up front and prescribingTarceva, sidestepping chemotherapy,which represents a sea change from the typicalstandard of care. “Until recently, everyonewith metastatic lung cancer got chemo,” Shawsays. Among those whose tumors carry EGFRmutations, ~70% respond, getting anywherefrom several months to just over an extra yearof life. But the response is variable. “I have onepatient who has taken erlotinib for seven yearsbefore needing something different,” says RoyHerbst, chief, thoracic medical oncology, M.D.Anderson Cancer Center in Houston.Iressa has not entirely lost the battle,however. In April, the European MedicinesAgency approved the drug for use in lungcancer patients with EGFR mutations.AstraZeneca now reports that the drug isapproved in 36 countries 4 . Several othertyrosine kinase inhibitors are also makingtheir way through the clinic (Table 1).Fine-tuning targeted treatmentThe struggle with Iressa’s approval may havealso taught companies a valuable lesson. Manyare now looking for biomarkers and acceptinga smaller initial market share in exchange for aspeedier development pathway.Crizotinib illustrates this approach. A dualMET (mesenchymal-epithelial transition factor)/ALK-fusioninhibitor, the drug is beingtested mainly in lung cancer patients whosetumors harbor a fusion of ALK and EML4(echinoderm microtubule associated proteinlike4), resulting from a translocation. At thisJune’s annual American Society of ClinicalBox 1 Moving into maintenance therapyOncology (ASCO) meeting held in Chicago,researchers reported seeing a dramatic anddurable response (some lasting 15 months) in57% of patients. Another 20–30% of patientsresponded less well but still benefited from thedrug. Although only about 3–5% of patients inthe US carry this fusion, it’s found in a higherpercentage of people in Asia.Crizotinib’s development has been rapid fire,in part due to some luck. “This drug was in clinicaltrials when we first learned about [EML4]-ALK fusions in August 2007,” says Shaw, whois the principal investigator on the phase 2 and3 trials now underway. The drug was slated fortesting in other cancers, but the researchersquickly launched a lung cancer trial. The firstpatients were identified in November 2007 andenrolled a month later. Skeptics point out thatresults from a larger trial may temper enthusiasmfor the drug, given that the data presentedat ASCO included only 82 patients.Thousand Oaks, California–based Amgenhas also been doing up-front biomarker workwith motesanib diphosphate (AMG706), anangiogenesis inhibitor that antagonizes vascularendothelial growth factor receptors(VEGFR)1, 2 and 3, platelet-derived growthfactor (PDGF) and c-Kit (stem cell growth factor)receptors 5 , which is being co-developedby Millennium of Cambridge, Massachusetts,and Takeda of Osaka, Japan. The drug competeswith Genentech’s Avastin (bevacizumab)and is being tested in combination with chemotherapy.“Many companies are working onantiangiogenic drugs and we need biomarkersto inform treatment decisions,” says DavidChang, vice president of global oncologydevelopment at Amgen.At this year’s ASCO meeting, data were presentedthat looked at five biomarkers in bloodAnother positive development in lung cancer has been the introduction of maintenancetherapy. Both Alimta (Lilly of Indianapolis’ anti-folate drug) and Tarceva have beenapproved for this type of use by the US Food and Drug Administration. But the way thismarket has shaped up sheds some light on evolving lung cancer market dynamics.Maintenance is given after an initial treatment round, and before a cancer returns, toprevent a recurrence. “There is very potent data that Alimta, given as a second treatment,can delay recurrences by months,” says Sloan-Kettering’s Kris. Because physiciansare becoming more accustomed to giving a targeted therapy up front, rather thanchemotherapy, “doctors are putting those two data points together, and saying “start it[Alimta] early and keep it going,” says Kris.Data from Waltham, Massachusetts–based oncology data firm IntrinsiQ support thistrend. “Maintenance in NSCLC was going to be slow to uptake anyway, because it was abig change,” explains Ed Kissel, vice president of quantitative analysis at the company.Previously, doctors would just start one treatment, then switch the patient if that startedto fail. Now, says Kissel, there is a push to maintenance therapy, but it’s benefitingAlimta a lot more than Tarceva. “People are using it [Alimta] more creatively than eventhe data suggested.”samples from patients enrolled in three phase2 trials of the Amgen drug, including a NSCLCstudy 6 . Across those studies, individuals whoresponded to the drug were more likely to haveelevated placental growth factor (P1GF) levelsafter treatment had begun. This kind of markeris only useful after a patient begins treatmentwith a drug. According to Scott Patterson,Amgen’s executive director of medical sciences,“Like everyone else, we are also looking forbaseline biomarkers predictive of response.”Digging up biomarkers has been arduouswork so far, and there’s much left to be done.“We need to find the driving mutation for everycancer and a drug for every mutation,” saysDaniel Haber, director of the MassachusettsGeneral Hospital Cancer Center. At this point,no one even knows how many subtypes ofNSCLC exist.The brass ring: early detectionBecause a key problem with lung cancer is thatthe tumors are found late, early diagnosis couldbe a game changer. The Canary Foundationof Palo Alto, California, and Victoria, BritishColumbia, Canada, is supporting work atStanford University in California that aims toaddress several hurdles simultaneously, usingtwo different approaches. “Ideally, you’d havea low-cost blood test as the first step followedby molecular imaging,” says Sanjiv (Sam)Gambhir, head of nuclear medicine, LucilePackard Children’s Hospital at Stanford anddirector, Canary Center at Stanford for CancerEarly Detection.Gambhir and his collaborators already havea candidate molecular imaging agent in hand.It binds to alpha V beta 3, an integrin expressedon new blood vessels and tumor cell surfaces 7 .Their investigational new drug application wasapproved to study this agent using positronemission tomography (PET) and the radiolabelfluorine 18. The agent is “highly specific” forthe integrin and with tomography it “gives youa molecular map of the lungs, showing smallnew lesions and whether they are likely to bemalignant or not,” Gambhir says.The group is also collaborating with researchersat the Seattle-based Fred Hutchinson CancerCenter using proteomics to look for signaturesof early lung cancer. Although a better imagingtool alone would be a step forward, Gambhirpoints out that given the expense of imaging,it will be best if it can be done selectively.But finding a good blood biomarker for earlydetection is extremely challenging. The limitsfor tumor detection are not yet even known, soGambhir and his colleagues have been creatingmathematical models relating blood biomarkerlevels to tumor burden so they can determinehow small a tumor they can detect 8 .1000 volume 28 number 10 october 2010 nature biotechnology


news feature© 2010 Nature America, Inc. All rights reserved.Novel therapeuticsWhereas the progress against tumors withEGFR and ALK mutations is encouraging,“for the bulk of lung cancer patients, theirdisease is more genetically complicated,”says Ira Mellman, vice president of researchoncology at Genentech. Smokers’ tumors inparticular, he says, “have many more mutationsand alterations.” Finding effective treatmentfor these tumors is likely to be muchmore difficult.Genentech is betting on radically differentapproaches in its next wave of cancertherapeutics—drug antibody conjugates.According to Mellman, the company has put alot of effort into choosing optimal antibodiesand making sure the toxin is linked securely tothe antibody. “The linker is critical in ensuringsafety. If the payload breaks off you canget terrific toxicity to the patient,” Mellmansays, pointing to the withdrawal of Wyeth/Pfizer’s Mylotarg (gemtuzumab ozogamicin;an anti-CD33 humanized monoclonal antibodylinked to the cytotoxic agent calicheamicin),which received accelerated approval, butwas withdrawn a decade later because mortalitywas higher in follow-up than in earliertrials. Other conjugates have used bacterialtoxins, which are highly immunogenic. If thepatient’s immune system makes an antibodyto the toxin, that can limit the conjugate’seffectiveness.Genentech’s conjugates use a new kind ofpayload—microtubule poisons—which willbe linked to antibodies targeting tumor cellsurface proteins. They are not revealing thesurface proteins being targeted. “The preclinicalstudies look great, but they always do,”Mellman says. Data from archived tumorsshow that 85% carry the targeted protein.The trials are being designed “to get a lot ofinformation out of them” Mellman says. Theresearchers will monitor levels of the biomarkersin patients and will also use radioactiveisotopes and immunoPET imaging studiesto show whether the drug is actually gettingwhere it is supposed to. Nine studies of drugconjugates in cancer will be starting over thenext year. One in NSCLC uses the anti-mitoticauristatin as its payload. A clinical study ofHerceptin conjugated with an anti-microtubulemaytansine-derivative, Herceptin-DM1,is also quite far along and has researchers veryexcited about conjugates as an approach.Ultimately, lung and other deadly tumorswill probably require treatment with multiple,targeted drugs. As Mellman says, “Cancers areprotean and resistance mutations spring upquickly.” Just as with HIV, he says, doctors willneed to bombard tumors with combinationtherapies to limit the cells’ options to mutate.“You have to think one step ahead of wherethe cancer cell is likely to go.”In keeping with that, researchers are alsolooking at mutations that confer resistance tothe targeted therapies they already have. TheEGFR mutation T790M, for example, confersresistance to tyrosine kinase inhibitors of thereceptor. “There is a worldwide developmentalpush to find a drug to target this defect,” saysMark Kris, chief, thoracic oncology service,Memorial Sloan-Kettering Cancer Center,New York.Better diagnosticsThe search for new diagnostics is broad andaggressive. Genes in the EGFR family andrelated pathways are one focal point in termsof cellular receptors. Overexpressing theEGFR family member ERBB3, for example,has been implicated in poor prognosis andsurvival among lung cancer patients and isemerging to be a key player, according toGunamani Sithanandam, a staff scientist atthe US National Cancer Institute in Bethesda,Maryland. ERBB3 activates phosphoinositide3 kinase (PI3K)/Akt (protein kinase B) signaling,which is involved in tumorigenesis. RASfamily gene products, including KRAS, actdownstream of ERBB3 and KRAS mutations(which 15–30% of patients have) and maketumors less responsive to EGFR inhibitors.KRAS mutations are the most common onesfound in smokers and are thought by someto be associated with unfavorable outcomes 9 .Gunamani thinks ERBB3 could also be animportant therapeutic target.“More and more, we are thinking oftreatment and prevention along molecularpathways,” says Nita Maihle, a professor inpharmacology and molecular medicine atYale University in New Haven, Connecticut.The challenge, Maihle says, is to really workout the biology. “We like to simplify things,but many of these biomarkers are extremelycomplex.” HER2 (also known as ERBB2) testingin breast cancer, she points out, has beenfraught with difficulty not just because thequality of testing is so variable, but becausethe very meaning of ‘HER2 positive’ is nowbeing challenged 10 .There is also a growing sense that mutations,although they make great targets andbiomarkers, are not going to explain all thevariations in response. “We need to do someof these tests in DNA, RNA and protein if weare focusing on advanced metastatic disease,”says Herbst.At M.D. Anderson, Herbst’s group isincluding genomic and proteomic studies inits landmark BATTLE (biomarker-integratedapproaches of targeted therapy for lung cancerelimination) trials, which employ a noveladaptive design. The trials are designed toevaluate 11 biomarkers related to four pathwaysagainst multiple lung cancer drugssimultaneously. All patients must have a lungbiopsy, which undergoes extensive molecularand histological analysis. Individuals are randomlyassigned to an experimental therapyand monitored for eight weeks. If subjectsrespond, they stay on the drug, if not, theyare shifted to a different experimental therapy.BATTLE-II, which is underway, will includecombinations of targeted therapies.“It’s a speed trial, where we reboot each timewe reassign a patient to a new treatment regimen,”Herbst says. They hope to quickly amassa range of biomarkers of response, but “expressionsignatures take a lot of work to validate,”he admits. They found, for example, that 61%of patients with a KRAS mutation who tookNexavar (sorafenib) had disease control ateight weeks, compared with 32% for the otherthree drugs. Tarceva did best against tumorswith EGFR mutations, Zactima (vandetanib;a 4-anilinoquinazoline derivative selectivefor VEGF receptor) for high VEGF receptor2 expression, and the combination of TarcevaTable 1 Selected trials of second-generation tyrosine kinase inhibitors in NSCLCDrug Company (location) TargetDevelopmentphaseZactima AstraZeneca EGFR and VEGFR2 Phase 3BIBW-2992 Boehringer Ingelheim,EGFR and HER2 Phase 3(Ridgefield, Connecticut)Pelitinib Wyeth (Madison, New Jersey) EGFR Phase 2Canertinib Pfizer EGFR and HER2 Phase 2Tykerb (lapatinib) GlaxoSmithKlineEGFR and HER2 Phase 2Votrient (pazopanib)(Brentford, UK)VEGFR1, 2, 3; PDGF and c-Kit Phase 2XL647Symphony EvolutionEGFR and VEGFR2 Phase 2(Rockville, Maryland)Crizotinib Pfizer MET/ALK fusion Phase 1/2XL184Exelixis(S. San Francisco, California)VEGFR2, MET and RET Phase 1nature biotechnology volume 28 number 10 october 2010 1001


NEWS feature© 2010 Nature America, Inc. All rights reserved.and Targretin (bexarotene; a retinoid X receptor(RXR)-selective antitumor retinoid) wasmost effective against tumors with cyclin D1defects or amplified numbers of EGFR.Biodesix, a molecular diagnostics companybased in Broomfield, Colorado, has what may bethe first clinically available mass spectrometry(MS)-based protein assay—the Veristrat test.The test, which was launched in spring 2009,requires only a blood sample and uses a proprietaryalgorithm to determine how likely it isthat a patient will respond to EGFR inhibitorstherapy. “We have made mass spec a clinical toolfor large protein biomarkers,” says CEO DavidBrunel. The samples are processed in Biodesix’sCLIA (Clinical Laboratory ImprovementAmendments)-approved laboratory. A keyadvantage of the test is that it doesn’t require atumor sample, which can be difficult to obtain.The Biodesix system looks at eight MSpeaks comprising protein and peptide fragments.Half of these are variants or isoformsof serum amyloid A, which appear to interactwith cancer-related pathways, including MAPK(mitogen-activated protein kinase). The otherhalf have been harder to name. The companywill likely have to identify the proteins in thetest to convince skeptics of its value.Amgen’s Patterson says that for now, themore common use of proteomics continuesto be for quantifying a few known proteins, asAmgen did in its motesanib/P1GF study, andnot for fishing for complex markers and/orsignatures. In the motesanib study, the companyused a traditional immunoassay-basedapproach. “A lot of people using this [massspec-based] approach ended up looking at thesame proteins and fragments in the end, and alot of the signature work has been disproven,”Patterson says.Gene expression has been beset by similarchallenges. If researchers are going to use eitherof these tools in lung cancer trials, accuracyand reproducibility need to be improved. Someof the steps that need to be taken are actuallysurprisingly simple.For example, researchers at the CambridgeResearch Institute in Cambridge, UK, recentlypublished a simple way of better preservinglung cancer biopsies 11 . Getting enough tissueto test for anything is often a challenge. “Onlyabout 30% of patients are suitable to have theirtumor cut out. The other 70% only have tinyfragments removed,” says Malcolm Lawson,one of the Cambridge researchers. RNA, inparticular, tends to degrade quickly.Lawson and his colleagues can increasethe levels of RNA preserved by 50–70% justby using a known RNA preservative beforefreezing the samples. As a result, more ofthe precious tissue from samples is availablelater. “Microarray technology is on the cuspof being more widely used,” says Lawson.“We really need to start thinking about thesepractical steps that can greatly improve ourresults.” The cost of the tools are now lessexpensive and there remains the urgent needfor standards.Circulating tumor cells are another promisingnew strategy for lung cancer diagnostics,but until now, it has been impractical to measurethem because they are rare compared toother blood components. “The standard commercialtechnology is not very sensitive and isharsh on the cells,” says Haber.His group is developing a microfluidicsbasedmicrochip (the circulating tumor cell(CTC) chip) that uses antibodies to epithelialcell adhesion molecule to snag the circulatingtumor cells. The core technology was developedby Mehmet Toner, a professor of surgeryat the Massachusetts General Hospital. Thelatest version uses a series of V-shaped indentationsto create currents that bring morecells into contact with the walls of the device.Haber hopes that within a year or two, “we’llbe monitoring the genetic status of tumorsusing CTC [chip]s, in real time.”S. San Francisco–based Nodality is takinga similar approach. Developed in the laboratoryof geneticist Gary Nolan of StanfordUniversity in Stanford, California, theirtechnology—single-cell network profiling—is based on flow cytometry. Antibodies areused to measure phosphoproteins in signalingpathways.Nolan developed a means of permeabilizingthe cell membrane so that antibodies couldenter cells and measure internal pathways.This approach “can characterize not just thesurface of cells, but also the activation levelsof pathways related to response and resistancecell by cell,” says CEO David Parkinson. Thetechnology, and related algorithms, allowsthe analysis of hundreds of thousands ofindividual cells, resting and stimulated. “Atits core, personalized medicine is the individualcharacterization of complex biology,”says Parkinson. It may be that some subsetsof cancer patients will not be identified until adeeper level of analysis is applied to them.Nodality has already used the technology tocharacterize responsiveness of acute myeloblasticleukemia to standard induction therapy,which works in about 60% of patients 12 .The researchers are finishing the validationof that test. They have also characterized subtypesof acute myeloid leukemia based on survival,DNA damage and apoptosis pathways 13 .Circulating tumor cells from lung cancerpatients are one of their next targets.Many of these new approaches are severalyears, or more, away from realization. Still,there’s finally some optimism filtering into theconversation about lung cancer. “Ten yearsago lung cancer was the least studied malignancyand the one with the fewest options.Now, because of genetics, we’re doing betterwith it than some other cancers,” says Haber.Kris concurs. “Tens of thousands of peoplewith lung cancer are now having their livesmade better, and we are leading the way withmutation testing,” he says.As more pathways are mapped out, newtools for studying lung cancer will be developed,and these tools will undoubtedly helpresearchers in other fields of oncology as well.Although there is optimism, the way forwardstill is not clear and a lot of hope is pinned onnew technologies and novel trial designs, likeBATTLE’s. For these efforts to be successful,everyone, including big pharma and regulators,will need to get truly on board.Malorye Allison, Acton, Massachusetts1. Chustecka, Z. Medscape Today, 7 June 2010 2. Stinchcombe, T.E. & Govindan, R. Lancet Oncol. 11,604–605 (2010).3. Subramanian, J. et al. J. Thorac. Oncol. 5, 1116–1119(2010).4. AstraZeneca. IRESSA Label Change Press Release(AstraZeneca, London) (17 June 2005).5. Polverino, A. et al. Cancer Res. 66, 8715–8721(2006).6. Bass, M.B. et al. J. Clin. Oncol. 28, 15s, Suppl; abstr3037 (2010).7. Zhang, X. et al. J. Nucl. Med. 47, 113–121 (2006).8. Gambhir, S.S. et al. PLoS Med. 5, 1287–1297(2008).9. Pao, W. et al. PLoS Med. 2, e73 (2005).10. Allison, M. Nat. Biotechnol. 28, 383–384 (2010).11. Lawson, M.H. et al. J. Thorac. Oncol. 5, 956–963(2010).12. Kornblau, S.M. et al. Clin. Cancer Res. 16, 3721–3733(2010).13. Rosen, D.B. et al. PLoS ONE 5, e12405 (2010).1002 volume 28 number 10 october 2010 nature biotechnology

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!