12.07.2015 Views

Drug marketing and the new media

Drug marketing and the new media

Drug marketing and the new media

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

volume 28 number 5 MAY 2010F o c u s o n : <strong>the</strong> predictive safety testing consortiumeditorial431 Biomarkers on a roll© 2010 Nature America, Inc. All rights reserved.Kidney glomeruli. The progress of <strong>the</strong>Nephrotoxicity Working Group of <strong>the</strong>Predictive Safety Testing Consortiumtowards validating markers of kidneydamage is presented on p 430.Artwork by Lewis Long.foreword432 Research at <strong>the</strong> interface of industry, academia <strong>and</strong> regulatory scienceWilliam B Mattes, Elizabeth Gribble Walker, Eric Abadie, Frank D Sistare,Jacky Vonderscher, Janet Woodcock & Raymond L Woosleyopinion <strong>and</strong> commentCOMMENTARY436 Next-generation biomarkers for detecting kidney toxicityJoseph V Bonventre, Vishal S Vaidya, Robert Schmouder, Peter Feig & Frank Dieterle441 Evolution of biomarker qualification at <strong>the</strong> health authoritiesFederico Goodsaid & Marisa PapalucaNEWS AND VIEWS444 A roadmap for biomarker qualificationDavid G Warnock & Carl C Peckresearchperspective446 Towards consensus practices to qualify safety biomarkers for use in early drugdevelopmentF D Sistare, F Dieterle, S Troth, D J Holder, D Gerhold, D Andrews-Cleavenger, W Baer,G Betton, D Bounous, K Carl, N Collins, P Goering, F Goodsaid, Y-Z Gu, V Guilpin,E Harpur, A Hassan, D Jacobson-Kram, P Kasper, D Laurie, B Silva Lima,R Maciulaitis, W Mattes, G Maurer, L Ann Obert, J Ozer, M Papaluca-Amati,J A Phillips, M Pinches, M J Schipper, K L Thompson, S Vamvakas, J-M Vidal,J Vonderscher, E Walker, C Webb & Y Yu455 Renal biomarker qualification submission: a dialog between <strong>the</strong> FDA-EMEA <strong>and</strong>Predictive Safety Testing ConsortiumF Dieterle, F D Sistare, F Goodsaid, M Papaluca, J S Ozer, C P Webb, W Baer,A Senagore, M J Schipper, J Vonderscher, S Sultana, D L Gerhold, J A Phillips,G Maurer, K Carl, D Laurie, E Harpur, M Sonee, D Ennulat, D Holder, D Andrews-Cleavenger, Y-Z Gu, K L Thompson, P L Goering, J-M Vidal, E Abadie, R Maciulaitis,D Jacobson-Kram, A F Defelice, E A Hausner, M Blank, A Thompson, P Harlow,D Throckmorton, S Xiao, N Xu, W Taylor, S Vamvakas, B Flamion, B Silva Lima,P Kasper, M Pasanen, K Prasad, S Troth, D Bounous, D Robinson-Gravatt, G Betton,M A Davis, J Akunda, J Eric McDuffie, L Suter-Dick, L Obert, M Guffroy, M Pinches,S Jayadev, E A Blomme, S A Beushausen, Valérie G Barlow, N Collins, J Waring,D Honor, S Snook, J Lee, P Rossi, E Walker & W MattesNature Biotechnology (ISSN 1087-0156) is published monthly by Nature Publishing Group, a trading name of Nature America Inc. located at 75 Varick Street,Fl 9, New York, NY 10013-1917. Periodicals postage paid at New York, NY <strong>and</strong> additional mailing post offices. Editorial Office: 75 Varick Street, Fl 9, New York,NY 10013-1917. Tel: (212) 726 9335, Fax: (212) 696 9753. Annual subscription rates: USA/Canada: US$250 (personal), US$3,520 (institution), US$4,050(corporate institution). Canada add 5% GST #104911595RT001; Euro-zone: €202 (personal), €2,795 (institution), €3,488 (corporate institution); Rest ofworld (excluding China, Japan, Korea): £130 (personal), £1,806 (institution), £2,250 (corporate institution); Japan: Contact NPG Nature Asia-Pacific, ChiyodaBuilding, 2-37 Ichigayatamachi, Shinjuku-ku, Tokyo 162-0843. Tel: 81 (03) 3267 8751, Fax: 81 (03) 3267 8746. POSTMASTER: Send address changes toNature Biotechnology, Subscriptions Department, 342 Broadway, PMB 301, New York, NY 10013-3910. Authorization to photocopy material for internal orpersonal use, or internal or personal use of specific clients, is granted by Nature Publishing Group to libraries <strong>and</strong> o<strong>the</strong>rs registered with <strong>the</strong> Copyright ClearanceCenter (CCC) Transactional Reporting Service, provided <strong>the</strong> relevant copyright fee is paid direct to CCC, 222 Rosewood Drive, Danvers, MA 01923, USA.Identification code for Nature Biotechnology: 1087-0156/04. Back issues: US$45, Canada add 7% for GST. CPC PUB AGREEMENT #40032744. Printed byPublishers Press, Inc., Lebanon Junction, KY, USA. Copyright © 2010 Nature Publishing Group. Printed in USA.i


volume 28 number 5 MAY 2010© 2010 Nature America, Inc. All rights reserved.Detecting transcript isoforms withRNA-Seq, p 511Threadbare methylome of <strong>the</strong>silkworm, p 516feature407 South-South entrepreneurial collaboration in health biotechHalla Thorsteinsdóttir, Christina C Melon, Monali Ray, Sharon Chakkalackal,Michelle Li, Jan E Cooper, Jennifer Chadder, Tirso W Saenz,Maria Carlota de Souza Paula, Wen Ke, Lexuan Li, Magdy A Madkour, Sahar Aly,Nefertiti El-Nikhely, Sachin Chaturvedi, Victor Konde, Abdallah S Daar & Peter A Singerpatents417 Open biotechnology: licenses neededYann Joly420 Recent patent applications in fluorescent imagingNEWS AND VIEWS421 Advancing RNA-Seq analysisBrian J Haas & Michael C Zody see also p 503 <strong>and</strong> p 511423 Haploidy with histonesGregory P Copenhaver & Daphne Preuss424 High-content imagingArnold Hayer & Tobias Meyer426 Third-generation sequencing fireworks at Marco Isl<strong>and</strong>David J Munroe & Timothy J R Harris429 Research highlightscomputational biologyanalysis495 GREAT improves functional interpretation of cis-regulatory regionsCory Y McLean, Dave Bristor, Michael Hiller, Shoa L Clarke, Bruce T Schaar,Craig B Lowe, Aaron M Wenger & Gill BejeranoresearchARTICLES503 Ab initio reconstruction of cell type–specific transcriptomes in mouse reveals <strong>the</strong>conserved multi-exonic structure of lincRNAsM Guttman, M Garber, J Z Levin, J Donaghey, J Robinson, X Adiconis, L Fan,M J Koziol, A Gnirke, C Nusbaum, J L Rinn, E S L<strong>and</strong>er & A Regev see also p 421letters511 Transcript assembly <strong>and</strong> quantification by RNA-Seq reveals unannotatedtranscripts <strong>and</strong> isoform switching during cell differentiationC Trapnell, B A Williams, G Pertea, A Mortazavi, G Kwan, M J van Baren,S L Salzberg, B J Wold & L Pachter see also p 421516 Single base–resolution methylome of <strong>the</strong> silkworm reveals a sparse epigenomic mapH Xiang, J Zhu, Q Chen, F Dai, X Li, M Li, H Zhang, G Zhang, D Li, Y Dong, L Zhao,Y Lin, D Cheng, J Yu, J Sun, X Zhou, K Ma, Y He, Y Zhao, S Guo, M Ye, G Guo, Y Li,R Li, X Zhang, L Ma, K Kristiansen, Q Guo, J Jiang, S Beck, Q Xia, W Wang & J Wang521 Dynamic single-cell imaging of direct reprogramming reveals an early specifyingeventZ D Smith, I Nachman, A Regev & A MeissnerReprogramming under <strong>the</strong> microscope,p 521careers <strong>and</strong> recruitment527 First quarter resurgence in biotech job postingsMichael Francisco528 peoplenature biotechnologyv


in this issue© 2010 Nature America, Inc. All rights reserved.The Predictive Safety Testing ConsortiumEvery year, <strong>the</strong> drug industry loses countless lead c<strong>and</strong>idates to drug-induced organ toxicity, most commonlyduring preclinical evaluation but sometimes in clinical trials <strong>and</strong> beyond. Earlier <strong>and</strong> more reliable detectionof drug-induced toxicity in <strong>the</strong> drug development pipeline would enable drug makers to make more informeddecisions about which c<strong>and</strong>idates to move forward in testing, <strong>the</strong> doses at which <strong>the</strong>se should be used <strong>and</strong> howbest to design clinical trials.Given <strong>the</strong> lack of progress in this area, <strong>the</strong> Critical Path Initiative has set out to establish <strong>the</strong> PredictiveSafety Testing Consortium (PSTC), which has <strong>the</strong> goal of qualifying <strong>the</strong> use of previously described biomarkersfor detecting organ toxicity in specific contexts [Foreword, p. 432]. This type of close collaborative partnershipbetween <strong>the</strong> public <strong>and</strong> private sectors has proven essential to share <strong>the</strong> expertise <strong>and</strong> costs, as well as ensuringbroad acceptance of <strong>the</strong> outcomes of such efforts by industry <strong>and</strong> regulatory bodies alike [Commentary, p. 441].This focus describes <strong>the</strong> progress of <strong>the</strong> first results of <strong>the</strong> PSTC carried out by its Nephrotoxicity WorkingGroup, which had <strong>the</strong> aim of identifying a process by which kidney safety biomarkers could be qualified foruse in regulatory decision making in preclinical settings <strong>and</strong> proposing how <strong>the</strong>se might be qualified for use inclinical trials.The kidney is a common site of drug-induced organ damage. Increased levels of serum creatinine (SCr) <strong>and</strong> blood urea nitrogen (BUN), <strong>the</strong>two major biomarkers in current practice for detection of nephrotoxicity, only become apparent after considerable kidney damage is evident<strong>and</strong> cannot pinpoint specific regions of <strong>the</strong> nephron that are affected. Numerous alternatives to SCr <strong>and</strong> BUN have been proposed but until<strong>the</strong> PSTC, none was approved by <strong>the</strong> drug regulatory authorities [Commentary, p. 436].By formulating a set of st<strong>and</strong>ard procedures <strong>and</strong> analyses to systematically screen <strong>the</strong> sensitivity <strong>and</strong> specificity of 23 c<strong>and</strong>idate biomarkersin detecting kidney damage in particular contexts [Perspective, p. 446], <strong>the</strong> PSTC enabled scientists from both industry <strong>and</strong> academia towork within a common framework to benchmark <strong>the</strong> capacities of <strong>the</strong> seven most promising biomarkers against histopathology in rat modelsof drug-induced nephrotoxicity. The findings submitted to <strong>the</strong> European Medicines Agency (EMEA; now EMA) <strong>and</strong> US Food <strong>and</strong> <strong>Drug</strong>Administration (FDA) health authorities for particular ‘fit for use’ claims for each biomarker are presented in three research articles [Articles,p. 463, 470, 478]. Three urinary biomarkers—total protein, β2-microglobulin <strong>and</strong> cystatin C—outperform SCr <strong>and</strong> BUN in detecting <strong>and</strong>monitoring drug-induced glomerular injury, whereas four biomarkers—kidney injury molecule-1, albumin, clusterin <strong>and</strong> trefoil factor-3—couldei<strong>the</strong>r outperform or add value to levels of SCR <strong>and</strong> BUN in detecting <strong>and</strong> monitoring drug-induced tubular damage [News & Views, p. 444].The FDA <strong>and</strong> EMEA approved use of <strong>the</strong> seven biomarkers for providing additional evidence to that offered by SCr, BUN <strong>and</strong> histopathologyin rat studies <strong>and</strong> recommended use of <strong>the</strong> biomarkers in clinical trails on a case-by-case basis. These outcomes of <strong>the</strong> submission processare discussed in <strong>the</strong> context of <strong>the</strong> implications of <strong>the</strong> markers for qualification <strong>and</strong> approval processes used for applications o<strong>the</strong>r than <strong>the</strong>detection of kidney damage [Perspective, p. 455]. Much remains to be done to generate <strong>the</strong> data needed to exp<strong>and</strong> <strong>the</strong> qualification of <strong>the</strong>biomarkers for general clinical use. In particular, no data were presented demonstrating <strong>the</strong> use of urinary biomarkers to monitor recovery fromdrug-induced nephrotoxicity <strong>and</strong> <strong>the</strong>re were no data for a blood-based biomarker that reflects general kidney function. A fourth research paper[Articles, p. 486] addresses both of <strong>the</strong>se concerns <strong>and</strong> shows that a panel of biomarkers enables evaluation not only of renal toxicity, but alsorecovery from damage <strong>and</strong> general renal function.PH & AMSpliced transcripts from RNA-SeqRNA-Seq enables a comprehensive survey ofcellular RNA, but until now it has not beenpossible to elucidate <strong>the</strong> full-length, splicedstructures of transcripts, especially if <strong>the</strong>yoriginate from unannotated intergenicregions. Guttman et al. <strong>and</strong> Trapnell et al.devise algorithms for reconstructing transcriptsfrom paired-end short sequencingreads of cDNA. The approach of Trapnell et al. is also able to quantify<strong>the</strong> abundance of each isoform. Unlike most previous approaches, <strong>the</strong>two algorithms do not require prior gene annotation, which enablesWritten by Kathy Aschheim, Michael Francisco, Peter Hare, Craig Mak,Andrew Marshall & Lisa Melton<strong>the</strong> researchers to discover <strong>new</strong> isoforms of existing genes as well asunannotated antisense transcripts <strong>and</strong> o<strong>the</strong>r large noncoding RNAs.Both approaches begin by performing gapped alignment of pairedendreads to <strong>the</strong> genome, which provides direct evidence for splicejunctions. Guttman et al. <strong>the</strong>n build a graph of potentially cotranscribednucleotides <strong>and</strong> scan paths in <strong>the</strong> graph for instances in whichsignificantly more reads than expected are found along a path. Thisstrategy in effect borrows information from adjacent mapped reads toimprove <strong>the</strong> statistical power of <strong>the</strong> method in small genomic regionsor those represented by few reads. Applying <strong>the</strong> method to an RNA-Seq data set from three mouse cell lines, <strong>the</strong> authors reconstruct <strong>the</strong>conserved multi-exonic structures of large intergenic noncoding RNAs(lincRNAs), which were not discernable by previous methods. Trapnellet al. take a different approach by identifying genomic regions with sufficientread depth (which represent transcripts) <strong>and</strong> <strong>the</strong>n assemblingtranscripts from graphs of ‘compatible’ reads that could have originatednature biotechnology volume 28 number 5 MAY 2010vii


© 2010 Nature America, Inc. All rights reserved.www.nature.com/naturebiotechnologyEDITORIAL OFFICEbiotech@us.nature.com75 Varick Street, Fl 9, New York, NY 10013-1917Tel: (212) 726 9200, Fax: (212) 696 9635Chief Editor: Andrew MarshallSenior Editors: Laura DeFrancesco (News & Features), Kathy Aschheim (Research),Peter Hare (Research), Michael Francisco (Resources <strong>and</strong> Special Projects)Business Editor: Brady HuggettAssociate Business Editor: Victor Be<strong>the</strong>ncourtNews Editor: Lisa MeltonAssociate Editors: Markus Elsner (Research), Craig Mak (Research)Editor-at-Large: John HodgsonContributing Editors: Mark Ratner, Chris ScottContributing Writer: Jeffrey L. FoxSenior Copy Editor: Teresa MooganManaging Production Editor: Ingrid McNamaraSenior Production Editor: Br<strong>and</strong>y CafarellaProduction Editor: Am<strong>and</strong>a CrawfordSenior Illustrator: Katie VicariIllustrator/Cover Design: Kimberly CaesarSenior Editorial Assistant: Ania LevinsonMANAGEMENT OFFICESNPG New York75 Varick Street, Fl 9, New York, NY 10013-1917Tel: (212) 726 9200, Fax: (212) 696 9006Publisher: Melanie BrazilExecutive Editor: Linda MillerChief Technology Officer: Howard RatnerHead of Nature Research & Reviews Marketing: Sara GirardCirculation Manager: Stacey NelsonProduction Coordinator: Diane TempranoHead of Web Services: Anthony BarreraSenior Web Production Editor: Laura GogginNPG LondonThe Macmillan Building, 4 Crinan Street, London N1 9XWTel: 44 207 833 4000, Fax: 44 207 843 4996Managing Director: Steven InchcoombePublishing Director: Peter CollinsEditor-in-Chief, Nature Publications: Philip CampbellMarketing Director: Della SarDirector of Web Publishing: Timo HannayNPG Nature Asia-PacificChiyoda Building, 2-37 Ichigayatamachi, Shinjuku-ku, Tokyo 162-0843Tel: 81 3 3267 8751, Fax: 81 3 3267 8746Publishing Director — Asia-Pacific: David SwinbanksAssociate Director: Antoine E. BocquetManager: Koichi NakamuraOperations Director: Hiroshi MinemuraMarketing Manager: Masahiro YamashitaAsia-Pacific Sales Director: Kate YoneyamaAsia-Pacific Sales Manager: Ken MikamiDISPLAY ADVERTISINGdisplay@us.nature.com (US/Canada)display@nature.com (Europe)nature@natureasia.com (Asia)Global Head of Advertising <strong>and</strong> Sponsorship: Dean S<strong>and</strong>erson, Tel: (212) 726 9350,Fax: (212) 696 9482Global Head of Display Advertising <strong>and</strong> Sponsorship: Andrew Douglas, Tel: 44 207 843 4975,Fax: 44 207 843 4996Asia-Pacific Sales Director: Kate Yoneyama, Tel: 81 3 3267 8765, Fax: 81 3 3267 8746Display Account Managers:New Engl<strong>and</strong>: Sheila Reardon, Tel: (617) 399 4098, Fax: (617) 426 3717New York/Mid-Atlantic/Sou<strong>the</strong>ast: Jim Breault, Tel: (212) 726 9334, Fax: (212) 696 9481Midwest: Mike Rossi, Tel: (212) 726 9255, Fax: (212) 696 9481West Coast: George Lui, Tel: (415) 781 3804, Fax: (415) 781 3805Germany/Switzerl<strong>and</strong>/Austria: Sabine Hugi-Fürst, Tel: 41 52761 3386, Fax: 41 52761 3419UK/Irel<strong>and</strong>/Sc<strong>and</strong>inavia/Spain/Portugal: Evelina Rubio-Hakansson, Tel: 44 207 014 4079,Fax: 44 207 843 4749UK/Germany/Switzerl<strong>and</strong>/Austria: Nancy Luksch, Tel: 44 207 843 4968, Fax: 44 207 843 4749France/Belgium/The Ne<strong>the</strong>rl<strong>and</strong>s/Luxembourg/Italy/Israel/O<strong>the</strong>r Europe: Nicola Wright,Tel: 44 207 843 4959, Fax: 44 207 843 4749Asia-Pacific Sales Manager: Ken Mikami, Tel: 81 3 3267 8765, Fax: 81 3 3267 8746Greater China/Singapore: Gloria To, Tel: 852 2811 7191, Fax: 852 2811 0743NATUREJOBSnaturejobs@us.nature.com (US/Canada)naturejobs@nature.com (Europe)nature@natureasia.com (Asia)US Sales Manager: Ken Finnegan, Tel: (212) 726 9248, Fax: (212) 696 9482European Sales Manager: Dan Churchward, Tel: 44 207 843 4966, Fax: 44 207 843 4596Asia-Pacific Sales & Business Development Manager: Yuki Fujiwara, Tel: 81 3 3267 8765,Fax: 81 3 3267 8752SPONSORSHIPg.preston@nature.comGlobal Head of Sponsorship: Gerard Preston, Tel: 44 207 843 4965, Fax: 44 207 843 4749Business Development Executive: David Bagshaw, Tel: (212) 726 9215, Fax: (212) 696 9591Business Development Executive: Graham Combe, Tel: 44 207 843 4914, Fax: 44 207 843 4749Business Development Executive: Reya Silao, Tel: 44 207 843 4977, Fax: 44 207 843 4996SITE LICENSE BUSINESS UNITAmericas: Tel: (888) 331 6288institutions@us.nature.comAsia/Pacific: Tel: 81 3 3267 8751institutions@natureasia.comAustralia/New Zeal<strong>and</strong>: Tel: 61 3 9825 1160nature@macmillan.com.auIndia: Tel: 91 124 2881054/55npgindia@nature.comROW: Tel: 44 207 843 4759institutions@nature.comCUSTOMER SERVICEwww.nature.com/helpSenior Global Customer Service Manager: Gerald CoppinFor all print <strong>and</strong> online assistance, please visit www.nature.com/helpPurchase subscriptions:Americas: Nature Biotechnology, Subscription Dept., 342 Broadway, PMB 301, New York, NY 10013-3910, USA. Tel: (866) 363 7860, Fax: (212) 334 0879Europe/ROW: Nature Biotechnology, Subscription Dept., Macmillan Magazines Ltd., Brunel Road,Houndmills, Basingstoke RG21 6XS, United Kingdom. Tel: 44 1256 329 242, Fax: 44 1256 812 358Asia-Pacific: Nature Biotechnology, NPG Nature Asia-Pacific, Chiyoda Building,2-37 Ichigayatamachi, Shinjuku-ku, Tokyo 162-0843. Tel: 81 3 3267 8751, Fax: 81 3 3267 8746India: Nature Biotechnology, NPG India, 3A, 4th Floor, DLF Corporate Park, Gurgaon 122002, India.Tel: 91 124 2881054/55, Tel/Fax: 91 124 2881052REPRINTSreprints@us.nature.comNature Biotechnology, Reprint Department, Nature Publishing Group, 75 Varick Street, Fl 9,New York, NY 10013-1917, USA.For commercial reprint orders of 600 or more, please contact:UK Reprints: Tel: 44 1256 302 923, Fax: 44 1256 321 531US Reprints: Tel: (617) 494 4900, Fax: (617) 494 4960


EditorialSitting up <strong>and</strong> taking noticeThe sheer pace of discovery in genetics is placing companies that pursue an aggressive infringement strategy for genepatents increasingly at odds with innovation.© 2010 Nature America, Inc. All rights reserved.On March 29, <strong>the</strong> sound of sabres rattling loudly emanated froma local court in <strong>the</strong> Sou<strong>the</strong>rn District of New York. In a case thatinvolved <strong>the</strong> BRCA1 <strong>and</strong> BRCA2 genetic tests for familial breast <strong>and</strong>ovarian cancer developed by Myriad Genetics, Judge Robert W. Sweeth<strong>and</strong>ed down a summary judgement that, if supported by higher courts,would not only invalidate Myriad’s composition of matter <strong>and</strong> methodclaims but could also undermine many patents on isolated genes.The plaintiffs in <strong>the</strong> case won on virtually every count. And <strong>the</strong> factthat <strong>the</strong>y did so at summary judgement—a stage that usually acts onlyas rehearsal of <strong>the</strong> arguments that will be made in court before a jury—means that <strong>the</strong> judge felt that Myriad had no case to argue.Despite <strong>the</strong> clarity of <strong>the</strong> ruling, any declaration that gene patentsare dead is premature. This decision is but <strong>the</strong> first salvo in what will bemany exchanges between proponents <strong>and</strong> opponents of gene patents in<strong>the</strong> US courts. Most legal commentators believe that Myriad will appeal<strong>the</strong> case to <strong>the</strong> US Federal Circuit court, <strong>and</strong> most believe that <strong>the</strong> courtwill overturn <strong>the</strong> bulk of Judge Sweet’s rulings.That is not to say that <strong>the</strong> biotech sector should be unconcernedabout, or dismissive of, <strong>the</strong> views being expressed by Judge Sweet <strong>and</strong><strong>the</strong> plaintiffs. Indeed, <strong>the</strong>re are many reasons why gene diagnostic businessesshould sit up <strong>and</strong> take notice. The gene patent controversy is notgoing away; in fact, it is more likely to intensify.Unrest about gene patents is spreading. In <strong>the</strong> Myriad case, physicians,patients, clinical geneticists <strong>and</strong> citizens’ groups all came toge<strong>the</strong>r tochallenge <strong>the</strong> biotech company—an indication not only of dissatisfactionabout Myriad’s overzealous pursuit of intellectual property (IP)rights but also of more broad distaste about <strong>the</strong> way gene inventionshave been, <strong>and</strong> are being, exploited.The Myriad plaintiffs were joined by <strong>the</strong> International Center forTechnology Assessment, Greenpeace, <strong>the</strong> Indigenous Peoples’ Council onBiocolonialism <strong>and</strong> <strong>the</strong> Council for Responsible Genetics. These ‘friendsof <strong>the</strong> court’ argued that gene patents have negative consequences, suchas <strong>the</strong> privatization of genetic heritage, <strong>the</strong> creation of private rights ofunknown scope <strong>and</strong> consequences <strong>and</strong> <strong>the</strong> violation of patients’ rights.The alignment of physicians’ <strong>and</strong> patients’ groups with what are, ineffect, antibiotech lobbyists is a worrying development.Broader concerns about gene patents, exclusive licensing <strong>and</strong> aggressiveIP infringement strategies are finding an echo within research. Itoften seems unfair that <strong>the</strong> patent system rewards only <strong>the</strong> last inventivestep—<strong>the</strong> small breakthrough that enables a concept to be realized.The research enterprise, which continually re<strong>new</strong>s itself, especially inrapidly moving areas like genetics, is increasingly at odds with <strong>the</strong> commercialconservatism of patent monopolies based on gene findings thatare obsolescent compared with current art. Despite both cultural <strong>and</strong>economic incentives for innovation, <strong>the</strong> difficulty in dislodging incumbentapproaches is reinforced by a patent system that insists that anyuse, however small, of a protected method is infringement. Is it so outrageousto expect that a properly functioning IP system could providean unobstructed path to <strong>the</strong> market both for <strong>the</strong> initial innovators <strong>and</strong>for subsequent improvers? Surely, a different balance of rights is possiblethat better serves <strong>the</strong> society with whom <strong>the</strong> patent bargain hasbeen struck.In this regard, Myriad’s influence has been particularly pernicious. Itslawyers have issued cease-<strong>and</strong>-desist letters to genetics laboratories inuniversities, hospitals <strong>and</strong> clinics that offered diagnostic services basedon <strong>the</strong> BRCA1 <strong>and</strong> BRCA2 genes. Its monopoly thus enforced, Myriadcontinues to charge around $3,000 per patient for <strong>the</strong> tests, a price thatis difficult to afford <strong>and</strong> which richly offsets operational costs: Myriad’sfiscal 2009 results show $326 million in revenue from molecular diagnostictesting against $43 million in costs. The technical discomfort withMyriad—<strong>and</strong> perhaps <strong>the</strong> popular objections, too—reflect not specificmalice directed at one company but a more general sense of disconnectednessbetween invention on <strong>the</strong> one h<strong>and</strong> <strong>and</strong> <strong>the</strong> availability ofimproved gene test products.The more important general point is <strong>the</strong> perceived impasse betweenpatents that make claims on <strong>the</strong> use of individual DNA sequences <strong>and</strong><strong>new</strong> diagnostics that look at many different sequences simultaneously.In <strong>the</strong> United States, <strong>the</strong> Secretary’s Advisory Committee on Genetics,Health, <strong>and</strong> Society (SACGHS) has addressed this problem in a reportGene Patents <strong>and</strong> Licensing Practices <strong>and</strong> Their Impact on Patient Accessto Genetic Tests. The report, which is currently being finalized, concludesthat patented sequences would be infringed by not only microarray <strong>and</strong>microbead methods using equivalent probes, but also whole genomesequencing methods. As a solution, <strong>the</strong> SACGHS report proposes that<strong>the</strong> pooling of patents or clearinghouses for royalty collection mightserve as machetes to allow innovative companies to hack a way through<strong>the</strong> patent thickets.Patent pooling <strong>and</strong> clearinghouse mechanisms are probably not goingto emerge in biotech of <strong>the</strong>ir own accord. History tells us that, with oneexception—<strong>the</strong> patent pool for Golden Rice—<strong>the</strong> life sciences have gotthrough thus far without <strong>the</strong>m. It will <strong>the</strong>refore probably take some formof government or legal coercion to get things moving for gene tests.As we move from single-gene tests to multiple-gene signature testing<strong>and</strong> whole genome sequencing, it might also be possible to assign rightsaccording to <strong>the</strong> importance of any specific gene sequence in <strong>the</strong> utilityof <strong>the</strong> test. Such a principle, instead of rewarding companies that managedto surround <strong>the</strong> early gene mutant discoveries (which now lookra<strong>the</strong>r trivial) with an impenetrable wall of IP, would incentivize thosewho continue to develop tests of high medical value with commensuratefinancial remuneration. That this ideal is implausible within <strong>the</strong> currentpetrified patent system <strong>and</strong> commercial infrastructure doesn’t have tostop <strong>the</strong> dream, <strong>and</strong> certainly shouldn’t stop <strong>the</strong> discussion.nature biotechnology volume 28 number 5 MAY 2010 381


<strong>new</strong>sin this sectionBiotech afterhealthcare reformp385Abbott wins bid forFacet p387Protests overChina’s Bt ricep390Biomarker-led adaptive trial blazes a trail in breast cancer© 2010 Nature America, Inc. All rights reserved.A breast cancer screening study that pairsoncology <strong>the</strong>rapies with biological markers(biomarkers) launched by a consortium ofpublic health agencies, academics <strong>and</strong> companiesis being heralded as a milestone in clinicaltrials. The I-SPY 2 TRIAL, which involves 20US cancer centers, will follow an adaptive trialdesign that promises both time <strong>and</strong> cost savings.Researchers will use genetic or biologicalmarkers from patients to guide decisions aboutwhich drug c<strong>and</strong>idates may be most effectivefor specific types of breast cancer. “The hypo<strong>the</strong>sishere is that one size does not fit all,” saysJanet Woodcock of <strong>the</strong> US Food <strong>and</strong> <strong>Drug</strong>Administration (FDA), one of <strong>the</strong> trial’s manycollaborators.The I-SPY2 breast cancer trial will use MRIimaging <strong>and</strong> genetic biomarkers to screen <strong>and</strong>rapidly identify <strong>the</strong> most promising agents.Pharma companies will donate twelve drugc<strong>and</strong>idates over <strong>the</strong> next five years to <strong>the</strong> program.MEHAU KULYK / SCIENCE PHOTO LIBRARYThe I-SPY 2 TRIAL (investigation of serialstudies to predict your <strong>the</strong>rapeutic response withimaging <strong>and</strong> molecular analysis 2), coordinatedunder <strong>the</strong> auspices of <strong>the</strong> Foundation for <strong>the</strong>National Institutes of Health (NIH; Be<strong>the</strong>sda,MD) Biomarkers Consortium, is adaptivein design—researchers will use informationfrom one set of participants to make informedmodifications as <strong>the</strong> study progresses. “We arelearning who is benefiting, <strong>and</strong> we modify <strong>the</strong>r<strong>and</strong>omization to go in that direction,” says biostatisticianDonald Berry, head of <strong>the</strong> Divisionof Quantitative Sciences at MD AndersonCancer Center, in Houston, <strong>and</strong> one of <strong>the</strong> I-SPY2 TRIAL’s principal investigators. Participantsinclude several academic groups, <strong>the</strong> NationalCancer Institute (NCI; Be<strong>the</strong>sda, MD), <strong>the</strong> FDA<strong>and</strong> drug <strong>and</strong> diagnostic companies.Although many in industry have been tiptoeingaround <strong>the</strong> biomarker question, I-SPY 2addresses it head on, using biomarkers to focuson those subjects that will benefit from treatment.The ultimate goal is to quickly pick out<strong>the</strong> best c<strong>and</strong>idate drugs worthy of testing <strong>and</strong>ultimately ramp up success rates for potentialtreatments.The initial five-year phase 2 I-SPY 2 TRIALwill compare five investigational drug c<strong>and</strong>idatesfrom Abbott Labs, of Abbott Park, Illinois;Amgen of Thous<strong>and</strong> Oaks, California; <strong>and</strong>Pfizer of New York (Table 1) with conventional<strong>the</strong>rapy. Those that earn a ‘thumbs up’ will passinto <strong>the</strong> phase 3 study, whereas <strong>the</strong> o<strong>the</strong>rs willbe dropped <strong>and</strong> <strong>new</strong> c<strong>and</strong>idate drugs will becycled in. In addition, when a drug meets <strong>the</strong>required “85% chance of succeeding in phase 3mark,” all <strong>the</strong> women in that study arm will beable to receive it.I-SPY 2 TRIAL has garnered a great deal ofattention, in large part because <strong>the</strong> design has<strong>the</strong> potential to shave several years <strong>and</strong> millionsof dollars off <strong>the</strong> drug development process. Ifsuccessful, <strong>the</strong> adaptive trial could recast <strong>the</strong>currently dreadful state of cancer drug development,where almost three-quarters of drugsin development fail in phase 3 (Box 1). It couldalso change <strong>the</strong> way advanced breast cancertrials are conducted in <strong>the</strong> future.Berry attributes <strong>the</strong> creation of <strong>the</strong> I-SPYTRIALs to “two tenacious women: LauraEsserman <strong>and</strong> Anna Barker.” Esserman, whois one of <strong>the</strong> I-SPY 2’s principal investigators,is director of <strong>the</strong> Carol Franc Buck Breast CareCenter at <strong>the</strong> University of California, SanFrancisco (UCSF), whereas Barker is deputydirector of <strong>the</strong> NCI.The I-SPY 2 TRIAL follows from I-SPY 1TRIAL, which provided critical data on <strong>the</strong>utility of multiple molecular biomarkers <strong>and</strong>MRI in evaluating breast tumors that aretreated with chemo<strong>the</strong>rapy before surgery.I-SPY 1 also helped <strong>the</strong> researchers set upst<strong>and</strong>ard methods for collecting core biopsymaterial for measuring <strong>and</strong> evaluating geneexpression profiles, <strong>and</strong> for MRI-based tumorevaluation, as well as o<strong>the</strong>r processes. In brief,it created <strong>the</strong> infrastructure to ensure accurate<strong>and</strong> consistent data collection, capture <strong>and</strong>sharing to launch <strong>the</strong> more ambitious trial.The <strong>new</strong> trial goes much fur<strong>the</strong>r. Participantsmust have large aggressive tumors, which aretypically extremely hard to treat. The adaptivedesign, which uses Bayesian statistical methods,will allow <strong>the</strong> researchers to more quicklydetermine if a <strong>the</strong>rapy is working or not. Alltrial participants will receive st<strong>and</strong>ard <strong>the</strong>rapy(chemo<strong>the</strong>rapy with or without Herceptindepending on <strong>the</strong>ir HER2 status) before surgery.Some of <strong>the</strong> women will also receiveinvestigative agents at that point. This willallow <strong>the</strong> researchers to measure <strong>the</strong> tumor<strong>and</strong> track its response. Esserman estimatesthis could shave years off <strong>the</strong> study’s length.“Typically we start studying <strong>the</strong>se agents inwomen with metastatic disease,” she says. Thatrequires 2–3 years, followed by ano<strong>the</strong>r 5–10years before results come in from studies in<strong>the</strong> adjuvant setting. It <strong>the</strong>refore takes a verylong time for a good drug to reach <strong>the</strong> widestrange of patients it can benefit.Before entering I-SPY 2, all women willhave a core biopsy <strong>and</strong> a MammaPrint diagnostictest from Amsterdam-based Agendiato determine whe<strong>the</strong>r <strong>the</strong>y are at high risk fortumor recurrence (<strong>and</strong> whe<strong>the</strong>r <strong>the</strong>y are eligiblefor <strong>the</strong> trial) or not. The MammaPrinttest comprises a 70-gene expression profilesignature for identifying breast cancerpatients at low risk of developing distantmetastasis (J. Clin. Oncol. 26, 729–735, 2008).nature biotechnology volume 28 number 5 MAY 2010 383


NEWS© 2010 Nature America, Inc. All rights reserved.Box 1 Segmenting lung cancer patientsJust a couple of weeks after I-SPY 2 TRIAL’s launch, ano<strong>the</strong>r MD Anderson team released<strong>the</strong> results from a similar trial searching for links between promising <strong>the</strong>rapies <strong>and</strong>putative biomarkers in non-small cell lung cancer. On April 15, at <strong>the</strong> 101 st AnnualMeeting of <strong>the</strong> American Association for Cancer Research in Washington, DC, principalinvestigator Edward Kim presented <strong>the</strong> results of <strong>the</strong> BATTLE (biomarker-integratedapproaches of targeted <strong>the</strong>rapy for lung cancer elimination) phase 2 trial, in which anadaptive r<strong>and</strong>omization approach was used to match four drugs to biomarkers in <strong>the</strong>tumors of 255 stage-4 non-small cell lung cancer patients who had received between one<strong>and</strong> nine previous treatments.The search for effective targeted <strong>the</strong>rapies has been particularly challenging in lungcancer. “Not a week goes by when you don’t hear about ano<strong>the</strong>r failed lung cancer trial,” sayRoy Herbst, chief of Thoracic Oncology at MD Anderson <strong>and</strong> one of <strong>the</strong> BATTLE researchers.Frustrated <strong>and</strong> dismayed by this trend, <strong>the</strong> researchers designed BATTLE, which beganenrolling patients three years ago.The study m<strong>and</strong>ated a fresh core needle biopsy from each participant to study 11biomarkers possibly linked to drug response. All of <strong>the</strong> participants had failed to respond toan initial <strong>the</strong>rapy <strong>and</strong> were assessed 8 weeks after beginning <strong>the</strong> <strong>new</strong> <strong>the</strong>rapy because of <strong>the</strong>disease’s typically rapid progress. Several biomarkers were identified that are significantlycorrelated with response to one or more of <strong>the</strong> drugs, including epidermal growth factorreceptor (EGFR) mutations for OSI Pharmaceuticals’ Tarceva (erlotinib), positive cyclin D1immunohistochemistry <strong>and</strong> EGFR fluorescent in situ hybridization amplification for Tarceva<strong>and</strong> chemo<strong>the</strong>rapy bexarotene, overexpression of vascular endo<strong>the</strong>lial growth factor receptor2 for London-based AstraZeneca’s Zactima (v<strong>and</strong>etanib), <strong>and</strong> absence of EGFR mutation orhigh polysomy for Nexavar (sorafenib), marketed by Bayer of Leverkusen, Germany. Patientswith KRAS mutations also tended to do better on Nexavar.For such a challenging disease, it is an astonishingly rich set of data. Now BATTLE II,a phase 2 trial will be launched using adaptive r<strong>and</strong>omization but testing combinations ofdrugs, ra<strong>the</strong>r than single agents. The arms of <strong>the</strong> trial will include even more novel agents,including AKT <strong>and</strong> MEK inhibitors. The goal is to find strongly predictive biomarkers for lungcancer <strong>the</strong>rapy.MAAccording to Laura Van’t Veer, chief researchofficer at Agendia who is taking a position atUCSF to work on <strong>the</strong> I-SPY 2 TRIAL study,“MammaPrint is <strong>the</strong> only FDA-approvedtest for this.” Tumors will also be analyzedusing Agendia’s DiscoverPrint whole genomeexpression test (N. Engl. J. Med. 347, 1999–2009, 2002). These expression profiles willlater be analyzed for how <strong>the</strong>y correlate to<strong>the</strong> patient’s response to treatment. During<strong>the</strong> trial, numerous investigative biomarkersof several types will also be included, as wellas imaging techniques.Table 1 I-SPY 2 TRIAL’s first breast cancer c<strong>and</strong>idate drugs<strong>Drug</strong> name Company Target/mechanismABT-888 (veliparib)AMG 655(conatumumab)Although multiple investigative drugs willbe studied at once, each of <strong>the</strong>se compoundshas a unique mechanism of action (Box 1) sothat competitors aren’t going head to head.Why should pharma <strong>and</strong> biotech allow <strong>the</strong>irdrugs to be tested this way? “Companies areexcited about an approach that can bringdown <strong>the</strong> time <strong>and</strong> cost it takes to evaluate<strong>the</strong>ir drugs,” Esserman says. Each drug willemerge from <strong>the</strong> trial with qualifying biomarkers.If some drugs drop out due to failure,<strong>new</strong> ones can be added to replace <strong>the</strong>mthanks to a novel master investigational <strong>new</strong>Abbott Laboratories Poly(adenosine-diphosphate-ribose) polymerase (PARP) inhibitor.PARP is normally involved in DNA repair, but cancer cells canuse it to <strong>the</strong>ir advantage.AmgenA human mAB that induces apoptosis in cancer cells by bindingTRAIL (tumor necrosis factor–related apoptosis-inducing lig<strong>and</strong>)receptor 2.AMG 386 Amgen A ‘peptibody’ Fc fragment linked to a peptide that inhibits <strong>the</strong>pro-angiogenic factors angiopoietin-1 (Tie-2) <strong>and</strong> angiopoietin-2.CP-751871(figitumumab)PfizerInsulin-like growth factor receptor (IGFR) inhibitor. IGFR hasmultiple effects on tumors.HKI-272 (neratinib) Pfizer A pan-ErbB small-molecule drug. Inhibits <strong>the</strong> HER2 kinase.drug application that <strong>the</strong> FDA has granted<strong>the</strong> investigators. The data from <strong>the</strong> trialwill be shared <strong>and</strong> made public through adatabase.To some observers, it seems it has taken avery long time to reach this point. The FDA<strong>and</strong> o<strong>the</strong>rs have for years been calling forgreater use of Bayesian approaches, particularlyin oncology (Nat. Rev. <strong>Drug</strong> Discov. 5, 3,2006). Streamlining clinical trials <strong>and</strong> adaptivedesigns was one of <strong>the</strong> focal points of<strong>the</strong> FDA’s 2006 Critical Path Initiative—<strong>the</strong>agency’s blueprint for improving development,manufacture <strong>and</strong> oversight of FDAregulatedproducts.But changing <strong>the</strong> way a clinical trial isconducted is a daunting task. This is becausemodifying an ongoing trial’s features is exactly<strong>the</strong> kind of thing that gets sponsors into troublewith regulators. “Looking at subsets can bevery dangerous,” Berry says. And yet lookingat response rates within subsets early can substantiallychange a trial’s outcome. He creditsadvances in statistical software <strong>and</strong> biomarkerdevelopment as two things that have increased<strong>the</strong> take up of Bayesian approaches. “DonBerry is probably <strong>the</strong> leading authority onadaptive design,” Esserman says.In a recent review, Berry <strong>and</strong> colleaguesascertained that 20% of nearly 1,000 protocolsused at MD Anderson had Bayesianfeatures (Clinical Trials 6, 205–216, 2009).Unfortunately, <strong>the</strong> trend has not spread muchfur<strong>the</strong>r afield, <strong>and</strong> as one observer has written,“While <strong>the</strong>re are certainly some at o<strong>the</strong>rcenters, <strong>the</strong> bulk of applied Bayesian clinicaltrial design in this country is largely confinedto a single zip code” (Clinical Trials 6,203–204, 2009).MD Anderson researchers have not onlypioneered <strong>the</strong> method, <strong>the</strong>y have left plentyof bread crumbs for anyone who wants tofollow <strong>the</strong>m. In <strong>the</strong>ir recent article, Berry<strong>and</strong> colleagues included case studies for particularapplications of <strong>the</strong> Bayesian approach<strong>and</strong> have also made available software for trialdesign using <strong>the</strong> method.So is this a tipping point for adoption ofBayesian trial design? According to GaryGordon of Abbott Laboratories, “It could bea turning point, but people will be looking atevery novel aspect of this trial <strong>and</strong> seeing howit actually turns out. The better things work,<strong>the</strong> more people will follow suit.” Meanwhile,many observers, including patient advocates,are encouraged. “This is a clear sign of progress,”says Frank Burroughs of <strong>the</strong> AbigailAlliance, “It’s exactly <strong>the</strong> kind of modernscientific <strong>and</strong> statistical tools that have beenlacking.”Malorye Allison Acton, Massachusetts384 volume 28 number 5 MAY 2010 nature biotechnology


Biotechs adjust to <strong>new</strong> l<strong>and</strong>scape as US healthcare reform takes off<strong>new</strong>s© 2010 Nature America, Inc. All rights reserved.Even before <strong>the</strong> political uproar surrounding<strong>the</strong> passage of <strong>the</strong> Patient Protection <strong>and</strong>Affordable Care Act (PPACA) in March hadsubsided, biotech-industry watchers wereapplauding <strong>the</strong> passage of <strong>the</strong> historic healthcare bill. Among <strong>the</strong> favorite measures in <strong>the</strong>legislation are generous exclusivity termsfor innovative <strong>the</strong>rapeutics within a <strong>new</strong>lydrafted pathway for biogenerics, a lucrativetax credit for eligible smaller companiesdeveloping <strong>the</strong>rapeutics, <strong>and</strong> a substantialboost—30 million or more—in <strong>the</strong> numberof potential clients for biotech <strong>the</strong>rapeuticsdue to <strong>the</strong> expansion of health insurance toso many more Americans.“The health care reform bill…includeskey provisions that will lead to <strong>new</strong> <strong>and</strong>improved treatments, cures <strong>and</strong> cost-savingsfor patients, while driving job growth inour industry <strong>and</strong> maintaining our nation’sglobal leadership in biotech innovation,”says Jim Greenwood, president of <strong>the</strong>Biotechnology Industry Organization(BIO) in Washington, DC. Peter Pitts,president of <strong>the</strong> Center for Medicine in <strong>the</strong>Public Interest (CMPI) in New York, agrees:“This legislation will have a huge impact onbiotech companies—<strong>the</strong> most affected of anyindustry.”However, some observers balk at sizing up <strong>the</strong>impact of <strong>the</strong> <strong>new</strong> legislation too quickly. “Thereare too many unresolved variables to knowwhe<strong>the</strong>r <strong>the</strong> position of <strong>the</strong> biotech industry willbe improved under <strong>the</strong> <strong>new</strong> health care law,” saysGregory Conko of <strong>the</strong> Competitive EnterpriseInstitute (CEI) in Washington, pointing to several“ambiguities over how various provisionswill be implemented.”What’s more, Conko continues, despite <strong>the</strong>exp<strong>and</strong>ed market, “<strong>the</strong> industry will be overtlypenalized by <strong>the</strong> addition of a tax on pharmaceuticalmanufacturers, starting at $2.8 billion in2012, peaking at $4.1 billion in 2018 <strong>and</strong> <strong>the</strong>nfalling again to $2.8 billion annually. And several<strong>new</strong> cost-cutting programs in <strong>the</strong> Departmentof Health <strong>and</strong> Human Services could result inmuch lower sales prices.”That tax on pharmaceuticals reflects anearly deal that <strong>the</strong> Pharmaceutical Research<strong>and</strong> Manufacturers of America (PhRMA) inWashington forged with Congress <strong>and</strong> <strong>the</strong>Administration over healthcare reform. Thedeal includes, among o<strong>the</strong>r matters, a provisionto reimburse <strong>the</strong> government for costs fallingwithin <strong>the</strong> widely scorned ‘donut hole.’ Thedonut hole is <strong>the</strong> term used to describe a coveragegap in <strong>the</strong> 2003 Medicare Part D health planfor prescription drugs. Many seniors find thatPresident Obama signs <strong>the</strong> most sweeping social legislationin decades. The Patient Protection <strong>and</strong> Affordable CareAct, enacted on March 23, will ensure coverage for almostall Americans.<strong>the</strong>y are initially reimbursed for drug expensesup to a certain limit, but on reaching <strong>the</strong> ‘donuthole’ are left responsible for drug costs untilexpenses reach <strong>the</strong> higher catastrophic coveragethreshold. For example, in 2009 (reimbursementlimits change yearly), Medicare paid for drugsfor seniors through <strong>the</strong> first nearly $2,700 outlay,but <strong>the</strong>n individuals paid out of pocket until asecond tier of drug benefits kicked in for costsexceeding about $6,100. In that year, <strong>the</strong> threetop-selling biologic drugs under part D wereEnbrel <strong>and</strong> Remicade for autoimmune diseases,<strong>and</strong> <strong>the</strong> anti-cancer agent Avastin, accordingto a report by La Merie Business Intelligence.“The donut hole has been a thorn in <strong>the</strong> side ofseniors,” says Boston-based Glen Giovannetti,global biotech leader for Ernst & Young. As partof a deal to remove that thorn, PhRMA agreedto phase in price reductions <strong>and</strong> close <strong>the</strong> donuthole for seniors by paying a special tax for severalyears, he adds. The impact on companies’balance sheets is proving hard to fathom. “Theexcise tax kicks in, <strong>and</strong> it’s a weird formula thathas companies trying to figure out when it hits<strong>the</strong>ir PNLs [profits <strong>and</strong> losses]. The tax, which isbased on total share of drugs sold to <strong>the</strong> governmentin <strong>the</strong> prior year, means companies haveto pay to play.” He estimates that, overall, <strong>the</strong>effects of this tax will probably swing positive by2014 because by <strong>the</strong>n so many more people willbe covered by insurance, making up in volumewhat will be lost in <strong>the</strong> short term to <strong>the</strong> <strong>new</strong> tax<strong>and</strong> reduced prices.But Giovannetti <strong>and</strong> o<strong>the</strong>rs say that <strong>the</strong>seestimates are crude at best. On <strong>the</strong> brightside, <strong>the</strong> biogenerics provisions in PPACAguarantee 12 years of exclusivity to innovatorcompanies for <strong>the</strong>ir products <strong>and</strong>also prohibit manufacturers of follow-onproducts from using br<strong>and</strong> names of originalproducts, Pitts of CMPI says. This latterprovision is “good for <strong>the</strong> industry” becauseit means that <strong>the</strong> innovator companies “canstill make money,” even past those 12 yearsof exclusivity. Thus, he predicts that manyphysicians will continue prescribing originalbr<strong>and</strong>-name products, particularly ifprice differentials with biogenerics remainlow.“A lot of companies are salivating at <strong>the</strong>possibility of biosimilars,” says WashingtonbasedThomas Sullivan, <strong>the</strong> founder of <strong>the</strong>website Policy <strong>and</strong> Medicine (P&M) <strong>and</strong>president of Rockpointe in Columbia,Maryl<strong>and</strong>. “But [companies] will have toprove <strong>the</strong>y work <strong>the</strong> same, <strong>and</strong> <strong>the</strong>y will belike a sub-br<strong>and</strong>ed category.”Certainly, <strong>the</strong> current biosimilar pathwayhas received a less-than-lukewarm receptionfrom traditional generics manufacturers. TheGeneric Pharmaceutical Association (GphA)in Arlington, Virginia, near Washington, calls<strong>the</strong>se provisions “a biogeneric pathway in nameonly,” <strong>and</strong> says it gives “false hope to patientswho desperately need access to life-saving biogenericmedicines.” GphA also calls <strong>the</strong> legislationa “missed opportunity to inject real pharmaceuticalcost containment into <strong>the</strong> US healthcaresystem” <strong>and</strong> claims that <strong>the</strong> <strong>new</strong> law “locks downindefinite br<strong>and</strong> product monopolies at a deepcost to patients <strong>and</strong> taxpayers.”Ano<strong>the</strong>r provision in PPACA is <strong>the</strong> <strong>the</strong>rapeuticdiscovery tax credit, which according toBIO’s Greenwood could prove “critical” to biotechcompanies. This <strong>new</strong> $1 billion program isaimed at research-intensive, small biotech companies,providing <strong>the</strong>m with tax credits equalto 50% of investments in qualified <strong>the</strong>rapeuticdiscovery projects for 2009 <strong>and</strong> 2010.Giovannetti of Ernst & Young calls this taxcredit provision a “big win” for firms with fewerthan 250 employees. In terms of qualifying for<strong>the</strong> credit, he says, “There’s not a lot of detailbecause <strong>the</strong> criteria are being developed. Butcompanies are very interested, contacting usto learn how to queue up with applications.”Importantly, he adds, unlike an earlier federalmeasure set up to stimulate <strong>the</strong> energy sector,this measure steers credit away from large, establishedcorporations <strong>and</strong> toward “emerging companies.It’s a big win when capital is so tight.”AFP Photo/Saul LOEBnature biotechnology volume 28 number 5 MAY 2010 385


NEWS© 2010 Nature America, Inc. All rights reserved.in briefGenentech, UCSF discoverypactUCSF’s SusanDesmond-Hellmannspent 14 years atGenentechGenentech <strong>and</strong><strong>the</strong> Universityof California,San Francisco(UCSF) announcedin February adrug discoverypartnership, a union<strong>the</strong>y proclaim isa <strong>new</strong> model forindustry-academicrelationships. Thedeal, which focuseson neurodegenerativediseases, goesbeyond providingfunds for several groups from <strong>the</strong> SmallMolecule Discovery Center (SMDC) at UCSF.The company is offering <strong>the</strong> university up to$13 million in development <strong>and</strong> commercialmilestone payments <strong>and</strong> a share in anyresulting royalties. Genentech, of South SanFrancisco, California, <strong>and</strong> SMDC scientistswill pursue target pathways selected fromlines of research on both sides, <strong>and</strong> <strong>the</strong>deal builds on <strong>the</strong>ir 2005 master agreementthat put guidelines in place for futurecollaboration (it has so far facilitated 15st<strong>and</strong>ard research agreements). The SMDC,which assists UCSF researchers in drugdiscovery, has a strong industrial bent: itis equipped to perform high-throughputassays <strong>and</strong> has a library of more than180,000 compounds. The center alsooffers experience, as it houses a dedicatedcore of medicinal chemists <strong>and</strong> biomedicalresearchers, many of whom have industrialtraining in analyzing <strong>and</strong> advancing hitsto lead compounds. The collaborationwill perhaps serve as a boost for <strong>the</strong>San Francisco area after Pfizer pulled out ofits Bio<strong>the</strong>rapeutics <strong>and</strong> Bioinnovation Centrein Mission Bay recently. The group slated towork with Genentech is “staff, not studentsor postdocs,” says SMDC director JimWells, <strong>and</strong> it is this expertise that sets <strong>the</strong>relationship apart from typical collaborationswith academic labs. Wells also said that thispartnership resembles a “biotech to pharma”arrangement, with <strong>the</strong> two teams workingside by side <strong>and</strong> having a healthy amountof scientific exchange. “It’s not, ‘You dowhat <strong>the</strong>y say <strong>and</strong> that’s it’,” he explains.“And it’s not like you have an asset that yousell off <strong>and</strong> never see again. There’s realinvolvement, real give <strong>and</strong> take.” Those arereasons enough for choosing SMDC for whatmight be a pilot program that Genentechcould duplicate elsewhere, but <strong>the</strong>re areo<strong>the</strong>rs: Wells worked as a protein engineer atGenentech for 16 years, <strong>and</strong> <strong>the</strong> chancellorof UCSF is Susan Desmond-Hellmann,previously Genentech’s president of productdevelopment.Jennifer RohnBox 1 Threats to reform—could <strong>the</strong> Act be struck down?Serious opposition to <strong>the</strong> Patient Protection <strong>and</strong> Affordable Care Act (PPACA) comes at twoprincipal levels. First, in <strong>the</strong> Congress, <strong>the</strong> Republican leadership, including Senate minorityleader Mitch McConnell of Kentucky <strong>and</strong> House minority leader John Boehner of Ohio,continue to inveigh against healthcare reform. “We’ve fought on behalf of <strong>the</strong> Americanpeople this week, <strong>and</strong> we’ll continue to fight until this bill is repealed <strong>and</strong> replaced withcommon-sense ideas that solve our problems without dismantling <strong>the</strong> health care systemwe have <strong>and</strong> without burying <strong>the</strong> American Dream under a mountain of debt,” McConnellsaid in March. Similarly, Boehner said, “Let’s repeal this jobs-killing government takeoverof health care <strong>and</strong> start over with common-sense reform to lower health care costs <strong>and</strong> helpsmall businesses create jobs.”Elsewhere, individual members of <strong>the</strong> House <strong>and</strong> several coalitions of Senators orRepresentatives, all Republicans, have introduced bills seeking to repeal PPACA, though<strong>the</strong>se are symbolic ra<strong>the</strong>r than realistic. In <strong>the</strong> near term, Republicans lack <strong>the</strong> votesnecessary to enact a repeal, which also would need to withst<strong>and</strong> a presidential veto. Theoutcome of general elections next November is expected to shift <strong>the</strong> political balance inCongress but by how much no one can say.The second level of serious opposition comes from 14 state Attorneys General—those ofAlabama, Colorado, Florida, Idaho, Louisiana, Michigan, Nebraska, Pennsylvania, SouthCarolina, South Dakota Texas, Utah, Virginia <strong>and</strong> Washington—who have filed lawsuitschallenging PPACA on both practical <strong>and</strong> constitutional grounds. Predicting <strong>the</strong> outcomeof <strong>the</strong>se legal challenges remains impossible, although some experts in constitutional lawargue that <strong>the</strong> American Civil War set <strong>the</strong> st<strong>and</strong>ard for states heeding federal statutes. In anycase, no radical change is expected anytime soon, <strong>and</strong> <strong>the</strong> more time available for PPACA tobecome a practical reality, <strong>the</strong> less likely it is to remain a hot issue—unless, of course, someof <strong>the</strong> more dire predictions about its ill effects become a part of that reality.JFAdditionally, PPACA authorizes <strong>the</strong>Cures Acceleration Network (CAN), whichis intended to help National Institutes ofHealth (NIH)-funded researchers bridge <strong>the</strong>gap between basic research <strong>and</strong> commercialdevelopment of treatments, according to EllenDadisman of BIO. “This provision also willhelp expedite Food <strong>and</strong> <strong>Drug</strong> Administration(FDA) review of highly innovative safe <strong>and</strong>effective treatments for patients,” she says. “Iffunded, CAN would significantly enhance <strong>the</strong>quality of health care for <strong>the</strong> American peopleby speeding up our ability to transitionresearch originating from NIH.”Yet, along with <strong>the</strong>se benefits for researchers<strong>and</strong> innovative companies, <strong>the</strong>re could comesome heavy lifting in store, says Giovannetti ofErnst & Young. Therapeutic agents “will need tobe as good or better <strong>and</strong> also cheaper,” he says.“We’re seeing this in collaborations’ milestonesbeing set between pharma <strong>and</strong> biotech companies.Safe <strong>and</strong> effective might not be goodenough; a product also has to be seen as gainingreimbursement [status]. Over <strong>the</strong> long term, thisshould play well for biotech companies that aretruly innovative.”Of course, just how or whe<strong>the</strong>r reimbursementpractices change—particularly with anaim of curbing costs—is one of <strong>the</strong> uncertaintiesembedded in healthcare reform.And comparative effectiveness research willsurely be part of this <strong>new</strong> equation, accordingto Conko of CEI. “It’s not obvious howprograms like <strong>the</strong> <strong>new</strong> Independent PaymentAdvisory Board for Medicare will work, howit <strong>and</strong> o<strong>the</strong>r programs will internalize comparativeeffectiveness research results from<strong>the</strong> <strong>new</strong> Patient-Centered Outcomes ResearchInstitute, or what effect <strong>the</strong> ‘value-based purchasing’program or <strong>the</strong> pilot programs for‘bundling’ payments will have on drug <strong>and</strong>biologics prescribing,” he says.Yet ano<strong>the</strong>r potential drag on innovationincluded in <strong>the</strong> healthcare reform is stringentreporting requirements for physicians <strong>and</strong> o<strong>the</strong>rswho consult with industry, according toSullivan of P&M. These are not “restrictions” assuch, but <strong>the</strong> “paperwork will be burdensome,”he says. “It doesn’t stop people from consulting,but regulators will want to know exactly what itlooks like, <strong>and</strong> it may have some effect on biotechswhen investment firms can see who all <strong>the</strong>consultants are.”Says Pitts of CMPI, “Industry lobbied hardfor a good bill, but this bill is flawed in so manyways.” (Box 1). However, he adds, “It’s timeto realize that it’s no longer just about sellingdrugs, but for providing healthcare, <strong>and</strong>companies must walk <strong>the</strong> walk.” None<strong>the</strong>less,Sullivan says, <strong>the</strong> biotech industry can “lookforward to having more patients who canafford treatments, especially for orph<strong>and</strong>iseases. And dropping insurance caps willtotally help <strong>the</strong> industry as well as patients <strong>and</strong><strong>the</strong>ir families.”Jeffrey L Fox Washington, DC386 volume 28 number 5 MAY 2010 nature biotechnology


<strong>new</strong>sAbbott outbids Biogen for Facet’s multiplesclerosis antibody© 2010 Nature America, Inc. All rights reserved.Abbott Laboratories has made a bid for aslice of <strong>the</strong> multiple sclerosis (MS) market,through its $450 million cash acquisition ofFacet Biotech. The deal, announced in March,gives Abbott a stake in Zenapax (daclizumab),a potential MS treatment poised to move intophase 3 testing, as well as a portfolio of early<strong>and</strong> mid-stage cancer compounds. But for <strong>the</strong>Abbott Park, Illinois–based pharma, <strong>the</strong> moveseems more like a toe-in-<strong>the</strong>-water exercisethan a headlong plunge.The scale of <strong>the</strong> transaction is miniscule whenset against Abbott’s recent €4.5 billion ($6.2 billion)acquisition of Brussels, Belgium–basedSolvay Pharmaceuticals or its $6.9 billion purchaseof Ludwigshafen, Germany–based KnollPharmaceuticals in 2000. The latter deal, whichgave it ownership of Humira (adalimumab), haspaid off h<strong>and</strong>somely: <strong>the</strong> tumor necrosis factoralpha (TNF-α) inhibitor racked up around$5.5 billion in sales last year. The Facet purchaseeven appears relatively modest compared with<strong>the</strong> $170 million—plus ano<strong>the</strong>r potential $20million in milestones—Abbott lavished on asingle phase 1 antibody, a nerve-growth-factorinhibitor called PG110, which it acquired fromPanGenetics, of Utrecht, The Ne<strong>the</strong>rl<strong>and</strong>s, lastyear.Never<strong>the</strong>less, Abbott’s $27 per share offer substantiallytrumped <strong>the</strong> $17.50 offered by Facet’sdevelopment partner Biogen Idec, of Cambridge,Massachusetts. In return, Abbott is getting partialownership of a clinical pipeline, for whichBiogen Idec <strong>and</strong> New York–based Bristol-MyersSquibb also have substantial claims, plus a setof protein engineering capabilities for optimizingantibody performance (see Table 1). It isnot, however, getting its h<strong>and</strong>s on a portfolio oflucrative antibody humanization patents held byPDL BioPharma, of Fremont, California, whichspun out Redwood City, California–based Facetin December 2008. Whe<strong>the</strong>r Abbott’s investorshave obtained good value for <strong>the</strong>ir moneyremains for now an open question (see Box 1).Although nei<strong>the</strong>r Abbott nor Facet officialswere available for comment, Zenapax, which isabout to start a phase 3 trial in MS, is generallyregarded as <strong>the</strong> main driver for <strong>the</strong> deal. Ahumanized monoclonal antibody that blocksinterleukin-2 (IL-2) signaling by binding to<strong>the</strong> alpha subunit (CD25) of <strong>the</strong> IL-2 receptor(IL-2R), it gained FDA approval for preventingkidney transplant rejection back in 1997.Basel, Switzerl<strong>and</strong>–based Roche sold <strong>the</strong> drugas Zenapax, but withdrew it from <strong>the</strong> marketin 2003 for commercial reasons. Novartis, alsoof Basel, continues to market Simulect (basiliximab),a chimeric antibody directed at <strong>the</strong>same target <strong>and</strong> indication that was approvedin 1998. Daclizumab has also been tested extensivelyin o<strong>the</strong>r indications involving abnormalT-cell responses, including <strong>the</strong> inflammatoryFacet BiotechFacet’s daclizumab, an anti-IL-2 monoclonal antibody, is considered <strong>the</strong> main driver in <strong>the</strong> dealbetween Abbott <strong>and</strong> <strong>the</strong> biotech firm, whose Redwood City, California headquarters are pictured above.nature biotechnology volume 28 number 5 MAY 2010 387


NEWS© 2010 Nature America, Inc. All rights reserved.in briefFDA crackdown onGenzymeGenzyme’s Allston L<strong>and</strong>ing Facility inMassachusetts, one of <strong>the</strong> world’s largest cellculture manufacturing plants, has become<strong>the</strong> focus of an enhanced enforcement actionin what is perhaps a sign of an increasinglytough stance at <strong>the</strong> US Food <strong>and</strong> <strong>Drug</strong>Administration (FDA) on manufacturingst<strong>and</strong>ards. The action, announced in March,has led to a draft consent decree from FDAthat requires Genzyme to pay a $175 million“up-front disgorgement of past profits,”<strong>the</strong> company said. If <strong>the</strong> Allston plantcontinues to miss deadlines for domestic<strong>and</strong> exported products, <strong>the</strong> draft also callsfor a 18.5% disgorgement of revenues fromproducts produced <strong>and</strong> distributed from<strong>the</strong> plant, <strong>and</strong> it could include heavy fines($15,000 per day per violation) if overallcGMP compliance is not met in coming years.The 185,000-square-foot Allston facilityproduces Genzyme’s <strong>the</strong>rapeutic enzymes forrare genetic diseases—products that bringin more than one-third of Genzyme’s $4.5billion in annual revenues. A February 2009warning letter from <strong>the</strong> FDA <strong>and</strong> several ‘483citations’ (formal notices to a manufacturerof a violation) have documented problemsat <strong>the</strong> plant that impact product quality <strong>and</strong>show a lack of written procedures, training,system maintenance <strong>and</strong> environmentaltesting. Genzyme, based in Cambridge,Massachusetts, has responded to <strong>the</strong> latestFDA action by bringing in The QuanticGroup, a Livingston, New Jersey–basedquality consulting firm, <strong>and</strong> moving its fill<strong>and</strong> finish operations to Hospira, a contractservice company in Lake Forest, Illinois.In February, it also hired Scott Canute,formerly of Indianapolis, Indiana–based EliLilly, as president of global manufacturing<strong>and</strong> corporate operations. This followed <strong>the</strong>recruitment in January of Ron Branning—formerly with Gilead Sciences of FosterCity, California—as senior vice president ofglobal product quality. Until two years ago,FDA personnel had regularly inspected <strong>the</strong>Genzyme facility <strong>and</strong> had no complaints.It was only after a <strong>new</strong> inspector began totour <strong>the</strong> facility that things changed. “It waslike night <strong>and</strong> day,” says a person familiarwith <strong>the</strong> situation, who spoke to NatureBiotechnology on condition of anonymity.“Initially, <strong>the</strong> company didn’t know what tothink or how to respond.” Genzyme’s responsetook too long <strong>and</strong> fell short of <strong>the</strong> FDA’sexpectations. The FDA’s move toward greateroversight <strong>and</strong> more stringent adherence toGMP is possibly <strong>the</strong> result of criticisms leviedfollowing <strong>the</strong> heparin contamination debacle(Nat. Biotechnol. 26, 589, 2008) <strong>and</strong> o<strong>the</strong>rfood <strong>and</strong> drug safety problems. In <strong>the</strong> 2010budget, <strong>the</strong> agency received an increase ofmore than a half-billion dollars, up to $3.2billion, with an emphasis on improvingproduct safety.Keith L CarsonBox 1 Weighing up <strong>the</strong> bidsAlthough Biogen Idec may technically be viewed as <strong>the</strong> underbidder on <strong>the</strong> Facet deal,<strong>the</strong> jury is out on whe<strong>the</strong>r its valuation of Facet’s assets was more accurate than that ofAbbott’s. “Time will tell whe<strong>the</strong>r Biogen Idec was offering too little or too much at $17.50[per share],” says Eric Schmidt, biotech analyst at Cowen <strong>and</strong> Company in New York.“Many of us are surprised that Abbott bid so much more than Biogen Idec because <strong>the</strong>y[Biogen] have <strong>the</strong> inside track here,” he says. “If I were an outside observer, I wouldcertainly trust Biogen Idec’s view of this asset because <strong>the</strong>y know this drug better, <strong>and</strong> <strong>the</strong>yknow this market better.” Schmidt dismisses any suggestions that Biogen management wasdiscouraged from bidding any higher because of <strong>the</strong> attentions of investor Carl Icahn, whohas, up until recently, been pushing for a sale of <strong>the</strong> Cambridge, Mass.–based company orits division into two separate firms, focused on neurology <strong>and</strong> oncology, respectively.Instead, Schmidt interprets <strong>the</strong> Biogen’s decision not to raise its bid beyond its final offeras simply an example of management maintaining its financial discipline. “I think it’s kindof refreshing,” he says. Conversely, Bret Holley, biotech analyst at Oppenheimer & Companyin New York, believe Biogen might have been taking ano<strong>the</strong>r approach—trying to pull off anacquisition at a heavily discounted price. “I think Biogen was trying to steal Facet on <strong>the</strong>cheap because of its cash position.”Schmidt is also unconcerned about <strong>the</strong> current safety problems besetting ocrelizumab,a next-generation successor to Rituxan (rituximab), which Biogen Idec is co-developingwith Roche, of Basel, Switzerl<strong>and</strong>. On March 8, <strong>the</strong> two firms announced a clinical hold onphase 3 trials of <strong>the</strong> anti-CD20 antibody in rheumatoid arthritis <strong>and</strong> lupus ery<strong>the</strong>matosus,following <strong>the</strong> observation of serious <strong>and</strong>, in some cases, fatal infections in patients. A phase2 trial in multiple sclerosis is ongoing, however. “No one cares about ocrelizumab,” saysSchmidt. Although <strong>the</strong> drug has <strong>the</strong> potential to extend or replace Biogen Idec’s Rituxanfranchise—which it also shares with Roche—its share of <strong>the</strong> profits would be lower.Termination of ocrelizumab’s development is unlikely to have major negative consequences,<strong>the</strong>refore. “It could [even] be a positive,” says Schmidt.Biogen Idec’s biggest issue lies elsewhere. “The principal concern <strong>and</strong> really <strong>the</strong>principal variable is Tysabri, <strong>and</strong> what <strong>the</strong>y can do in <strong>the</strong> face of mounting PML [progressivemultifocal leukoencephalopathy] cases,” says Holley, adding that he is sceptical of <strong>the</strong>value of <strong>the</strong> viral assay that Biogen Idec <strong>and</strong> its partner Elan of Dublin, are promoting toreduce <strong>the</strong> risk of patients on Tysabri developing PML.CSeye condition uveitis, T-cell leukemia, humanT-cell lymphotropic virus (HTLV)-1 associatedmyelopathy/tropical spastic paraparesis, asthma<strong>and</strong> chronic immune thrombocytopenia. Butits biggest commercial potential lies in MS, saysThomas Waldmann of <strong>the</strong> National CancerInstitute, in Be<strong>the</strong>sda, Maryl<strong>and</strong>. Back in 1981,Waldman produced a murine predecessor toZenapax, anti-Tac, <strong>and</strong> along with his NationalInstitute of Health (NIH; Be<strong>the</strong>sda, Maryl<strong>and</strong>)colleagues has built up a substantial body ofclinical evidence on Zenapax in multiple indications(J. Clin. Immunol. 27, 1–18, 2007).In MS, <strong>the</strong> antibody was initially thought toselectively stop patients’ activated T cells, as <strong>the</strong>yexpress high levels of <strong>the</strong> CD25 receptor subunit.Resting T cells, in contrast, rarely expressCD25. Antibody binding to CD25 prevents <strong>the</strong>subsequent recruitment of <strong>the</strong> beta (CD122)<strong>and</strong> gamma (CD132) subunits of IL-2R, whichare necessary for IL-2–<strong>media</strong>ted signal transduction<strong>and</strong> fur<strong>the</strong>r T-cell activation <strong>and</strong> proliferation.However, one important line of evidence,originally put forward by Waldmann’s NIHcolleague Bibiana Bielekova, suggests that <strong>the</strong>efficacy signals seen in MS patients treated withZenapax are not due to <strong>the</strong> direct suppression ofan abnormal T-cell response (which is generallyconsidered to be <strong>the</strong> main pathological featureof <strong>the</strong> condition). Instead, administration of<strong>the</strong> antibody appears to result in an expansionof immunoregulatory CD56 bright natural killer(NK) cells, which <strong>the</strong>n suppress <strong>the</strong> activatedT-cell population (Proc. Natl. Acad. Sci. USA103, 5941–5946, 2006). The precise details ofhow CD25 inhibition stimulates CD56 bright NKcell growth, however, is not clear. “The issue ofhow daclizumab works is a continuing story,”Waldmann says.Market expectations surrounding <strong>the</strong> drugappear modest, notwithst<strong>and</strong>ing recentlyreported efficacy data from a phase 2 trial inwhich <strong>the</strong> drug was administered in combinationwith interferon-beta (interferon-β; LancetNeurol. 9, 381–390, 2010). Patients given highdoseZenapax plus interferon-β developed 72%fewer <strong>new</strong> lesions than those on interferonalone. “It’s fairly easy to get good efficacy datain autoimmune disease,” says Eric Schmidt, biotechanalyst at Cowen <strong>and</strong> Company in New388 volume 28 number 5 MAY 2010 nature biotechnology


<strong>new</strong>s© 2010 Nature America, Inc. All rights reserved.Table 1 Facet Biotech pipeline<strong>Drug</strong> Description Indication Status PartnerDaclizumab Humanized monoclonal antibody that MS Phase 2 Biogen Idecbinds alpha subunit of IL-2 receptorVolociximab Chimeric monoclonal antibody that Solid tumors Phase 2 Biogen Idecbinds α 5 β 1 integrinElotuzumab Humanized monoclonal antibody that Multiple myeloma Phase 1 Bristol-Myers Squibbbinds CS1 glycoprotein(BMS)PDL192 Humanized monoclonal antibody Oncology Phase 1 –that binds TweakR (tumor necrosisfactor–like weak inducer of apoptosisreceptor)PDL241 Humanized monoclonal antibody that Multiple myeloma Preclinical BMS abinds CS1 glycoproteina BMS retains an option on this program.York. “The real hurdle to drug discovery in MShas been good efficacy coupled with a goodsafety profile.”“There’s very little doubt, based on its use ino<strong>the</strong>r indications, that it will have side effects,”says Bret Holley, biotech analyst at Oppenheimer& Co. in New York. “It’s very tough to see how itdifferentiates against o<strong>the</strong>r MS <strong>the</strong>rapies that areon <strong>the</strong> market or in <strong>the</strong> pipeline.” As clinical dataare limited, <strong>the</strong> real test will be its effect on relapserate <strong>and</strong> its long-term safety profile in a large population.But so far, Holley says, Zenapax appearsto offer efficacy inter<strong>media</strong>te between that of <strong>the</strong>older, so-called ABCR drugs (Avonex, Betaseron,Copaxone <strong>and</strong> Rebif) <strong>and</strong> <strong>new</strong>er, more potent<strong>the</strong>rapies, such as Tysabri (natalizumab), Gilenia(fingolimod/FTY720), which <strong>the</strong> FDA has underpriority review, <strong>and</strong> cladribine, to which <strong>the</strong> FDAgave an initial rebuff last year.Given <strong>the</strong> drug’s relatively mild immuno-in <strong>the</strong>ir wordsIllumina reachesHollywoodPhoto by Adam Scull, PHOTOlinkSan Diego–based Illuminahas sequenced <strong>the</strong>genome of actress GlennClose, whose family has ahistory of mental illness.She took advantage of <strong>the</strong>$48,000 service in <strong>the</strong>hope that it would helpdestigmatize <strong>the</strong> disease<strong>and</strong> aid efforts to find acure for <strong>the</strong>se ailments.Close’s husb<strong>and</strong> is abiotech entrepreneur.“The environment to launch <strong>new</strong> product…is goingto be tougher, <strong>the</strong> pricing is going to be tougher,<strong>the</strong> probability (of drug approvals) is probably goingto be more challenging.” Biogen Idec’s James C.Mullen, who is leaving <strong>the</strong> firm in June, paints aless-than-rosy future for biotechs after healthcarereform. (The Boston Globe, 31 March 2010)suppressive effects, it could find a niche as partof a combination <strong>the</strong>rapy. “Daclizumab hasbeen used with lots of o<strong>the</strong>r immunosuppressiveagents, so it might be of value <strong>the</strong>re,” saysWaldmann. The recent phase 2 study did notdefinitively show that <strong>the</strong> combination wasresponsible for <strong>the</strong> benefit seen, as <strong>the</strong> trial didnot include a Zenapax-only arm. Moreover,some patients who developed neutralizing antibodiesagainst interferon-β <strong>the</strong>rapy still derivedbenefit. “There’s really been essentially no largetrial that has shown that combination <strong>the</strong>rapywas better than each of <strong>the</strong> components individually,”says Jeffrey Cohen, of <strong>the</strong> Clevel<strong>and</strong>Clinic, in Clevel<strong>and</strong>. Even so, Cohen also predictsthat drug could have a future—even if it’s amodest one. “We still need additional <strong>the</strong>rapeuticoptions in MS,” he says. “Almost any additionaloption in our repertoire is good.”Cormac Sheridan Dublin“Right now your family history may be your bestbet <strong>and</strong> it doesn’t cost anything,” Francis Collins,director of <strong>the</strong> US National Institutes of Health <strong>and</strong>leader of <strong>the</strong> Human Genome Project, downplays<strong>the</strong> impact of gene-based tests such as thoseoffered by Navigenics, 23<strong>and</strong>Me <strong>and</strong> DecodeMe.(Reuters, 31 March 2010)“I personally believe that Becky McClain is really<strong>the</strong> canary in <strong>the</strong> coal mine.” Jeremy Gruber,from <strong>the</strong> Council of Responsible Genetics, on <strong>the</strong>recent $1.4 million in compensation awarded to aformer Pfizer scientist who claimed a geneticallyengineered virus had caused her paralyzingillness, stresses that safety regulations have notkept up with <strong>the</strong> pace of research. (New York Times,2 April 2010)“Merck is now a bigger beast to feed.” Merck’sMargaret Beer urges biotechs ga<strong>the</strong>red at a recentconference in London to approach <strong>the</strong> <strong>new</strong>lyexp<strong>and</strong>ed company, as it is still actively searchingfor opportunities. (PharmaTimes, 29 March 2010)in briefAriad’s NF-κB blowThe US Court of Appeals for <strong>the</strong> Federal Curcuit inMarch ruled for Eli Lilly in Indianapolis, Indiana,<strong>and</strong> against Ariad Pharmaceuticals, affirming anearlier decision by a three-judge panel <strong>and</strong> dealinga possible death blow to Ariad’s broad claimson <strong>the</strong> nuclear factor κB (NF-κB) pathway (Nat.Biotechnol. 27, 494, 2009). A 2006 jury rulingthat Lilly’s Evista (raloxifene) <strong>and</strong> Xigris (activatedprotein C) infringed Cambridge, Massachusetts–based Ariad’s NF-κB patent alarmed much of<strong>the</strong> drug development world, stoking fears thatbroad patent claims on biological pathways wouldstifle drug development. The March opinionagain invalidated Ariad’s claims <strong>and</strong> affirmedthat patents must meet a written descriptionrequirement separate from an enablementrequirement—an issue that has divided <strong>the</strong>appeals court since a 1997 ruling establishedwritten description, dubbed <strong>the</strong> Lilly doctrine(Nat. Biotechnol. 16, 87, 1998). Ariad isconsidering petitioning for Supreme Court review.But <strong>the</strong> Supreme Court has “bigger fish to frywith patentable subject matter right now,” saysUniversity of Michigan law professor RebeccaEisenberg, alluding to Association for MolecularPathology v. US Patent <strong>and</strong> Trademark Office(<strong>the</strong> Myriad Genetics gene patenting case,seemingly destined for Supreme Court review),<strong>and</strong> Prome<strong>the</strong>us v. Mayo, ano<strong>the</strong>r dispute over<strong>the</strong> patentability of ‘natural processes’. Ariad alsolost an NF-κB patent infringement case againstAmgen, of Thous<strong>and</strong> Oaks, California, <strong>and</strong> <strong>the</strong> USPatent <strong>and</strong> Trademark Office invalidated most ofAriad’s patent claims in a separate review (Ariadhas appealed), suggesting <strong>the</strong> NF-κB patent haslittle life left.Ken GarberOrphan drug workshopsIn an effort to increase <strong>the</strong> number of drugsavailable to treat rare diseases <strong>and</strong> to help make<strong>the</strong> US Food <strong>and</strong> <strong>Drug</strong> Administration (FDA)more approachable, <strong>the</strong> FDA is hosting a series ofworkshops to encourage regulatory submissionsfor orphan drug designation for drugs aimedat treating rare diseases. The agency’s Officeof Orphan Products Development (OOPD) isholding <strong>the</strong>se events to help academics, biotechcompanies <strong>and</strong> those unfamiliar with <strong>the</strong> processcomplete <strong>the</strong> best application possible. The firstworkshop, held in February at <strong>the</strong> Claremont,California–based Keck Graduate Institute,resulted in 14 submissions from <strong>the</strong> 29 potentialsponsors who attended. Timothy Coté, directorof <strong>the</strong> OOPD, explains that <strong>the</strong> workshops are“a way to demystify <strong>the</strong> process,” which issometimes deemed to be daunting. “Sponsorsapproach <strong>the</strong> FDA with considerable fear <strong>and</strong>loathing. And that's not a good thing,” he says.Though an orphan drug status does not ensurea drug will be approved for sale, <strong>the</strong> designationtypically helps attract investor interest <strong>and</strong>provides o<strong>the</strong>r benefits, such as seven years ofmarket exclusivity <strong>and</strong> tax credits. Coté hopesthat <strong>the</strong>se workshops will be <strong>the</strong> “beginning of amore c<strong>and</strong>id relationship” between <strong>the</strong> FDA <strong>and</strong>potential sponsors <strong>and</strong> that <strong>the</strong>y will increase<strong>the</strong> chances of rare-disease <strong>the</strong>rapies reaching<strong>the</strong> clinic.Kirsten Doransnature biotechnology volume 28 number 5 MAY 2010 389


NEWS© 2010 Nature America, Inc. All rights reserved.in briefFDA crackdown onGenzymeGenzyme’s Allston L<strong>and</strong>ing Facility inMassachusetts, one of <strong>the</strong> world’s largest cellculture manufacturing plants, has become<strong>the</strong> focus of an enhanced enforcement actionin what is perhaps a sign of an increasinglytough stance at <strong>the</strong> US Food <strong>and</strong> <strong>Drug</strong>Administration (FDA) on manufacturingst<strong>and</strong>ards. The action, announced in March,has led to a draft consent decree from FDAthat requires Genzyme to pay a $175 million“up-front disgorgement of past profits,”<strong>the</strong> company said. If <strong>the</strong> Allston plantcontinues to miss deadlines for domestic<strong>and</strong> exported products, <strong>the</strong> draft also callsfor a 18.5% disgorgement of revenues fromproducts produced <strong>and</strong> distributed from<strong>the</strong> plant, <strong>and</strong> it could include heavy fines($15,000 per day per violation) if overallcGMP compliance is not met in coming years.The 185,000-square-foot Allston facilityproduces Genzyme’s <strong>the</strong>rapeutic enzymes forrare genetic diseases—products that bringin more than one-third of Genzyme’s $4.5billion in annual revenues. A February 2009warning letter from <strong>the</strong> FDA <strong>and</strong> several ‘483citations’ (formal notices to a manufacturerof a violation) have documented problemsat <strong>the</strong> plant that impact product quality <strong>and</strong>show a lack of written procedures, training,system maintenance <strong>and</strong> environmentaltesting. Genzyme, based in Cambridge,Massachusetts, has responded to <strong>the</strong> latestFDA action by bringing in The QuanticGroup, a Livingston, New Jersey–basedquality consulting firm, <strong>and</strong> moving its fill<strong>and</strong> finish operations to Hospira, a contractservice company in Lake Forest, Illinois.In February, it also hired Scott Canute,formerly of Indianapolis, Indiana–based EliLilly, as president of global manufacturing<strong>and</strong> corporate operations. This followed <strong>the</strong>recruitment in January of Ron Branning—formerly with Gilead Sciences of FosterCity, California—as senior vice president ofglobal product quality. Until two years ago,FDA personnel had regularly inspected <strong>the</strong>Genzyme facility <strong>and</strong> had no complaints.It was only after a <strong>new</strong> inspector began totour <strong>the</strong> facility that things changed. “It waslike night <strong>and</strong> day,” says a person familiarwith <strong>the</strong> situation, who spoke to NatureBiotechnology on condition of anonymity.“Initially, <strong>the</strong> company didn’t know what tothink or how to respond.” Genzyme’s responsetook too long <strong>and</strong> fell short of <strong>the</strong> FDA’sexpectations. The FDA’s move toward greateroversight <strong>and</strong> more stringent adherence toGMP is possibly <strong>the</strong> result of criticisms leviedfollowing <strong>the</strong> heparin contamination debacle(Nat. Biotechnol. 26, 589, 2008) <strong>and</strong> o<strong>the</strong>rfood <strong>and</strong> drug safety problems. In <strong>the</strong> 2010budget, <strong>the</strong> agency received an increase ofmore than a half-billion dollars, up to $3.2billion, with an emphasis on improvingproduct safety.Keith L CarsonBox 1 Weighing up <strong>the</strong> bidsAlthough Biogen Idec may technically be viewed as <strong>the</strong> underbidder on <strong>the</strong> Facet deal,<strong>the</strong> jury is out on whe<strong>the</strong>r its valuation of Facet’s assets was more accurate than that ofAbbott’s. “Time will tell whe<strong>the</strong>r Biogen Idec was offering too little or too much at $17.50[per share],” says Eric Schmidt, biotech analyst at Cowen <strong>and</strong> Company in New York.“Many of us are surprised that Abbott bid so much more than Biogen Idec because <strong>the</strong>y[Biogen] have <strong>the</strong> inside track here,” he says. “If I were an outside observer, I wouldcertainly trust Biogen Idec’s view of this asset because <strong>the</strong>y know this drug better, <strong>and</strong> <strong>the</strong>yknow this market better.” Schmidt dismisses any suggestions that Biogen management wasdiscouraged from bidding any higher because of <strong>the</strong> attentions of investor Carl Icahn, whohas, up until recently, been pushing for a sale of <strong>the</strong> Cambridge, Mass.–based company orits division into two separate firms, focused on neurology <strong>and</strong> oncology, respectively.Instead, Schmidt interprets <strong>the</strong> Biogen’s decision not to raise its bid beyond its final offeras simply an example of management maintaining its financial discipline. “I think it’s kindof refreshing,” he says. Conversely, Bret Holley, biotech analyst at Oppenheimer & Companyin New York, believe Biogen might have been taking ano<strong>the</strong>r approach—trying to pull off anacquisition at a heavily discounted price. “I think Biogen was trying to steal Facet on <strong>the</strong>cheap because of its cash position.”Schmidt is also unconcerned about <strong>the</strong> current safety problems besetting ocrelizumab,a next-generation successor to Rituxan (rituximab), which Biogen Idec is co-developingwith Roche, of Basel, Switzerl<strong>and</strong>. On March 8, <strong>the</strong> two firms announced a clinical hold onphase 3 trials of <strong>the</strong> anti-CD20 antibody in rheumatoid arthritis <strong>and</strong> lupus ery<strong>the</strong>matosus,following <strong>the</strong> observation of serious <strong>and</strong>, in some cases, fatal infections in patients. A phase2 trial in multiple sclerosis is ongoing, however. “No one cares about ocrelizumab,” saysSchmidt. Although <strong>the</strong> drug has <strong>the</strong> potential to extend or replace Biogen Idec’s Rituxanfranchise—which it also shares with Roche—its share of <strong>the</strong> profits would be lower.Termination of ocrelizumab’s development is unlikely to have major negative consequences,<strong>the</strong>refore. “It could [even] be a positive,” says Schmidt.Biogen Idec’s biggest issue lies elsewhere. “The principal concern <strong>and</strong> really <strong>the</strong>principal variable is Tysabri, <strong>and</strong> what <strong>the</strong>y can do in <strong>the</strong> face of mounting PML [progressivemultifocal leukoencephalopathy] cases,” says Holley, adding that he is sceptical of <strong>the</strong>value of <strong>the</strong> viral assay that Biogen Idec <strong>and</strong> its partner Elan of Dublin, are promoting toreduce <strong>the</strong> risk of patients on Tysabri developing PML.CSeye condition uveitis, T-cell leukemia, humanT-cell lymphotropic virus (HTLV)-1 associatedmyelopathy/tropical spastic paraparesis, asthma<strong>and</strong> chronic immune thrombocytopenia. Butits biggest commercial potential lies in MS, saysThomas Waldmann of <strong>the</strong> National CancerInstitute, in Be<strong>the</strong>sda, Maryl<strong>and</strong>. Back in 1981,Waldman produced a murine predecessor toZenapax, anti-Tac, <strong>and</strong> along with his NationalInstitute of Health (NIH; Be<strong>the</strong>sda, Maryl<strong>and</strong>)colleagues has built up a substantial body ofclinical evidence on Zenapax in multiple indications(J. Clin. Immunol. 27, 1–18, 2007).In MS, <strong>the</strong> antibody was initially thought toselectively stop patients’ activated T cells, as <strong>the</strong>yexpress high levels of <strong>the</strong> CD25 receptor subunit.Resting T cells, in contrast, rarely expressCD25. Antibody binding to CD25 prevents <strong>the</strong>subsequent recruitment of <strong>the</strong> beta (CD122)<strong>and</strong> gamma (CD132) subunits of IL-2R, whichare necessary for IL-2–<strong>media</strong>ted signal transduction<strong>and</strong> fur<strong>the</strong>r T-cell activation <strong>and</strong> proliferation.However, one important line of evidence,originally put forward by Waldmann’s NIHcolleague Bibiana Bielekova, suggests that <strong>the</strong>efficacy signals seen in MS patients treated withZenapax are not due to <strong>the</strong> direct suppression ofan abnormal T-cell response (which is generallyconsidered to be <strong>the</strong> main pathological featureof <strong>the</strong> condition). Instead, administration of<strong>the</strong> antibody appears to result in an expansionof immunoregulatory CD56 bright natural killer(NK) cells, which <strong>the</strong>n suppress <strong>the</strong> activatedT-cell population (Proc. Natl. Acad. Sci. USA103, 5941–5946, 2006). The precise details ofhow CD25 inhibition stimulates CD56 bright NKcell growth, however, is not clear. “The issue ofhow daclizumab works is a continuing story,”Waldmann says.Market expectations surrounding <strong>the</strong> drugappear modest, notwithst<strong>and</strong>ing recentlyreported efficacy data from a phase 2 trial inwhich <strong>the</strong> drug was administered in combinationwith interferon-beta (interferon-β; LancetNeurol. 9, 381–390, 2010). Patients given highdoseZenapax plus interferon-β developed 72%fewer <strong>new</strong> lesions than those on interferonalone. “It’s fairly easy to get good efficacy datain autoimmune disease,” says Eric Schmidt, biotechanalyst at Cowen <strong>and</strong> Company in New388 volume 28 number 5 MAY 2010 nature biotechnology


© 2010 Nature America, Inc. All rights reserved.NEWSin briefTexas splurges on cancerTexas doled out <strong>the</strong> first round of grantsfrom a $3 billion publicly funded programto boost in-state cancer research. Almost allof <strong>the</strong> initial $61 million went to in-stateacademic institutions like University ofTexas, Rice University <strong>and</strong> Baylor College ofMedicine. Two private companies have alsoreceived money—InGeneron, a developer ofcell separation <strong>and</strong> diagnostics tools basedin Houston, <strong>and</strong> Visualase, a designer ofprecision lasers used to ablate brain tumors,also based in Houston. In order to boost <strong>the</strong>state’s private sector, <strong>the</strong> fund’s managingbody, <strong>the</strong> Cancer Prevention & ResearchInstitute of Texas (CPRIT), closed a parallelround of applications in March exclusivelyfor companies. CPRIT hopes <strong>the</strong> money willfoster a fledgling biotech industry, attracttop researchers <strong>and</strong> lure <strong>new</strong> business toTexas. To show its commitment, <strong>the</strong> CPRITstates that it will pay half an institutionalendowment with “no limit” to draw a seniorscientist. The program’s chief scientificofficer, Alfred Gilman, hopes <strong>the</strong> grantingprocess will make <strong>the</strong> state more attractive toventure capitalists. The vetting from CPRIT’sreview council, made up of directors from<strong>the</strong> nation’s top cancer research centers,“should be a big vote of confidence” forpotential investors, he says. CPRIT funded66 out of 881 applications in its first round.Of <strong>the</strong> grants, two-thirds had translationalcomponents, many in genetics, epigenetics<strong>and</strong> imaging. “We need to find youngentrepreneurial CEOs who are willing to goanywhere to chase good, promising science,”Gilman says.Daniel Grushkinin <strong>the</strong>ir words“After spending$1.4 billion ofshareholders’ money,maybe it’s best forCell Therapeuticsto return what’s leftto shareholders <strong>and</strong>call it a day.” DavidMiller, of SeattlebasedBiotech StockResearch, commentson <strong>the</strong> company’s failure to persuade <strong>the</strong> USFood <strong>and</strong> <strong>Drug</strong> Administration to approve itslymphoma drug, pixantrone, despite burningthrough a fair amount of investors’ cash.(Xconomy, 23 March 2010)“We’ve been fighting this war on cancer sinceNixon’s time, but we’ve only had <strong>the</strong> humangenome for about a decade.” Victor Velculescu,co-director of cancer biology at Johns HopkinsKimmel Cancer Center, responds to criticalcomments that too many people are still dyingof <strong>the</strong> disease. (Bloomberg, 16 March 2010)Chinese green light for GM rice <strong>and</strong> maizeprompts outcryBiosafety certificates for genetically modified(GM) rice <strong>and</strong> maize issued by <strong>the</strong> ChineseMinistry of Agriculture late last year haveprompted a protest from over a hundredintellectuals <strong>and</strong> prominent public officials.This represents one of <strong>the</strong> most high-profilechallenges to China’s aggressive policy for <strong>the</strong>adoption of transgenic crops. Even so, proponentsof <strong>the</strong> technology say that opposition islikely nei<strong>the</strong>r to block <strong>the</strong> path to commercializationof GM rice nor to stall development ofan approach that Chinese government officialshave long recognized as a key to addressing<strong>the</strong> country’s growing dem<strong>and</strong> for food.In early March, 120 Chinese scholars—mostly in <strong>the</strong> areas of humanity <strong>and</strong> socialscience—signed a public petition asking<strong>the</strong> Ministry of Agriculture to withdraw <strong>the</strong>two safety licenses issued last November.The petition, presented during <strong>the</strong> annualChina’s homegrown GM rice could soon reach local markets, but criticsare voicing strong concerns over <strong>the</strong> nation’s staple crop.plenary meeting of China’s legislature, <strong>the</strong>National People’s Congress, was reinforcedby a motion from <strong>the</strong> Zhigong Party, chairedby China’s Science Minister Wan Gang. Themotion, introduced to <strong>the</strong> Chinese People’sPolitical Consultative Conference, China’sUpper House, urges a cautious approach toGM crop development.Over <strong>the</strong> past two decades, China has maintaineda positive attitude to <strong>the</strong> development ofGM organisms. Just two years ago, <strong>the</strong> countryinvested a colossal $3.5 billion in its GM seedprogram, with <strong>the</strong> intention of becoming aleading international player capable of creatingits own GM crops to ensure security of <strong>the</strong>food supply. Thus far, several locally developedGM crops, including sweet pepper, papaya <strong>and</strong>poplar, have been approved <strong>and</strong> are currentlysold in <strong>the</strong> country. Bacillus thuringiensis toxin(Bt)-producing cotton is also cultivated widelyin China, <strong>and</strong> <strong>the</strong> country’sown transgenic varieties ofrice <strong>and</strong> maize are likely tofollow within several years(Nat. Biotechnol. 28, 8,2010). The safety licensesthat triggered <strong>the</strong> recentoutcry were issued for twopest-resistant Bt rice varieties(Table 1) developedby Qifa Zhang of WuhanbasedCentral ChinaAgricultural Universityof Huazhong AgriculturalUniversity, <strong>and</strong> a maizeexpressing phytase developedby Yun-Liu Fan of<strong>the</strong> Beijing-based ChineseAcademy of AgriculturalSciences (CAAS) tha<strong>the</strong>lps livestock digestphosphorus in animal feed(<strong>and</strong> that also potentiallyreduces pollution fromanimal waste).Such biosafety certificatesprovide authorizationto commence fieldtesting of a <strong>new</strong> variety;commercial release of acrop can take ano<strong>the</strong>r fivePETER PARKS/AFP/Getty Imagesyears or more of field trials.In <strong>the</strong> case of Bt rice<strong>and</strong> phytase maize, <strong>the</strong>certificates are valid fromAugust 2009 to August390 volume 28 number 5 MAY 2010 nature biotechnology


<strong>new</strong>sTable 1 Chinese GM rice varieties currently approved for field testing or under developmentTrait Variety StagePest resistance Bt Shanyou 63 <strong>and</strong> Huahui No. 1 Biosafety certificate issued209, Zhuanghui 11, Xiushui 11, Minghui 81, Minghui 63, IR 72, Zhongguo 91, GM Minghui 63, Minghui 86, In developmentD297B, Zhuxian B, Minghui 63, Zhenshan 97A <strong>and</strong> Maxie AHerbicide resistance Jingyin 119, 87203, Eyi 105, Xiushui 11, Qiufeng, Youfeng <strong>and</strong> HanfengIn development© 2010 Nature America, Inc. All rights reserved.2014, during which time <strong>the</strong> crops will beplanted on farml<strong>and</strong> in central China’sHubei <strong>and</strong> Sh<strong>and</strong>ong provinces, respectively.Kongming Wu, a biosafety scientist at CAAS,who is a member of <strong>the</strong> National BiosafetyCommittee of GM Food <strong>and</strong> advises <strong>the</strong>government on <strong>the</strong> issuing of biosafety certificates,says, “The procedures with which weapprove <strong>the</strong> GM rice <strong>and</strong> maize are <strong>the</strong> same asthose adopted in most developed countries.”The evaluation includes environmental <strong>and</strong>food-safety testing as well as toxicology assessmentundertaken by independent researchinstitutes. Some involved in <strong>the</strong> productionof GM varieties are thus somewhat puzzled by<strong>the</strong> recent outcry. “There is no established scientificevidence to prove any potential harmof GM crops to health <strong>and</strong> to environment,”says Dafang Huang, former director of <strong>the</strong>Institute of Biotechnology under CAAS.But opponents of GM technology refuseto accept such reassurances. What’s more,<strong>the</strong>re appears to be confusion about <strong>the</strong> significanceof <strong>the</strong> biosafety certificates. Criticsare failing to distinguish between <strong>the</strong> greenlight for field-testing, <strong>and</strong> <strong>the</strong> go-ahead tocommercialize. Thus, <strong>the</strong> petition states “<strong>the</strong>approval for <strong>the</strong> commercialization of GMrice <strong>and</strong> maize enables China to become <strong>the</strong>world’s first country to plant a GM staplefood, threatening <strong>the</strong> national safety.” But <strong>the</strong>certificates issued so far are only for field trialsassessing safety; fur<strong>the</strong>r studies would beneeded before commercial release would beconsidered (<strong>and</strong>, in any case, China would notbe <strong>the</strong> first country to plant a GM staple giventhat <strong>the</strong> US has been planting Bt maize for <strong>the</strong>past 15 years).Apart from <strong>the</strong> precautionary concerns over<strong>the</strong> impact of GM varieties on human <strong>and</strong>environmental health, opponents argue thattransgenic rice <strong>and</strong> maize represent a threatto small-holding farmers in China. “In <strong>the</strong>cases of commercialized GM crops, most of<strong>the</strong> benefits go to big GM seeding companies,such as Monsanto, <strong>and</strong> farmers remain losersbecause <strong>the</strong>y have no o<strong>the</strong>r choices <strong>and</strong> <strong>the</strong>ycannot obtain conventional non-GM seeds,”says Lifeng Fang, an anti-GM campaigner ofBeijing-based Greenpeace China.But studies assessing <strong>the</strong> benefits, especiallyincreased yields, associated with commercializedvarieties of Bt cotton <strong>and</strong> Bt maizein developing countries have overwhelmingdemonstrated benefits for small farmers (Nat.Biotechnol. 28, 319–321, 2010). And accordingto <strong>the</strong> annual report of <strong>the</strong> InternationalService for <strong>the</strong> Acquisition of Agri-biotechApplications (ISAAA; New York), Bt rice has<strong>the</strong> potential to create an estimated benefitof $4 billion per year for up to 440 millionrice farmers in China; similarly, maize engineeredto express phytase could enable savingson livestock feed <strong>and</strong> reduce pollution fromundigested phosphorous.Evidence on <strong>the</strong> ground also indicatesthat Chinese farmers are receptive to GMtechnology. Since its approval in 1997, Btcotton has been adopted to <strong>the</strong> extent thatby 2009, 68% of <strong>the</strong> total cotton planted inChina was transgenic. And even though thisrepresented a slight reduction in <strong>the</strong> areaof transgenic cultivation over <strong>the</strong> previousyear—to 3.7 million hectares compared with3.8 million hectares in 2008—Ruifa Hu, asenior researcher at Beijing-based Centrefor Chinese Agricultural Policies (CCAP),<strong>the</strong> Chinese Academy of Sciences, thinks thisreflects recent economic <strong>and</strong> environmentalconditions ra<strong>the</strong>r than a cooling receptionfor GM technology. “It is mainly a result oflower prices for cotton that have reduced <strong>the</strong>total planting area of <strong>the</strong> crop,” he says. Inaddition, <strong>the</strong> cotton borer worm population,which is targeted by Bt varieties, has droppedsignificantly in recent years, <strong>and</strong> farmers mayhave opted to save money last year by plantingconventional non-GM varieties. “The normalmarket <strong>and</strong> fluctuation in cultivation area willnot impact <strong>the</strong> future commercialization ofGM rice,” Hu believes.Additional concerns for GM varieties inChina relate to admixture <strong>and</strong> outcrossingwith conventional crops <strong>and</strong> to <strong>the</strong> perniciousstranglehold of Western multinationals likeMonsanto <strong>and</strong> Basel-based Syngenta on intellectualproperty rights (IPR) covering transgenictechnology. In terms of outcrossing,opponents are particularly concerned about<strong>the</strong> possibility that transgenic crops currentlyunauthorized for mass planting could transfertraits to conventional crops cultivatedon farms or admix with <strong>the</strong>m. Greenpeacereported in late March that <strong>the</strong> Bt proteinhad been detected in rice sold in Changsha,in sou<strong>the</strong>rn China, through what is suspectedto have been a release from <strong>the</strong> Central ChinaUniversity of Agriculture. Since 2005, similarreports have been repeatedly made by<strong>the</strong> environmental group Greenpeace. In <strong>the</strong>European market, rice imported from Chinahas also been found to contain Bt ingredients(http://www.nature.com/<strong>new</strong>s/2006/060904/full/<strong>new</strong>s060904-5.html). Zhang admits that<strong>the</strong> unintentional flow of GM rice is possible.“In 1999, when <strong>the</strong>re was no strict biosafetyregulation <strong>and</strong> we had poor IPR awareness,some of our GM rice seed samples may havebeen stolen at a national scientific achievementshow. It is possible that illegal plantationsof GM rice could have resulted,” Zhangtold Nature Biotechnology.Opponents say that <strong>the</strong> cultivation of unauthorizedvarieties of Bt rice is a sign of laxoversight, an indication that GM rice cannotbe properly monitored <strong>and</strong> controlled oncecommercialized. “It could pollute nearbynon-GM crops by outcrossing,” says DayuanXue, a biodiversity professor at <strong>the</strong> CentralUniversity of Nationalities in Beijing.Protests against <strong>the</strong> lack of transparencyin <strong>the</strong> decision process are flooding <strong>the</strong>Chinese <strong>media</strong>. For instance, <strong>the</strong> Ministry ofAgriculture has admitted that <strong>the</strong> biosafetycertificates for GM rice <strong>and</strong> corn had actuallybeen issued a year before <strong>the</strong>ir formalannouncement last November. The neutrality<strong>and</strong> credibility of scientists involved in <strong>the</strong>development of GM crops is also under scrutiny.Some are even being accused of pursuing<strong>the</strong>ir own financial interests, an allegationthat Zhang disputes: “You cannot say doingresearch projects is for self-interest, as we cannotprofit from commercialization because<strong>the</strong> IPR belongs to <strong>the</strong> state.”Despite increasing resistance to cultivationof GM crops in China, Huang of CAASreveals that Chinese policymakers are likelyto continue <strong>the</strong> push toward commercializationof GM rice. “Under pressure, <strong>the</strong>re couldbe some pauses, but science should play itsrole,” says Huang. The ripples from China’sdecisions are likely to be felt internationally.“We Asian nations are closely watching China.What China does [in GM crop commercialization],o<strong>the</strong>r nations will follow,” saysBhagirath Choudhary, Delhi-based ISAAAIndian national coordinator.Hepeng Jia Beijingnature biotechnology volume 28 number 5 MAY 2010 391


data pageAbove water in Q1Walter Yang© 2010 Nature America, Inc. All rights reserved.Biotech stocks remain buoyant, <strong>and</strong> although funding dipped comparedwith <strong>the</strong> preceding two quarters, 1Q09 remained above <strong>the</strong> dire levelsseen last winter. Excluding US partnership monies, <strong>the</strong> industry pulledStock market performanceThe biotech indices were up >11%, whereas <strong>the</strong> Dow, S&P 500 <strong>and</strong>NASDAQ were up by only 4-6%.Index1,6001,5001,4001,3001,2001,1001,00090080070060050012/081/09BioCentury100 Dow Jones S&P 500 NASDAQ NASDAQ Biotech Swiss Market2/093/094/095/096/097/098/09MonthGlobal biotech initial public offerings9/0910/0911/0912/091/102/103/10Six IPOs trickled in last quarter, raising a total of $391.7 million.Amount raised ($ millions)7006005004003002001000Asia-PacificEuropeNorth America$0.0$0.0$0.01Q09$0.0$0.0$0.02Q09$14.8$7.4$6353Q09$50.4$151$704Q09$31$0.0$3611Q10Financial quarter1Q09 2Q09 3Q09 4Q09 1Q10Americas 0 0 2 2 4Europe 0 0 1 2 0Asia-Pacific 0 0 1 2 2Table indicates number of IPOs. Source: BCIQ: BioCentury Online Intelligencein $5.5 billion last quarter, down 31% from 4Q09. Six initial public offerings(IPOs) were completed, bringing funding for public floats up 45%to $391.7 million.Global biotech industry financingThe industry raised $11.6 billion in 1Q10, 49% less than in 4Q09; onlyIPOs were up.Financial quarter1Q104Q093Q092Q091Q09Partnership Debt <strong>and</strong> o<strong>the</strong>r financing Venture capital Follow-on financing PIPEs IPOs6.1, 2.1, 1.3, 1.3, 0.5, 0.44.8, 2.4, 1.3, 0.5, 0.4, 0.08.0, 2.6, 1.1, 0.8, 0.7, 0.09.4, 2.3, 1.2, 2.4, 0.6, 0.714.8, 3.1, 1.6,2.3, 0.7, 0.30 5 10 15 20 25Amount raised ($ billions)Partnership figures are for deals involving a US company. Source: BCIQ: BioCentury OnlineIntelligence, Burrill & Co.Global biotech venture capital investmentPrivate biotechs raised $1.3 billion in 1Q10, about <strong>the</strong> same as a yearago, but down 18% from 4Q09.Amount raised ($ millions)1,8001,6001,4001,2001,0008006004002000Asia-Pacific$38$323$962 $0$175$9321Q09Europe2Q09North AmericaSource: BCIQ: BioCentury Online Intelligence$6$104$1,0763Q09Financial quarter$9$512$1,0624Q09$24$326$9471Q10Notable Q1 dealsIPOsCompany (lead underwriters)Amountraised($ millions)Changein stockprice sinceofferDatecompletedIronwood (JPMorgan, Morgan Stanley, Credit Suisse) $215.6 21% 2-FebAveo (JP Morgan, Morgan Stanley) $89.7 0% 11-MarAn<strong>the</strong>ra (Deutsche Bank, Piper Jaffray) $42.0 0% 1-MarCellSeed (Nomura) $24.8 –16% 16-MarCorMedix (Maxim) $13.5 0% 25-MarCBio $6.2 –68% 8-FebVenture capitalCompany (lead investors)Amountraised($ millions)RoundnumberDateclosedArchimedes (Novo Growth) $98.5 NA 2-MarNGM Bio (Tichenor, Column Group) $51.0 2 15-MarAlnara (MPM) $35.0 2 28-JanEleven Bio (Flagship, Third Rock) $35.0 1 17-FebGenetix (Third Rock) $35.0 2 12-MarMerus (Novartis, Pfizer, Bay City, Life SciencesPartners)$30.7 2 29-JanSource: BCIQ: BioCentury Online IntelligenceMergers <strong>and</strong> acquisitionsTargetAcquirerValue($ millions) Date announcedMillipore Merck KGaA $5,600 28-FebFacet Biotech Abbott $722 9-MarCeption Cephalon $250 23-FebLicensing/collaborationResearcherIsisInvestorGlaxoSmithKlineValue($ millions) Deal description$1,500 Discover <strong>and</strong> develop RNA-targeted <strong>the</strong>rapeuticsfor rare <strong>and</strong> infectious diseasesRigel AstraZeneca $1,300 Worldwide rights to develop <strong>and</strong> commercializefostamatinib (R788)Transgene Novartis $963 Option to obtain an exclusive, worldwide licenseto develop <strong>and</strong> commercialize TG4010 for cancerGalapagos Roche $589 Discover <strong>and</strong> develop treatments for chronicobstructive pulmonary disease (COPD)Basilea Astellas $514 Co-develop broad-spectrum, azole antifungalsavuconazole worldwide, excluding JapanWalter Yang is Research Director at BioCentury392 volume 28 number 5 MAY 2010 nature biotechnology


<strong>new</strong>s feature© 2010 Nature America, Inc. All rights reserved.How green biotech turned white<strong>and</strong> blueArgentina has blazed a trail as one of <strong>the</strong> leading geneticallymodified (GM) crop producers. Can o<strong>the</strong>r developing countriesimport <strong>the</strong> seeds of its success? Lucas Laursen investigates.This year, midway through Argentina’s 2005–2015 Strategic Plan for Biotechnology, a longstalledupdate of <strong>the</strong> Seed Law circulating inBuenos Aires may finally reach <strong>the</strong> legislativefloor. The current law, which facilitated <strong>the</strong> rapidboom of transgenic crops in Argentina in <strong>the</strong>1990s—60% of Argentina’s soy crop was geneticallymodified for herbicide resistance withinthree years of <strong>the</strong> introduction of RoundupReady soy—is a source of conflict over intellectualproperty rights, as it permits farmers toretain seeds without paying royalties (Box 1).However, <strong>the</strong> meteoric rise in GM cropproduction was not solely <strong>the</strong> function of <strong>the</strong>seed law. Compatible agricultural practices in<strong>the</strong> early 1990s <strong>and</strong> a welcoming governmentcontributed. Critics <strong>and</strong> fans alike say it’s amodel from which o<strong>the</strong>r developing countriescan learn important lessons. Critics warn ofagribusiness’s disproportionate influence ongovernment, an influence <strong>the</strong>y say has createdan explosion of monoculture that jeopardizes<strong>the</strong> businesses <strong>and</strong> health of small farmers.Conversely, Argentine farmers <strong>and</strong> investorscontinue betting on GM varieties, arguing that<strong>the</strong> increased yields <strong>and</strong> financial returns havehelped prop up <strong>the</strong> country’s ailing economy.The question now is whe<strong>the</strong>r o<strong>the</strong>r countrieswill continue to look to Argentina as a rolemodel in <strong>the</strong> adoption of GM crops.Fertile groundMoises Burachik, a senior scientist at <strong>the</strong>Buenos Aires-based National Commissionfor Agricultural Biotechnology Assessment(CONABIA) <strong>and</strong> part of a team responsible forassessing <strong>the</strong> risks of GM crops, worked throughhis recent summer vacation to get through abacklog of applications. Toge<strong>the</strong>r with his counterpartsat <strong>the</strong> National Service for Food Health<strong>and</strong> Quality (SENASA, Buenos Aires), whostudy <strong>the</strong> impact of <strong>new</strong> products on humanhealth, Burachik has a growing to-do list <strong>and</strong>brimming calendar.Burachik is proud of <strong>the</strong> group’s performancein enabling Argentina’s biotech boom, but heis concerned that understaffing <strong>and</strong> outdatedregulations are holding back field trials <strong>and</strong>commercialization. And although Argentinawas once second in <strong>the</strong> world only to <strong>the</strong> UnitedStates in terms of transgenic acreage, this year<strong>the</strong> country slipped into third place behindBrazil, which has been exp<strong>and</strong>ing cultivation ofbiotech crops. Bureaucratic hurdles are not <strong>the</strong>only things slowing down GM crop adoption;<strong>the</strong>re is also a lack of public investment in agriculturalresearch in Argentina. And althoughArgentinean regulators approved a <strong>new</strong> varietyof maize (<strong>the</strong> Swiss-based Syngenta’sBt11xGA21 GM maize), which represents <strong>the</strong>next generation of transgenic crops, in Brazil anational research group recently independentlyproduced its own herbicide-resistant form ofGM soybean, something Argentina has yet toaccomplish. In some ways, it’s surprising thatArgentina has been such a trailblazer for biotechcrops; part of <strong>the</strong> reason for that was <strong>the</strong>willingness of politicians <strong>and</strong> <strong>the</strong>ir scientificadvisers nearly two decades ago to create <strong>the</strong>necessary infrastructure.In 1991, when representatives from <strong>the</strong>California company Calgene (now Monsanto),Nidera of Rotterdam, Holl<strong>and</strong>, <strong>and</strong> Swissgiant Ciba-Geigy (now Syngenta) approachedArgentine government officials about runningfield trials of herbicide- <strong>and</strong> insect-resistantcotton, maize <strong>and</strong> soy plants, <strong>the</strong>y found thatArgentina had no regulatory framework inplace. But, Burachik says, “The technical staffconvinced <strong>the</strong> bosses that this was a <strong>new</strong> greenrevolution.” The government invited industrygroups to join a <strong>new</strong>ly formed commission, <strong>the</strong>No. of applications250200150GrantedNot grantedpredecessor to CONABIA, to make technical recommendationson field trial protocols <strong>and</strong> GMcrop approval. “Of course, <strong>the</strong>re were conflictsof interest, but <strong>the</strong> industry representatives were<strong>the</strong>re on behalf of <strong>the</strong>ir sector, not <strong>the</strong>ir companies,”<strong>and</strong> recused <strong>the</strong>mselves from discussionsrelating to <strong>the</strong>ir firms, Burachik recalls.Ties between <strong>the</strong> Argentine government<strong>and</strong> large agricultural l<strong>and</strong>owners have a longhistory, dating back to <strong>the</strong> 19 th century, when<strong>the</strong> government attempted to pay down itsdebt by taxing exports of crops such as wheat<strong>and</strong> maize. But <strong>the</strong> close ties are also a sourceof criticism, with groups such as <strong>the</strong> Grupo deReflexión Rural of Buenos Aires (http://www.grr.org.ar/), which claims that a revolving doorbetween large-scale agricultural firms <strong>and</strong>government gives <strong>the</strong> firms informal contacts,insider knowledge <strong>and</strong> undue sway over regulatoryproceedings. Although strong connections<strong>and</strong> influence between agribusiness <strong>and</strong> governmentpredate <strong>the</strong> arrival of biotech, <strong>the</strong>y haveplayed an important role in paving <strong>the</strong> way forits swift adoption, says Peter Newell, an internationaldevelopment researcher at <strong>the</strong> Universityof East Anglia, UK, author of a recent study of<strong>the</strong> politics of Argentina’s biotech boom 1 .The early trickle of commercial field trialapplications turned into a stream, reaching 90in 1998, <strong>the</strong> year Europe stopped approving<strong>new</strong> GM crops for import (Fig. 1). Today, <strong>new</strong>applications for field trials are often for cropscontaining multiple transgenes stacked in <strong>the</strong>same plant. In some countries, <strong>the</strong>se would alsobe required to undergo fresh field trials, butin an effort to streamline <strong>the</strong> process, in 2007Argentina began evaluating such crops based oneach separate GM crop’s field trials.Despite <strong>the</strong>se <strong>and</strong> o<strong>the</strong>r government effortsto keep up with <strong>new</strong> technology at <strong>the</strong> commissionlevel, Monsanto’s corporate affairs directorPablo Vaquero in Buenos Aires says thatreorganizations at <strong>the</strong> Ministry of Agriculture,10099909478 8165 706266504536 3926 30213 7111101991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009Figure 1 GM plant field trial approvals in Argentina. (Source: CONABIA, Buenos Aires.)121144224214 210nature biotechnology volume 28 number 5 MAY 2010 393


NEWS feature© 2010 Nature America, Inc. All rights reserved.where final decisions on <strong>new</strong> commercial cropsare made, have slowed down trial applications.It is typically 5 to 6 years from applying fora field trial to <strong>the</strong> first commercial plantings,but even a relatively short delay at <strong>the</strong> wrongtime of year can add an entire year to <strong>the</strong> cycle,agrees Burachik.The Argentine agricultural industry in <strong>the</strong>1990s was ripe for biotech soy for o<strong>the</strong>r reasons,too. Following <strong>the</strong> example of a few pioneers in<strong>the</strong> 1970s <strong>and</strong> 1980s, many Argentine l<strong>and</strong>ownersin <strong>the</strong> mid-1990s began adopting low-till orno-till practices, in which seeds are drilled bymachinery directly into <strong>the</strong> ground. Argentinahas about 2.5 million hectares under no-tillpractices today, according to <strong>the</strong> ArgentineNational Institute of Agricultural Technology.Also known as direct seeding, <strong>the</strong> process allowsfarmers to plant soy crops on <strong>the</strong> same fields aswheat in <strong>the</strong> off-season, permitting a massiveincrease in <strong>the</strong> amount of cultivation possible.Crucially, <strong>the</strong> method required heavy machinerybut little labor, h<strong>and</strong>ing an advantage to largel<strong>and</strong>owners with capital <strong>and</strong> economies of scaleat about <strong>the</strong> same time as <strong>the</strong> first GM soy cropreached <strong>the</strong> Argentine market. Larger l<strong>and</strong>ownersbegan buying up individually owned plots,leading to violent confrontations. Farmerseagerly turned over l<strong>and</strong> previously devotedto cattle or domestic food crops <strong>and</strong> cut downforests to plant Monsanto’s Roundup Readysoy, which resists <strong>the</strong> herbicide glysophate, <strong>and</strong>was <strong>the</strong> first GM crop approved in Argentina.Ecological luck also played a role: “The singlefact that is probably most important in accelerating<strong>the</strong> speed of approval [in Argentina] isthat <strong>the</strong>re are no wild relatives of soy,” notes ValGiddings, president of Prome<strong>the</strong>usAB, a biotechconsultancy in Silver Spring, Maryl<strong>and</strong>.In part because most GM products wereexported for animal feed <strong>and</strong> partly becauseof consumer apathy, say observers, <strong>the</strong>re waslittle public reaction in <strong>the</strong> first years of biotechplantings. Today, when <strong>the</strong> Buenos Airesoffice of <strong>the</strong> US Department of Agriculture’sForeign Agricultural Service (FAS) runs publicevents promoting GM crops, environmentalor consumer groups don’t bo<strong>the</strong>r showingup, says Andrea Yankelevich, who works for<strong>the</strong> FAS in Buenos Aires. Newell says that <strong>the</strong>national <strong>media</strong> portrays growers as heroicengines of economic growth, often siding withagribusiness against government attempts atregulation or taxation.Royalty-free cultivationArgentina’s intellectual property laws helped tolower <strong>the</strong> cost of adoption. Argentina adheresto <strong>the</strong> 1978 International Convention for <strong>the</strong>Protection of New Varieties of Plants (UPOV1978), which permits creators of <strong>new</strong> plants toMoisés Burachik, Director of Biotechnology,Ministry of Agrictulture, Buenos Aires. (Source:Moisés Burachik).charge an initial license fee, but exempts growersfrom paying annual fees for <strong>new</strong> seeds. Formaize, creators are able to earn <strong>the</strong>ir R&D costsback because <strong>the</strong> plants are not self-fertilizing<strong>and</strong> growers must buy <strong>the</strong> seeds each year.Soy is self-fertilizing <strong>and</strong> although Argentinefarmers may not legally distribute seeds, underUPOV 1978, <strong>the</strong>y are permitted to retain seedsfor <strong>the</strong>ir own use. UPOV updated its terms ina 1991 convention to limit this practice, butArgentina <strong>and</strong> its partners in <strong>the</strong> Sou<strong>the</strong>rnCommon Market (Mercosur) have not signedon to <strong>the</strong> <strong>new</strong> convention.When Roundup Ready soy arrived inArgentina, it was under license to AsgrowArgentina, a multinational owned at <strong>the</strong> timeby <strong>the</strong> American-based Upjohn Company ofKalamazoo, Michigan, which seed <strong>and</strong> grainimporter/exporter Nidera of Buenos Airessubsequently acquired. Nidera spread <strong>the</strong> seedswidely <strong>and</strong> legally throughout <strong>the</strong> country,but illegal trade, nicknamed ‘white bag’, hadalready begun. During that time, Monsantomade much of its Argentine income fromselling <strong>the</strong> patented Roundup Ready herbicidethat accompanied Roundup Ready–resistantsoybeans. By <strong>the</strong> time Monsanto applied fora revalidation patent on its Roundup Ready–resistant soy in 1995, Argentina had signedTRIPS, <strong>the</strong> international “trade-related aspectsof intellectual property rights” agreementthat does not recognize revalidation patents.Argentine courts could deny <strong>the</strong> Monsantoapplication on <strong>the</strong> principle that <strong>the</strong> transgenicseed was already widely distributed <strong>and</strong> part of<strong>the</strong> public domain. In 2003, Monsanto withdrewits soy business from Argentina, though<strong>the</strong> firm still sells various formulations ofRoundup Ready herbicide <strong>the</strong>re <strong>and</strong> reported$183 million in gross receipts from Argentinain its fiscal 2008–2009 year, making Argentinaits third-biggest regional market 2 .A consequence of <strong>the</strong> Argentinean legalenvironment was that <strong>the</strong> price of legitimatelylicensed seeds fell, giving Argentine exporters asmall but noticeable advantage in global markets.This prompted <strong>the</strong> US government <strong>and</strong><strong>the</strong> American Soybean Association, headquarteredin St. Louis, to put pressure on developingcountries like Brazil not to import RoundupReady soybeans from Argentina. By <strong>the</strong>n,however, <strong>the</strong> trade in illegal seeds had spreadbeyond Argentina’s borders into agriculturallysimilar parts of Brazil <strong>and</strong> Paraguay.Monsanto also took its case to importersof Argentine products in countries whereMonsanto did have a patent. Its lawyersclaimed that <strong>the</strong> use of Argentine RoundupReady soy on which no royalty had been paidwas illegal under its agreements with importersin Spain, <strong>the</strong> UK, The Ne<strong>the</strong>rl<strong>and</strong>s <strong>and</strong>Denmark. Argentina argued that <strong>the</strong> agreementsapplied only when <strong>the</strong> transgene wasused for <strong>the</strong> patented function—protectingliving plants from Roundup Ready herbicide—butthat soy derivatives such as oil werenot protected. Monsanto lost its court cases inSpain <strong>and</strong> <strong>the</strong> UK in 2007, <strong>and</strong> <strong>the</strong> Dutch casewas referred to <strong>the</strong> European Court of Justicein 2009. That trial’s ruling, which is expectedthis year, will probably guide any o<strong>the</strong>r rulingsin Europe. Paolo Mengozzi, advocate general,wrote an opinion for <strong>the</strong> Court in Marchfavoring Argentina, though <strong>the</strong> Court has yetto make its final ruling.Vaquero says that by one calculation, RoundupReady soy generated as much as $20 billion for<strong>the</strong> Argentine economy from 1996 to 2006, ofwhich ~80% stayed with producers. “Our intentionis to return our soy business to Argentinawhen a mechanism exists that will permit us torecoup our investment,” Vaquero saysBut Roundup Ready is now decades old,<strong>and</strong> Argentina’s Supreme Court ruling againstgranting Monsanto a patent for it is severalyears old, too. In <strong>the</strong> absence of an unexpectedlegal shift in Argentina, Monsanto is trying toforge private agreements covering a secondgenerationRoundup Ready transgenic soybeanwith Brazilian producers <strong>and</strong> exporters.Monsanto claims that that <strong>the</strong> <strong>new</strong> seed performsabout 10% better than first-generationRoundup Ready soy. Still, Vaquero notes, “Weneed agreement with <strong>the</strong> producers, not justa <strong>new</strong> law, because laws like this are hard toenforce without cooperation.”Roxanna Blasetti, director of internationalrelations for <strong>the</strong> Ministry of Agriculture, saysthat <strong>the</strong> government has its h<strong>and</strong>s tied withregard to issuing Monsanto a patent for <strong>the</strong>original Roundup Ready soy today. “Thegovernment cannot recognize private ownershipof a public property,” she says, <strong>and</strong>because <strong>the</strong> Roundup Ready gene is in <strong>the</strong>394 volume 28 number 5 MAY 2010 nature biotechnology


<strong>new</strong>s feature© 2010 Nature America, Inc. All rights reserved.public domain in Argentina it would set an“unimaginable” precedent.EU holdupAs it became a major producer of GM soy,Argentina had to tackle <strong>the</strong> issue of tradebarriers to its exports. In 1998, under pressurefrom consumer <strong>and</strong> environmentalgroups, <strong>the</strong> European Union (EU; Brussels)stopped approving GM agricultural productsfor commercial use. The US <strong>the</strong>n brought acase against <strong>the</strong> EU before <strong>the</strong> World TradeOrganization (WTO) in Geneva. Canada,ano<strong>the</strong>r large GM crop exporter, joined <strong>the</strong>case, which argued that <strong>the</strong> EU was takingtoo long to approve <strong>new</strong> transgenic crops forimport. For Argentina, <strong>the</strong> case for joining <strong>the</strong>fight was less clear-cut.“Argentina did not have as many [GM]products in <strong>the</strong> pipeline as Canada <strong>and</strong> <strong>the</strong>United States, so its primary motivation wasto encourage compliance with WTO rules,”says Blasetti, who was involved in Argentina’snegotiations. Some Argentine exporters feared<strong>the</strong> repercussions of a drawn-out trade conflictwith one of <strong>the</strong>ir biggest customers.Ultimately, <strong>the</strong> risk of <strong>the</strong> precedent set by<strong>the</strong> European moratorium prevailed over fearsof a trade conflict <strong>and</strong> <strong>the</strong> Argentine governmentdecided to join <strong>the</strong> North Americanplaintiffs before <strong>the</strong> WTO.By August 2003, some European nationsbegan approving transgenic crops again,though WTO rulings would take until late2006 to come down in favor of <strong>the</strong> Americanplaintiffs. The argument hinged not onwhe<strong>the</strong>r GM crops were safe for Europe, butwhe<strong>the</strong>r Europe’s approval process was consistentwith its obligations under existing tradeagreements. Canada <strong>and</strong> Argentina have sinceagreed on calendars for approving <strong>the</strong> backlogof transgenic crops that accumulated during<strong>the</strong> moratorium <strong>and</strong> information sharing toease <strong>new</strong> approvals.Yet during <strong>the</strong> moratorium, Argentina wasalready asserting more trade independencethanks to its ties with o<strong>the</strong>r markets, includingChina <strong>and</strong> India. For example, China <strong>and</strong>Argentina signed a memor<strong>and</strong>um of underst<strong>and</strong>ingin 2004 that lets China import soy<strong>and</strong> Argentina import herbicide. Argentinaalso exports unrefined soy oil to India.Thus, Argentina’s alliances, widespreadtrade network <strong>and</strong> farming infrastructureare all part of <strong>the</strong> story of its early successwith biotech crops.Internal tensionsIn <strong>the</strong> years since GM crop cultivation tookhold, internal conflicts have started to influence<strong>the</strong> country’s biotech crop productioncapacity. Most recently in 2008, under pressureto redistribute some of <strong>the</strong> wealth generatedby high commodity crop prices, Argentinepresident Cristina Kirchner instituted a <strong>new</strong>floating export tax that increases when internationalprices are high <strong>and</strong> decreases whenprices are low. Farmers blocked highways, <strong>and</strong>camped out in Buenos Aires in protest. “Thereis a populist view of <strong>the</strong> world coming from<strong>the</strong> government in which <strong>the</strong>re is a certainconfrontation with business <strong>and</strong> farming,”observes Eduardo Trigo, a biotech analyst inBuenos Aires for consulting firm CEO.Farmers may unite against taxes, but <strong>the</strong>yrealign against each o<strong>the</strong>r in o<strong>the</strong>r debates.Large l<strong>and</strong>owners are more willing than smallones to cede license fees to Monsanto, forinstance. Consumer <strong>and</strong> environmental groupshave also grown more vocal in recent years.Gonzalo Girolami, a Greenpeace spokesperson,points to a 2007 forestry law passed thanks to aGreenpeace campaign that requires provinciallevelapproval before l<strong>and</strong>owners can cut downforests. “Soy accelerated deforestation,” saysGirolami, “but no longer are forests at <strong>the</strong> mercyof <strong>the</strong>ir owners.”Accusations that soy has taken over Argentinepolitics are interchangeable with argumentsabout beef producers 30 years ago or wheatgrowers 150 years ago. As Trigo notes “governmentpower is very centralized here,” <strong>and</strong> hasalways been close to l<strong>and</strong>owners, despite spatsover taxation <strong>and</strong> regulation.Exporting lessonsArgentina’s neighbors Paraguay <strong>and</strong> Brazil beganmaking up for lost time a few years ago aftertaking a wait-<strong>and</strong>-see stance during biotech’sfirst decade in Argentina. They contain regionsgeographically similar to some in Argentina <strong>and</strong>were beneficiaries of illegal seed smuggling in<strong>the</strong> late 1990s.Public debate <strong>the</strong>re, although strong in <strong>the</strong>first decade of GM crops, mostly died down asfarmers have grown accustomed to <strong>the</strong> smuggledGM seeds. “It’s a non-issue” in Paraguay,Trigo says, <strong>and</strong> regulatory authorities havedeveloped a nimbler set of rules for approval<strong>and</strong> implementation of <strong>new</strong> transgenic crops<strong>the</strong>re. Paraguay also reinvests a fraction of itsbiotech export tariffs in domestic biotech R&D,notes Yankelevich, unlike Argentina where mostbiotech R&D is privately funded.Brazil has approved about a dozen commerciallyapproved biotech crops now <strong>and</strong>,according to Burachik, it is approving trials ata faster rate than Argentina. In fact, Giddingsargues that “one of <strong>the</strong> things that’s driving<strong>the</strong> Argentines has been competitive pressurevis-à-vis <strong>the</strong>ir colleagues across <strong>the</strong> Rio Plata.”Climate <strong>and</strong> soil conditions vary widely acrosso<strong>the</strong>r developing countries, even in SouthAmerica, which contains mountain desertenvironments in <strong>the</strong> Andes, semi-arid plainsin <strong>the</strong> Argentine Pampas <strong>and</strong> tropical rainforestin <strong>the</strong> Amazon Basin. Yet <strong>the</strong> majority ofbiotech crop R&D still focuses on temperateclimates like that of North America. “It’s verydifficult to believe that biotech soy’s success inArgentina will be repeated with ano<strong>the</strong>r cropor in ano<strong>the</strong>r country,” Trigo says.Fur<strong>the</strong>r funding of domestic research inArgentina would help, Trigo says, but <strong>the</strong>approval process at CONABIA “has a resourceproblem.” Burachik agrees that 20 staffers aren’tenough to keep up with <strong>the</strong> flow of applications,which tripled from 1999 to 2009, but says that<strong>the</strong> bigger problem for Argentina <strong>and</strong> o<strong>the</strong>rdeveloping countries is that <strong>the</strong>ir potentialmarkets approve products out of synchrony. “Ihave tried to create links with o<strong>the</strong>r regulatoryagencies to start a technical dialog about sharingbiosafety information,” Burachik says, “but I fear<strong>the</strong> problem isn’t really <strong>the</strong>re: it’s political.”The debate over GM crops is much louderin o<strong>the</strong>r developing countries. In Peru, whichstill lacks regulation to enforce its biotech law,opponents have called for a moratorium on<strong>the</strong> import of biotech products <strong>and</strong> claimedto detect transgenes in cultivated crops. A scientistwho contested <strong>the</strong>se claims is currentlyfacing criminal charges for defamation (Nat.Biotechnol. 28, 110, 2010). Greenpeace is sponsoringa “Brazil Better Without Transgenic”advertising campaign <strong>and</strong> some consumerfacingfood processors <strong>and</strong> retailers are hesitantto adopt biotech products, though <strong>the</strong>y remainpopular with producers 3 .With growing markets in China, India <strong>and</strong>elsewhere, Argentina <strong>and</strong> its neighbors willcontinue trying to capitalize on <strong>the</strong>ir competitiveadvantages growing soy, cotton <strong>and</strong>maize. The <strong>new</strong> seed law under considerationin Buenos Aires may open <strong>the</strong> door to moreprivate investment if international firms, suchas Monsanto, are satisfied that <strong>the</strong>ir royaltieswill be more secure than under today’s system.But <strong>the</strong> cost of distribution will depend heavilyon international agreements, such as <strong>the</strong> pendingEU approval schedules. Those challenges,which Argentina has navigated thus far, mightbe enough to make o<strong>the</strong>r countries think twiceabout how to implement <strong>the</strong>ir own biotech cropplans, but at least in Argentina, Yankelevich says,“<strong>the</strong>re’s no going back.”Lucas Laursen, Madrid1. Newell, P. J. Latin Amer. Studies 41, 27–57 (2009).2. http://www.monsanto.com/pdf/pubs/2009/annual_report.pdf3. Silva, J.F. Brazil Biotechnology Annual AgriculturalBiotechnology Report 2008 (US Department ofAgriculture FAS, Brasilia, 2008) nature biotechnology volume 28 number 5 MAY 2010 395


uilding a businessAvoiding capital punishmentJustin Chakma, Eliot Forster & Thomas E HughesIn an industry with a lengthy product development timeline, capital efficiency is paramount. But successfulcapital-efficient strategies require a different approach to thinking, working <strong>and</strong> fundraising.© 2010 Nature America, Inc. All rights reserved.Capital efficiency is generally defined asdoing more with less. The idea is particularlyrelevant now, but biotechs should alwaystry to be efficient with <strong>the</strong>ir capital because it’soften too late to tighten up <strong>the</strong> budget onceresources become scarce.In our view, capital efficiency is most closelyrelated to strategies for spending <strong>and</strong> resourcing—in o<strong>the</strong>r words, achieving more with greaterflexibility <strong>and</strong> precision, <strong>and</strong> using minimalresources. It means making it absolutely clearto your potential investors that you will carefullymonitor your money (Box 1). And this isespecially relevant to small biotechs for whichacquisition is typically <strong>the</strong> ultimate objective.Being a capital-efficient company (Box 2) hasmore to do with how you spend your moneythan with anything else. But it also deals wi<strong>the</strong>xploiting <strong>the</strong> many intrinsic, capital-efficientadvantages associated with small biotechs. Thefirst advantage is <strong>the</strong> lack of a costly ‘legacyinfrastructure,’ comprising equipment or talentonce needed but later underused due tostrategy or scope changes. Lacking unneededinfrastructure means that companies can renttalent as needed <strong>and</strong> <strong>the</strong>refore access top-tiercapabilities at a discounted cost. The second is<strong>the</strong> greater flexibility in R&D spending, with afocus on getting <strong>the</strong> program to its key valueinflectionpoint, typically proof of concept.And <strong>the</strong> third is <strong>the</strong> alignment of performanceincentives <strong>and</strong> <strong>the</strong> company’s mission—whencompensation packages focus more on companyequity, it helps attract collaborative teamJustin Chakma is an analyst at MaRS Innovation,Toronto, Ontario, Canada. Eliot Forster ispresident <strong>and</strong> CEO of Solace Pharmaceuticals,Boston, Massachusetts, USA. Thomas E. Hughesis president <strong>and</strong> CEO of Zafgen, Cambridge,Massachusetts, USA.e-mail: justin.chakma@utoronto.ca,eliotforster@solacepharma.com orthughes@zafgen.complayers who are focused on <strong>the</strong> company’s goals.In <strong>the</strong> following article, we discuss how <strong>the</strong>seadvantages can be achieved in practice whensetting up a life science venture.The right size, <strong>the</strong> right peopleCapital efficient organizations keep hiring lean,with most communication occurring throughemail <strong>and</strong> phone conferences to accommodatetravel, coordinate efforts with vendors, <strong>and</strong> tofacilitate after-hours work for teams that maybe distributed across geographical regions. Youneed to ensure that your hires are comfortablewith this sort of collaborative, real-timeworkflow. Without constant supervision, youremployees will need to ask <strong>the</strong> right questionson <strong>the</strong>ir own <strong>and</strong> learn quickly. Make sure <strong>new</strong>hires have this ability.Your employees also should plan on oddschedules, <strong>and</strong> <strong>the</strong> company should be up frontabout this. For example, one of us (T.E.H.) headsa team based in Cambridge, Massachusetts,that often holds night teleconferences withdevelopment colleagues in Australia as well asearly morning calls with <strong>the</strong> drug discovery <strong>and</strong>formulation teams in <strong>the</strong> UK. Crying babies,washing machines <strong>and</strong> barking dogs are often<strong>the</strong> sound track to <strong>the</strong>se meetings. It is a differentway of life <strong>and</strong> a different way of doingbusiness—most entrepreneurial people are fi<strong>new</strong>ith it, but any CEO of a startup should broach<strong>the</strong> topic with any potential <strong>new</strong> hire.In general, you’ll need employees who arecapable of asking questions on <strong>the</strong>ir own. You’llwant adaptable people who can learn quickly—people who are comfortable both running projects<strong>and</strong> making <strong>the</strong>ir own coffee. In short, youneed people who can network <strong>and</strong> who also havea level of autonomy <strong>and</strong> <strong>the</strong> ability to independentlyproblem solve.This raises <strong>the</strong> question: where does onefind <strong>the</strong>se sorts of hires? Personal contactsare important, but ano<strong>the</strong>r way is to tap into<strong>the</strong> network of pharmaceutical advisors <strong>and</strong>contract research organizations (CROs). Theseindividuals are already accustomed to <strong>the</strong> capital-efficientlifestyle <strong>and</strong> workflow; moreover,<strong>the</strong>y are also solid networkers <strong>the</strong>mselves <strong>and</strong>can help to drive fur<strong>the</strong>r recruitment. Askaround <strong>and</strong> network; contacts, whe<strong>the</strong>r atCROs or elsewhere, are a good source of suggestionsfor potential employees.What’s more, working in virtual organizationsacross various time zones, as is often required totap <strong>the</strong> best talent <strong>and</strong> maximize use of humancapital, means that you are accountable for facetime: as a founder or CEO, you’ll need to travel<strong>and</strong> get out of <strong>the</strong> cave to communicate withyour staff <strong>and</strong> your vendors. Phones will onlytake you so far. This means that although anorganization needs to keep infrastructure coststo a minimum, a hub—with telephones, chairs<strong>and</strong> decent coffee—is also important for occasionallyhosting partners <strong>and</strong> sharing ideas; i<strong>the</strong>lps to build a sense of a team effort. Plan tomeet with staff face to face <strong>and</strong> <strong>the</strong>n communicateas <strong>the</strong> project plan dictates. For instance,when a company representative is required toattend a critical meeting in person, you makesure <strong>the</strong>y get <strong>the</strong>re—last year, one of us (T.E.H.)sent an employee to Australia for a 15-minutemeeting. It was worth it.Order outGiven that <strong>the</strong> majority of drug developmentcosts occur during clinical trials, this area is ripefor conserving, <strong>and</strong> CROs often can produceresults more cheaply than a small biotech evercould. To decide if your work is appropriatefor outsourcing, determine if your project orset of experiments can be defined specifically(exploratory biology may not be well suited fora capital-efficient model, for example) <strong>and</strong> <strong>the</strong>nlook for available CROs, send out a request forproposals with project outcomes specified, <strong>and</strong>shortlist organizations based on those definedparameters. Finally, interview teams to assess<strong>the</strong> quality of <strong>the</strong>ir previous work <strong>and</strong> examinenature biotechnology volume 28 number 5 MAY 2010 399


uilding a business© 2010 Nature America, Inc. All rights reserved.<strong>the</strong>ir background via reference checks, casestudies <strong>and</strong> statistical models. It’s also importantto observe how <strong>the</strong> CRO interacts withyour own team to determine whe<strong>the</strong>r it trulyunderst<strong>and</strong>s <strong>the</strong> business.For <strong>the</strong> sake of expediency, it is often best tohave relationships with CROs before you need<strong>the</strong>m. One way to do this is to retain pharmaceuticaladvisors who have experience across awide spectrum of skills, be it medical chemistry,toxicology or clinical trials. Those people <strong>and</strong><strong>the</strong>ir networks should help you identify, <strong>and</strong><strong>the</strong>n get inside, <strong>the</strong> best shops quickly.To help select a CRO, it’s critical to underst<strong>and</strong>what results you seek, so that your CROpartners <strong>and</strong> advisors truly underst<strong>and</strong> whatyou’re doing. Successful capital-efficient biotechsspend a great deal of time making surethat planning is correct <strong>and</strong> communicatingthose plans effectively, including specifying <strong>the</strong>output. You’ll get only disaster if you walk into aCRO with a generic request for a clinical trial.Besides, agreeing to precise specifications is agreat way to minimize scope change (resultingBox 1 Pitching for capital efficiencyin ‘change orders’). About 25% of CRO revenuecomes from change orders, so proper duediligence <strong>and</strong> output specifications in a contractcan save you big money. One final thing:when committing to a contract in a foreigncountry, consider hedging strategies in whichyou can secure an exchange rate (favorable orde-risked against adverse fluctuations) bybuying in advance a portion of <strong>the</strong> currencyneeded to complete <strong>the</strong> deal.There are many qualified CROs out <strong>the</strong>re, butwe have successfully worked with <strong>the</strong>se: INCResearch in Raleigh, North Carolina; ICON,based in Dublin, Irel<strong>and</strong>; Nucleus Network inMelbourne, Australia; Trident Clinical Researchin Port Adelaide, Australia; <strong>and</strong> Q-Pharm inQueensl<strong>and</strong>, Australia.Before manMost early-stage biotechs are first created asvirtual operations in academic labs, so <strong>the</strong>reis a temptation to outsource preclinical workto <strong>the</strong> academic labs instead of to CROs.This makes sense, <strong>and</strong> some universities nowFor better or worse, every CEO is going to boast about his or her company being capitalefficient. But how do you truly show that your biotech’s capital-efficient model is real?Proof of capital efficiency comes from your full-time employee count <strong>and</strong> financialstatements. When investors ask about your financing history, point to your, say, six or feweremployees <strong>and</strong> mention a specific achievement, such as hitting all your milestones with thissmall group. You’ll also want to show investors your financial statements <strong>and</strong> business plan,<strong>the</strong>n describe what you have delivered <strong>and</strong> what you intend on delivering. If your past orfuture infrastructure to operational spending ratio is greater than 1:1, you have a problem.Your COO should determine whe<strong>the</strong>r your infrastructure cost is less than your operationalspending before releasing any money to spend on, for example, a <strong>new</strong> hire. One test to applywould be to ask what your fixed costs (including payroll) represent as a fraction of your totalrunning cost <strong>and</strong> what it would cost in time <strong>and</strong> dollars to shut down <strong>the</strong> enterprise.The bottom line in <strong>the</strong> biotech world is: ‘<strong>the</strong> more you hire, <strong>the</strong> more you fire.’ It’s best toavoid that by being capital efficient from <strong>the</strong> very beginning <strong>and</strong> making that clear to yourventure capitalists. It also helps to articulate a pathological hatred of large infrastructure.Also, be honest about your expenses <strong>and</strong> show that you’ve done due diligence to reduce<strong>the</strong>m. This means determining where <strong>and</strong> when it is appropriate to be capital efficient.Determine <strong>the</strong> vendors in your arena, get quotes <strong>and</strong> have each expense item accounted forwith an appropriate source for your business plan. You should lay out multiple scenarios forclinical trials <strong>and</strong> present <strong>the</strong>ir true <strong>and</strong> complete costs to give <strong>the</strong> full scope of possibilities.You have to show venture capitalists that you know what it means to be capital efficient, notjust tell <strong>the</strong>m you know.If possible, it’s best to approach venture capital firms that truly underst<strong>and</strong> <strong>the</strong> capitalefficientmodel. These are <strong>the</strong> firms that build a portfolio project by project <strong>and</strong> take a ‘shotson goal’ approach, where <strong>the</strong> risk of each program is overseen at <strong>the</strong> fund level ra<strong>the</strong>r than<strong>the</strong> firm level. The management teams of venture capital firms are mobile <strong>and</strong> are aligned tokill bad projects.Some venture capitalist firms that follow this process are Atlas Ventures, based inWaltham, Massachusetts, <strong>and</strong> Scale Venture Partners in Foster City, California, both ofwhich often integrate management teams into <strong>the</strong>ir network <strong>and</strong> involve <strong>the</strong>m as advisorsacross <strong>the</strong> breadth of <strong>the</strong> portfolio; InterWest in Menlo Park, California; Polaris in Waltham;New Enterprise Associates, based in Menlo Park; Sofinnova in Paris; New Leaf Ventures inMenlo Park; <strong>and</strong> Third Rock Ventures in Boston.support laboratory facilities that can beaccessed by external groups at a fair price. Infact, Zafgen in Cambridge, Massachusetts, hastapped into several such core facilities to supportstudies requiring gene expression analysis<strong>and</strong> specialized clinical chemistry readouts.That said, in our experience, it’s rare that anacademic lab can offer <strong>the</strong> support <strong>and</strong> termsprovided by specialist, professional vendors.Universities are notorious for taking a longtime to negotiate contracts, <strong>and</strong> experimentscan be held up for 6–12 months due to paperworkat technology transfer offices. Issues canarise around publication, inability to drawcontracts, turnover of tech transfer staff <strong>and</strong>intellectual property concerns.Also, universities often are not adept at performingstudies that require repetition, as is frequentlyneeded in drug discovery. (In contrast,good CROs repeat <strong>the</strong> same task reliably timeafter time.) It’s difficult to know when to usea university, but <strong>the</strong> partnership usually worksbest when both <strong>the</strong> academic motivation forconducting <strong>new</strong> research <strong>and</strong> <strong>the</strong> company’smotivation for drug discovery progress align.In <strong>the</strong>se cases, companies can access years ofacademic experience. Capital-efficient biotechsmust weigh <strong>the</strong> pros of experience <strong>and</strong> a lowcost with <strong>the</strong> cons of potential delays <strong>and</strong> <strong>the</strong>different culture of academic labs.But <strong>the</strong>re are o<strong>the</strong>r parameters to considerbeyond time <strong>and</strong> cost. Academic collaboratorsoften are best equipped <strong>and</strong> best able to helpanswer <strong>the</strong> tough ‘how’ questions: How doesmanipulation of this drug target impact <strong>the</strong>disease state at <strong>the</strong> whole animal, tissue, cellor pathway level? How useful is this approachat different stages of <strong>the</strong> disease process? Or,How does this treatment stack up against st<strong>and</strong>ardof care or competing emerging agents?Professional contract organizations are betterat answering <strong>the</strong> ‘what’ questions: What is <strong>the</strong>potency of this molecule? What is <strong>the</strong> distributionof this molecule in different tissues? Or,What is <strong>the</strong> impact of treatment on a range ofst<strong>and</strong>ardized endpoints?Your decision about preclinical work shouldalso consider which approach will provide <strong>the</strong>greatest assurance to big pharma decision makers.In this regard, professional organizationsmight provide <strong>the</strong> greatest comfort when clearlydocumented results are needed concerning <strong>the</strong>mechanics of <strong>the</strong> program (drug exposure <strong>and</strong>metabolism, safety <strong>and</strong> efficacy, <strong>and</strong> so on).Academic collaborations help provide comforton <strong>the</strong> hairier question of whe<strong>the</strong>r <strong>the</strong> <strong>the</strong>rapeuticsstrategy is attractive from an investmentst<strong>and</strong>point. Publications from trusted investigatorscan go a long way with pharma <strong>and</strong> canreduce <strong>the</strong>ir stress when deciding to go with anexpensive <strong>and</strong> risky in-licensing deal.400 volume 28 number 5 MAY 2010 nature biotechnology


uilding a business© 2010 Nature America, Inc. All rights reserved.We recommend <strong>the</strong>se CROs for preclinicalwork: Galapagos in Mechelen, Belgium; Evotec,based in Hamburg, Germany; GVK BiosciencesPrivate in Hyderabad, India; Cerep in Paris,France; <strong>and</strong> RenaSci in Nottingham, UK.Making <strong>the</strong> callAs hinted in <strong>the</strong> previous section, managersmust ensure <strong>the</strong> research is sufficiently focused.Can you define a sequential step of experimentsthat will signal a ‘go’ or ‘no go’ for your project?To do this, you must establish priorities:conduct due diligence to eliminate risk factorssuch as intellectual property, <strong>the</strong>n assess <strong>the</strong>accessibility of starting materials <strong>and</strong> so on.This will help you not only reduce unexpectedor unnecessary deviations from <strong>the</strong> businessstrategy but also identify questions that maynot be im<strong>media</strong>tely available <strong>and</strong> determinepotentially rate-limiting activities.In <strong>the</strong> experience of two of us (T.E.H. <strong>and</strong>E.F.), <strong>the</strong> rate-limiting activities are set by identifying<strong>the</strong> most important piece of data neededfor a project. The firms <strong>the</strong>n plan out <strong>the</strong> entiresequence of activities that will produce that pieceof data. It becomes a priority, with o<strong>the</strong>r componentsbeing run more quickly or more slowlyaround it. Setting priorities in this way preventsa company from having many projects going onat once, which can lead to an inability to controlthird parties, resulting in cost <strong>and</strong> time overruns.Keep an eye on what’s important, <strong>and</strong> change itonly in rare circumstances.Part of prioritizing is weighing <strong>the</strong> urgency of<strong>the</strong> activity versus a willingness to pay for it, soyou should be able to explicitly define <strong>the</strong> conditionsin which you will outsource. Is <strong>the</strong> taskdefined? Is <strong>the</strong> task time sensitive? Being able toanswer <strong>the</strong>se questions easily depends on havingconducted sufficient due diligence <strong>and</strong> havinga hard strategy. For instance, always trying tonegotiate a lower price may bring added costsby increasing <strong>the</strong> time it takes to complete <strong>the</strong>project. Is speed or cost more important to <strong>the</strong>health of your company? It’s critical to underst<strong>and</strong>how your different pieces fit toge<strong>the</strong>r.A project management program, such asweb-based software like Tenrox or MicrosoftProject, can help you track outst<strong>and</strong>ing financial<strong>and</strong> project controls, as well as give you adatabase of contracts <strong>and</strong> collaborations. Ganttcharts (horizontal bar charts that represent <strong>the</strong>duration of tasks set against <strong>the</strong> progression oftime for resource allocation, annotated with keydecision points <strong>and</strong> criteria) are helpful, as wellas being useful for identifying responsibilities.Finally, it’s likely in your small firm that<strong>the</strong> decision-making group is only <strong>the</strong> executiveteam <strong>and</strong> <strong>the</strong> board members. Use thisas an advantage <strong>and</strong> avoid creating too manycommittees or firm processes. Think eachBox 2 Defining capital efficiencyCapital expenditures (CapEx) are expenditures that create future benefits. Examplesinclude <strong>the</strong> development of infrastructure <strong>and</strong> acquisition of equipment that has a long,useful life span. These types of expenditures obtain entities that will usually last five yearsor more <strong>and</strong> are depreciated or amortized accordingly. If you are working in an organizationwith relatively short-term objectives, such as a venture capital–funded biotech, or abiotech in which <strong>the</strong>re are changing priorities or efforts, <strong>the</strong>n you should minimize yourCapEx <strong>and</strong> manage <strong>the</strong> activities through operational expenditures (OpEx). Keeping a lowCapEx to OpEx ratio is one way of being capital efficient.Capital efficiency should also be applied during cash-flush times. Remember thatraising incrementally larger amounts of capital delivers less attractive returns for yourventure capitalists. Capital-efficient companies also tend to have fewer shareholders <strong>and</strong>simpler governance in <strong>the</strong> board room, both of which can mean a more collaborative <strong>and</strong>intimate investor involvement.problem through but make decisions firmly<strong>and</strong> quickly.Get organizedWith <strong>the</strong> continuing globalization of biotech,it is important to identify <strong>and</strong> leverageexpertise in multiple regions. But this task isnot simple. Underst<strong>and</strong>ing where <strong>and</strong> whenyou intend to use a region for a certain taskfrom <strong>the</strong> outset is critical in shaping how youorganize your biotech.Two of us (T.E.H. <strong>and</strong> E.F.) have directexperience trying to complete biology work inemerging markets, such as India, but have hadlittle success (so far). Mainly that’s because outsourcingof biological work remains primarily<strong>the</strong> province of <strong>the</strong> US <strong>and</strong> Europe. However,chemistry, especially routine chemical syn<strong>the</strong>sis,is conducted successfully <strong>and</strong> cost effectively inIndia. For clinical trials, <strong>the</strong> location of <strong>the</strong> CROis not as important as its international reach,because increasingly it’s undesirable to conductphase 2 <strong>and</strong> 3 trials in just one country, due tocompetition for certain patient populationsamong o<strong>the</strong>r reasons.To leverage <strong>the</strong> global nature of contractresearch work, Zafgen <strong>and</strong> SolacePharmaceuticals, based in Boston, Massachusetts,have taken different approaches. Solace set upfacilities in both Cambridge, Massachusetts,<strong>and</strong> Canterbury, UK, to allow <strong>the</strong> company towork toge<strong>the</strong>r with CROs from India <strong>and</strong> China,as well as with <strong>the</strong> West Coast of <strong>the</strong> US, in asingle (long) working day. Zafgen, on <strong>the</strong> o<strong>the</strong>rh<strong>and</strong>, opted for a fully integrated model withonly a small number of CROs supporting itsdrug discovery program. By using a ‘full-stopshop’ with molecular modeling, chemistry,assay work <strong>and</strong> optimization, Zafgen employs asimple organizational structure <strong>and</strong> has reducedits dependence on building a supply <strong>and</strong> datamanagement structure. The logistics for sendingsamples <strong>and</strong> managing data between CROs arecomplicated <strong>and</strong> can lead to delays <strong>and</strong> errors ifnot carefully managed.Which organizational structure you opt forwill depend on your stage of development <strong>and</strong>your <strong>the</strong>rapeutic area. For preclinical work,<strong>the</strong> headquarters of your company or vendoris critical for optimizing global workflow. Fullstopshops are more amenable to optimizing<strong>new</strong> compounds for established targets ra<strong>the</strong>rthan exploiting <strong>new</strong> drug targets. For clinicaldevelopment work, most capital-efficientcompanies opt to go with a single global CRO,so <strong>the</strong> diversification of location matters less.But, clinical trials for niche diseases are morelikely to be sourced to specialized CROs,meaning managers need to plan <strong>and</strong> adapt<strong>the</strong>ir organizations accordingly.ConclusionsExecuting a capital-efficient model in biotech ischallenging but worthy of consideration. In <strong>the</strong>past, big pharma may have valued infrastructurein biotechs, but it is moving in <strong>the</strong> oppositedirection today. These days, a slimmer biotechis more attractive.Individuals in a capital-efficient organizationneed to be able to translate complex scientificneeds into concise <strong>and</strong> clear project specificationsfor third-party CROs <strong>and</strong> advisors. Thenext cadre of successful bioentrepreneurs willneed to be good scientists <strong>and</strong> have <strong>the</strong> wherewithalto mold <strong>the</strong>ir vision into great work plansfor o<strong>the</strong>rs. Ultimately, of course, it is results thatmatter. Biotech projects that both advance <strong>new</strong>treatments <strong>and</strong> release investor value will lead<strong>the</strong> way. Those types of firms are best formedthrough capital efficiency.To discuss <strong>the</strong> contents of this article, join <strong>the</strong> Bioentrepreneur forum on Nature Network:http://network.nature.com/groups/bioentrepreneur/forum/topicsnature biotechnology volume 28 number 5 MAY 2010 401


correspondenceNatural variation in crop composition <strong>and</strong> <strong>the</strong> impactof transgenesis© 2010 Nature America, Inc. All rights reserved.To <strong>the</strong> Editor:Compositional equivalence of cropsimproved through biotech-derivedtransgenic, or genetically modified (GM),traits <strong>and</strong> <strong>the</strong>ir conventional (non-GM)comparators is an important criterionin breeding as well as a key aspect of riskassessments of commercial c<strong>and</strong>idates. Wepresent here an analysis evaluated fromcompositional data on GM corn <strong>and</strong> GMsoybean varieties grown across a range ofgeographies <strong>and</strong> growing seasons with <strong>the</strong>aim of not only assessing <strong>the</strong> relative impactof transgene insertion on compositionalvariation in comparison with <strong>the</strong> effect ofenvironmental factors but also reviewing<strong>the</strong> implications of <strong>the</strong>se results on <strong>the</strong>safety assessment process. Specifically, ouranalysis includes evaluation of seven GMcrop varieties from a total of nine countries<strong>and</strong> eleven growing seasons. On <strong>the</strong> basis ofour data, we conclude that compositionaldifferences between GM varieties <strong>and</strong> <strong>the</strong>irconventional comparators were encompassedwithin <strong>the</strong> natural variability of <strong>the</strong>conventional crop <strong>and</strong> that <strong>the</strong> compositionof GM <strong>and</strong> conventional crops cannot bedisaggregated.Plant breeding programs expect to ei<strong>the</strong>rmaintain compositional quality duringenhancement of o<strong>the</strong>r agronomic traitsor improve crop compositional qualitythrough intended changes in <strong>the</strong> levels ofkey nutrients or antinutrients. Over <strong>the</strong>past two decades, one of <strong>the</strong> most successfulapproaches to enhancing agronomic traitsin crops is <strong>the</strong> insertion of trait-encodinggenes using <strong>the</strong> techniques of modernbiotech. Compositional equivalence betweenGM crops <strong>and</strong> conventional (non-GM)comparators is an important breeding goalbut is also often considered to provide an“equal or increased assurance of <strong>the</strong> safetyof foods derived from genetically modifiedplants” 1 . Comparative compositional studiesare <strong>the</strong>refore included as a significantcomponent of risk assessments of <strong>new</strong> GMcrops. As a consequence, a large body ofPercent dry weight4.03.53.02.52.01.51.00.50ALAARGASPCYSGLUGLYHIShigh-quality compositional data generatedaccording to principles outlined in <strong>the</strong>Organization for Economic Cooperation<strong>and</strong> Development (OECD; Paris) consensusdocuments 2 are available. On a product-byproductbasis, compositional equivalenceof GM crops <strong>and</strong> <strong>the</strong>ir conventionalcomparators has been demonstrated inpotato, cotton, soybean, corn, rice, wheat<strong>and</strong> alfalfa (for a list of references describingcompositional <strong>and</strong> omics comparisonsof GM <strong>and</strong> non-GM comparators, seeSupplementary References). In addition to<strong>the</strong> compositional studies conducted withinregulatory programs, biochemical studies onGM crops have been extensively pursued bypublic <strong>and</strong> private research sectors. Although<strong>the</strong>re are complexities in <strong>the</strong> interpretationof modern profiling technologies, <strong>and</strong> nost<strong>and</strong>ardized framework for comparisons,<strong>the</strong> lack of variation between GM crops<strong>and</strong> <strong>the</strong>ir conventional comparators at <strong>the</strong>transcriptomic, proteomic <strong>and</strong> metabolomiclevel has been independently corroborated.These profiling evaluations extend to awide range of plants including wheat,potato, soybean, rice, tomato, tobacco,Arabidopsis <strong>and</strong> Gerbera (see SupplementaryReferences).Amino acid components - cornNon-GMGMISOLEULYSMETPHEPROSERTHRTRPTYRFigure 1 Summary of amino acid levels in conventional <strong>and</strong> GM corn from a total of eight growingseasons. Each vertical bar represents <strong>the</strong> range of values for <strong>the</strong> corresponding amino acids asmeasured in studies listed in Supplementary Table 1. See Supplementary Table 20 for fur<strong>the</strong>r details<strong>and</strong> Supplementary Figures 1–11 for summarized data on o<strong>the</strong>r nutrient <strong>and</strong> antinutrient componentsin corn <strong>and</strong> soybean.These, <strong>and</strong> o<strong>the</strong>r studies (e.g., refs. 3–5),have also suggested a high degree of naturalvariability inherent to crop biochemical<strong>and</strong> metabolite composition. It is <strong>the</strong>reforereasonable to ask if changes in compositionassociated with modern transgenic breedingpractices are different in scope fromthose attributable to natural genotypic<strong>and</strong> environmentally <strong>media</strong>ted variation.We reasoned that a systematic analysisencompassing published compositionaldata generated under OECD guidelineson several GM products grown in a rangeof geographies, under different regionalagronomic practices <strong>and</strong> over multipleseasons would provide an effective overviewof <strong>the</strong> relative impacts of transgenesis-derivedagronomic traits with natural variation oncrop composition.GM corn <strong>and</strong> GM soybean now represent30.0% <strong>and</strong> 53%, respectively, of globalproduction 6 . Our analysis <strong>the</strong>refore evaluatedcompositional data reported on grain <strong>and</strong>seed harvested from different GM corn<strong>and</strong> GM soybean products as <strong>the</strong>se nowrepresent a significant percentage of globalproduction of <strong>the</strong>se crops as well as providean abundance of compositional data fromdiverse climates <strong>and</strong> growing regions. TheVAL402 volume 28 number 5 MAY 2010 nature biotechnology


correspondence© 2010 Nature America, Inc. All rights reserved.77TBrNCD77CBrNCDR2TBrNCD77TBrNSRR2TBrNSR77CBrNSR77TBrSNT77CBrSNTR2TBrSNT77TBrSRO77CBrSROR2TBrSROcompositional data described in this studywere generated under OECD guidelines aspart of <strong>the</strong> comparative safety assessmentprocess used to support regulatory approvals<strong>and</strong> commercialization. Componentsanalyzed included proximates, macro- <strong>and</strong>micronutrients, toxicants <strong>and</strong> antinutrientsas well as o<strong>the</strong>r crop-specific secondarymetabolites.The GM products evaluated in this studyrepresented a range of traits conferringinsect protection, herbicide resistance ordrought tolerance <strong>and</strong> originated from atotal of nine countries (France, Italy, Spain,Germany, Romania, <strong>the</strong> United States,Argentina, Brazil <strong>and</strong> Chile) over a total ofeleven growing seasons (SupplementaryTable 1). Nutritionally enhanced GMproducts (that is, those with intentionallyaltered compositional <strong>and</strong> metaboliteprofiles) are not included in this assessment.Experimental designs in all growingregions included multiple replicated fieldsites (Supplementary Methods). In earlierpublished compositional analyses of eachof <strong>the</strong> GM products described in this study(Supplementary Table 1), <strong>the</strong> mean values<strong>and</strong> ranges of compositional componentsmeasured across all individual replicated fieldsites (referred to as a combined-site analysis)were presented. To support <strong>the</strong> analysispresented here, statistical differences between<strong>the</strong> GM <strong>and</strong> conventional componentswithin each of <strong>the</strong> individual sites wereadditionally evaluated (Supplementary Notes<strong>and</strong> Supplementary Tables 2–19). Overall,for corn, a total of 2,350 (number of sites× number of compositional components;see Supplementary Table 1) statisticalcomparisons between <strong>the</strong> GM varieties <strong>and</strong>86420−2−4−6−6 −4 −2 0 2 4 6 8Component 1Figure 2 Hierarchical cluster analysis <strong>and</strong> principal component analysis of compositional datagenerated on <strong>the</strong> harvested seed of insect-protected MON 87701 <strong>and</strong> glyphosate-tolerant MON 89788soybean grown in <strong>the</strong> nor<strong>the</strong>rn <strong>and</strong> sou<strong>the</strong>rn regions of Brazil during <strong>the</strong> 2007–2008 season. Thesample codes are as follows. The first three digits indicate <strong>the</strong> sample: 77T, MON 87701; R2T, MON89788; <strong>and</strong> 77C, conventional control for both MON 87701 <strong>and</strong> MON 89788. The remaining digitsindicate <strong>the</strong> sites: Cachoeira Dourada; Minas Gerais (BrNCD); Sorriso; Mato Grosso (BrNSR); Nao-Me-Toque; Rio Gr<strong>and</strong>e do Sul (BrSNT); Rol<strong>and</strong>ia; <strong>and</strong> Parana (BrSRO). BrN indicates <strong>the</strong> nor<strong>the</strong>rn region,BrS represents <strong>the</strong> sou<strong>the</strong>rn region.Component 2<strong>the</strong>ir corresponding conventional controlswere conducted. Of <strong>the</strong>se, 91.5% were notsignificantly different (P > 0.05).In most, if not all, cases <strong>the</strong> statisticallysignificant differences between <strong>the</strong> GM<strong>and</strong> conventional components representedmodest differences in relative magnitude.In 2000, <strong>the</strong> Nordic Council of Ministers 7recommended that if a GM componentdiffered from <strong>the</strong> conventional controlby ±20%, additional analyses of <strong>the</strong> GMcrop were warranted. This approach is notgenerally recognized by <strong>the</strong> internationalregulatory community, but <strong>the</strong> 20% figuredoes form a reasonable threshold forarithmetical comparisons. It is apparent fromour analysis that <strong>the</strong>se magnitude differencesbetween GM crops <strong>and</strong> <strong>the</strong>ir conventionalcomparators are rarely observed. Fewer than1% of all comparisons, where a significantdifference (P > 0.05) was observed, hada relative magnitude difference >20%.For soybean, of a total of 1,840 statisticalcomparisons between <strong>the</strong> GM products <strong>and</strong><strong>the</strong> corresponding conventional controls(Supplementary Table 1), 88.5% were notsignificantly different (P > 0.05). As withcorn, <strong>the</strong> statistically significant differencesbetween <strong>the</strong> GM <strong>and</strong> conventional soybeancomponents generally represented modestdifferences in relative magnitude.Regardless of <strong>the</strong> respective merits of astatistical or strictly arithmetic approachto comparative assessments, both mustrecognize <strong>the</strong> extent of natural compositionalvariation found in conventional croppopulations. As demonstrated in Figure 1(see also Supplementary Figures 1–11 <strong>and</strong>Supplementary Table 20), <strong>the</strong> range of valuesobserved in <strong>the</strong>se studies for componentsevaluated in OECD consensus–based safetyassessments is extensive <strong>and</strong> encompasses<strong>the</strong> threshold values suggested by reference 7.Fur<strong>the</strong>rmore, <strong>the</strong>re is a remarkably extensiveoverlap in <strong>the</strong> values for <strong>the</strong> conventional <strong>and</strong>GM components, which suggests that overall,GM <strong>and</strong> conventional composition cannot bedisaggregated.Multivariate analyses (principalcomponents analysis (PCA) <strong>and</strong> hierarchicalclustering analysis (HCA)) were conductedon each of <strong>the</strong> compositional data setsgenerated in studies listed in SupplementaryTable 1 <strong>and</strong> are summarized inSupplementary Figures 12–24. An illustrativeexample of <strong>the</strong> relative contributions ofmodern biotech <strong>and</strong> natural variation oncrop composition is that of a compositionalstudy of MON 87701 <strong>and</strong> MON 89788 grownin distinct geographic regions of Brazil(nor<strong>the</strong>rn <strong>and</strong> sou<strong>the</strong>rn; Fig. 2). Along with<strong>the</strong> United States <strong>and</strong> Argentina, Brazil isone of <strong>the</strong> top three soybean producers in<strong>the</strong> world. Two of its major growing regions(nor<strong>the</strong>rn <strong>and</strong> sou<strong>the</strong>rn Brazil) are separatedby geography <strong>and</strong> climate <strong>and</strong> requiredifferent germplasms adapted for growthin each respective region. Thus, for insectprotectedMON 87701 <strong>and</strong> glyphosatetolerantMON 89788, both GM-derived traitsare introgressed into both <strong>the</strong> conventionalvariety Monsoy 8329, which is adapted forcultivation in <strong>the</strong> nor<strong>the</strong>rn region, <strong>and</strong> <strong>the</strong>conventional variety A5547 for cultivation in<strong>the</strong> sou<strong>the</strong>rn region.Figure 2 presents HCA <strong>and</strong> PCA ofcompositional data generated on MON 87701,MON 89788 <strong>and</strong> <strong>the</strong>ir respective regionspecificcontrols grown at two replicated fieldsites in each of <strong>the</strong> nor<strong>the</strong>rn <strong>and</strong> sou<strong>the</strong>rngrowing regions. It is apparent that cultivationin different regions contributes more thangenetic modification to compositionaldifferences recorded in this study. A detailedreview of <strong>the</strong> data revealed that whiledifferences in mean values of test <strong>and</strong> controlfatty acids <strong>and</strong> isoflavones were ei<strong>the</strong>rstatistically insignificant (P > 0.05) or of smallrelative magnitude, <strong>the</strong>re was a remarkabledifference in <strong>the</strong> fatty acid <strong>and</strong> isoflavoneprofiles of <strong>the</strong> two region-specific controls(Supplementary Notes <strong>and</strong> SupplementaryTables 14,15 <strong>and</strong> 20). For example, for <strong>the</strong>nor<strong>the</strong>rn region control (Monsoy 8329), <strong>the</strong>mean values for oleic acid <strong>and</strong> linoleic acidwere 40.43% <strong>and</strong> 39.73 % of total fatty acid,respectively, whereas corresponding valuesfor <strong>the</strong> sou<strong>the</strong>rn region control (A5547) were22.60 <strong>and</strong> 52.23% total fatty acids, respectively.Mean values for <strong>the</strong> major isoflavone daidzeinwere over four times higher (1,014 p.p.m.nature biotechnology volume 28 number 5 MAY 2010 403


correspondence© 2010 Nature America, Inc. All rights reserved.versus 234 p.p.m.) in A5547 relative to thatof Monsoy 8329. These differences weresubstantially greater in magnitude th<strong>and</strong>ifferences observed in test <strong>and</strong> controlcomparisons (Supplementary Table 21).All multivariate analyses presented inSupplementary Figures 12–24 were consistentwith this conclusion that differences ingrowing location <strong>and</strong>/or genetic backgroundcontributed more to compositional variationthan transgene insertion.This analysis reviewed compositional data,generated under OECD guidelines, from atotal of seven different GM crop varietiesgrown over a wide geographic area <strong>and</strong> atotal of eleven growing seasons. It presents<strong>the</strong> most comprehensive compilation of GMcrop composition data that we are aware of(Supplementary Table 20) <strong>and</strong> reveals that<strong>the</strong> compositional data for agronomicallyequivalent transgenic <strong>and</strong> conventional cropsfall indistinguishably within <strong>the</strong> same space.It can be concluded that incorporation ofbiotech-derived agronomic traits has hadlittle impact on natural variation in cropcomposition <strong>and</strong> that most compositionalvariation is attributable to growing region,agronomic practices <strong>and</strong> genetic background.This also supports <strong>the</strong> hypo<strong>the</strong>sis that <strong>the</strong>compositional quality, <strong>and</strong> by extrapolation,<strong>the</strong> nutritional quality of GM crops wi<strong>the</strong>nhanced agronomic traits is consistentlymaintained.Several considerations follow from ourobservations. For example, at least onestudy 8 has recommended that compositionalassessments of <strong>new</strong> crop varieties, <strong>and</strong>not <strong>the</strong> breeding technologies adopted in<strong>the</strong>ir development, serve as <strong>the</strong> basis forregulatory evaluation. The results presentedhere imply that if regulatory scrutiny isto be commensurate with <strong>the</strong> potentialfor compositional deviation, <strong>the</strong>re is noreason to prioritize crops on <strong>the</strong> basis ofgenetic modification via transgenesis overcrops genetically modified via conventionalbreeding, chemical mutagenesis orirradiation. This is consistent with <strong>the</strong>product-based regulatory principle that“products, substances <strong>and</strong> tangibles” shouldbe <strong>the</strong> basis of risk assessments <strong>and</strong> not<strong>the</strong> processes involved in creating thoseproducts (for a discussion, see refs. 9,10). Itis noteworthy that two recent commentarieson <strong>the</strong> application of transgenic technologypresented in Nature Biotechnology havediscussed <strong>the</strong> impact of prioritizing GMbreeding strategies in <strong>the</strong> regulatory approvalprocess as leading to curtailing “agbiotechproduct quality innovation” 11 <strong>and</strong> “stranglingat birth” forest biotech 12 .More detailed compositional analysis<strong>and</strong> documentation of crop material usedin animal feeding studies used in safetyassessments may also be warranted. Theresults presented here emphasize <strong>the</strong> need tounderst<strong>and</strong> natural variation in providingbiological context to pair-wise differencesin any recorded toxicological or nutritionalprofiles during comparisons of animals feddiets containing GM plant material (oftengrain) with diets containing <strong>the</strong> comparatorconventional plant material. Referencecrops are typically used in feeding studies inregulatory safety assessments as a means toencompass compositional variation in <strong>the</strong> testcrop, yet are often excluded in o<strong>the</strong>r publishedreports. The unfortunate consequences ofomitting references <strong>and</strong> eschewing carefulcompositional analyses in feeding studieshave been documented in a nonpeer reviewedFeature article in this journal 13 .The economic <strong>and</strong> environmental impactsof <strong>the</strong> global adoption of GM crops over<strong>the</strong> past decade have been reviewed 6,14,15 .GM crops are credited with increasedyields, decreased pesticide <strong>and</strong> fuel use <strong>and</strong>,particularly in <strong>the</strong> case of herbicide-tolerantcrops, with facilitating conservation tillagepractices. It can be concluded that <strong>the</strong>sederived benefits have been accompanied byconsistent compositional quality <strong>and</strong> thatcompositional quality implies a very broadrange of compositions <strong>and</strong> endogenous levelsof single constituents. The findings from <strong>the</strong>analysis reported here may prove relevantto research strategies <strong>and</strong> public policyevaluations of <strong>the</strong> safety <strong>and</strong> nutritional valueof GM <strong>and</strong> conventional crops.Note: Supplementary information is available on <strong>the</strong>Nature Biotechnology website.COMPETING FINANCIAL INTERESTSThe authors declare competing financial interests:details accompany <strong>the</strong> full-text HTML version of <strong>the</strong>paper at http://www.nature.com/naturebiotechnology/.George G Harrigan, Denise Lundry,Suzanne Drury, Kristina Berman,Susan G Riordan, Margaret A Nemeth,William P Ridley & Kevin C GlennProduct Safety Center, Monsanto Company, 800North Lindbergh Blvd., St. Louis, Missouri, USA.e-mail: george.g.harrigan@monsanto.com1. OECD. Report of <strong>the</strong> OECD Workshop on <strong>the</strong> Toxicological<strong>and</strong> Nutritional Testing of Novel Foods (Organisation forEconomic Co-operation <strong>and</strong> Development, Paris, 1998).2. OECD. An Introduction to <strong>the</strong> Food/Feed Safety ConsensusDocuments of <strong>the</strong> Task Force (Organization for EconomicCooperation <strong>and</strong> Development, Paris, 2006).3. Lehesranta, S.J. et al. Proteomics 7, 597–604 (2007).4. Reynolds, T.L., Nemeth, M.A., Glenn, K.C., Ridley, W.P.& Astwood, J.D. J. Agric. Food Chem. 53, 10061–10067(2005).5. Harrigan, G.G. et al. J. Agric. Food Chem. 55, 6177–6185 (2007).6. James, C. Executive Summary of Global Status ofCommercialized Biotech/GM Crops: 2008. ISAAA BriefsNo. 39 (International Service for <strong>the</strong> Acquisition of Agri-Biotech Applications, Ithaca, New York, 2008).7. Hothorn, L.A. & Oberdoerfer, R. Regul. Toxicol.Pharmacol. 44, 125–135 (2006).8. National Research Council <strong>and</strong> Insitute of Medicine of<strong>the</strong> National Academies. Safety of Genetically EngineeredFoods. Approaches to Assessing Unintended Health Effects(The National Academies Press, Washington, DC, 2004).9. McHughen, A. Nat. Biotechnol. 25, 725–727 (2007).10. Bradford, K.J., Van Deynze, A., Gutterson, N., Parrott, W.& Strauss, S.H. Nat. Biotechnol. 23, 439–444 (2005).11. Graff, G.D., Zilberman, D. & Bennett, A.B. Nat.Biotechnol. 27, 702–704 (2009).12. Strauss, S.H., Tan, H.M., Boerjan, W. & Sedjo, R. Nat.Biotechnol. 27, 519–527 (2009).13. Marshall, A. Nat. Biotechnol. 25, 981–987 (2007).14. Brookes, G. & Barfoot, P. AgBioForum 11, 21–38(2008).15. Kleter, G.A. et al. Pest Manag. Sci. 63, 1107–1115(2007).GM crops <strong>and</strong> gender issuesTo <strong>the</strong> Editor:Correspondence in <strong>the</strong> December issueby Jonathan Gressel 1 not only states thatgender issues in ruralsettings have not beenadequately addressed withrespect to weed controlbiotech but also assertsthat such technologycan increase <strong>the</strong> qualityof life of rural womenin developing countries.Improved weed control isa labor-saving technologythat can result in lessemployment in a laborsurplus rural economy.Often in rural areas, wage income is<strong>the</strong> main source of income <strong>and</strong> animportant determinant of <strong>the</strong> qualityof life, particularlywhere employmentopportunities are generallylimited 2 . Apart from soilpreparation, planting <strong>and</strong>weeding, harvesting is also‘femanual’ work that cangenerate more employmentif yields are higher. Biotechcan enhance <strong>the</strong> quality oflife of women but only if<strong>the</strong> technology is associatedwith overall generation ofrural employment.404 volume 28 number 5 MAY 2010 nature biotechnology


correspondence© 2010 Nature America, Inc. All rights reserved.US $ per hectare160140120100806040200Familymale Bt cottonFamilyfemaleOn <strong>the</strong> basis of <strong>the</strong>se issues, we feel thatGressel presents only part of <strong>the</strong> story<strong>and</strong> that quality of life for women indeveloping countries depends not only on<strong>the</strong> ‘femanual’ work but also on <strong>the</strong> incomes<strong>the</strong>y earn. Thus, addressing gender issuesin biotech requires rigorous analysis <strong>and</strong>a comprehensive evaluation beyond thatoutlined by Gressel. Here we summarizerecent research by two of us (A.S. & M.Q.) 3,4on <strong>the</strong> gender effects of insect-resistantBacillus thuringiensis toxin (Bt) cotton inIndia, which indicates that this technologygenerates more employment for females,who happen to earn much more than males.Since its commercialization in India in<strong>the</strong> year 2002, <strong>the</strong> area in which Bt cotton iscultivated increased to 7.6 million hectaresin 2008 (ref. 5). Several studies showsizable direct benefits of <strong>the</strong> technology<strong>and</strong> also indirect benefits from spilloversto o<strong>the</strong>r rural markets <strong>and</strong> sectors 6–8 , butno studies analyzed <strong>the</strong> gender aspect ofthis technology. To analyze <strong>the</strong> genderimplications of Bt cotton adoption, wecarried out two household surveys 3,4 .The first survey was undertaken in onevillage where we collected comprehensivedata on household characteristics <strong>and</strong>interactions across various markets. Thestudy village, Kanzara, is located in <strong>the</strong>Akola district of Maharashtra, <strong>the</strong> statewith <strong>the</strong> largest area under cotton in India.Kanzara can be considered a typical settingfor small-holder cotton production in<strong>the</strong> semi-arid tropics 9 . Interviews with allvillage households <strong>and</strong> institutions wereconducted in 2004, capturing all householdeconomic activities <strong>and</strong> transactions for<strong>the</strong> 12-month period between April 2003<strong>and</strong> March 2004. Of <strong>the</strong> total 305 villagehouseholds, 102 are l<strong>and</strong>less; <strong>the</strong> o<strong>the</strong>r Conventional cottonHiredmaleHiredfemaleAlllaborsFigure 1 Returns to labor from Bt cotton <strong>and</strong> conventional cotton in rural India. Family laborers arehousehold members working in <strong>the</strong>ir own farm. Hired labor refers to farm work performed by l<strong>and</strong>ed<strong>and</strong> l<strong>and</strong>less households in o<strong>the</strong>rs farm earning wages. Returns to non-farm labor are not included here.Simulation I modeled an increase in Bt cotton area by 1 hectare; simulation II modeled an increase inconventional cotton area by 1 hectare. Both simulations are based on SAM multiplier model (for moredetails, see Supplementary Information).203 own l<strong>and</strong> suitable for agriculturalproduction. The average farm size of l<strong>and</strong>owninghouseholds in <strong>the</strong> village is 1.9hectares. All farm households cultivate atleast some cotton, mostly next to a numberof food <strong>and</strong> fodder crops for subsistenceconsumption <strong>and</strong> for sale.This information was updated using<strong>the</strong> second survey: panel data from a farmsample survey conducted over a period of5 years 10 . We used this more representativesurvey data to fur<strong>the</strong>r improve <strong>the</strong>robustness of <strong>the</strong> results 3,4 . Based on <strong>the</strong>setwo data sources, we developed a socialaccounting matrix (SAM) for Kanzara,which represents <strong>the</strong> flows of all economictransactions that take place within <strong>the</strong>village economy (Supplementary Table 1<strong>and</strong> Supplementary Methods). Over <strong>the</strong>2003–2004. <strong>the</strong> gross domestic productof <strong>the</strong> village was about $0.53 million.Village SAMs have been developed <strong>and</strong>used previously in different contexts 11–13 .Yet, our SAM is distinct in two respects.First, unlike previous SAMs, which are allbased on sample surveys, our SAM buildson a village census. Because a SAM byconstruction requires both receipts <strong>and</strong>payments of all transactions, availabilityof census data reduces <strong>the</strong> problem ofunbalanced markets <strong>and</strong> thus of biasedresults. Second, our SAM explicitlyconsiders both Bt cotton <strong>and</strong> conventionalcotton as two different activities, whichallows us to evaluate both technologies’distributional impacts.Even so, <strong>the</strong> SAM as such is a staticrepresentation of <strong>the</strong> village economy<strong>and</strong> does not allow statements to be madeabout income distribution effects ofindividual activities like Bt cotton. To dothis requires a SAM multiplier model, whichwe refined (Supplementary Methods <strong>and</strong>Supplementary Fig. 1) <strong>and</strong> used for differentsimulations. In particular, we ran twosimulation experiments—‘simulation I’ forBt cotton <strong>and</strong> ‘simulation II’ for conventionalcotton—both modeling an expansion in <strong>the</strong>village cotton area by 1 hectare.Using a village modeling approachtaking into account both direct <strong>and</strong>indirect benefits, our study found thatBt cotton technology generates not onlyhigher income but also more employment,especially for hired female labor 3,4 .Compared with conventional cotton (Fig. 1;simulation II), Bt cotton (Fig. 1; simulationI) generates additional employment, raising<strong>the</strong> total wage income by $40 per hectare 4 .The largest increase is for hired femaleswith a gain of 55% from Bt cotton. Thistranslates to about 424 million additionalemployment opportunities for femaleearners for <strong>the</strong> total Bt cotton area in India.Increase in returns to hired female labor ismostly related to higher yields in Bt cotton,due to <strong>the</strong> additional labor employed forpicking <strong>the</strong> increased production of cotton(harvesting of cotton is primarily a femaleactivity in India).For family female labor, additionalincome from Bt cotton leads to withdrawalof in-house females from farming activities,raising <strong>the</strong> quality of life of women.Although reduced pesticide applicationsin Bt cotton is labor saving, <strong>the</strong> returns tofamily male labor that largely carry out thisactivity is higher (Fig. 1). Even so, someof <strong>the</strong> saved family male labor involvedin scouting <strong>and</strong> spraying for pests arereallocated to o<strong>the</strong>r household economicactivities, previously carried out by femalefamily members, increasing <strong>the</strong> returnsto this labor category. Overall, <strong>the</strong>refore,Bt cotton enhances <strong>the</strong> quality of life ofwomen through increasing income <strong>and</strong>reducing ‘femanual’ work.Note: Supplementary information is available on <strong>the</strong>Nature Biotechnology website.COMPETING FINANCIAL INTERESTSThe authors declare no competing financial interests.Arjunan Subramanian 1,2 , Kerry Kirwan 1 , DavidPink 2 & Matin Qaim 31 University of Warwick, Warwick ManufacturingGroup, Coventry, UK. 2 University of Warwick,WHRI, Warwickshire, UK. 3 Georg-AugustUniversity of Goettingen, Goettingen, Germany.e-mail: s.arjunan@warwick.ac.uk1. Gressel, J. Nat. Biotechnol. 27, 1085–1086(2009).2. Subramanian, A. Distributional Effects of AgriculturalBiotechnology in a Village Economy: The Case of Cottonin India (Curvillier Verlag, Goettingen, Germany, 2007).nature biotechnology volume 28 number 5 MAY 2010 405


correspondence© 2010 Nature America, Inc. All rights reserved.3. Subramanian, A. & Qaim, M. World Dev. 37, 256–267(2009).4. Subramanian, A. & Qaim, M. J. Dev. Stud. 46, 295–311 (2010).5. Marshall, A. Nat. Biotechnol. 27, 221 (2009).6. Huang, J., Hu, R., Rozelle, S. & Pray, C. Science 308,688–690 (2005).7. Gomez-Barbero, M., Berbel, J. & Rodriguez-Cerezo, E.Nat. Biotechnol. 26, 384–386 (2008).8. Qaim, M., Subramanian, A. & Sadashivappa, P. Nat.Biotechnol. 27, 803–804 (2009).9. Walker, T. & Ryan, J. Village <strong>and</strong> Household Economiesin India’s Semi-Arid Tropics (The Johns HopkinsUniversity Press, Baltimore, Maryl<strong>and</strong>, 1990).10. Sadashivappa, P. & Qaim, M. AgBioForum 12, 172–183 (2009).11. Adelman, I., Taylor, E. & Vogel, S. J. Dev. Stud. 25,5–24 (1988).12. Subramanian, S. & Sadoulet, E. Econ. Dev. Cult.Change 39, 131–173 (1990).13. Parikh, A. & Thorbecke, E. Econ. Dev. Cult. Change44, 351–377 (1996).BIO’s track record on emergingcompaniesTo <strong>the</strong> Editor:As executives at emerging biotechcompanies <strong>and</strong> chairs of <strong>the</strong> BiotechnologyIndustry Organization’s (BIO; Washington,DC) Board of Directors (S.S.) <strong>and</strong>Emerging Companies Governing Board(R.K.), we were pleased to see that youreditorial in <strong>the</strong> February issue 1 recognizedthat BIO is <strong>the</strong> “only advocate for <strong>the</strong>smaller, younger, nonrevenue-driven[biotech] companies” but have to disagreethat our voice on behalf of <strong>the</strong>se smallfirms is not “always loud <strong>and</strong> clear.”BIO consistently <strong>and</strong> effectivelyadvocates for exp<strong>and</strong>ing available fundingfor emerging biotech companies, whichcompose ~90% of our core membership.These companies have no products on <strong>the</strong>market <strong>and</strong> revenues of 40% of BIO’s Health Section GoverningBoard, which is <strong>the</strong> entity within BIO’sgovernance structure that formallydevelops <strong>and</strong> sets BIO’s positions onmajor healthcare issues.BIO seeks public policy outcomes that helpencourage investment in small, researchintensivebiotech companies <strong>and</strong> advocatesfor public policies that exp<strong>and</strong> access to, <strong>and</strong><strong>the</strong> availability of, public funding for researchconducted by <strong>the</strong>se companies.Over <strong>the</strong> past year, BIO has workedtirelessly with its members to advocatesuccessfully for a provision included in<strong>the</strong> healthcare reform legislation recentlysigned into law that will provide $1billion in <strong>the</strong>rapeutic discovery projecttax credits. This credit will provide reliefto investment-starved small biotechresearch companies by helping to offset aportion of resources spent on <strong>the</strong>rapeuticdevelopment activities, such as hiringscientists <strong>and</strong> conducting clinical studies.BIO’s continuing work to restoreeligibility to majority venture-backedsmall biotech companies to competefor Small Business Innovation Research(SBIR) grants has resulted in an importantdiscussion on Capitol Hill about <strong>the</strong>nature of our sector’s funding. We havewon informed <strong>and</strong> passionate support inboth <strong>the</strong> House <strong>and</strong> Senate <strong>and</strong> significantlegislative progress. We remain optimisticthat <strong>the</strong> SBIR/STTR Reauthorization Act of2009 will address our concerns.BIO successfully advocates for large <strong>and</strong>small companies alike by addressing issuesspecific to company size <strong>and</strong> businesssector, as well as those that affect <strong>the</strong>industry as a whole. Emerging companiesdepend on <strong>the</strong> success of establishedbiotech companies to get innovative <strong>new</strong><strong>the</strong>rapies approved <strong>and</strong> reimbursed atreasonable rates to attract investment.Our advocacy efforts on healthcarereform have exemplified our success inshaping public policy so that it continues toincentivize innovation, benefiting biotechcompanies, both large <strong>and</strong> small.The healthcare reform law also includeslanguage to establish a pathway for <strong>the</strong>approval of biosimilars, which will ensurepatient safety, exp<strong>and</strong> competition,reduce costs <strong>and</strong> provide necessary <strong>and</strong>fair incentives for continued biomedicalinnovation. BIO spent <strong>the</strong> past 3 yearstirelessly educating members of <strong>the</strong>House <strong>and</strong> Senate on <strong>the</strong> complexity ofbiologics <strong>and</strong> <strong>the</strong> importance of providinga significant period of data exclusivity toallow biotech companies to recover <strong>the</strong>irexpenses <strong>and</strong> provide an adequate returnon investment. Without <strong>the</strong> guarantee ofthis return on investment, firms such asours would have great difficulty in raisingfunds to finance <strong>the</strong> next-generationinnovative <strong>the</strong>rapies.BIO also has been a leading player inadvocating meaningful patent reformlegislation that will help promotecontinued biotechnology innovation <strong>and</strong>help drive US economic growth. Patentsare often <strong>the</strong> sole assets of many BIOmembers. As such, strong <strong>and</strong> predictablepatent protection enables <strong>the</strong> flow ofrisk capital that is vital to achievingbiotechnology’s promise. While patentreform legislation continues to wind itsway through Congress, BIO has successfullyadvocated for several key provisions thatwill streng<strong>the</strong>n <strong>the</strong> US patent system <strong>and</strong>enhance patent quality.Perhaps as crucial as <strong>the</strong> issues thatBIO’s board chooses to advocate for is ourapproach. BIO has, <strong>and</strong> will continue to be,policy led. Our engagements with membersof Congress are oriented around <strong>the</strong> keyfacts about our industry, without regardto party or politics. The industry that BIOrepresents is based on cutting-edge science,<strong>and</strong> our efforts are supported by data <strong>and</strong>facts.In addition to its advocacy efforts onbehalf of companies, BIO hosts industryleadinginvestor <strong>and</strong> partnering meetingsin <strong>the</strong> United States <strong>and</strong> around <strong>the</strong> worldto provide emerging companies withinvestment, licensing <strong>and</strong> o<strong>the</strong>r partnershipopportunities.BIO is committed to be <strong>the</strong> voice ofall biotech companies—whe<strong>the</strong>r small,medium or large. Although <strong>the</strong> difficulty ofdoing so is not lost on us, <strong>the</strong> voice of smallbiotech is both loud <strong>and</strong> clear—<strong>and</strong>, we arehappy to report, being heard.COMPETING FINANCIAL INTERESTSThe authors declare no competing financial interests.Rachel K King 1 & Stephen A Sherwin 21 GlycoMimetics, Inc., Gai<strong>the</strong>rsburg, Maryl<strong>and</strong>,USA. 2 Ceregene, Inc., San Diego, California, USA.1. Anonymous. Nat. Biotechnol. 28, 103 (2010).406 volume 28 number 5 MAY 2010 nature biotechnology


FEATURESouth-South entrepreneurial collaboration in healthbiotechHalla Thorsteinsdóttir, Christina C Melon, Monali Ray, Sharon Chakkalackal, Michelle Li, Jan E Cooper,Jennifer Chadder, Tirso W Saenz, Maria Carlota de Souza Paula, Wen Ke, Lexuan Li, Magdy A Madkour,Sahar Aly, Nefertiti El-Nikhely, Sachin Chaturvedi, Victor Konde, Abdallah S Daar & Peter A Singer© 2010 Nature America, Inc. All rights reserved.A survey of entrepreneurial collaborations among health biotech firms in developing countries reveals a surprisinglyhigh level of collaboration but a lack of emphasis on <strong>new</strong> or improved health biotech products <strong>and</strong> processes.In recent decades, developing countries havesought to reduce <strong>the</strong>ir reliance on trade with<strong>the</strong> economically <strong>and</strong> politically dominantnor<strong>the</strong>rn, or developed, countries, favoringinstead South-South partnerships that synergizestrengths <strong>and</strong> bolster competitiveness.Entrepreneurial firms in developing countriesare increasingly aware of <strong>the</strong> opportunities inone ano<strong>the</strong>r’s markets, as is evident from <strong>the</strong>12.5% increase in <strong>the</strong> rate of South-South tradeeach year 1 .Emerging economies, such as China <strong>and</strong>India, have experienced unprecedented growthHalla Thorsteinsdóttir, Christina C. Melon,Monali Ray, Sharon Chakkalackal, MichelleLi, Jan E. Cooper, Jennifer Chadder, AbdallahS. Daar <strong>and</strong> Peter A. Singer are at <strong>the</strong>McLaughlin Rotman Centre for Global Health,University of Toronto <strong>and</strong> University HealthNetwork, Toronto, Ontario, Canada. HallaThorsteinsdóttir <strong>and</strong> Abdallah S. Daar arealso at <strong>the</strong> Dalla Lana School of Public Health,University of Toronto, Ontario, Canada. TirsoW. Saenz <strong>and</strong> Maria Carlota de Souza Paulaare at <strong>the</strong> Centre for Sustainable Development,University of Brasilia, Brazil. Wen Ke <strong>and</strong>Lexuan Li are at <strong>the</strong> Institute of Policy <strong>and</strong>Management, Chinese Academy of Sciences,Beijing, China. Magdy A. Madkour is at <strong>the</strong>Arid L<strong>and</strong>s Agricultural Research Institute, AinShams University, Cairo, Egypt. Sahar Aly <strong>and</strong>Nefertiti El-Nikhely are at <strong>the</strong> Center for SpecialStudies <strong>and</strong> Programs, Biblio<strong>the</strong>ca Alex<strong>and</strong>rina,Alex<strong>and</strong>ria, Egypt. Sachin Chaturvedi is at<strong>the</strong> Research <strong>and</strong> Information System forDeveloping Countries, India. Victor Konde is at<strong>the</strong> University of Zambia, Lusaka, Zambia.e-mail: halla.thorsteinsdottir@mrcglobal.orgcountries as several developing countries havebuilt up capacity in <strong>the</strong> field, including privatesectordevelopment 5–9 .At <strong>the</strong> same time, analysts have calledfor increased South-South collaboration toaddress shared health problems 10 . Developingcountries are increasingly aware of <strong>the</strong> importanceof doing so through joint efforts withone ano<strong>the</strong>r, <strong>and</strong> <strong>the</strong>y have set up networks todeal with malaria, tuberculosis, HIV/AIDS <strong>and</strong>o<strong>the</strong>r common diseases. Toge<strong>the</strong>r with Russia<strong>and</strong> <strong>the</strong> Ukraine, Brazil, China, Cuba, Nigeria<strong>and</strong> Thail<strong>and</strong> are working toge<strong>the</strong>r in a networkthat jointly promotes research <strong>and</strong> development(R&D) aimed at developing innovativediagnostics kits, drugs <strong>and</strong> vaccines for HIV/AIDS prevention <strong>and</strong> treatment 11 . In addition,24 manufacturers of vaccines in develop<strong>and</strong>increased globaltrade 2 . Fur<strong>the</strong>rmore,developing countrieshave beensetting up mechanismsto encourageincreased tradewith one ano<strong>the</strong>rby establishing freetrade zones, suchas <strong>the</strong> Associationof Sou<strong>the</strong>ast AsianNations Free TradeArea, <strong>the</strong> Sou<strong>the</strong>rnCommon Market(Mercosur/Mercosul)in Latin America <strong>and</strong><strong>the</strong> Common Marketfor Eastern <strong>and</strong>Sou<strong>the</strong>rn Africa.Developing countrieshave also been targeting science <strong>and</strong>technology sectors as key areas for encouragingSouth-South collaboration <strong>and</strong> are forginga growing number of bilateral, multilateral<strong>and</strong> regional agreements with this aim 3 . SouthAfrica <strong>and</strong> Malawi, for example, have formedan agreement directed at accelerating economicgrowth <strong>and</strong> reducing poverty through<strong>the</strong> adoption of current global technologies 4 . Inaddition, <strong>the</strong>re are significant science <strong>and</strong> technologycomponents in regional collaborationefforts in developed countries, such as thoseorganized by <strong>the</strong> New Partnership for Africa’sDevelopment (http://www.nepad.org/), <strong>and</strong><strong>the</strong> IBSA network organized by India, Brazil<strong>and</strong> South Africa (http://www.ibsa-trilateral.org/). Health biotech provides a substantialscope for collaboration between developingMembers of <strong>the</strong> team studying South-South entrepreneurial collaboration inhealth biotech. From left to right: Sachin Chaturvedi, May Sanaee, MagdyA Madkour, Wen Ke, Halla Thorsteinsdottir, Victor Konde, Tirso W Saenz,Monali Ray, Christina Melon, Nefertiti El-Nikhely, Heba Maram.nature biotechnology volume 28 number 5 MAY 2010 407


feature© 2010 Nature America, Inc. All rights reserved.Table 1 Number of health biotech firms surveyed <strong>and</strong> <strong>the</strong>ir response ratesCountry Number of firms surveyed Number of responses Response rateBrazil 110 72 66%China 139 83 60%Cuba 11 8 73%Egypt 22 15 68%India 121 68 56%South Africa 64 42 66%Total 467 288 62%ing countries have come toge<strong>the</strong>r to form <strong>the</strong>Developing Countries Vaccine ManufacturesNetwork (http://www.dcvmn.com/), whichensures a consistent <strong>and</strong> sustainable supply ofquality vaccines to developing countries at anaffordable price <strong>and</strong> encourages R&D efforts tomeet <strong>the</strong> emerging vaccine needs in <strong>the</strong> developingworld.Although South-South collaboration in science<strong>and</strong> technology has been high on developingcountries’ agenda since <strong>the</strong> 1960s 12 , <strong>the</strong>reis only a limited amount of empirical evidencethat examines <strong>the</strong>se collaborations. In healthbiotech, for example, we are not aware of anywork confirming that developing countries’firms have heeded <strong>the</strong> call for South-Southcollaboration, or that <strong>the</strong>y are to any significantdegree working toge<strong>the</strong>r. In this article,we aim to fill this knowledge gap <strong>and</strong> provideempirical data on South-South collaboration.We refer to partnerships between health biotechfirms in developing countries (that is, low<strong>and</strong>middle-income countries) as ‘South-Southfirm collaboration’. Collaboration betweenfirms in developing <strong>and</strong> developed countries(high-income countries) is called ‘South-Northfirm collaboration’.Rationale for South-South collaborationOne reason why firms in health biotech, both indeveloping countries <strong>and</strong> elsewhere, may want32%6%North-South onlySouth-South only21%41%BothNei<strong>the</strong>rFigure 1 Extent of international collaboration ofhealth biotech firms in developing countries <strong>and</strong>comparisons of <strong>the</strong>ir South-South versus South-North collaborations.to work toge<strong>the</strong>r is to minimize costs <strong>and</strong> risk.The commercialization of <strong>new</strong> health products<strong>and</strong> services in biotech is characterized by highcosts <strong>and</strong> high risks 13 . Even though preclinicalwork may produce promising medicines, attritionof products remains high, with many leadc<strong>and</strong>idates rejected after costly clinical humantesting.Ano<strong>the</strong>r reason why collaborations areattractive is that <strong>the</strong>y provide a conduit to <strong>new</strong><strong>and</strong> foreign markets. Alliances between firmsare often necessary to exp<strong>and</strong> <strong>the</strong>ir markets 14 .Firms in small countries are particularly dependenton exporting <strong>the</strong>ir products to survive,<strong>and</strong> collaborative arrangements with firms ino<strong>the</strong>r countries are typically needed to obtainthis access.A third rationale for collaboration is togain enhanced access to strategic knowledgeor specific technical skills 13–17 . Both scientific<strong>and</strong> product development knowledge inhealth biotech is highly specialized, making itnearly impossible for small firms or institutionsin developing countries to harness it all.Collaborations <strong>the</strong>refore become a means bywhich firms can obtain access to a wide spectrumof knowledge, technologies <strong>and</strong> skills,allowing <strong>the</strong>m to implement <strong>new</strong> <strong>and</strong> relevantfindings in <strong>the</strong>ir field. This knowledge can berequisite for various phases of health biotechdevelopment. For instance, for many smallfirms that are taking <strong>the</strong>ir first steps in productdevelopment, access to knowledge about regulatoryauthorities <strong>and</strong> processes in local <strong>and</strong>foreign markets is particularly important.If developing countries can cultivate waysto work effectively toge<strong>the</strong>r, <strong>the</strong>y may be ableto harness a more relevant model of promotinginnovation than <strong>the</strong> traditional model ofrelying on linkages with developed countries.By pooling <strong>the</strong>ir expertise <strong>and</strong> resources, <strong>the</strong>ycould streng<strong>the</strong>n <strong>the</strong>ir capability to addressshared problems—problems that may notaffect <strong>the</strong> developed world nor capture <strong>the</strong>interest of companies <strong>the</strong>re. If successful,South-South collaboration could increasecapacity in science-intensive fields by allowingparticipants to learn from each o<strong>the</strong>r,improve <strong>the</strong> ability of developing countries toaddress <strong>the</strong>ir own problems, <strong>and</strong> contribute toeconomic development <strong>and</strong> quality of life indeveloping countries.To examine <strong>the</strong> level <strong>and</strong> characteristics ofSouth-South collaboration, we sent a brief surveyto 467 health biotech firms in six developingcountries that have relatively strong healthbiotech sectors—Brazil, China, Cuba, Egypt,India <strong>and</strong> South Africa—<strong>and</strong> asked about <strong>the</strong>irlinkages with all o<strong>the</strong>r developing countries.We selected <strong>the</strong>se countries on <strong>the</strong> basis of ourprevious research identifying <strong>the</strong>m as regionalleaders in this field 5 . The survey was sent to all<strong>the</strong> dedicated biotech firms that we could identifyin <strong>the</strong>se countries, to pharmaceutical firmsactive in biotech <strong>and</strong> to o<strong>the</strong>r organizationsheavily involved in commercialization activitiesin <strong>the</strong> health biotech field (see SupplementaryMethods for a discussion on how we identifiedhealth biotech firms). We asked <strong>the</strong> firmswhe<strong>the</strong>r <strong>the</strong>y collaborated with firms or organizationsin o<strong>the</strong>r low- <strong>and</strong> middle-incomecountries, <strong>and</strong> if so, to name <strong>the</strong>ir collaborators<strong>and</strong> provide an overview of each partnership.Data collected included <strong>the</strong> reasons for <strong>the</strong>collaboration, <strong>the</strong> activities involved <strong>and</strong> <strong>the</strong>output of <strong>the</strong> collaboration. We presented <strong>the</strong>firms with a broad definition of ‘collaboration’,including in that definition any work jointlyundertaken by firms <strong>and</strong> organizations thatcontributes to <strong>the</strong> production of knowledge,products or services in health biotech.A total of 288 firms completed <strong>the</strong> survey,a response rate of 62% (Table 1). We feel thisis a solid response rate, given that participationwas voluntary <strong>and</strong> <strong>the</strong> nature of <strong>the</strong> sectorcan make it challenging to get responsesfrom firms. The sector is fluid, with companiesfrequently merging or going bankrupt. In biotechsurveys by <strong>the</strong> Organisation for EconomicCo-operation <strong>and</strong> Development (Paris) involvingm<strong>and</strong>atory responses, only response ratesunder 50% are considered low 18 .In <strong>the</strong> following sections, we describe <strong>the</strong>extent of South-South health biotech collaborations,map where <strong>the</strong> main linkages lie <strong>and</strong>South-South collaboration(percentage)80706050403020100Brazil ChinaCubaEgyptCountryIndiaSouthAfricaFigure 2 Percentages of firms in <strong>the</strong> countrieswe surveyed that engage in South-South healthbiotech collaboration.408 volume 28 number 5 MAY 2010 nature biotechnology


featureTurkeyMexicoIraq ChinaJordanLibyaCubaPakistanEgyptGuatemalaYemenThail<strong>and</strong>Dominican RepublicNigeriaSudanIndiaColombia VenezuelaPhilipinesGhanaUg<strong>and</strong>a Kenya Sri LankaEcuadorMalaysiaIndonesiaPeruBrazilMalawiMozambiqueBoliviaBotswanaParaguayNamibia ZimbabweSwazil<strong>and</strong>Argentina South Africa Lesotho© 2010 Nature America, Inc. All rights reserved.Figure 3 Collaboration network of health biotech firms in South-South collaborations. The size of each node represents <strong>the</strong> total number of South-Southcollaborations for <strong>the</strong> country, while <strong>the</strong> width of each line represents <strong>the</strong> number of collaborations between <strong>the</strong> two linked countries. For clarity, only linkagesof two or more collaborations were included on this map.explore <strong>the</strong> main characteristics <strong>and</strong> outputs of<strong>the</strong> collaborations.Extent of South-South collaborationThe results show that South-South firm collaborationis substantial, with more than aquarter (27%) of <strong>the</strong> health biotech firms thatresponded reporting collaborations of this type(Fig. 1). South-North collaboration is still morepredominant, however, with over half (53%) of<strong>the</strong> firms reporting collaborations with developedcountries. A proportion of <strong>the</strong> firms inour sample (21%) indicated <strong>the</strong>y engaged inboth South-South <strong>and</strong> South-North collaborations.We looked at <strong>the</strong> proportion of firms involvedin South-South collaboration in each of <strong>the</strong>countries we studied (Fig. 2). Those countriesin our sample with <strong>the</strong> smallest populations—Cuba <strong>and</strong> South Africa—are <strong>the</strong> most active inSouth-South collaborations, with almost halfof <strong>the</strong> South African firms <strong>and</strong> three-quartersof <strong>the</strong> Cuban entrepreneurial organizationsreporting involvement in this type of collaboration.This is in stark contrast to <strong>the</strong> morepopulated countries, such as China, where justover 10% of <strong>the</strong> firms report South-South collaborations,<strong>and</strong> India, with fewer than 20% offirms engaged in such partnerships.According to our findings, almost all <strong>the</strong>countries studied are more active in South-North collaborations than South-South collaborations.Egypt was <strong>the</strong> only country thatshowed a lower rate of South-North collabora-tion, with twice as many South-South collaborationsas South-North (Table 2).Most of <strong>the</strong> firms that are active in South-South collaboration are engaged in severalcollaboration initiatives. The total number ofSouth-South collaborations reported in thisstudy is 279. It is important to note, however,that some collaborations may have been double-counted;that is, a particular partnershipbetween an Indian firm <strong>and</strong> a South Africanfirm may have been counted twice—once forIndia <strong>and</strong> once for South Africa—if both firmsresponded to <strong>the</strong> survey <strong>and</strong> reported all of<strong>the</strong>ir collaborations. We attempted to addressthis issue by asking <strong>the</strong> respondents to provide<strong>the</strong> names of <strong>the</strong>ir partnering firms; however,many opted to keep this information confidential,<strong>the</strong>reby limiting our ability to adjust <strong>the</strong>number of collaborations accordingly. In suchcases, <strong>the</strong> firms reported, for example, that <strong>the</strong>yTable 2 Number of international collaborations reportedCountrycollaborated with ‘firm A’ in India <strong>and</strong> ‘firm B’in China. This may inflate <strong>the</strong> aggregate numberof South-South collaborations.On average, <strong>the</strong> firms reported taking partin 3.5 collaborations, with responses rangingfrom 2.8 collaborations per firm for Brazilto 5.7 collaborations for Cuba. Brazil has <strong>the</strong>largest number of South-South collaborationsof <strong>the</strong> countries we surveyed, with well over60 collaborations. Even though <strong>the</strong> countrieswith <strong>the</strong> smallest populations, Cuba <strong>and</strong> SouthAfrica, have relatively low numbers of healthbiotech firms, <strong>the</strong>y are so active in South-Southcollaborations that comparing <strong>the</strong>ir collaborationswith those of large countries is still likelyto produce valid results. South Africa has <strong>the</strong>second-highest number of collaborationsof <strong>the</strong> countries in this study, <strong>and</strong> Cuba hasslightly more collaborations than <strong>the</strong> populationgiant China.South-South collaborations North-South collaborations Total collaborationsNumberAverage numberper companyNumberAverage numberper companyNumberAverage numberper companyBrazil 64 0.9 127 1.8 191 2.7China 27 0.3 99 1.2 126 1.5Cuba 34 4.3 63 7.9 97 12.1Egypt 39 2.6 30 2.0 69 4.6India 54 0.8 126 1.9 180 2.6South Africa 61 1.5 66 1.6 127 3.0Total 279 1.0 511 1.8 790 2.7nature biotechnology volume 28 number 5 MAY 2010 409


feature© 2010 Nature America, Inc. All rights reserved.Number of collaborations250200150100500DistributionMarketingProviding suppliesManufacturingWe asked <strong>the</strong> firms to indicate who initiated<strong>the</strong> collaborations: <strong>the</strong>mselves, <strong>the</strong>ir partners,government agencies, international organizations,expatriates or any o<strong>the</strong>r inter<strong>media</strong>ry.Their answers indicate that <strong>the</strong> firms <strong>the</strong>mselvestypically initiated <strong>the</strong> collaborations.Governments or o<strong>the</strong>r local or internationalorganizations seldom played this role, withonly 17 of <strong>the</strong> 279 reported collaborations saidto have been initiated by such organizations.Respondents from Cuba <strong>and</strong> Brazil were mostlikely to indicate governmental influence, typicallytargeting public research organizationsthat are heavily involved in entrepreneurialactivities. Follow-up interviews in developingcountries revealed that firms find it challengingto identify appropriate collaborative partnersin o<strong>the</strong>r developing countries <strong>and</strong> to initiate <strong>the</strong>collaboration. Finding enough detailed informationabout potential partners is a difficulttask, <strong>and</strong> building trust can also be challenging.Thus, <strong>the</strong>re definitely is scope for governments<strong>and</strong> o<strong>the</strong>r third parties to take a more proactiverole in initiating collaborations. It is alsonotable that only one of <strong>the</strong> collaborations wasreported to be initiated by expatriates who havemoved between <strong>the</strong> collaborating countries.One explanation for this may be a relativelylow migration rate of professionals betweendeveloping countries. It would be interestingto see whe<strong>the</strong>r expatriates are more importantin South-North health biotech collaboration.In addition, we asked <strong>the</strong> respondentsto indicate whe<strong>the</strong>r <strong>the</strong>y had set up formalarrangements with <strong>the</strong>ir collaborators, <strong>and</strong>to elaborate on <strong>the</strong> nature of those arrangementswhere applicable. We found that most(almost 90%) of <strong>the</strong> collaborations involved atleast one type of formal arrangement amongparticipants, ranging from supply agreementsto R&D cooperation agreements to <strong>marketing</strong><strong>and</strong> distribution agreements. Licensing agreementswere commonly cited, with around 19%of <strong>the</strong> collaborations having formal licensingcontracts, whereas joint ventures were estab-R&DTrainingClinical trialsUsing suppliesCollaborative activityLab servicesContract researchFigure 4 Distribution of <strong>the</strong> activities involved in <strong>the</strong> South-South entrepreneurial collaborations for all<strong>the</strong> countries we surveyed.O<strong>the</strong>rlished in only around 8% of <strong>the</strong> collaborationsoverall. South Africa (seven joint ventures) <strong>and</strong>Cuba (six joint ventures) had <strong>the</strong> highest numbersof joint ventures reported.Geography of collaborationsTo map South-South collaborations in healthbiotech, we drew a diagram of <strong>the</strong> main linkagesreported by <strong>the</strong> firms using <strong>the</strong> Ucinet 6program (http://www.analytictech.com/ucinet/; Fig. 3). The countries we surveyed directlyappear as hubs involved in various collaborationnetworks; it is not surprising that <strong>the</strong>y arefeatured centrally. In contrast, this map is likelyto under-represent <strong>the</strong> collaborations of countrieswe did not survey, such as Mexico, Nigeria<strong>and</strong> Malaysia. Never<strong>the</strong>less, <strong>the</strong> map providesan approximate overview of South-South collaborationin health biotech <strong>and</strong> shows that <strong>the</strong>strongest linkages of <strong>the</strong> countries we surveyedare with one ano<strong>the</strong>r.Chinese companies collaborate mainly withthose in Brazil <strong>and</strong> India, Indian companieshave close linkages with those in South Africa,<strong>and</strong> Brazilian companies have close linkageswith firms in Cuba. The only o<strong>the</strong>r pairs ofcountries where companies are involved ina similar level of South-South collaborationare Brazil <strong>and</strong> Argentina, <strong>and</strong> South Africa<strong>and</strong> Botswana. As Argentina <strong>and</strong> Botswanaare active in forming partnerships with o<strong>the</strong>rdeveloping countries, surveying <strong>the</strong>m wouldhave provided an even fuller picture of South-South firm collaboration in this field. Our data,however, reinforces <strong>the</strong> notion that we surveyed<strong>the</strong> strongest countries in health biotech <strong>and</strong>that <strong>the</strong>y collaborate with one ano<strong>the</strong>r despitesubstantial distances.The map of South-South collaborations alsoreflects <strong>the</strong> regional nature of health biotechpartnerships between firms in developingcountries. Every country in our survey hascollaborations with o<strong>the</strong>r countries withinits continent. For example, South Africa hasnumerous ties with o<strong>the</strong>r sub-Saharan coun-tries, Egypt collaborates with Middle Eastern<strong>and</strong> North African countries, <strong>and</strong> <strong>the</strong>re aremany linkages of Brazil <strong>and</strong> Cuba with o<strong>the</strong>rLatin American countries.Characteristics of collaborationsTo get a deeper underst<strong>and</strong>ing of South-Southcollaborations, we asked <strong>the</strong> firms what activitieswere involved in <strong>the</strong> collaborations, whatwere <strong>the</strong> reasons for partnering <strong>and</strong> what outputshad arisen from <strong>the</strong>se deals.Collaborations involve mostly commercialization.We asked <strong>the</strong> firms to specify <strong>the</strong> activities<strong>the</strong>y were pursuing jointly in South-Southcollaborations, choosing from a wide selectionof activities that are typically undertaken byhealth biotech firms, from research-intensiveactivities to end-stage commercializationactivities such as distribution <strong>and</strong> <strong>marketing</strong>.We considered activities to be innovative if<strong>the</strong>y focused on research <strong>and</strong> developmentalactivities of <strong>new</strong> products or services, orof production processes. This includes, forinstance, clinical trials <strong>and</strong> laboratory services.Conversely, we regarded collaborations involvingsimply <strong>the</strong> packaging of products or <strong>the</strong>irexport between countries—that is, <strong>marketing</strong><strong>and</strong> distribution—as noninnovative activities.We indicated to <strong>the</strong> firms that <strong>the</strong>y shouldchoose all <strong>the</strong> activities that were applicable to<strong>the</strong>ir collaborations, <strong>and</strong> we offered <strong>the</strong> optionto add any o<strong>the</strong>r activities not included onour list.The resulting responses show that <strong>the</strong>majority of <strong>the</strong> collaborations (60%) involvetwo or more activities. For example, ra<strong>the</strong>rthan creating collaboration solely arounddistribution, partnership deals more usuallyinvolve distribution <strong>and</strong> ano<strong>the</strong>r activity,such as providing supplies. It is also clearthat most of <strong>the</strong> South-South collaborationsinvolve end-stage commercialization activities,with around 200 (72%) involving distribution<strong>and</strong> 95 (34%) involving <strong>marketing</strong> activities(Fig. 4). Innovative activities were much lessfrequently cited by <strong>the</strong> firms that responded:R&D was part of only 36 (13%) of <strong>the</strong> collaborations,clinical trials just 25 (9%), <strong>and</strong>contract research only 9 (3%). It is noteworthythat <strong>the</strong> third most frequently cited collaborationactivity was providing supplies, with53 (19%) of <strong>the</strong> South-South collaborationsinvolving such provisions. Supply activity canvary from providing plant material from whichactive pharmaceutical ingredients are derivedfor drug development to providing active pharmaceuticalingredients.The relatively slight emphasis on R&Dactivities in South-South firm collaborationcontrasts with that reported in an analysis410 volume 28 number 5 MAY 2010 nature biotechnology


featureof North-North collaboration in biotech 17 .From <strong>the</strong> mid- to late 1990s, more than 20%of biotech collaborations between developedcountries involved R&D, up from around 6%in <strong>the</strong> 1970s. It will be of interest to repeat thissurvey in a few years to detect whe<strong>the</strong>r R&Dcollaborations between developing countriesalso increase.We <strong>the</strong>n explored where <strong>the</strong> collaborationlinkages lie for <strong>the</strong> different types of activities(Fig. 5). Some of <strong>the</strong> activities represent onlya few collaboration linkages, which certainlylimits <strong>the</strong> possibility of generalizing from<strong>the</strong>se results. As distribution <strong>and</strong> <strong>marketing</strong>are closely related activities, we graphed <strong>the</strong>mtoge<strong>the</strong>r as ‘end-stage commercialization’.There are relatively strong end-stage commercializationlinkages between <strong>the</strong> leadingdeveloping countries in health biotech (Fig.5a), with, for example, active distribution <strong>and</strong><strong>marketing</strong> collaborations between Brazil <strong>and</strong>China, Brazil <strong>and</strong> Cuba, India <strong>and</strong> China, <strong>and</strong>India <strong>and</strong> South Africa. They probably formlinkages to reach each o<strong>the</strong>r’s markets. Alsostriking are <strong>the</strong> widespread regional commercializationcollaborations in health biotech.South African firms, for example, have distribution<strong>and</strong> <strong>marketing</strong> collaborations with wellover 20 African countries, including relativelystrong linkages with Botswana, Namibia <strong>and</strong>a© 2010 Nature America, Inc. All rights reserved.bMexicoCubaDominican RepublicColombia VenezuelaEcuadorPeruBrazilBoliviaChile ParaguayArgentinaNigeriaEgyptUg<strong>and</strong>aTurkeySouth AfricaJordanSudanYemenKenyaMalawiMozambiqueBotswanaNamibia ZimbabweSwazil<strong>and</strong>ChinaPakistanIndiaSri LankaThail<strong>and</strong>PhilipinesMalaysiaIndonesiaMexicoCubaGuatemalaVenezuelaEcuadorPeruBrazilEgyptPalestineMalawiZambiaBotswanaIranChinaPakistanIndiaThail<strong>and</strong>MalaysiaIndonesiaArgentinaSouth AfricaFigure 5 The network of collaborations involving end-stage commercialization versus R&D. (a) Collaborations involving end-stage commercialization.(b) Collaborations involving R&D. As in Figure 3, node size <strong>and</strong> line width denote numbers of collaborations. For clarity, only linkages of two or moredistribution <strong>and</strong> <strong>marketing</strong> collaborations are included in a; all of <strong>the</strong> linkages are shown in b.nature biotechnology volume 28 number 5 MAY 2010 411


feature© 2010 Nature America, Inc. All rights reserved.Box 1 Vaccines for Africa’s meningitis beltTo counter a meningitis outbreak in 2007 in <strong>the</strong> so-called ‘meningitis belt’ of Africa, <strong>the</strong>World Health Organization (WHO) decided to assess <strong>the</strong> status <strong>and</strong> production capacityof polysaccharide manufacturers worldwide. This assessment identified Bio-Manguinhos(Rio de Janeiro), in collaboration with <strong>the</strong> Finlay Institute (Havana), as <strong>the</strong> most suitablesuppliers. Through South-South collaboration, <strong>the</strong>y could quickly provide <strong>the</strong> neededproducts to address <strong>the</strong> outbreak at a lower price than that of alternative suppliers.The meningitis belt in Africa stretches across <strong>the</strong> continent from Senegal in <strong>the</strong> west toEthiopia in <strong>the</strong> east <strong>and</strong> covers several low-income countries with an estimated populationof ~300 million. Samples from meningitis-infected individuals showed that <strong>the</strong> cases werecaused by Neisseria meningitidis serogroup A, which is <strong>the</strong> most common serogroup inAfrica but exists in nei<strong>the</strong>r Brazil nor Cuba.The Finlay Institute has a long history of meningitis research <strong>and</strong> managed to controla meningitis outbreak in Cuba in <strong>the</strong> mid-1980s, developing a purified meningococcivaccine that was <strong>the</strong> first of its kind worldwide. Bio-Manguinhos also has extensiveexperience in vaccine research <strong>and</strong> manufacturing, <strong>and</strong> has developed an efficientscale-up process using lyophilization. By collaborating <strong>and</strong> relying on <strong>the</strong>ir respectivestrengths, <strong>the</strong>se two organizations were able to supply, in a timely fashion, a meningitis Avaccine capable of combating <strong>the</strong> African meningitis outbreak.For its part, <strong>the</strong> WHO also facilitated <strong>the</strong> collaboration by making it possible forANVISA, <strong>the</strong> regulatory agency in Brazil, to collaborate with <strong>the</strong> Cuban regulatoryagency CECMED. The agencies were able to exchange information about <strong>the</strong>ir respectiveregulatory systems, which made it possible for <strong>the</strong>m to align <strong>the</strong> collaborative process.Nei<strong>the</strong>r Bio-Manguinhos nor <strong>the</strong> Finlay Institute alone would have been able to respondso quickly <strong>and</strong> efficiently to this request. This example <strong>the</strong>refore demonstrates howSouth-South collaboration can be harnessed to address a health threat when spurredby dem<strong>and</strong> <strong>and</strong> funding from an international organization. It also shows how South-South collaboration can contribute toward improving global health (http://www.who.int/<strong>media</strong>centre/<strong>new</strong>s/notes/2007/np12/en/index.html).Nigeria. Egypt has distribution <strong>and</strong> <strong>marketing</strong>collaborations with around 10 African countries<strong>and</strong> widely within <strong>the</strong> Middle East. Indiahas commercialization collaborations witho<strong>the</strong>r Asian countries, such as Sri Lanka <strong>and</strong>Pakistan. Brazil has a relatively large number ofcommercialization collaborations with o<strong>the</strong>rLatin American countries, <strong>and</strong> it should benoted that its only commercialization collaborationsin Africa are with Portuguese-speakingcountries such as Angola <strong>and</strong> Mozambique.According to our survey, Brazil <strong>and</strong> SouthAfrica do not have distribution <strong>and</strong> <strong>marketing</strong>linkages in health biotech with each o<strong>the</strong>r, nordo Egypt <strong>and</strong> South Africa.We fur<strong>the</strong>r found that China has frequentcollaborations with both India <strong>and</strong> Brazil inproviding supplies. It is also notable that SouthAfrica mainly provides supplies to o<strong>the</strong>r sub-Saharan countries. This may indicate that itscollaborations are focused on providing necessaryproducts or ingredients for biotechdevelopment, including active pharmaceuticalingredients, to countries with limited capacityin this field. Our follow-up case study researchhas supported this notion.The survey data suggest that India <strong>and</strong> Chinaare most active in manufacturing collaborations,which is not surprising, as manufactur-ing in general is an area of strength for bothcountries 19–21 . Their manufacturing collaborationsappear mainly to be intercontinental,between <strong>the</strong> leading developing countries, withrelatively strong ties between China <strong>and</strong> Brazil,India <strong>and</strong> South Africa, <strong>and</strong> India <strong>and</strong> Egypt.The large markets in China <strong>and</strong> India areattractive to companies in smaller countries,<strong>and</strong> this leads <strong>the</strong>se firms to create Chinese <strong>and</strong>Indian joint ventures allowing local manufacture,<strong>the</strong>reby facilitating market entry <strong>and</strong>reducing <strong>the</strong> cost of transportation from <strong>the</strong>smaller country.R&D collaborations are limited <strong>and</strong> centeraround a few countries. It is obvious fromFigure 5b that R&D collaborations are notnearly as numerous as end-stage commercializationcollaborations. The main linkages inR&D are between firms in <strong>the</strong> leading developingcountries in health biotech. Most of <strong>the</strong>separtnerships are between companies in Brazil<strong>and</strong> Cuba, India <strong>and</strong> Egypt, Cuba <strong>and</strong> India,<strong>and</strong> India <strong>and</strong> South Africa. An exception iscollaborations between companies in Cuba<strong>and</strong> India, which seem to be relatively strongin R&D compared with end-stage commercialization.O<strong>the</strong>r active R&D linkages werefound between enterprises in South Africa <strong>and</strong>Indonesia. Firms in Cuba, India <strong>and</strong> China alsohave a few R&D collaborations with companiesin o<strong>the</strong>r countries; in <strong>the</strong> case of Cuba, <strong>the</strong>se aremostly regional collaborations with o<strong>the</strong>r LatinAmerican countries, whereas India’s collaborationsare cross-continental <strong>and</strong> involve companiesin several African countries. In addition,it is notable that China <strong>and</strong> India seem to bemore heavily involved in collaborations surroundingend-stage commercialization <strong>and</strong>than in R&D partnerships.Developing countries conduct joint R&Dfor several types of products. Vaccines arekey to preventative health care in developingcountries, <strong>and</strong> by working toge<strong>the</strong>r on sharedhealth problems, companies in <strong>the</strong> South canstreng<strong>the</strong>n <strong>the</strong>ir potential for developing costeffectiveproducts. Cholera is a shared healthproblem in Bangladesh <strong>and</strong> eastern India. TheInternational Centre for Diarrhoeal DiseaseResearch (Dhaka, Bangladesh) has been conductingleading research on cholera vaccinec<strong>and</strong>idates, <strong>and</strong> its collaboration with <strong>the</strong>Indian firm Biological E (Hyderabad, India) hasfacilitated fur<strong>the</strong>r <strong>the</strong> development of a choleravaccine c<strong>and</strong>idate. If <strong>the</strong> vaccine originatingfrom <strong>the</strong> institute in Bangladeshi proves efficacious<strong>and</strong> safe, <strong>the</strong> partners can gear up towardmanufacturing of <strong>the</strong> vaccine by <strong>the</strong> Indianfirm. Ano<strong>the</strong>r example of vaccine R&D involves<strong>the</strong> Bio-Manguinhos (Rio de Janeiro) in collaborationwith <strong>the</strong> Finlay Institute (Havana).These two institutions exploited each o<strong>the</strong>r’srespective strengths to develop <strong>and</strong> manufacturea bivalent meningitis AC vaccine to addressa meningitis outbreak in Africa (Box 1). This isa good example of how developing countriescan use <strong>the</strong>ir assets in biotech to address healthproblems of o<strong>the</strong>r countries in need. And <strong>the</strong>setypes of collaborations extend beyond vaccinesto more experimental types of <strong>the</strong>rapy. Forexample, <strong>the</strong> South African firm Altis Biologics(Pretoria, South Africa) is partnered with <strong>the</strong>First Affiliated Hospital of Xinjiang MedicalUniversity (Xinjiang, China), which is carryingout animal testing of Altis’s allogeneic humanbone extract enriched in bone morphogeneticproteins, intended for use in implants for complexfractures <strong>and</strong> bone disease.Although our survey results indicate thatSouth-South collaborations rarely includeclinical trials (ano<strong>the</strong>r developmental activity),<strong>the</strong>re are some interesting exceptions. Of <strong>the</strong>countries we examined, Cuba seems to have <strong>the</strong>greatest number of active clinical trial collaborations.Some of <strong>the</strong>se collaborations involvedSouth-South-North collaborations. CIMAB(<strong>the</strong> entrepreneurial arm of <strong>the</strong> Cuban instituteCenter of Molecular Immunology; Havana),with its partner YM BioSciences (Mississauga,Canada), has spearheaded <strong>the</strong> establishment412 volume 28 number 5 MAY 2010 nature biotechnology


feature© 2010 Nature America, Inc. All rights reserved.<strong>and</strong> ‘provide financing’ cited only four times(1%). Cubans stood out again in citing ‘accessto financing’ relatively frequently as a reasonfor <strong>the</strong>ir collaborations, as well as ‘providetechnology/equipment’. This may indicate that<strong>the</strong>y have collaborations that involve licensingaccess to <strong>the</strong>ir technologies to o<strong>the</strong>r developingcountries.It is noteworthy how frequently ‘provideknowledge’ <strong>and</strong> ‘gain knowledge’ are cited asreasons for collaborations, especially given howrarely activities related to R&D were reportedin our study. It points to a strong capacityofa global clinical consortium to test cancer<strong>the</strong>rapeutics that are based on innovation fromCuba (Box 2). The network includes partnersfrom 20 developing countries <strong>and</strong> thus has aheavy emphasis on South-South collaboration.China is also involved in South-South collaborationfocused on clinical trials. For instance,<strong>the</strong> Chinese firm SH-IDEA PharmaceuticalCompany (Yuxi, China) <strong>and</strong> <strong>the</strong> KunmingInstitute of Botany (Kunming, China) areworking with Thail<strong>and</strong>’s Ministry of PublicHealth (Bangkok) on clinical trials of an HIV/AIDS treatment (Box 3). The study stems fromoriginal research from <strong>the</strong> Kunming Instituteof Botany based on Chinese traditional medicine<strong>and</strong> local biodiversity, but <strong>the</strong> clinical trialswere carried out on Thai patients.It should also be noted that according toour survey, <strong>the</strong> South-South collaboration ofIndian firms in clinical trials is limited. As Indiais known for active international collaborationsinvolving clinical trials 20–22 , its lack of clinicaltrial partnerships with o<strong>the</strong>r developing countriesperhaps reflects <strong>the</strong> greater allure of relationshipswith multinational pharmaceuticalfirms or with developed countries.Bidirectional knowledge flow is an importantreason for collaboration. To better underst<strong>and</strong><strong>the</strong> motivations for South-South firm collaboration,we asked respondents to indicate <strong>the</strong>reasons for each of <strong>the</strong>ir collaborations. Again,we note <strong>the</strong> multifaceted nature of South-South collaborations, with respondents reportingseveral reasons for single collaborations.In line with <strong>the</strong> heavy emphasis on end-stagecommercialization, ‘access to markets’ was <strong>the</strong>main reason given for <strong>the</strong> collaborations (207or 74% of <strong>the</strong> collaborations). It was an importantreason for commercial collaborations in all<strong>the</strong> countries we surveyed; firms in developingcountries are clearly working toge<strong>the</strong>r to gainexport markets for <strong>the</strong>ir products <strong>and</strong> services.The second most commonly cited reason for<strong>the</strong> collaborations was to ‘provide knowledge’(72 or 26%), followed by ‘gain knowledge’ (52or 19%). A relatively high proportion of Cubanrespondents (68%) cited ‘provide knowledge’as a reason for <strong>the</strong> collaboration. Brazilians alsocited this reason fairly often, but <strong>the</strong>y more frequentlythan <strong>the</strong> Cubans reported knowledgegain as a reason for <strong>the</strong>ir collaborations.There is mention of clinical access as a reason,with ‘access to patients’ stated for 28 (10%)of <strong>the</strong> collaborations, mainly by Chinese <strong>and</strong>Cuban respondents. Finally, ‘provide patients’was a factor in 13 (5%) of <strong>the</strong> reported collaborations.What is notable is how infrequentlyfinancial reasons were given for <strong>the</strong> collaborations,with ‘access to financing’ cited as areason for only 15 (5%) of <strong>the</strong> collaborations,building role for <strong>the</strong> collaborations, as seenin examples of technology-transfer initiatives(Box 4). This may mean that South-South collaborationis still in its infancy, though its aim isfuture knowledge-generation activities. The discrepancymay also reflect <strong>the</strong> different types ofknowledge that are required in health biotech.South-South collaboration may be used to gainaccess to knowledge about each o<strong>the</strong>r’s markets,to deal with regulatory affairs, <strong>and</strong> so on.Some of <strong>the</strong> reasons reported here alignwell with reasons attributed to North-Northor North-South collaborations 13–17,23 . AccessBox 2 Global South-South-North consortium for clinical trialsTo carry out cost-effective clinical trials, CIMAB, <strong>the</strong> commercial arm of Cuba’s Center ofMolecular Immunology (Havana), <strong>and</strong> its partner YM BioSciences (Mississauga, Canada),have established a consortium of firms around <strong>the</strong> world for testing <strong>the</strong> humanizedmonoclonal antibody nimotuzumab in <strong>the</strong> treatment <strong>and</strong> diagnosis of patients withcancers of epi<strong>the</strong>lial origin. The consortium (http://www.ymbiosciences.com/products/nimotuzumab/codevelopment.php) has partners from 20 developing countries as well as7 developed countries, including Argentina, Brazil, Colombia, Mexico, Peru, Paraguay<strong>and</strong> Uruguay from Latin America, Algeria, Egypt, Morocco, Nigeria <strong>and</strong> South Africa fromAfrica, <strong>and</strong> China, India, Indonesia, Malaysia, Pakistan <strong>and</strong> <strong>the</strong> Philippines from Asia.Asia is especially strong in <strong>the</strong> consortium, with Japan, Singapore <strong>and</strong> South Korea asdeveloped-country participants. O<strong>the</strong>r high-income countries in <strong>the</strong> network are SaudiArabia <strong>and</strong> Germany. The consortium thus reflects a South-South-North collaborationwith strong participation from developing countries. Examples of sou<strong>the</strong>rn firms in <strong>the</strong>consortium are Biocon Biopharmaceuticals (Bangalore, India), Biotech PharmaceuticalCo. (Beijing), Eurofarma (Sao Paulo, Brazil) <strong>and</strong> Laboratorios PiSA (Guadalajara, Mexico)Nimotuzumab is a Cuban innovation from <strong>the</strong> Center of Molecular Immunologythat targets epidermal growth factor receptor. It is aimed at various epi<strong>the</strong>lial cancertypes, including non–small cell lung, glioma, esophageal, brain metastasis, colorectal,pancreatic, prostate, cervical <strong>and</strong> breast cancers. To date, <strong>the</strong> consortium has testednimotuzumab in 9,842 patients in Cuba, Argentina, Brazil, Canada, China, Colombia,Germany, India, Indonesia, Japan, Malaysia, Mexico, Singapore, South Africa, SouthKorea, Thail<strong>and</strong> <strong>and</strong> <strong>the</strong> Philippines. Trials are also being conducted in Europe, Japan<strong>and</strong> North America. CIMAB <strong>and</strong> YM BioSciences work to ensure that <strong>the</strong> network offirms follows <strong>the</strong> regulatory guidelines of <strong>the</strong> International Committee for Harmonization/Good Clinical Practice. The consortium’s clinical trial results are collected in a centraldepository. Aggregating patient data from sites in <strong>the</strong> various countries increases <strong>the</strong>statistical power <strong>and</strong> quality of <strong>the</strong> clinical trials. By amassing data ga<strong>the</strong>red underinternationally recognized norms from <strong>the</strong> collaborating sites, <strong>the</strong> partners are ableto submit a stronger drug application to <strong>the</strong>ir national regulatory authorities. Gainingapproval from one regulatory agency can pave <strong>the</strong> way for o<strong>the</strong>r agencies to be able toapprove <strong>the</strong> product. Currently, nimotuzumab has been approved for <strong>marketing</strong> as atreatment for head <strong>and</strong> neck cancers <strong>and</strong> glioma in 23 countries worldwide, includingArgentina, Brazil, China, India, Indonesia, Mexico <strong>and</strong> Ukraine. The consortium memberslicense <strong>the</strong> drug from CIMAB <strong>and</strong> market it in <strong>the</strong>ir home countries.Running clinical trials in developing countries among several partners has a number ofadvantages. Economies are obtained through <strong>the</strong> lower personnel <strong>and</strong> infrastructure costs<strong>and</strong> by sharing clinical trial expenses across several partners. Patient recruitment is faster,even for rare cancer indications, owing to <strong>the</strong> large patient populations, who previouslylacked access to treatments. Thus, not only are costs reduced, but trials are completedat a faster pace. The example of nimotuzumab shows that a consortium of enterprisesconsisting primarily of small biotech firms from developing countries can complete <strong>the</strong>sestudies at <strong>the</strong> same speed as, <strong>and</strong> at lower cost than, big pharma. By including a South-South collaboration strategy, biotech firms have an alternative to partnering with pharmacompanies in clinical development <strong>and</strong> can potentially retain greater presence in <strong>the</strong> laterstages of a product’s development <strong>and</strong> a greater share of revenue stream.nature biotechnology volume 28 number 5 MAY 2010 413


feature© 2010 Nature America, Inc. All rights reserved.to markets <strong>and</strong> knowledge are both consistentincentives. Even so, given <strong>the</strong> findingsfrom developed countries, where <strong>the</strong> need toaccess financing <strong>and</strong> minimize costs regularlystimulates collaboration, we expected access tofinancing to be cited more often as a reasonfor South-South collaborations than we found.We <strong>the</strong>refore cannot conclude that <strong>the</strong> South-South collaborations were fuelled by motivationsto minimize costs.Collaborations are strongly product focused.We asked <strong>the</strong> respondents of <strong>the</strong> survey toreport <strong>the</strong> outputs of <strong>the</strong>ir South-South collaborations.The majority of collaborations,roughly 65%, have resulted in some specificoutput. The collaborations are strongly productfocused, with 70 (25%) collaborationsleading to a joint product in <strong>the</strong> market <strong>and</strong>16 (6%) leading to a joint product in <strong>the</strong> pipeline.Thus, <strong>the</strong>se types of partnerships facilitate<strong>the</strong> end-stage commercialization of health biotechproducts produced by firms in developingcountries <strong>and</strong> increase <strong>the</strong> availability of <strong>the</strong>seproducts in developing countries.Even so, very few collaborations result in <strong>the</strong>joint development of products; instead, <strong>the</strong>setypes of commercial relationships are confinedto licensing arrangements. Thus, only 16 (6%)collaborations led to joint products in <strong>the</strong> pipeline,<strong>and</strong> joint patents were reported as an outcomefor only 12 (4%) of <strong>the</strong> collaborations.Cuban <strong>and</strong> Brazilian enterprises were <strong>the</strong> onlyones that reported joint patenting as an outcomeof <strong>the</strong>ir collaborations. Not surprisingly,South-South firm collaboration seems to rarelyresult in joint publications of a scientific paper(reported only once as an output of collaboration).O<strong>the</strong>r reported outputs included <strong>the</strong>following: clinical/scientific research results,human resource training, separate productdevelopment, <strong>and</strong> technology transfers.Our analysis also reveals that more thanhalf of partnerships involving R&D had jointproducts on <strong>the</strong> market, <strong>and</strong> a quarter of <strong>the</strong>mhad joint products in <strong>the</strong> pipeline. Even though<strong>the</strong>re is generally a limited emphasis on productdevelopment in <strong>the</strong> South-South collaborationsexamined here, product development <strong>and</strong> endstagecommercialization activities are closelylinked. Several developing countries are currentlysignatories of <strong>the</strong> TRIPS (trade-relatedaspects of intellectual property rights) agreement,<strong>and</strong> firms in <strong>the</strong>se countries have startedBox 3 A South-South approach to dealing with HIV/AIDs based onlocal biodiversityChina <strong>and</strong> Thail<strong>and</strong> are working toge<strong>the</strong>r to develop a remedy against HIV/AIDs based onChinese biodiversity <strong>and</strong> knowledge from traditional Chinese medicine. The collaborationinvolves both public <strong>and</strong> private-sector institutions. The start of a collaboration between<strong>the</strong> two neighbors was marked in 1997, when a memor<strong>and</strong>um of underst<strong>and</strong>ing wassigned by <strong>the</strong>ir ministries of public health. As a part of this collaboration, an officialpartnership was established between <strong>the</strong> Department of Medical Science withinThail<strong>and</strong>’s Ministry of Public Health (Bangkok) <strong>and</strong> <strong>the</strong> Kunming Institute of Botany(Kunming, China) of <strong>the</strong> Chinese Academy of Sciences (http://stats.yuxi.gov.cn/showitem.asp?id=2006120717303184815).Thail<strong>and</strong> has a higher reported prevalence of HIV/AIDs than China, making it a preferredpartner for China. The Thai government was highly motivated to address <strong>the</strong> rising healththreat of HIV/AIDs, <strong>and</strong> its larger patient base facilitated clinical trial testing. Interestin this collaboration was spurred by a visit of Thai officials to <strong>the</strong> lab of Luo Shide at<strong>the</strong> Kunming Institute. In <strong>the</strong> late 1990s, Shide had carried out a series of experimentsanalyzing ex vivo <strong>the</strong> pharmacological <strong>and</strong> toxicological properties of a mixture of flavones<strong>and</strong> triterpenoids with inhibitory activity against HIV protease <strong>and</strong> reverse transcriptase,originally purified from a Chinese traditional remedy, Ke’ Aite. After initiation of <strong>the</strong>collaboration, a team of researchers in Thail<strong>and</strong> repeated <strong>the</strong> preclinical work inpreparation for <strong>the</strong> commencement of clinical trials. To scale up <strong>and</strong> manufacture <strong>the</strong><strong>the</strong>rapeutic c<strong>and</strong>idate, <strong>the</strong> two groups struck up a collaboration with <strong>the</strong> Chinese firm SH-IDEA Pharmaceutical Company (Yuxi, China). The resulting product—Complex SH—is <strong>the</strong>first herbal-based anti-HIV drug to have undergone phase 1, 2 <strong>and</strong> 3 testing in China <strong>and</strong>Thail<strong>and</strong> 28 . The product is patented <strong>and</strong> has received regulatory approval in both China<strong>and</strong> Thail<strong>and</strong>.In light of controversy over <strong>the</strong> pricing <strong>and</strong> availability in developing countries of smallmoleculeinhibitors of HIV protease <strong>and</strong> reverse transcriptase marketed by Western drugcompanies, it is noteworthy that South-South collaboration can harness an alternativesolution to address a local health threat. This example also shows how governmental willcan cultivate South-South collaboration, enabling two countries to develop a <strong>the</strong>rapeuticbased on knowledge from <strong>the</strong> South.to place an increasing emphasis on R&D <strong>and</strong>developing ‘<strong>new</strong> to <strong>the</strong> world’ innovation 6,24,25 .Our survey results suggest that those firms maybe relying, in part, on <strong>the</strong>ir commercializationlinkages with o<strong>the</strong>r developing countries tojointly streng<strong>the</strong>n <strong>the</strong>ir R&D activities. This isa promising sign that South-South collaborationswill, in <strong>the</strong> future, become important instreng<strong>the</strong>ning health biotech innovation withindeveloping countries.ConclusionsOur analysis indicates that South-South entrepreneurialcollaboration in health biotech issubstantial <strong>and</strong> that firms in developing countriesare actively working toge<strong>the</strong>r. These typesof collaborations are on <strong>the</strong> political agenda ofmany developing countries’ governments, <strong>and</strong>,as mentioned above, developing countries areincreasingly signing collaborative agreements<strong>and</strong> setting up initiatives to promote scientific<strong>and</strong> technological collaboration among <strong>the</strong>mselves.Our results show that in <strong>the</strong> health biotechsector, at least, firms have moved beyond<strong>the</strong> rhetoric of South-South collaboration.They are actively boosting trade in <strong>the</strong>ir countriesby forming relationships with firms ino<strong>the</strong>r developing economies; to a lesser degree,<strong>the</strong>y are working toge<strong>the</strong>r to boost innovation,as seen in <strong>the</strong> development of <strong>new</strong> productsor processes.Apart from providing insight into <strong>the</strong> currentextent <strong>and</strong> characteristics of South-Southcollaboration, our survey also establishes abaseline for future studies. As such, it can provideimportant information for evaluating<strong>the</strong> effects of policies <strong>and</strong> programs aiming topromote collaboration in developing countries.As with any survey, our study has limitations.For logistical reasons, we had to limit our datacollection to a few countries—those that arelikely to contain <strong>the</strong> bulk of developing countries’firms active in this field. Fur<strong>the</strong>rmore, wehave not been able to receive information fromevery firm active in health biotech in <strong>the</strong> countrieswe focused on, <strong>and</strong> some firms may nothave reported <strong>the</strong> extent <strong>and</strong> characteristics ofall <strong>the</strong>ir South-South collaborations. Even so,as we obtained a relatively high response rate,we believe that <strong>the</strong> results represent <strong>the</strong> maincharacteristics of South-South firm collaborationin <strong>the</strong> health biotech field.In summary, our findings lead us to severalconclusions. First, we can see that South-Southcollaboration has become a widely chosen pathfor health biotech firms. One in every four firmsthat responded to our survey stated an activecollaboration with o<strong>the</strong>r developing countries.Fur<strong>the</strong>rmore, developing countries’ firms thatengage in South-South collaboration are likelyto be involved in several initiatives at a given414 volume 28 number 5 MAY 2010 nature biotechnology


feature© 2010 Nature America, Inc. All rights reserved.time. South-South collaboration has <strong>the</strong>reforebecome a reality of <strong>the</strong> health biotech sector—a well-trodden route firms take in <strong>the</strong>ir entrepreneurialactivities. None<strong>the</strong>less, South-Northcollaborations are even more prevalent, withjust over one in every two firms being activein collaboration with at least one developedcountry. There were also differences in <strong>the</strong>extent of South-South entrepreneurial healthbiotech collaborations depending on <strong>the</strong> location;countries with <strong>the</strong> smallest populationswere most active in collaborating with o<strong>the</strong>rdeveloping countries. This probably reflects<strong>the</strong> fact that small home markets can create<strong>the</strong> need to collaborate for <strong>the</strong> sake of a firm’sviability.Second, this survey shows that most collaborationsinvolve linkages between <strong>the</strong> leadingdeveloping countries in health biotech. Despitedistances, working toge<strong>the</strong>r may amplify <strong>the</strong>competitiveness of relatively advanced developingcountries. In addition, <strong>the</strong> results showa considerable number of regional collaborationsbetween firms. Firms in South Africa, forexample, have active linkages with o<strong>the</strong>r sub-Saharan countries, <strong>and</strong> enterprises in bothBrazil <strong>and</strong> Cuba had active collaborations inLatin America. Thus, South-South collaborationshave a dual purpose: to amplify <strong>the</strong>global competitiveness of leading developingcountries in health biotech <strong>and</strong> to streng<strong>the</strong>nregional ties in health biotech.Third, <strong>the</strong> health biotech collaborationsbetween developing countries involve mainlyend-stage commercialization activities ra<strong>the</strong>rthan R&D. Commercialization activities suchas distribution <strong>and</strong> <strong>marketing</strong> were by far <strong>the</strong>most common South-South collaborationactivities, <strong>and</strong> more common than any research<strong>and</strong> developmental activities. This is true for all<strong>the</strong> countries surveyed in this study. The focuson end-stage commercialization is in line with‘access to markets’ being <strong>the</strong> most common reasongiven for South-South collaborations <strong>and</strong>reflects a need for companies to export <strong>the</strong>irproducts to o<strong>the</strong>r developing countries. Thefact that <strong>the</strong> countries with <strong>the</strong> smallest populationswere most active in South-South collaborationsunderscores this finding. Consideringthat some developing countries have proventrack records in producing relatively affordablehealth biotech products 26 , South-South healthbiotech partnerships may increase <strong>the</strong> availabilityof relatively inexpensive health biotechproducts in developing countries’ markets, aswell as <strong>the</strong> accessibility of health biotechnologiesin general.Fourth, <strong>the</strong>se collaborations contribute onlymarginally to innovation in health biotech. Fewof <strong>the</strong> South-South collaborations reported in<strong>the</strong> survey involved knowledge-creation activi-Box 4 Extending health biotech capacity through South-SouthcollaborationTechnology transfer features centrally in South-South collaboration in health biotech <strong>and</strong>can lead to substantial capacity building in countries that lack technological proficiencyin certain areas. In one example, an Egyptian company has forged collaboration witha Chinese firm to enable <strong>the</strong> production of recombinant insulin in Egypt, which waspreviously imported <strong>and</strong> as a result was often in short supply in <strong>the</strong> Middle Easterncountry. The partnership involved <strong>the</strong> transfer of technology to produce recombinantinsulin from <strong>the</strong> Chinese company Dongbao (Shanghai) to <strong>the</strong> Holding Company forBiological Products <strong>and</strong> Vaccines (VACSERA) in Giza, Egypt. As a result, Egypt now has afacility that can produce recombinant insulin locally, <strong>and</strong> diabetics in <strong>the</strong> country have areliable <strong>and</strong> readily accessible supply of insulin that is cheaper than <strong>the</strong> imported product.The technology transfer from China has thus considerably benefitted <strong>the</strong> Egyptian healthsystem. As economic <strong>and</strong> political turmoil can lead to an unsteady supply of importan<strong>the</strong>alth products, self-sufficiency is far from being a trivial goal for developing countries.Elsewhere, India has transferred technology for diagnosing infectious diseases to SouthAfrica. East Coast Rapid Diagnostics (now split into Tulip South Africa <strong>and</strong> Life Assay,both of Durban, South Africa) is a joint venture between <strong>the</strong> publicly funded LIFElabs inSouth Africa (Durban) <strong>and</strong> <strong>the</strong> Indian Tulip Group Diagnostics (Bambolim, India). Under<strong>the</strong> agreement, <strong>the</strong> Indian company transfers several diagnostic technologies to SouthAfrica, including rapid malaria diagnostic kits <strong>and</strong> pregnancy diagnostic kits, toge<strong>the</strong>r withsubstantial capacity <strong>and</strong> technical assistance. These diagnostic kits are stable at hightemperatures <strong>and</strong> are thus suitable for application in Africa, where cooling can be hard toachieve in supply chains. In return for <strong>the</strong> technology transfer, LIFElabs will commercialize<strong>and</strong> market <strong>the</strong> kits in o<strong>the</strong>r African countries with high incidences of malaria <strong>and</strong> o<strong>the</strong>rinfectious diseases.These two examples show that South-South technology transfer can lead to a strongersupply of essential health products in developing countries, more affordable than<strong>the</strong> imported alternatives <strong>and</strong> well-adapted to <strong>the</strong> needs of local populations. Suchcollaborations are thus a cost-effective <strong>and</strong> efficient way of promoting global health.ties tied to innovation. For example, only 13%of <strong>the</strong> reported collaborations involve R&D<strong>and</strong> only 9% involve clinical trials. This mayindicate that many of <strong>the</strong> firms we surveyedare not active in health biotech innovation.Instead, <strong>the</strong>y may be licensing products fromfirms that are innovators in <strong>the</strong> field—typicallyfrom developed countries. Never<strong>the</strong>less,some firms from China, Cuba <strong>and</strong> India haveincreasingly been applying <strong>the</strong>ir innovativecapabilities to <strong>the</strong> health biotech field 5–7 . It willbe of interest to repeat <strong>the</strong> survey in <strong>the</strong> futureto see whe<strong>the</strong>r South-South collaboration willmake a richer contribution toward innovation.It is also notable that collaboration involvingR&D activities has a strong commercial side,with ‘joint product on market’ being <strong>the</strong> mostfrequently cited output for <strong>the</strong> R&D collaborations.This reflects <strong>the</strong> sizable product focus ofR&D collaborations, which may translate intoa stronger innovation track record once morefirms have been able to build up innovationcapacity.Fifth, South-South collaboration is typicallyinitiated by <strong>the</strong> participating firms <strong>the</strong>mselves.The results of <strong>the</strong> survey show that little collaborationhas been initiated by governmentalorganizations or by any o<strong>the</strong>r outside party;international organizations <strong>and</strong> expatriateshave also had a limited role in encouragingSouth-South collaborations. As research onSouth-North collaboration between firmshas suggested that a major challenge of healthbiotech collaboration is establishing <strong>the</strong> initiallinkages with possible collaborators 27 , it seemslikely that this challenge is also experienced by<strong>the</strong> firms of developing countries. Our resultsmay indicate an opportunity for greater governmentalinvolvement. The example of <strong>the</strong>Brazil-Cuba collaboration on meningitis ACvaccine for Africa exemplifies <strong>the</strong> importantrole that international organizations can playin facilitating South-South collaboration. Theinvolvement of o<strong>the</strong>r international organizationsor philanthropic organizations might alsobe warranted to accelerate <strong>the</strong> formation ofcollaborations that provide affordable optionsfor improving health in developing countries.On <strong>the</strong> basis of our research, we can makeseveral recommendations. Firms in developingcountries should consider South-Southcollaboration as a way to exp<strong>and</strong> <strong>the</strong>ir markets.Market dem<strong>and</strong> has been exp<strong>and</strong>ing inmany developing countries, <strong>and</strong> it is thus anincreasingly lucrative strategy to target thosemarkets 2 . Setting up a collaboration with a firmnature biotechnology volume 28 number 5 MAY 2010 415


feature© 2010 Nature America, Inc. All rights reserved.1. Anonymous. South-south trade: vital for development(Policy brief) (Organisation for Economic Co-Operation<strong>and</strong> Development, Paris, 2006)2. Anonymous. Global economic prospects: crisis, finance,<strong>and</strong> growth (The International Bank for Reconstruction<strong>and</strong> Development & The World Bank, Washington, DC,2010)3. Hassan, M.H. Building capacity in <strong>the</strong> life sciences in<strong>the</strong> developing world. Cell 131, 433–436 (2007).4. Mkoka, C. South African scientists welcome Malawi onboard. SciDev.Net (17 August 2007).5. Thorsteinsdóttir, H., Quach, U., Daar, A.S. & Singer,P.A. Conclusions: promoting biotechnology innovationin developing countries. Nat. Biotechnol. 22 suppl.,DC48–DC52 (2004).6. Frew, S.E. et al. India’s health biotech sector at a crossroads.Nat. Biotechnol. 25, 403–417 (2007).7. Frew, S.E. et al. Chinese health biotech <strong>and</strong> <strong>the</strong> threeinano<strong>the</strong>r developing country that has knowledgeof <strong>the</strong> local regulations relating to productquality <strong>and</strong> product manufacture, as well as anestablished product distribution network, isan important first step toward accessing <strong>the</strong>semarkets. Firms in developing countries shouldrealize that by working toge<strong>the</strong>r <strong>the</strong>y can leverageeach o<strong>the</strong>r’s strengths <strong>and</strong> develop morecost-effective products. In doing so, <strong>the</strong>y canexp<strong>and</strong> <strong>the</strong>ir markets considerably in <strong>the</strong> developingworld, where a large proportion of <strong>the</strong>population can afford only low-priced healthproducts. Firms in developing countries canstart <strong>the</strong>ir cooperation by focusing on <strong>marketing</strong><strong>and</strong> distribution, but as <strong>the</strong>ir collaborationdeepens <strong>and</strong> trust is built, <strong>the</strong>y can start to pursuefur<strong>the</strong>r innovative activities with commercialpartners.Governments in developing countriesshould continue to place an emphasis onSouth-South collaboration. As more developingcountries have built up capacity in healthbiotech, <strong>the</strong>y now can use collaboration witho<strong>the</strong>r developing countries to build capacity inareas where knowledge is lacking. Technologytransfer between developing countries can be apromising strategy to gain access to technologiesthat are typically more affordable <strong>and</strong>appropriate to developing countries’ needsthan <strong>the</strong> technologies from developed countries.Such collaborations can streng<strong>the</strong>n <strong>the</strong>capacity of firms based in countries currentlyweak in health biotech <strong>and</strong> can start bridging<strong>the</strong> divides between developing countries inthis field.Our survey also shows that even thoughSouth-South firm collaborations in healthbiotech are widespread <strong>and</strong> numerous, <strong>the</strong>yrarely involve innovation. Developing countriesare not yet reaping <strong>the</strong> full benefits of suchcommercial partnerships. With an increasedinnovation focus, developing countries couldleverage <strong>the</strong>ir individual strengths <strong>and</strong> increase<strong>the</strong> pool of resources to address <strong>the</strong>ir sharedproblems. We thus recommend that governmentsin developing countries integrateSouth-South collaboration more closely in<strong>the</strong>ir innovation policies <strong>and</strong> provide supportto firms from o<strong>the</strong>r developing countries thatwant to promote joint innovation in healthbiotech. To smooth <strong>the</strong> process of innovation,<strong>the</strong>se governments may need to consider how<strong>the</strong>ir regulatory offices can work toge<strong>the</strong>r tomake <strong>the</strong> process of cross-border innovationeasier <strong>and</strong> faster.Finally, our survey shows that governments<strong>and</strong> international organizations have had alimited role in initiating South-South collaboration.Promoting a stronger innovation focusin South-South health biotech collaborationsshould not be dependent solely on <strong>the</strong> activitiesof enterprises in developing countries;supportive activities that directly target <strong>the</strong>development of health biotech products <strong>and</strong>services are called for from both governmentsin developing countries <strong>and</strong> <strong>the</strong> internationalcommunity. International organizations <strong>and</strong>philanthropic organizations that are engagedin promoting global health should pay attentionto <strong>the</strong> power of South-South commercialcollaborations in providing affordable healthproducts. When health biotech firms in developingcountries pool <strong>the</strong>ir respective strengths,<strong>the</strong>re is potential for such collaborative effortsto be more cost effective <strong>and</strong> relevant than <strong>the</strong>work of health biotech companies in developedcountries; thus, South-South collaborationsmay be able to provide health productsthat reach more poor people in <strong>the</strong> developingworld.Note: Supplementary information is available on <strong>the</strong>Nature Biotechnology website.ACKNOWLEDGMENTSThe authors thank all <strong>the</strong> firms that responded to<strong>the</strong> survey <strong>and</strong> generously shared <strong>the</strong>ir expertise<strong>and</strong> time. We also thank J. Clark <strong>and</strong> K. MacDonaldfor comments on <strong>the</strong> manuscript. This project wasfunded by Genome Canada through <strong>the</strong> OntarioGenomics Institute <strong>and</strong> by <strong>the</strong> InternationalDevelopment Research Centre, <strong>and</strong> was supported by<strong>the</strong> McLaughlin-Rotman Centre for Global Health,an academic center at <strong>the</strong> University Health Network<strong>and</strong> University of Toronto. H.T. is supported by a NewInvestigator Award from <strong>the</strong> Canadian Institutes ofHealth Research. M.R. is supported by a CanadianInstitutes of Health Research Training Award.COMPETING FINANCIAL INTERESTSThe authors declare competing financialinterests: details accompany <strong>the</strong> full-text HTMLversion of <strong>the</strong> paper at http://www.nature.com/naturebiotechnology/.billion patient market. Nat. Biotechnol. 26, 37–53(2008).8. Rezaie, R. et al. Brazilian health biotech – fosteringcrosstalk between public <strong>and</strong> private sectors. Nat.Biotechnol. 26, 627–644 (2008).9. Al-Bader, S. et al. Small but tenacious: South Africa’shealth biotech sector. Nat. Biotechnol. 27, 427–445(2009).10. Morel, C.M. et al. Health innovation networks to helpdeveloping countries address neglected diseases.Science 309, 401–404 (2005).11. Lemle, M. Nations team up to share R & D skills inHIV/AIDS battle. SciDev.Net (28 February 2005)12. Ohiorhenuan, J.F.E. & Rath, A. in Desigining <strong>the</strong>Future: South-South Cooperation in Science <strong>and</strong>Technology (eds. Zhou, Y. & Gitta, C.) (United NationsDevelopment Programme, New York, 2000)13. Pisano, G.P. Science Business: The Promise, <strong>the</strong>Reality, <strong>and</strong> <strong>the</strong> Future of Biotech (Harvard BusinessSchool Press, Boston, 2006)14. Hagedoorn, J. Inter-firm R&D partnerships: an overviewof major trends <strong>and</strong> patterns since 1960. Res.Policy 31, 477–492 (2002).15. Faulkner, W. & Senker, J. Knowledge Frontiers:Public Sector Research <strong>and</strong> Industrial Innovation inBiotechnology, Engineering Ceramics <strong>and</strong> ParallelComputing (Oxford University Press, 1995).16. Lee, C.W. Strategic alliances influence on small <strong>and</strong>medium firm performance. J. Bus. Res. 60, 731–741 (2007).17. Roijakkers, N. & Hagedoorn, J. Inter-firm R&Dpartnering in pharmaceutical biotechnology since1975: Trends, patterns, <strong>and</strong> networks. Res. Policy35, 431–446 (2006).18. van Beuzekom, B. & Arundel, A. OECD biotechnologystatistics (Organisation for Economic Co-Operation<strong>and</strong> Development, Paris, 2006).19. Yusuf, S., Nabeshima, K. & Perkins, D. in Dancingwith Giants: China, India <strong>and</strong> <strong>the</strong> Global Economy(eds. Winteres, L.A. & Ysuf, S.) 35–66 (The WorldBank, Washington, DC, <strong>and</strong> <strong>the</strong> Institute of PolicyStudies, Singapore, 2007).20. Chaturvedi, K., Chataway, J. & Wield, D. Policy, markets<strong>and</strong> knowledge: strategic synergies in Indianpharmaceutical firms. Technol. Anal. Strateg.Manage. 19, 565–588 (2007).21. Bower, D.J. & Sulej, J.C. The Indian challenge: <strong>the</strong>evolution of a successful <strong>new</strong> global strategy in <strong>the</strong>pharmaceutical industry. Technol. Anal. Strateg.Manage. 19, 611–624 (2007).22. Maiti, R. & Raghavendra, M. Clinical trials in India.Pharmacol. Res. 56, 1–10 (2007).23. Ray, M., Daar, A.S., Singer, P.A. & Thorsteinsdóttir,H. Globetrotting firms. a survey of Canada’s healthbiotechnology collaboration with developing countries.Nat. Biotechnol. 27, 806–814 (2009).24. Kale, D. & Little, S. From imitation to innovation: <strong>the</strong>evolution of R&D capabilities <strong>and</strong> learning processesin <strong>the</strong> Indian pharmaceutical industry.Technol. Anal.Strateg. Manage. 19, 589–609 (2007).25. Simonetti, R. & Archambault, E. The dynamics ofpharmaceutical patenting in India: evidence fromUSPTO data.Technol. Anal. Strateg. Manage. 19,625–642 (2007).26. Thorsteinsdóttir, H. The role of <strong>the</strong> health systemin health biotechnology in developing countries.Technol. Anal. Strateg. Manage. 19, 659–675(2007).27. Taylor, A.D. et al. North–South partnerships—a studyof Canadian firms. Nat. Biotechnol. 25, 978–979(2007).28. Sangkitporn, S. et al. Efficacy <strong>and</strong> safety of zidovudine<strong>and</strong> zalcitabine combined with a combination ofherbs in <strong>the</strong> treatment of HIV-infected Thai patients.Sou<strong>the</strong>ast Asian J. Trop. Med. Public Health 36,704–708 (2005).416 volume 28 number 5 MAY 2010 nature biotechnology


patentsOpen biotechnology: licenses neededYann JolyOpen biotechnology may be <strong>the</strong> ideal solution to ensure scientific progress <strong>and</strong> <strong>the</strong> realization of <strong>the</strong> common good,but it has yet to deliver on its promises.© 2010 Nature America, Inc. All rights reserved.In <strong>the</strong> last few decades, <strong>the</strong> application of <strong>the</strong>patent system to <strong>the</strong> field of biotech has facedan increasing amount of criticism from scientificresearchers, ethicists <strong>and</strong> lawyers alike 1 .According to <strong>the</strong>se critiques, <strong>the</strong> broad utilizationof <strong>the</strong> patent system in this scientific fieldleads to counterproductive results 2,3 , is unethical4,5 <strong>and</strong> of dubious legal validity 6 . Evidencehas yet to be found that patents have a widespreadnegative impact on research 7 . However,most researchers agree that patents, <strong>the</strong> threatof patents or restrictive patent licenses have attimes generated specific problems in <strong>the</strong> fieldof biotech—for example, problems of accessto <strong>new</strong> genetic tests by clinicians in <strong>the</strong> case ofMyriad Genetics’ breast cancer gene patents orproblems linked to broad patents such as thosefor embryonic stem cells 8–10 . The growingunpopularity of biotech patents has motivatedresearchers to find alternative or complementarysolutions that would foster <strong>the</strong> developmentof, <strong>and</strong> facilitate access to <strong>new</strong> biotechgoods. One of <strong>the</strong> most promising solutions,inspired by <strong>the</strong> open source movement in <strong>the</strong>field of information technology (IT) (Box 1),as well as by <strong>the</strong> already existing open scienceideal within <strong>the</strong> academic community, is openbiotechnology.In recent years, an impressive number ofopen projects have been developed in severalspheres of activity associated with biotechresearch. It would be difficult, if not impossible,to find a definition that would encompass<strong>the</strong> many radically different open projectscurrently existing in <strong>the</strong> field. Because <strong>the</strong>re isno source code involved in open biotechnologyprojects, <strong>the</strong>y will likely be quite differentfrom those observed in IT. The term ‘open biotechnology’has been used to refer to such dif-Yann Joly is at <strong>the</strong> Centre of Genomics <strong>and</strong>Policy, McGill University <strong>and</strong> Genome QuebecInnovation Centre, Montreal, Quebec, Canada.e-mail: yann.joly@mail.mcgill.caBox 1 The success of open source informaticsThe open source project in <strong>the</strong> field of IT was developed by idealist programmer/hackerRichard Stallman in <strong>the</strong> early 1980s in resistance to <strong>the</strong> increasing commercializationof computer software. Stallman created <strong>the</strong> Free Software Movement (FSM) <strong>and</strong> helpeddevelop <strong>the</strong> copyleft license to protect <strong>the</strong> open nature of <strong>the</strong> various informatics toolsdeveloped by <strong>the</strong> FSM. Since Stallman’s early successes, <strong>the</strong> popularity of open sourcehas been growing continuously <strong>and</strong> has led to <strong>the</strong> creation of <strong>the</strong> Open Source Initiative.The greatest success of <strong>the</strong> open source movement remains <strong>the</strong> development of <strong>the</strong> GNU/Linux kernel in <strong>the</strong> early 1990s. The Linux operating system now has >30 million usersworldwide, whereas collaborators to <strong>the</strong> Open Source Initiative were estimated to be>1.5 million in 2008.ferent projects as an open journal (e.g., PublicLibrary of Science), a <strong>new</strong> bioinformatic tool(e.g., <strong>the</strong> BioMoby messaging st<strong>and</strong>ard), a database(e.g., NIH db GaP), a big science project(e.g., HapMap or <strong>the</strong> Human Genome Project),a project to facilitate access to biotech researchtools (Cambia BiOS) or a combination of <strong>the</strong>se.In this confusing environment, projects havinglittle to do with open biotechnology have evenbeen presented as such by dishonest entrepreneurshoping to piggyback on <strong>the</strong> movement’spopularity. It is thus becoming increasinglyimportant to agree on some broad criteriathat would allow us to separate genuine openprojects from imitations. Based on an in-depthanalysis of <strong>the</strong> literature, we propose that, ata minimum, an open biotechnology projectshould meet <strong>the</strong> following criteria:1. Make use at one stage or ano<strong>the</strong>r of <strong>the</strong>internet <strong>and</strong> o<strong>the</strong>r information technologies(e.g., to promote quicker dissemination ofresults, promote collaboration <strong>and</strong>/or toimprove project coordination).2. Be designed in a way that will permit o<strong>the</strong>rmembers of <strong>the</strong> scientific community to collaborateon <strong>the</strong> project.3. Include a strategy to ensure rapid publicdissemination of <strong>the</strong> information <strong>and</strong>research results it generates.4. Permit members of <strong>the</strong> scientific communityto use its results without having to concluderestrictive agreements that would limitresearch freedom <strong>and</strong> integrity.5. Not use intellectual property (IP) to limitaccess to <strong>the</strong> project, its results or to discriminatebetween different uses or differentusers.Possibly, such a project could also includea mechanism to allow <strong>the</strong> initial researchersto recuperate reasonable production costsinvested in its realization. However, this mechanismshould not impede <strong>the</strong> open nature of<strong>the</strong> project.It can be seen from <strong>the</strong>se broad criteriathat open biotechnology is not necessarilyantagonistic to IP <strong>and</strong> that it is possible todevelop an open source project that wouldmake use of <strong>the</strong> patent system. A varietyof licensing schemes with or without IP(e.g., patent pool, non-assertion covenants,public domain, protected commons agreement,contractual licenses) can <strong>the</strong>oreticallybe used as <strong>the</strong> engine to support <strong>the</strong> opennature of <strong>the</strong> project (Table 1).nature biotechnology volume 28 number 5 MAY 2010 417


patents© 2010 Nature America, Inc. All rights reserved.Table 1 Possible open biotechnology licensing strategiesPatent pools IP An arrangement between at least two patent owners to license <strong>the</strong>irpatents to one ano<strong>the</strong>r or to third parties. A governance structurecan be set up to administer <strong>the</strong> pool for <strong>the</strong> patent owners.Open IP licenses IP Inventions or creations protected by IP rights <strong>and</strong> made accessiblethrough open licenses often based on <strong>the</strong> original open sourcecopyleft model.Contract (access agreement) Non-IP A legal agreement whereby two or more parties bind <strong>the</strong>mselves.Public domain (includingdefensive publishing)Non-IPOpen licenses: <strong>new</strong> models neededThe central element that will determine <strong>the</strong>success or failure of any open biotechnologymodel is its license. A license is a contractwith a series of conditions, financial or o<strong>the</strong>rwise,that will allow <strong>the</strong> licensee <strong>the</strong> use of alicensed good. Open licenses are at <strong>the</strong> heart ofany open project. They are <strong>the</strong> legal tools usedto guarantee that <strong>the</strong> project remains accessiblefor all users <strong>and</strong> customers. Additionalclauses can also permit researchers to ensurethat <strong>the</strong>ir goods are used efficiently <strong>and</strong> ethicallyby members of <strong>the</strong> scientific community.In <strong>the</strong> field of IT, <strong>the</strong> open source movementhas relied on a series of copyright licensesbased on Richard Stallman’s ‘copyleft’ modelto ensure open access to <strong>the</strong> software codesby <strong>the</strong> broad computing community 11 . Thelegal validity of some open source copyrightlicenses as well as that of similar CreativeCommons copyright licenses has recentlybeen confirmed by courts of law in a varietyof countries 12,13 . This legal recognition hasgiven legitimacy to <strong>the</strong> open source project<strong>and</strong> that of Creative Commons.In <strong>the</strong> field of biotech, things are very different.Unlike in IT, where most software isprotectable through copyright, products ofbiotech are usually protected through <strong>the</strong> patentsystem. Moreover, several biotech developmentsinitially thought to be protectablethrough <strong>the</strong> patent system have been foundnot deserving of such reward in recent legaldecisions, forcing developers to rely on o<strong>the</strong>rweaker IP rights (e.g., copyrights, sui generisdatabase rights), contractual law or commercialsecrecy for protection 14,15 . Accordingly,it is extremely difficult to develop simplelicense models to ensure <strong>the</strong> openness of agiven project <strong>and</strong> even more challenging todevelop model licenses that could be used fora variety of projects.In <strong>the</strong> case of potentially patentable goods,<strong>the</strong> central question is, can <strong>the</strong> patent systembe used, as copyright is, to ensure opendevelopment <strong>and</strong> access? Although <strong>the</strong>oreticallyfeasible, <strong>the</strong> high cost <strong>and</strong> legal uncer-Inventions or creations not protected by IP rights <strong>and</strong> disclosed to<strong>the</strong> public generally through <strong>the</strong> internet or scientific publications.Non-assertion covenants IP Agreement or unilateral promise by an IP owner not to enforce itsIP against third parties in certain predetermined circumstances.tainty associated with genetic patents wouldseriously jeopardize <strong>the</strong> viability of such anapproach. Indeed, patents are very expensiveto obtain, maintain <strong>and</strong> defend 16,17 .This means that any inventor relying on anopen patent license would need to chargea sufficient cost to its licensees to recuperateits investment in <strong>the</strong> patent (including,prospectively, a part of <strong>the</strong> cost of defendingits patent in court against potential infringers).This amount alone could be sufficient todeter potential users from obtaining a license.Moreover, many small projects (private orpublic) simply cannot afford <strong>the</strong> cost of patents<strong>and</strong> prefer to rely on commercial secrecyto protect <strong>the</strong>ir inventions.One suggested solution to <strong>the</strong> cost issue isan umbrella organization that could assume<strong>the</strong> responsibility for maintaining <strong>and</strong> protectingdonated patents for researchers 18 .This organization could be financed throughvoluntary donations, membership fees <strong>and</strong>licensing revenues obtained from users.However, getting sufficient numbers of interestedparties to contribute to <strong>the</strong> developmentof such an organization has proven an insurmountableobstacle so far. Ano<strong>the</strong>r potentialstrategy involves using <strong>the</strong> international patentfiling system to postpone both nationalpatent applications <strong>and</strong> part of <strong>the</strong> financialburden of patenting in most countries by 30months after <strong>the</strong> priority date. Following thatdelay, because of <strong>the</strong> fast-paced rate of biotechinnovation, patent protection will oftenno longer be necessary.Because of <strong>the</strong> high cost of patents <strong>and</strong> of<strong>the</strong> uncertainty concerning <strong>the</strong> patentabilityof a growing number of basic research findings,an increasing number of scientists haveturned to contractual licenses (often referredto as access agreements) to ensure open orcontrolled access to <strong>the</strong> fruits of <strong>the</strong>ir researchto members of <strong>the</strong> scientific community 19,20 .Purely contractual licenses, although lessexpensive <strong>and</strong> easier to design than patentlicenses, are not particularly efficient againstuse by third parties to <strong>the</strong> original contract. Agrowing tendency to use <strong>the</strong>se licenses to protectgoods that are not protectable throughIP (e.g., natural phenomena or raw data)has also been recently observed in biotech 21 .Although sometimes warranted by <strong>the</strong> needto better protect <strong>the</strong> identity of research participants22 , such use of contractual licensescould have <strong>the</strong> counterproductive <strong>and</strong> paradoxicaleffect of limiting access to an alreadypublic good to protect open access. Finally, athird strategy, leaving <strong>the</strong> good in <strong>the</strong> publicdomain unprotected, although appealing,remains vulnerable to abuse from morecommercially minded parties. Large biopharmaceuticalcompanies could access <strong>the</strong> good,modify it in small ways <strong>and</strong> use IP to control<strong>and</strong> market it, restricting its future use bymembers of <strong>the</strong> scientific community 23 .An additional problem common to mostopen biotechnology projects has to do with<strong>the</strong> sheer complexity of existing licenses <strong>and</strong>access agreements 18 . Because most scientistsare not legal experts, it makes sense that accesslicenses should be short <strong>and</strong> simply writtenso as to encourage wide use of a good. Sadly,this is generally not <strong>the</strong> case <strong>and</strong> many accessagreements <strong>and</strong> licenses developed with <strong>the</strong>best intentions have ended up much morecomplicated than <strong>the</strong> traditional IP licenses<strong>the</strong>y were seeking to replace.DiscussionEarly setbacks designing satisfactory licensesshould not be seen as a sign of failure for <strong>the</strong>open biotechnology movement. The freesoftware movement in <strong>the</strong> field of informaticstook 20 years to blossom into a strong,competitive force. Open biotechnology is stillin its infancy. However, <strong>the</strong> dynamism of <strong>the</strong>open biotechnology movement can be seennot only in <strong>the</strong> increasing number of openprojects but also in <strong>the</strong> growing support <strong>and</strong>interest of policy makers, nongovernmentalorganizations <strong>and</strong> research funders, whichbodes very well for <strong>the</strong> future of open biotechnology22 .One of <strong>the</strong> main obstacles on <strong>the</strong> road tosuccess for biotech proponents will be <strong>the</strong>need to develop simple, efficient <strong>and</strong> legallyvalid open licenses to support <strong>the</strong>ir projects.Such licenses will give <strong>the</strong> open biotechnologymovement <strong>the</strong> credibility <strong>and</strong> strengthit needs to foster collaboration, transparency<strong>and</strong> access on a large-scale basis. Work hasalready begun on this challenging task, albeitin a ra<strong>the</strong>r uncoordinated manner. To streamline<strong>and</strong> st<strong>and</strong>ardize current efforts, <strong>the</strong> creationof an international association whereresearchers interested in open biotechnologylicensing could discuss common problems<strong>and</strong> harmonize <strong>the</strong>ir efforts would be very418 volume 28 number 5 MAY 2010 nature biotechnology


patents© 2010 Nature America, Inc. All rights reserved.beneficial. The creation of similar informalgroups has been a key to <strong>the</strong> success of <strong>the</strong>open source movement in informatics.Open biotechnology is desirable to ensure<strong>the</strong> quick <strong>and</strong> efficient development <strong>and</strong>integration of genomic research, but also asa much needed reward to thank <strong>the</strong> impressivenumber of volunteers who have contributedaltruistically to <strong>the</strong> progress of thishighly prospective scientific field. Hopefully,<strong>the</strong> current problems designing suitable openlicenses will only prove a minor impedimenton <strong>the</strong> way to democratizing biotechnologicalresearch.ACKNOWLEDGMENTSThe author would like to thank F. Hemmings <strong>and</strong>B.M. Knoppers for reviewing <strong>the</strong> manuscript,E.R. Gold for comments on an earlier version of <strong>the</strong>draft <strong>and</strong> Genome Canada/Genome Quebec for <strong>the</strong>irfinancial support of <strong>the</strong> PRIVAC project GenomicsApplied to <strong>the</strong> Discovery <strong>and</strong> Development ofVaccines <strong>and</strong> Immuno<strong>the</strong>rapies.COMPETING FINANCIAL INTERESTSThe author declares no competing financial interests.1. Boyle, J. in Perspectives on Properties of <strong>the</strong> HumanGenome Project (ed. Kieff, F.S.) 97 (Elsevier AcademicPress, St. Louis, USA 2003).2. Heller, M.A. & Eisenberg, R.S. Science 280, 698–701(1998).3. Merges, R.P. & Nelson, R. Columbia Law Rev. 90, 839–916 (1990).4. Kass, L. Toward a More Natural Science (Free Press, NewYork, 1985).5. Kass, L. Public Interest 107, 65–86 (1992).6. Greenfield, D. Santa Clara Comput. High Technol. LawJ. 25, 467–538 (2009).7. Caulfield, T., Cook-Deegan, R.M., Kieff, F.S. & Walsh,J.P. Nat. Biotechnol. 24, 1091–1094 (2006).8. Cho, M.K., Illangasekare, S., Weaver, M.A., Leonard,D.G.B. & Merz, J.F. J. Mol. Diagn. 5, 3–8 (2003).9. Matthijs, G. & Halley, D. Eur. J. Hum. Genet. 10, 783–785 (2002).10. Murray, F. N. Engl. J. Med. 356, 2341–2343 (2007).11. St.-Laurent, A.M. Underst<strong>and</strong>ing Open Source <strong>and</strong> FreeSoftware Licensing, (O’Reilly Publishing, Sebastopol,California, 2004).12. Curry v. Audax, District Court of Amsterdam Case no.334492/KG 06–176 SR (2006).13. Jacobsen v. Katzer, 535 F.3d 1373 (Fed. Cir. 2008).14. In re: Dane K. Fisher <strong>and</strong> Rughunath v. Lalgudi 421 F.3d1365 (Fed Cir. 2005).15. In re: Marek Z. Kubin <strong>and</strong> Raymond G. Goodwin No.09–667,859 (Fed. Cir. April 3, 2009).16. US General Accounting Offices. Report to CongressionalRequesters (GAO-02–789) (US General AccountingOffices, Washington, DC, 2002).17. Malakoff, D. Science 291, 1194 (2001).18. Guadamuz Gonzàlez, A. NCJL & Tech. 7, 321–366(2006).19. Data Access Policy for <strong>the</strong> International HapMap Project(policy no longer in use). 20. Wellcome Trust Case Control Consortium. 21. Cromer, J.D. UMKC Law Rev. 76, 505–523 (2007).22. Birney, E. et al. Nature 461, 168–170 (2009).23. Cottrell, C.R. Wake Forest Intell. Prop. L.J. 7, 251–274(2007).nature biotechnology volume 28 number 5 MAY 2010 419


patents© 2010 Nature America, Inc. All rights reserved.Recent patent applications in fluorescent imagingPatent number Description Assignee InventorDE 102008049878,WO 2010037487JP 2010071662WO 2010032306US 20100068752WO 2010030119,KR 2010030194WO 2010030120,KR 2010030195US 20100062429,WO 2010028349US 20100055701,CN 101659705FR 2934954,WO 2010018216A sample high-resolution imaging method for laserscanning microscopes involving recording fluorescentlight <strong>and</strong> obtaining an optimal adjustment for a parameterof illumination <strong>and</strong>/or parameter of recording,e.g., wavelength of illumination, pulse sequence ofillumination, wavelength range of recording, exposuretime of recording <strong>and</strong> amplification of recording.A spectral image processing method involving calculating<strong>the</strong> contribution of fluorescent materials withrespect to <strong>the</strong> position of <strong>the</strong> material to be observedbased on intrinsic emission spectrum <strong>and</strong> measurementspectral image.A fluorescence image detecting apparatus for imagingcells, e.g., cancer cells, with a fluorescence side filtercomprising interference <strong>and</strong> absorption filters arrangedin a series along <strong>the</strong> fluorescence light–advancingdirection.New substituted quinolinium compounds useful forlabeling, detecting or quantifying target molecules,e.g., proteins <strong>and</strong> nucleic acids, identifying specificorganelles or regions in cells of interest <strong>and</strong> multi-colorimaging.Fluorescent silica nanoparticles useful for <strong>the</strong>detection of lymph nodes, preferably sentinel nodes,for in vivo imaging of lymph nodes, monitoring celllines, etc.Fluorescent silica nanoparticles useful for detectingpositron emission tomography <strong>and</strong> fluorescence dualimaging, comprising radioisotope labeling.A <strong>new</strong> fluorescent compound, 1,4-bis(2-(dimethylamino) ethylamino)-2,3-difluoro-5,8-dihydroxyanthracene-9,10-dione, for identifying <strong>the</strong>location or position of nuclei of cells.A <strong>new</strong> imaging agent comprising a fusion protein havingDNA binding domains <strong>and</strong> fluorescent domains;useful for imaging a cell <strong>and</strong> in screening assays fortesting <strong>the</strong> activity of biological effector molecules.An indocyanine green formulation in nanoemulsionform used as a diagnostic agent for imaging fluorescence,comprising a continuous aqueous phase <strong>and</strong>dispersed oily phase comprising <strong>the</strong> indocyaninegreen, amphiphilic lipid <strong>and</strong> lipid solubilizer.Carl Zeiss Microimaging(Jena, Germany)Kempe M, Kleppe I,Krampert G,Wolleschensky RPriorityapplicationdatePublicationdate9/30/2008 4/1/2010,4/8/2010Nikon (Tokyo) Mimura M 9/16/2008 4/2/2010Shimadzu(Kyoto, Japan)Cox HJ, P<strong>and</strong>e P,Patton WF, Xiang YSNU R&DB Foundation(Seoul), Seoul NationalUniversity Foundation(Seoul), IntellectualProperty & TechnologyLicensing Program(Riyadh, Saudi Arabia)SNU R&DB Foundation(Seoul), Seoul NationalUniversity Foundation(Seoul), IntellectualProperty & TechnologyLicensing Program(Riyadh, Saudi Arabia)Donegan JJ, Endo LifeSciences (New York),Li Z, P<strong>and</strong>e P,Patton WF, Rabbani E,Xiang YAn X, Tong Y, ZhangX, Beijing Instituteof Microbiology &Epidemiology (Beijing)Atomic Energy <strong>and</strong>Alternative EnergiesCommission(Gif-sur-Yvette, France)Hizume K, Oda I,Tsunazawa Y, Yajima ACox HJ, P<strong>and</strong>e P,Patton WF, Xiang YAiarfaj NA, Airessayes SI,Aitamimi SA, Alothman ZA,Choi G, Choi K, Chung D,Gang G, Jeon Y, Jeong D,Kang K, Kim Y, Kwon P,Park J, Piao J, Quan BAhmed AYH, Aimajid AM,Alothman AA, Alothman ZA,Choi G, Choi K, Chung D,Gang G, Jeon Y, Jeong D,Kang K, Kim Y, Kwon P,Park J, Piao J, Quan BDonegan JJ, Li Z, P<strong>and</strong>e P,Patton WF, Rabbani E,Xiang Y9/18/2008 3/25/20105/24/2005 3/18/2019/9/2008 3/18/2010,3/10/20109/9/2008 3/18/2010,3/18/20109/8/2008 3/11/2010An X, Tong Y, Zhang X 8/27/2008 3/4/2010,3/3/2010Goutayer M, Navarro YGF,Texier NI8/14/2008 2/19/2008,2/18/2009JP 2010014469 A method of manufacturing a radiological image Fuji Film (Tokyo) Isoda Y, Takasu A 7/2/2008 1/21/2010conversion panel, e.g., imaging plate <strong>and</strong> scintillatorpanel, involving forming a fluorescent substance layeron a substrate <strong>and</strong> maintaining <strong>the</strong> initial temperature<strong>and</strong> final temperature of <strong>the</strong> substrate.Source: Thomson Scientific Search Service. The status of each application is slightly different from country to country. For fur<strong>the</strong>r details, contact Thomson Scientific, 1800Diagonal Road, Suite 250, Alex<strong>and</strong>ria, Virginia 22314, USA. Tel: 1 (800) 337-9368 (http://www.thomson.com/scientific).420 volume 28 number 5 MAY 2010 nature biotechnology


<strong>new</strong>s <strong>and</strong> viewsAdvancing RNA-Seq analysisBrian J Haas & Michael C ZodyNew methods for analyzing RNA-Seq data enable de novo reconstruction of <strong>the</strong> transcriptome.© 2010 Nature America, Inc. All rights reserved.Sequencing of RNA has long been recognizedas an efficient method for gene discovery 1 <strong>and</strong>remains <strong>the</strong> gold st<strong>and</strong>ard for annotation ofboth coding <strong>and</strong> noncoding genes 2 . Comparedwith earlier methods, massively parallelsequencing of RNA (RNA-Seq) 3 has vastlyincreased <strong>the</strong> throughput of RNA sequencing<strong>and</strong> allowed global measurement of transcriptabundance. Two reports in this issue introduceapproaches for RNA-Seq analysis that capturegenome-wide transcription <strong>and</strong> splicing inunprecedented detail. Trapnell et al. 4 describea software package, Cufflinks, for simultaneousdiscovery of transcripts <strong>and</strong> quantificationof expression levels <strong>and</strong> apply it to study geneexpression <strong>and</strong> splicing during <strong>the</strong> differentiationof mouse myoblast cells. Taking a similarapproach, Guttman et al. 5 use software calledScripture to reannotate <strong>the</strong> transcriptomes ofthree mouse cell lines, defining complete genemodels for hundreds of <strong>new</strong> large intergenicnoncoding RNAs (lincRNAs) 6 .Although transcript sequencing has beenpossible for nearly 20 years, until recently itrequired <strong>the</strong> construction of clone libraries.Projects to determine full-length gene structuresfor human, mouse <strong>and</strong> o<strong>the</strong>r importantmodels have taken years to complete 7 .With <strong>new</strong> sequencing technologies, no cloningis needed, allowing direct sequencing ofcDNA fragments. In a matter of days <strong>and</strong> at asmall fraction of <strong>the</strong> cost of earlier projects,one can achieve reasonably complete coverageof a transcriptome 8 . But this approachhas been hindered by a substantial challenge:without cloning, one cannot know a prioriwhich reads came from which transcripts.Recent studies analyzed gene expression <strong>and</strong>alternative splicing by mapping short RNA-Seq reads to previously known or predictedBrian J. Haas <strong>and</strong> Michael C. Zody are at <strong>the</strong>Broad Institute, Cambridge, Massachusetts, USA.e-mail: bhaas@broadinstitute.org ormczody@broadinstitute.orgGenomeAlign reads togenomeAssemble transcriptsfrom spliced alignmentsMore abundantLess abundanttranscripts 9,10 . Although highly informative,such studies are inherently limited to knowngenes <strong>and</strong> to alternative splicing across previouslyidentified splice junctions. To fullyleverage RNA-Seq data for biological discovery,one should be able to reconstructtranscripts <strong>and</strong> accurately measure <strong>the</strong>irrelative abundance without reference to anannotated genome.Previous efforts to reconstruct transcriptsRNA-Seq readsAssemble transcriptsde novoAlign transcriptsto genomeFigure 1 Strategies for reconstructing transcripts from RNA-Seq reads. The ‘align-<strong>the</strong>n-assemble’approach (left) taken by Trapnell et al. 4 <strong>and</strong> Guttman et al. 5 first aligns short RNA-Seq reads to<strong>the</strong> genome, accounting for possible splicing events, <strong>and</strong> <strong>the</strong>n reconstructs transcripts from <strong>the</strong>spliced alignments. The ‘assemble-<strong>the</strong>n-align’ approach (right) first assembles transcript sequencesde novo—that is, directly from <strong>the</strong> RNA-Seq reads. These transcripts are <strong>the</strong>n splice-aligned to <strong>the</strong>genome to delineate intron <strong>and</strong> exon structures <strong>and</strong> variations between alternatively spliced transcripts.As de novo assembly is likely to work only for <strong>the</strong> most abundant transcripts, <strong>the</strong> align-<strong>the</strong>n-assemblemethod should be more sensitive, although this warrants fur<strong>the</strong>r investigation. RNA-Seq reads arecolored according to <strong>the</strong> transcript isoform from which <strong>the</strong>y were derived. Protein-coding regions ofreconstructed transcript isoforms are depicted in dark colors.from short RNA-Seq reads have followed twogeneral strategies (Fig. 1). The first, a de novoassembly approach implemented in <strong>the</strong> ABySSsoftware 11 , reduces <strong>the</strong> annotation problemto that of aligning full-length cDNAs, whichis well h<strong>and</strong>led by several algorithms. Thismethod is also applicable to <strong>the</strong> discovery oftranscripts that are missing or incomplete in<strong>the</strong> reference genome <strong>and</strong> to RNA-Seq datafrom organisms lacking a genome reference.nature biotechnology volume 28 number 5 MAY 2010 421


<strong>new</strong>s <strong>and</strong> views© 2010 Nature America, Inc. All rights reserved.However, assembly of short reads is itself difficult,<strong>and</strong> only <strong>the</strong> most abundant transcriptsare likely to be fully assembled.The second strategy involves splice-awarealignment of individual short RNA-Seq readsto <strong>the</strong> genome followed by transcript reconstruction12,13 . This is <strong>the</strong> approach taken byTrapnell et al. 4 in Cufflinks <strong>and</strong> by Guttman etal. 5 in Scripture. Both programs use <strong>the</strong> TopHataligner 14 to generate spliced alignments to <strong>the</strong>genome. Whereas earlier RNA-Seq experimentsproduced 25–32-base reads, <strong>the</strong> 75-base or longerreads now available can be aligned in segments,allowing reads whose ends are anchored in differentexons to define splice sites without relyingon prior annotations. Both programs <strong>the</strong>nbuild directed graphs <strong>and</strong> traverse <strong>the</strong> graphsto identify distinct transcripts, using paired endinformation to link sparsely covered transcripts<strong>and</strong> filter out unlikely isoforms.There are also notable differences in <strong>the</strong>details of <strong>the</strong> algorithms. For example, Cufflinksuses a rigorous ma<strong>the</strong>matical model to identify<strong>the</strong> complete set of alternatively regulated transcriptsat each locus <strong>and</strong> to assign coverage toeach transcript; Scripture employs a statisticalsegmentation model to distinguish expressedloci <strong>and</strong> filter out experimental noise. Moreextensive testing of Cufflinks, Scripture <strong>and</strong> denovo assembly methods such as ABySS will berequired to determine whe<strong>the</strong>r some methodsperform better in certain applications.Strikingly, despite <strong>the</strong> extensive prior annotationof <strong>the</strong> mouse genome (which was basedon millions of expressed sequence tags (ESTs)<strong>and</strong> thous<strong>and</strong>s of full-length cDNAs), bothstudies identify thous<strong>and</strong>s of novel transcripts,including novel isoforms of known genes<strong>and</strong> completely novel coding <strong>and</strong> noncodinggenes.Trapnell et al. 4 discover 3,724 highconfidenceisoforms of known genes that areabsent from existing automated <strong>and</strong> manuallycurated gene sets. They also demonstratethat independently determining <strong>the</strong> expressionof each isoform with high accuracy isan important prerequisite for subsequentanalysis. It has been shown that RNA-Seq canaccurately detect gene expression levels overa wide dynamic range 3,9 , but previous experimentshave relied on known or predictedisoforms. By reconstructing all isoformsdirectly from <strong>the</strong> RNA-Seq read alignments<strong>and</strong> accurately classifying individual pairedread fragments according to <strong>the</strong>ir isoform oforigin, Trapnell et al. 4 are able to measure<strong>the</strong> expression levels of individual isoformswithin a single gene with high accuracy.They fur<strong>the</strong>r show that correct assignmentof RNA-Seq fragments to novel isoforms cansubstantially affect <strong>the</strong> computed expressionlevels of known isoforms of <strong>the</strong> same gene.Measuring <strong>the</strong> expression of individual isoformsmakes it possible to study regulatorychanges in greater detail than was previouslyfeasible. Regulatory changes may be transcriptional,indicated by isoforms with differenttranscription start sites, or post-transcriptional,indicated by isoforms with <strong>the</strong> same startsite that show alternative internal splicing.Trapnell et al. 4 identify large numbers of genesthat undergo significant changes of both typesover <strong>the</strong> time course of <strong>the</strong>ir experiment. Theability to examine regulation of expression atsuch a fine scale over an entire genome allowsimportant <strong>new</strong> insights into genome function.For example, data at this level of detail couldvastly improve our ability to model regulatorynetworks or to infer regulatory motifs basedon correlation of <strong>the</strong> expression <strong>and</strong> splicing ofindividual isoforms ra<strong>the</strong>r than genes.Guttman et al. 5 also identify a number ofnovel splice isoforms but focus <strong>the</strong>ir analysison novel transcripts, particularly lincRNAs.Previous work using ChIP-Seq <strong>and</strong> wholegenometiling arrays 6 identified loci thatencode lincRNAs but lacked <strong>the</strong> resolution toproduce accurate models. With <strong>the</strong> Scripturepredictions, Guttman et al. 5 were able to constructgene models for 609 known loci <strong>and</strong>identify <strong>and</strong> generate structures for over 1,000novel lincRNAs. They also identified 469 antisensetranscripts of protein-coding genes.Determining gene models for <strong>the</strong>se noncodingRNAs opens <strong>the</strong> door to functional analysis.For example, Guttman et al. 5 examined <strong>the</strong>conservation levels of transcripts. Consistentwith previous observations, <strong>the</strong> lincRNAswere typically more conserved than intronicsequences but less conserved than proteincodinggenes. Conversely, <strong>the</strong> antisense transcriptsshowed no conservation outside ofthat resulting from overlap with coding exons,suggesting that <strong>the</strong>se two classes of transcriptshave very different functions <strong>and</strong> constraints.The RNA-Seq data also revealed expressionpatterns of noncoding transcripts <strong>and</strong> showedthat <strong>the</strong> lincRNAs are not only less abundantthan protein-coding genes but also less broadlyexpressed, with a greater fraction showing tissuespecificity compared with protein-codinggenes in <strong>the</strong> same cell lines. More generally,<strong>the</strong> determination of precise gene models <strong>and</strong>expression patterns for noncoding RNAs willfacilitate <strong>the</strong>ir inclusion in regulatory network<strong>and</strong> gene interaction models, an important steptoward underst<strong>and</strong>ing <strong>the</strong>ir functions.The number of novel transcripts discoveredby Trapnell et al. 4 <strong>and</strong> Guttman et al. 5 mayleave us wondering: why do existing annotationsfall so short? Known isoforms accountfor almost 80% of <strong>the</strong> RNA-Seq fragments in<strong>the</strong> Trapnell et al. 4 data, indicating that <strong>the</strong>seare highly expressed genes that were easilyidentified from clone-based cDNA sequencing(Guttman et al. 5 do not provide an identicalbreakdown, but <strong>the</strong> high level of coverageshown for <strong>the</strong> most abundant transcripts suggestssimilar numbers). Ano<strong>the</strong>r 11% of fragmentsmap to novel isoforms of known genes,62% of which are supported by previous ESTor mRNA sequence but are not annotatedas distinct transcripts. These less abundantisoforms may have been sampled sparsely inprevious studies, or may not have been fullysequenced or annotated because of similarityto known transcripts at <strong>the</strong> same locus.Similarly, 43% of <strong>the</strong> novel lincRNAs foundby Guttman et al. 5 were found in a previousmouse cDNA project 15 . Given <strong>the</strong> apparenttissue-specificity of lincRNAs, <strong>the</strong> remaindermay not have been seen previously due to relativelylimited tissue sampling. The emphasis ofearlier large-scale transcript sequencing projectson protein-coding genes also explains <strong>the</strong>absence of annotation for most of <strong>the</strong>se features,even where evidence has existed. Cleardefinition of <strong>the</strong>se novel coding <strong>and</strong> noncodingtranscripts is made possible by <strong>the</strong> unbiasednature of RNA-Seq combined with <strong>the</strong> unbiaseddiscovery methods of Trapnell et al. 4 <strong>and</strong>Guttman et al. 5Cufflinks, Scripture <strong>and</strong> similar tools providea great opportunity to improve <strong>the</strong>annotation of both well-studied genomes<strong>and</strong> poorly annotated genomes that have notreceived extensive traditional EST <strong>and</strong> fulllengthmRNA sequencing. However, <strong>the</strong>re arestill substantial challenges in using RNA-Seqfor annotation. A large number of transcriptsidentified by Cufflinks <strong>and</strong> Scripture wereconsistent with known isoforms but incompletedue to lack of coverage. Just as RNA-Seqallows reconstruction of transcripts that areonly weakly supported by EST data, many lesshighly or less broadly expressed transcripts areonly weakly or incompletely supported by currentRNA-Seq.As technology allows increasingly deepersequencing of <strong>the</strong> transcriptome, it will bepossible to identify more transcripts withhigher confidence. However, more sophisticatedmethods for separating functional lowabundancetranscripts from transcriptionalnoise <strong>and</strong> process artifacts will be needed.Also, although Cufflinks <strong>and</strong> Scripture willbe useful tools for annotating <strong>new</strong> genomes,different genomes may pose different algorithmicchallenges owing to variation incharacteristics such as gene density, introncontent <strong>and</strong> length, <strong>and</strong> prevalence of alternativesplicing. It remains to be seen howwell Cufflinks <strong>and</strong> Scripture will perform422 volume 28 number 5 MAY 2010 nature biotechnology


<strong>new</strong>s <strong>and</strong> views© 2010 Nature America, Inc. All rights reserved.on genomes that are very different frommouse.Massively parallel sequencing technologyhas already revolutionized <strong>the</strong> way westudy genomes, <strong>and</strong> <strong>the</strong> capacity <strong>and</strong> qualityof sequencing data continue to improve ata rapid pace. Trapnell et al. 4 <strong>and</strong> Guttman etal. 5 have demonstrated <strong>the</strong> power of RNA-Seqcombined with novel transcript discovery togreatly improve <strong>the</strong> annotation of an alreadywell-studied genome <strong>and</strong> to add substantially toour underst<strong>and</strong>ing of transcriptional <strong>and</strong> posttranscriptionalregulation. By making <strong>the</strong>ir softwareavailable, <strong>the</strong>y provide powerful tools thatwill facilitate future RNA-Seq studies.COMPETING FINANCIAL INTERESTSThe authors declare no competing financial interests.Haploidy with histonesGregory P Copenhaver & Daphne Preuss1. Adams, M.D. et al. Science 252, 1651–1656 (1991).2. Haas, B. J. et al. Genome Biol 3, RESEARCH0029(2002).3. Nagalakshmi, U. et al. Science 320, 1344–1349(2008).4. Trapnell, C. et al. Nat. Biotechnol. 28, 503–519(2010).5. Guttman, M. et al. Nat. Biotechnol. 28, 511–515(2010).6. Guttman, M. et al. Nature 458, 223–227 (2009).7. Temple, G. et al. Genome Res. 19, 2324–2333(2009).8. Wang, Z., Gerstein, M. & Snyder, M. Nat. Rev. Genet.10, 57–63 (2009).9. Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L.& Wold, B. Nat. Methods 5, 621–628 (2008).10. Wang, E.T. et al. Nature 456, 470–476 (2008).11. Birol, I. et al. Bioinformatics 25, 2872–2877 (2009).12. Denoeud, F. et al. Genome Biol. 9, R175 (2008).13. Yassour, M. et al. Proc. Natl. Acad. Sci. USA 106,3264–3269 (2009).14. Trapnell, C., Pachter, L. & Salzberg, S.L. Bioinformatics25, 1105–1111 (2009).15. Carninci, P. et al. Science 309, 1559–1563 (2005).An engineered centromere-specific histone could enable homozygousdiploid lines to be generated at high frequency, simplifying crop breeding.Sexually reproducing plants carrying a setof chromosomes from each parent are <strong>the</strong>rule in nature, but, for crop breeders, haploidplants represent a more useful resource.Arising ei<strong>the</strong>r spontaneously at very low frequenciesor generated by protracted crossbreedingor tissue-culture methods, haploidplants allow fully homozygous lines to bescreened for desirable traits in one generation.A recent study in Nature reports thathaploid plants can now be rapidly producedthrough <strong>the</strong> introduction of a single geneticalteration. Ravi <strong>and</strong> Chan 1 show that perturbinga centromeric histone in <strong>the</strong> modelplant Arabidopsis thaliana makes it possibleto reliably create haploid plants <strong>and</strong> ‘doubledhaploid’ progeny from those plants. If thisapproach can be translated to crop species, itwould find im<strong>media</strong>te application in agriculturalbiotechnology, shortening crop breedingprograms by years.In most eukaryotic organisms, <strong>the</strong> movementof a chromosome during cell division isGregory P. Copenhaver is in <strong>the</strong> Department ofBiology <strong>and</strong> <strong>the</strong> Carolina Center for GenomeSciences, University of North Carolina atChapel Hill, Chapel Hill, North Carolina, USA.Daphne Preuss is at Chromatin, Inc., Chicago,Illinois, USA.e-mail: dpreuss@chromatininc.comregulated by its centromere, which is boundby <strong>the</strong> centromeric histone H3 (CENH3), avariant of <strong>the</strong> more ubiquitous histone H3.After DNA replication, CENH3 is loaded onto<strong>the</strong> <strong>new</strong>ly formed daughter str<strong>and</strong>s, targetingepigenetic marks in <strong>the</strong> centromere region 2,3 .In a zygote, <strong>the</strong> centromeres of <strong>the</strong> maternal<strong>and</strong> paternal chromosomes are boundby CENH3 proteins from <strong>the</strong> maternal <strong>and</strong>paternal germ cells, respectively. Normally,<strong>the</strong>se two sets of CENH3 help to move <strong>the</strong>maternal <strong>and</strong> paternal chromosomes wi<strong>the</strong>qual efficiency in <strong>the</strong> first few mitotic divisionsthat form <strong>the</strong> developing embryo. Ravi<strong>and</strong> Chan 1 show that altering CENH3 fromone parent can induce targeted elimination of<strong>the</strong> chromosomes inherited from that parent(Fig. 1).The authors modify CENH3 in two ways.In <strong>the</strong> first, green fluorescent protein (GFP)is fused to <strong>the</strong> N terminus of CENH3. In<strong>the</strong> second, <strong>the</strong> N-terminal tail of CENH3is replaced with <strong>the</strong> corresponding domainfrom histone H3, <strong>and</strong> GFP is fused to <strong>the</strong><strong>new</strong> tail (Fig. 1a). Both <strong>the</strong> H3 <strong>and</strong> CENH3N-terminal tails are targets for multiple posttranslationalmodifications <strong>and</strong> are thoughtto regulate chromatin structure. The modifiedCENH3s do retain some function, but <strong>the</strong>irrecognition of <strong>the</strong> chromosome segregationmachinery is diminished. As a result, <strong>the</strong> onlychromosomes in <strong>the</strong> zygote that are movedproperly are those that harbor CENH3 from<strong>the</strong> wild-type parent (Fig. 1b).As <strong>new</strong> histone syn<strong>the</strong>sis takes place withina developing embryo, one would expect thatDNA str<strong>and</strong>s are loaded with a mixture ofCENH3 proteins encoded by <strong>the</strong> maternal<strong>and</strong> paternal alleles. Consistent with thisview, Ravi <strong>and</strong> Chan 1 find that <strong>the</strong> distinctionbetween chromosomes having maternal orpaternal CENH3 is lost after <strong>the</strong> first few divisions<strong>and</strong> <strong>the</strong> remaining divisions are able toproceed normally throughout development,resulting in a haploid plant. These haploidscan produce diploid (doubled-haploid) progeny,presumably ei<strong>the</strong>r through somatic chromosomedoubling or rare non-reductionaldivisions during meiosis.For nearly a century, crop breeders haverecognized that haploid plants can be usedto accelerate <strong>the</strong> development of <strong>new</strong> inbredlines 4 . In a typical program, geneticallydiverse parents are crossed to create hybrids(F 1 ), <strong>and</strong> populations of <strong>the</strong>ir offspring(F 2 , F 3 , F 4 <strong>and</strong> so on) are surveyed to identifydesirable traits <strong>and</strong> to select individualplants for fur<strong>the</strong>r propagation. After severalgenerations, <strong>the</strong> traits under selectionbecome fixed, <strong>and</strong> <strong>the</strong> inbred line is typicallyhomozygous for chromosomal regionsof interest. Incorporating doubled haploidsinto a breeding program has <strong>the</strong> advantageof saving considerable time by achievinghomozygosity more quickly; however, thisstrategy requires that more lines be planted<strong>and</strong> screened in a single generation, allowinga sufficiently complete survey of geneticcombinations.Although haploids occur spontaneously inmany crop species, <strong>the</strong>y are extremely rare,often forming prezygotically from gametophytecells that develop into a mature plant.Haploids can be formed at a higher (albeitstill extremely low) frequency from ‘inducer’lines, from gametophytes cultured in vitro,or from intra- or interspecies hybrids thatundergo post-zygotic chromosome elimination.What is most exciting about <strong>the</strong> breedingapproach described by Ravi <strong>and</strong> Chan 1 is<strong>the</strong> high frequency at which <strong>the</strong>y recover haploidplants from a diploid parent (~1–10%of a normal seed set in A. thaliana). In addition,<strong>the</strong>y show that <strong>the</strong> same scheme can beused to create diploid plants from tetraploids,which may be useful for breeding crops withcomplex ploidy, such as hexaploid wheat.These results raise several questions aboutchromosome dynamics during cell division.What is <strong>the</strong> nature of <strong>the</strong> competition betweencentromeres bound to different CENH3s? Theauthors suggest that <strong>the</strong> modified CENH3snature biotechnology volume 28 number 5 MAY 2010 423


<strong>new</strong>s <strong>and</strong> viewsabZygoteAfter DNA syn<strong>the</strong>sisAfter mitosisH3iCENH3DiploidheterozygoteiiGFP-tailswapDiploidheterozygoteGFP-CENH3iiiHaploidDoubled-haploidhomozygote© 2010 Nature America, Inc. All rights reserved.Figure 1 Manipulation of CENH3 structure perturbs chromosome segregation in plants. (a) Centromere-specific histone variants. Centromere-specificCENH3 differs from <strong>the</strong> ubiquitous histone H3 at its N-terminal tail. Ravi <strong>and</strong> Chan 1 modify CENH3 by fusing GFP to <strong>the</strong> N termini of CENH3 (GFP-CENH3) or of a CENH3 variant whose tail has been replaced by <strong>the</strong> tail of H3 (GFP-tailswap). (b) Inheritance patterns of chromosomes bearing normal <strong>and</strong>modified CENH3 in <strong>the</strong>ir centromeres. Self-pollination of plants bearing normal (i) or GFP-tagged CENH3 (ii) generates zygotes that replicate <strong>and</strong> transmitchromosomes normally. A cross between a plant with normal CENH3 <strong>and</strong> a plant with GFP-tagged CENH3 (iii) generates chromosome str<strong>and</strong>s that remainprimarily decorated with <strong>the</strong>ir respective parental CENH3 variants. Wild-type CENH3 has an advantage in promoting chromosome segregation to daughtercells, resulting in haploid plants that can be selfed to form doubled-haploid, homozygous progeny.may slow <strong>the</strong> kinetics of interaction withcellular machinery, leading to <strong>the</strong> loss ofchromosomes bound mostly by modifiedCENH3s. O<strong>the</strong>r possible explanations includedifferences in <strong>the</strong> interactions with o<strong>the</strong>r centromere-bindingproteins (as many as 19 havebeen identified) or in <strong>the</strong> physico-mechanicalproperties of histone-bound centromeres 5,6 .Is CENH3 <strong>the</strong> only component of <strong>the</strong> centromerewhose variants can compete in thismanner? For example, ano<strong>the</strong>r centromerecomponent, CENP-C, is functionally distinctfrom CENH3 but shares <strong>the</strong> quality of significantdiversity at <strong>the</strong> amino acid level in phylogeneticanalyses, suggesting that variants ofCENP-C might also have different competitiveefficiencies 7 .Ravi <strong>and</strong> Chan 1 have shown that a singlegenetic change can alter <strong>the</strong> efficiency ofhaploid induction in plants. Translating thistechnology to crops will require overcominga few hurdles. First, appropriate CENH3alleles must be identified—a null mutationin CENH3 will be required, <strong>and</strong> a stable lineencoding a suitably altered form of CENH3will have to be generated. Second, becausemost crop plants have more chromosomes(<strong>and</strong> often fewer seeds) than does A. thaliana,it is not clear how efficiently <strong>the</strong> set ofchromosomes contributed by <strong>the</strong> CENH3mutant parent will be eliminated. Despite<strong>the</strong>se questions, <strong>the</strong> potential benefits forcrop breeding coupled with <strong>the</strong> broad conservationof CENH3 across plant familiesclearly justify commercial investment in thisapproach.COMPETING FINANCIAL INTERESTSThe authors declare competing financialinterests: details accompany <strong>the</strong> full-text HTMLversion of <strong>the</strong> paper at http://www.nature.com/naturebiotechnology/.1. Ravi, M. & Chan, S.W. Nature 464, 615–618(2010).2. Morris, C.A. & Moazed, D. Cell 128, 647–650(2007).High-content imagingArnold Hayer & Tobias Meyer3. Lermontova, I. et al. Plant Cell 18, 2443–2451(2006).4. Dunwell, J.M. Plant Biotechnol. J. 8, 377–424(2010).5. Przewloka, M.R. & Glover, D.M. Annu. Rev. Genet.43, 439–465 (2009).6. Bloom, K. & Joglekar, A. Nature 463, 446–456(2010).7. Malik, H.S. Prog Mol Subcell Biol. 48, 33–52(2009).Multiparametric imaging of siRNA screening data sheds light on endocytosis.Gaining a systems-level underst<strong>and</strong>ing ofcomplex cellular processes will require <strong>new</strong>analytic approaches that account for <strong>the</strong>effects of perturbations on a large numberof functional parameters with high resolution<strong>and</strong> high throughput. A recent studyby Collinet et al. 1 in Nature provides aninstructive example of how this might beachieved. Focusing on endocytosis, <strong>the</strong>authors combine multiparametric imagingwith a genome-wide RNA interference(RNAi) screen in HeLa cells to analyzeArnold Hayer <strong>and</strong> Tobias Meyer are in <strong>the</strong>Department of Chemical <strong>and</strong> Systems Biology,Clark Center Bio-X, Stanford University,Stanford, California, USA.e-mail: tobias1@stanford.edumany parameters of <strong>the</strong> endocytic system inunprecedented detail.Endocytosis allows eukaryotic cells toremove signaling receptors from <strong>the</strong>ir surfaces<strong>and</strong> to take up extracellular molecules.Internalized cargo are shuttled through a mazeof intracellular sorting <strong>and</strong> transport stationsuntil <strong>the</strong>y reach <strong>the</strong>ir destinations. Primaryendocytic vesicles fuse with early endosomes,from where cargo is ei<strong>the</strong>r recycled back to<strong>the</strong> plasma membrane or sorted into <strong>the</strong>endo-lysosomal pathway for degradation.Clathrin-<strong>media</strong>ted endocytosis is a majorendocytic route used by transferrin, growthfactorreceptors <strong>and</strong> pathogenic viruses duringinfectious entry. Although clathrin-dependentuptake is <strong>the</strong> best-studied endocytic pathway, aystems-level underst<strong>and</strong>ing of <strong>the</strong> dynamic424 volume 28 number 5 MAY 2010 nature biotechnology


<strong>new</strong>s <strong>and</strong> viewsRNAi perturbationPulse with fluorescent lig<strong>and</strong>sConfocal imagingAutomated analysis of62 parametersClustering of genes accordingto observed phenotypesNucleusEndosomesGroup 1Gene AGene BScoreGroup 262 parametersGene CGene DGroup 3Gene EGene FKatie VicariFigure 1 Workflow of <strong>the</strong> high-content siRNA screen developed by Collinet et al. 1 . HeLa cells are treated with an siRNA or endoribonuclease-prepared siRNAfrom one of three genome-wide libraries, followed by a pulse of two fluorescently labeled endocytic cargos. The cells are <strong>the</strong>n fixed, <strong>and</strong> images are recordedusing automated confocal microscopy. A custom image analysis software measures 62 different parameters (such as endosome size, cargo content, <strong>and</strong> distancefrom each o<strong>the</strong>r <strong>and</strong> from <strong>the</strong> nucleus) in each of <strong>the</strong> images. The parameters are used to assemble a phenotypic profile for each of <strong>the</strong> targeted genes. Genesthat show similar effects on all of <strong>the</strong> parameters are predicted to be involved in similar endocytic processes.© 2010 Nature America, Inc. All rights reserved.<strong>and</strong> interconnected endocytic pathwaysremains elusive.Earlier large-scale, imaging-based RNAiapproaches have probed <strong>the</strong> endocytic systemusing transferrin or viruses as endocytic cargoto identify novel regulators 2–5 . Owing to <strong>the</strong>inherent noise in RNAi screens, <strong>the</strong>se studiessought to obtain a small number of validatedhits ra<strong>the</strong>r than to define <strong>the</strong> function of everytested gene. Typically, <strong>the</strong> high-throughputnature of such approaches required relativelylow-resolution images <strong>and</strong> <strong>the</strong>refore allowed<strong>the</strong> evaluation of only a small number ofparameters.In contrast, Collinet et al. 1 aimed to determine<strong>the</strong> role of all genes in <strong>the</strong> endocytic systemwith high accuracy. They began by pulsingHeLa cells with two lig<strong>and</strong>s that enter cells byclathrin-<strong>media</strong>ted endocytosis—fluorescentlytagged transferrin <strong>and</strong> epidermal growth factor(Fig. 1). Once endocytosed, <strong>the</strong>se lig<strong>and</strong>s <strong>and</strong><strong>the</strong>ir receptors follow distinct routes inside <strong>the</strong>cell: transferrin <strong>and</strong> transferrin receptor recycleback to <strong>the</strong> plasma membrane, whereas epidermalgrowth factor <strong>and</strong> its receptor enter <strong>the</strong> degradationpathway. For RNAi perturbations, <strong>the</strong>authors used three genome-wide libraries, or7–8 small interfering RNAs (siRNAs) or endoribonuclease-preparedsiRNAs per gene, yielding~161,000 knockdown conditions in total. Highresolutionimages of fixed cells were acquired byautomated spinning disc confocal microscopy,allowing visualization of subcellular structures<strong>and</strong> intracellular cargo distribution.During <strong>the</strong>ir life cycle, endosomes typicallytravel from <strong>the</strong> cell periphery toward <strong>the</strong> cellcenter while changing shape <strong>and</strong> <strong>the</strong> extent of<strong>the</strong>ir tubular extensions in accordance withongoing sorting processes. In an effort to comprehensivelydescribe this system, Collinet etal. 1 extracted 62 parameters from <strong>the</strong> highresolutionimages. These included <strong>the</strong> totalamount of internalized cargo as well as parametersthat define endosomal shape, number <strong>and</strong>distribution. Using <strong>the</strong>se parameters, <strong>the</strong>y generatedphenotypic profiles for all genes <strong>and</strong> <strong>the</strong>nanalyzed <strong>the</strong> profiles to identify 4,609 geneswhose knockdown significantly altered <strong>the</strong> stateof <strong>the</strong> endocytic system for ei<strong>the</strong>r one or both of<strong>the</strong> endocytic lig<strong>and</strong>s. These hits were clusteredinto 14 groups according to <strong>the</strong>ir phenotypicprofiles (Fig. 1).As expected, established players in endocytictrafficking were well represented. But<strong>the</strong> screen also identified genes not previouslyassociated with endocytic trafficking,such as those encoding components of <strong>the</strong>transforming growth factor beta, Wnt <strong>and</strong>Notch signaling pathways, <strong>and</strong> many genesof unknown function. Among <strong>the</strong> variousclasses of genes identified, those that regulateendocytosis of transferrin <strong>and</strong> epidermalgrowth factor differently are of specialinterest. Although both lig<strong>and</strong>s enter cellsby a clathrin-dependent mechanism, <strong>the</strong>re isevidence that <strong>the</strong>y use distinct populationsof vesicles 6 . Collinet et al. 1 now provide acatalog of genes whose products selectivelyregulate endocytosis of one or <strong>the</strong> o<strong>the</strong>rlig<strong>and</strong>, fur<strong>the</strong>r demonstrating <strong>the</strong> plasticityof clathrin-<strong>media</strong>ted endocytosis.Future studies could investigate <strong>the</strong> potential<strong>the</strong>rapeutic relevance of <strong>the</strong>se results. Forexample, uncontrolled cell growth caused bydefects in receptor internalization might becorrected by specifically stimulating <strong>the</strong> degradationof <strong>the</strong>se receptors. In <strong>the</strong> context ofinfectious disease, it may be possible to selectivelyblock infection by pathogenic viruses thatrely on clathrin-<strong>media</strong>ted endocytosis. Ideally,such strategies would target <strong>the</strong> disease-relatedsubtype of clathrin-<strong>media</strong>ted endocytosiswhile allowing <strong>the</strong> cell to take up nutrients <strong>and</strong>remain healthy.Previous large-scale siRNA screens studyingsimilar or o<strong>the</strong>r mammalian systems oftenproduced hit lists with relatively poor overlap.Divergent screening strategies may partlyaccount for this effect, but off-target effects ofindividual siRNAs <strong>and</strong> variability in cell-culturesystems remain a major concern. True validationof <strong>the</strong> current dataset will ultimately comefrom detailed follow-up studies that establishprotein function of individual hits at a mechanisticlevel.Never<strong>the</strong>less, <strong>the</strong> work of Collinet et al. 1provides a road map of how to generate a comprehensivegenetic data set of <strong>the</strong> mammalianendocytic system <strong>and</strong> o<strong>the</strong>r cellular processes.Their screening data are readily accessible online(http://gwsdisplayer.mpi-cbg.de/), allowinginterrogations of single genes or groups ofgenes. By combining this data set with complementarymultiparametric genome-wide data ono<strong>the</strong>r endocytic processes, it should be possibleto construct a comprehensive endocytic database.Ultimately, this database, if st<strong>and</strong>ardizedin format <strong>and</strong> quality, could be combined withanalogous data on o<strong>the</strong>r cellular processes suchas mitosis 7 or <strong>the</strong> secretory pathway to createa repository for mammalian loss-of-functionscreening data similar to existing resources forsequence, proteomics <strong>and</strong> microarray data.Such databases have proven very useful ino<strong>the</strong>r model organisms (http://www.flyrnai.org/, http://www.wormbase.org/).COMPETING FINANCIAL INTERESTSThe authors declare no competing financial interests.1. Collinet, C. et al. Nature 464, 243–249 (2010).2. Galvez, T. et al. Genome Biol. 8, R142 (2007).3. Pelkmans, L. et al. Nature 436, 78–86 (2005).4. Karlas, A. et al. Nature 463, 818–822 (2010).5. Konig, R. et al. Nature 463, 813–817 (2010).6. Leonard, D. et al. J. Cell Sci. 121, 3445–3458(2008).7. Neumann, B. et al. Nature 464, 721–727 (2010).nature biotechnology volume 28 number 5 MAY 2010 425


<strong>new</strong>s <strong>and</strong> viewsThird-generation sequencing fireworks at MarcoIsl<strong>and</strong>David J Munroe & Timothy J R HarrisAdvances in sequencing platforms promise to make this technology more accessible.© 2010 Nature America, Inc. All rights reserved.It was unseasonably cold in Florida during<strong>the</strong> Advances in Genome Biology <strong>and</strong>Technology (AGBT) meeting on MarcoIsl<strong>and</strong>, on 24–27 February, but <strong>the</strong>re wasno cooling <strong>the</strong> enthusiasm <strong>and</strong> excitementof meeting participants over <strong>the</strong> <strong>new</strong> developments<strong>and</strong> innovations that continue todrive DNA sequencing technology. Even <strong>the</strong>lavish firework display could not upstage <strong>the</strong>sequencing pyrotechnics on offer from <strong>the</strong><strong>new</strong>est generation of instruments showcasedduring <strong>the</strong> meeting.Over <strong>the</strong> course of <strong>the</strong> past 5 years,<strong>the</strong> development of so-called ‘next’- or‘second’-generation DNA sequencing, <strong>and</strong><strong>the</strong> applications that this enabled, havefirmly established DNA sequencing as <strong>the</strong>preeminent technology driving future developmentsin genomics. As reported at AGBT,<strong>the</strong> dominant second-generation sequencingplatforms—HiSeq from Illumina (San Diego,CA) <strong>and</strong> SOLiD from Life Technologies(Foster City, CA)—have been optimized sothat, by years end, <strong>the</strong>y will not only havesubstantially reduced h<strong>and</strong>s-on sample preparationtime but also have <strong>the</strong>ir throughputincreased to ≥100 Gb of mappable sequenceper run. Improvements in <strong>the</strong> <strong>new</strong> Illuminaplatform (HiSeq 2000) include reagent optimization,<strong>the</strong> use of two flow cells <strong>and</strong> a dualsurface imaging system, whereas <strong>the</strong> <strong>new</strong>SOLiD platform (SOLiD 4) makes use of a<strong>new</strong>ly engineered DNA ligase, smaller beadsize, reagent optimization, <strong>and</strong> improvedsoftware for bead detection <strong>and</strong> color calling.In addition to increased throughput, <strong>the</strong>SOLiD 4 boasts a >99.9% accuracy rate.Last year, <strong>the</strong>se platforms were joined by<strong>the</strong> commercial launch of ano<strong>the</strong>r system,<strong>the</strong> arrayed nanoball system of CompleteGenomics (Mountain View, CA), which isan iteration of <strong>the</strong> sequencing-by-ligationapproach. Unlike <strong>the</strong> Illumina <strong>and</strong> LifeTechnologies sequencing businesses, whichwere positioned as instrument vendors, <strong>the</strong>David J. Munroe <strong>and</strong> Timothy J. R. Harris arein <strong>the</strong> Advanced Technology Program, SAIC-Frederick, Inc., National Cancer Institute–Frederick, Frederick, Maryl<strong>and</strong>, USA.e-mail: dmunroe@ncifcrf.govComplete Genomics business model is tooperate as a sequencing service ra<strong>the</strong>r thansell instrumentation <strong>and</strong> consumables. TheComplete Genomics platform uses a proprietarycombinatorial probe–anchor ligationstrategy to sequence amplified DNAtemplates that are self-assembled into DNAnanoballs anchored onto patterned nanoarrays1 . The ligation chemistry is complex,as is <strong>the</strong> data analysis inherent to all shortreadplatforms, two features that toge<strong>the</strong>rtranslate into long turnaround times. Evenso, a recent report detailing <strong>the</strong> sequencingof three human genomes demonstratesthat this platform is highly accurate <strong>and</strong> iscapable of generating an average of 45–87fold coverage at a consumables cost of $4,400per genome 1 .Although improvements to <strong>the</strong> secondgenerationcontinue to impress, perhaps <strong>the</strong>greatest ‘buzz’ at AGBT <strong>and</strong> elsewhere hasbeen about <strong>the</strong> development of so-calledthird-generation DNA sequencing platforms.Designed to complement second-generationsequencing, third-generation platforms haveseveral characteristics that distinguish <strong>the</strong>mfrom <strong>the</strong>ir predecessors, including singlemoleculetemplates, lower cost per base, easysample preparation, significantly faster runtimes <strong>and</strong> simplified primary data analysis.Long-read lengths (hundreds of base pairsor more) enable de novo sequencing <strong>and</strong>simplify data analysis. In particular, a longreadlength simplifies sequence assembly <strong>and</strong>facilitates a variety of data analysis functionssuch as detection of copy number variations(CNVs), translocations, splice variation,chimeric transcripts <strong>and</strong> haplotype phasing.The use of single-molecule templatestranslates into simplified template preparation<strong>and</strong> typically reduces <strong>the</strong> amount ofsample needed for analysis. Third-generationsequencing platforms also have significantlyfaster run times compared with second-generationinstruments (minutes as opposed todays). These short run times will facilitateapplication development <strong>and</strong> open <strong>the</strong> doorto <strong>the</strong> routine use of sequencing as a diagnostictool. Currently, several such platformsare in various stages of development. Fourdistinguish <strong>the</strong>mselves from <strong>the</strong> rest: PacificBiosciences (PacBio; Menlo Park, CA),Life Technologies (Carlsbad, CA), OxfordNanopore (Oxford, UK) <strong>and</strong> Ion Torrent(Gilford, CT). The representatives of <strong>the</strong>secompanies were decked out in <strong>the</strong>ir brightlycolored company regalia at <strong>the</strong> meeting, wi<strong>the</strong>ach ensconced in <strong>the</strong>ir respective rooms like<strong>the</strong> pits of a Formula One race.Of <strong>the</strong> emerging third-generation technologies,<strong>the</strong> PacBio <strong>and</strong> Life Technologiesplatforms are <strong>the</strong> most similar <strong>and</strong> closest tocommercial release, with early-access partnershipsscheduled for midyear <strong>and</strong> yearend,respectively. The similarities between<strong>the</strong>se two platforms confer a shared set ofstrengths <strong>and</strong> weaknesses. Both <strong>the</strong> PacBio<strong>and</strong> Life Technologies instruments use DNApolymerase <strong>and</strong> terminal phosphate–labelednucleotides 2 that allow long read lengths(1 kb <strong>and</strong> 1.5 kb, respectively) <strong>and</strong> short runtimes (15 min <strong>and</strong> 20 min, respectively). Theyboth also use a charge-coupled diode (CCD)array detection system 3 . This means that <strong>the</strong>throughput of <strong>the</strong>se platforms is restrictedby <strong>the</strong> current state-of-<strong>the</strong>-art in CCD arraytechnology. Simply put, <strong>the</strong>se cameras havea finite amount of data-recording capacity.Until this capacity is increased, <strong>the</strong> per-runthroughput of <strong>the</strong>se platforms will be limitedto a level no higher than that of <strong>the</strong> Illumina<strong>and</strong> SOLiD second-generation sequencers.But it is <strong>the</strong> differences, ra<strong>the</strong>r than <strong>the</strong>similarities, between <strong>the</strong> PacBio <strong>and</strong> LifeTechnologies platforms that are most pertinent.The reactions in <strong>the</strong> PacBio RSsequencer are performed in 80,000 zeromodewaveguide (ZMW) ‘wells’, each holding20 zeptoliters (10 –21 liters) 4–7 (Fig. 1a). Inaddition to de novo sequencing capabilities,<strong>the</strong> first release of <strong>the</strong> PacBio instrumentwill also offer redundant re-sequencing <strong>and</strong>strobe-sequencing applications. Redundantsequencing generates multiple independentreads of each template molecule, resultingin accuracy rates exceeding 99.9%. Thesecond application, strobe sequencing, is asimplified alternative to second-generationsequencing’s mate-pair application. Strobesequencing was developed as a solution to<strong>the</strong> problem that continuous illuminationrequired by <strong>the</strong> excitation laser inflicts photo426 volume 28 number 5 MAY 2010 nature biotechnology


<strong>new</strong>s <strong>and</strong> viewsdamage on <strong>the</strong> polymerase in <strong>the</strong> ZMWguide wells, thus limiting read lengths. Strobesequencing addresses this issue by periodically‘turning off ’ <strong>the</strong> excitation laser. While<strong>the</strong> laser is ‘off,’ no sequence data can becollected, but <strong>the</strong> polymerase can continueto traverse <strong>the</strong> template molecule withoutincurring damage; <strong>and</strong> <strong>the</strong> distant sequenceis <strong>the</strong>n read when <strong>the</strong> laser is turned back on.The net effect is that multiple sequence reads(totaling an average of 1 kb) can be collectedacross longer stretches of each contiguoustemplate molecule.In contrast to PacBio, <strong>the</strong> Life Technologiesplatform covalently binds <strong>the</strong> end of <strong>the</strong>DNA template molecule to a glass array surface(Fig. 1b). The DNA polymerase usedin <strong>the</strong> Life Technologies system is modifiedwith a quantum dot fluorescent donor moleculethat enables a fluorescence resonanceenergy transfer (FRET)–based labeling strategyoffering two distinct advantages. First,light emission can only emanate from labelednucleotides as <strong>the</strong>y are being incorporated,leading to a significantly lower background.Second, because a FRET-based system doesnot require continuous high-energy laserexcitation, significantly less photodamageis inflicted on <strong>the</strong> polymerase, whichshould ultimately lead to much longer readlengths. With <strong>the</strong> initial release of this platform,Life Technologies will also offer aredundant sequencing application that willpush accuracy rates to >99.9%. Currently indevelopment is an ultra-long-read-lengthapplication (>100 kb), in which single temab c dH +IlluminationEmissionGAIon-sensitive layerIon sensor© 2010 Nature America, Inc. All rights reserved.Figure 1 Third-generation sequencing platforms. (a) Pacific Biosciences SMRT (single-molecule real-time) DNA sequencing method. The platform uses aDNA polymerase anchored to <strong>the</strong> bottom surface of a ZMW (pictured in cross section). Differentially labeled nucleotides enter <strong>the</strong> ZMW via diffusion <strong>and</strong>occupy <strong>the</strong> ‘detection volume’ (white translucent halo area) or microseconds. During an incorporation event, <strong>the</strong> labeled nucleotide is ‘held’ within <strong>the</strong>detection volume by <strong>the</strong> polymerase for tens of milliseconds. As each nucleotide is incorporated, <strong>the</strong> label, located on <strong>the</strong> terminal phosphate, is cleavedoff <strong>and</strong> diffuses out of <strong>the</strong> ZMW. (b) Life Technologies FRET sequencing platform uses base fluorescent labeling technology, a DNA polymerase modifiedwith a quantum dot <strong>and</strong> DNA template molecules immobilized onto a solid surface. During an incorporation event, energy is transferred from <strong>the</strong> quantumdot to an acceptor fluorescent moiety on each labeled base. Light emission can only emanate from labeled nucleotides as <strong>the</strong>y are being incorporated. (c)The Oxford nanopore sequencing platform uses an exonuclease coupled to a modified α-hemolysin nanopore (purple, pictured in cross section) positionedwithin a lipid bilayer. As sequentially cleaved bases are directed through <strong>the</strong> nanopore, <strong>the</strong>y are transiently bound by a cyclodextrin moiety (blue), disturbingcurrent through <strong>the</strong> nanopore in a manner characteristic for each base. (d) The Ion Torrent sequencing platform uses a semiconductor-based high-densityarray of microwell reaction chambers positioned above an ion-sensitive layer <strong>and</strong> an ion sensor. Single nucleotides are added sequentially, <strong>and</strong> incorporationis recorded by measuring hydrogen ions released as a by-product of nucleotide chain elongation.plate molecules are stretched in nanotubes<strong>and</strong> sequenced by several polymerase moleculessimultaneously. As this platform comescloser to commercial release, we will see towhat extent <strong>the</strong>se differences translate intoadvantages.Slightly fur<strong>the</strong>r from commercial releaseis <strong>the</strong> Oxford Nanopore Technologies instrument.Ra<strong>the</strong>r than using a sequencing-bysyn<strong>the</strong>sismethod, this technology employsan exonuclease-based ‘sequencing by deconstruction’approach. At <strong>the</strong> heart of thistechnology is an exonuclease coupled to amodified α-hemolysin nanopore (Fig. 1c).The modified nanopores are positionedwithin a lipid bilayer over a microwell thatcontains a pair of electrodes on ei<strong>the</strong>r side of<strong>the</strong> lipid bilayer. When an electrical potentialis applied, <strong>the</strong> high intrinsic resistance of <strong>the</strong>bilayer directs a cation-modulated currentthrough <strong>the</strong> nanopore. As a DNA sampleis introduced, <strong>the</strong> exonuclease functionsto ‘capture’ <strong>the</strong> DNA molecule <strong>and</strong> direct<strong>the</strong> sequentially cleaved bases through <strong>the</strong>nanopore. As each cleaved base traverses <strong>the</strong>nanopore, <strong>the</strong> current is disturbed in a mannercharacteristic for each base, creating an‘electrical trace’ unique to each nucleotide 8 .Distinct advantages of this system includea low instrument fabrication <strong>and</strong> operationcost due to <strong>the</strong> lack of labeled nucleotides<strong>and</strong> optical detection systems (that is, laser<strong>and</strong> CCD camera). In addition, <strong>the</strong> OxfordNanopore platform is compatible with directRNA sequencing <strong>and</strong> <strong>the</strong> detection of modifiedbases 8 by virtue of each individual base’scharacteristic ability to disturb electricalcurrent, which should enable epigeneticsapplications. A clear disadvantage, however,is that because <strong>the</strong> template moleculeis digested during sequencing, redundantsequencing (<strong>and</strong> <strong>the</strong> associated high accuracy)is not possible. However, this drawbackcould be eliminated by simply replacing <strong>the</strong>exonuclease coupled to <strong>the</strong> nanopore witha DNA polymerase. Several o<strong>the</strong>r noteworthygroups, including GE Healthcare (LittleChalfont, UK), are also developing nanopore-basedsequencing platforms, <strong>the</strong> detailsof which have not yet been made public.Arguably, though, <strong>the</strong> most heat at AGBTwas generated by <strong>the</strong> Ion Torrent Systemsplatform. This technology uses a semiconductor-basedhigh-density array of microwellsthat function as reaction chambers (Fig.1d). As DNA polymerase traverses eachsingle-molecule template, nucleotide incorporationevents are recorded using a unique<strong>and</strong> imaginative readout system that measureshydrogen ions released as a naturalby-product of chain elongation—a kind ofsequencing pH meter. Like o<strong>the</strong>r nanoporebasedtechnologies, <strong>the</strong> Ion Torrent platformhas <strong>the</strong> advantage of low instrument fabrication<strong>and</strong> operation costs owing to <strong>the</strong> lackof labeled nucleotides <strong>and</strong> optical detectionsystems. Ion Torrent currently claims 100–200 base reads in 1–2 h on an instrument<strong>the</strong> size of a typical microwave oven with aprojected sales price of ~$50,000. Althoughhighly anticipated, no release date has yetbeen scheduled.nature biotechnology volume 28 number 5 MAY 2010 427


<strong>new</strong>s <strong>and</strong> views© 2010 Nature America, Inc. All rights reserved.Today, we st<strong>and</strong> at <strong>the</strong> edge of an era when<strong>new</strong> sequencing technologies <strong>and</strong> <strong>the</strong> greatlyreduced cost of generating sequencing data,will open up a host of possibilities in basicresearch, translational medicine <strong>and</strong> diagnosticsthat were unimaginable a decadeago. Up until this point, <strong>the</strong> complexity <strong>and</strong>cost of large-scale capillary <strong>and</strong> secondgenerationDNA sequencing largely limitedits practice to large, specialized centers.Third-generation sequencing technologypromises to remove <strong>the</strong>se barriers. The simplesample preparation, short run times <strong>and</strong>relative ease of operation inherent to singlemoleculesequencing make it significantlymore accessible <strong>and</strong> will translate into manymore genomes or parts of genomes beingsequenced. This will require continuing verysubstantial investments in data storage <strong>and</strong>analysis to keep pace with <strong>the</strong> sequencingmachines.Perhaps <strong>the</strong> greatest impact, though, willbe felt in clinical medicine <strong>and</strong> personalizedhealthcare, given that <strong>the</strong> characteristics ofthird-generation sequencing make <strong>the</strong>seplatforms particularly well suited to moleculardiagnostics. Areas in which we mightexpect to see this <strong>new</strong> sequencing technologyplaying a more im<strong>media</strong>te role includehaplotyping, mutation detection, companiondiagnostics <strong>and</strong> real-time monitoring ofpathogen evolution. Costs aside, it is clearthat third-generation DNA sequencing islikely to produce fireworks lasting considerablylonger than <strong>the</strong> ones at AGBT.AcknowledgementsThis project has been funded in whole or in partwith federal funds from <strong>the</strong> National CancerInstitute, National Institutes of Health, undercontract HHSN261200800001E. The content ofthis publication does not necessarily reflect <strong>the</strong>views or policies of <strong>the</strong> Department of Health<strong>and</strong> Human Services, nor does mention of tradenames, commercial products or organizations implyendorsement by <strong>the</strong> US Government.COMPETING FINANCIAL INTERESTSThe authors declare no competing financial interests.1. Drmanac, R. et al. Science 327, 78–81 (2010).2. Korlach, J. et al. Nucleosides Nucleotides NucleicAcids 27, 1072–1083 (2008).3. Lundquist, P.M. et al. Optics Lett. 33, 1026–1028(2008).4. Levene, M.J. et al. Science 299, 682–686 (2003).5. Foquet, M. et al. J. Appl. Phys. 103, 034301(2008).6. Korlach, J. et al. Proc. Natl. Acad. Sci. USA 105,1176–1181 (2008).7. Eid, J. et al. Science 323, 133–138 (2009).8. Clarke, J. et al. Nat. Nanotechnol. 4, 265–270(2009).428 volume 28 number 5 MAY 2010 nature biotechnology


esearch highlights© 2010 Nature America, Inc. All rights reserved.Weighing in on single cell growth ratesManalis <strong>and</strong> colleaguesmonitor <strong>the</strong> growth ofsingle cells in real time bydeveloping a fluidic controlsystem for a previouslydescribed suspendedmicrochannel resonator.By trapping a cell in<strong>the</strong> channel <strong>and</strong> <strong>the</strong>ncontinuously alternating<strong>the</strong> direction in which fluid flows through <strong>the</strong> device, <strong>the</strong>yare able to repeatedly measure over time <strong>the</strong> cell’s buoyantmass, a parameter analogous to its dry mass. The high massprecision achieved by <strong>the</strong> resonator enables growth rates tobe measured over intervals much shorter than <strong>the</strong> cell divisiontime (typically ~5 min for Escherichia coli, Bacillus subtilis<strong>and</strong> Saccharomyces cerevisiae, <strong>and</strong> ~20 min for mouselymphoblasts). For all four of <strong>the</strong>se cell types, heavier cells growfaster than lighter ones <strong>and</strong> <strong>the</strong>re is substantial variation in<strong>the</strong> ‘instantaneous’ growth rate, even for cells of similar mass.Coupling this approach with fluorescent reporters of molecular<strong>and</strong> cellular activities <strong>and</strong>/or transgenes encoding productsof biotechnological interest may provide insights into diseasemechanisms or <strong>the</strong> modes of action of drugs. (Nat. Methods,published online 11 April 2010; doi:10.1038/nmeth.1452) PHRNA-binding site mappingCellular RNAs are bound by a variety of RNA binding proteins.Identifying <strong>the</strong> targets of <strong>the</strong>se proteins <strong>and</strong> determining <strong>the</strong> functionalconsequences of RNA-protein interactions have been <strong>the</strong> focusof intensive research. Tuschl <strong>and</strong> colleagues now present a methodthat enables efficient isolation of RNAs bound to a specific protein<strong>and</strong> high-resolution mapping of <strong>the</strong> interaction sites within eachRNA species. The method is based on <strong>the</strong> incorporation of <strong>the</strong> thymidinebase analog 4-thiouridine, which can be covalently cross-linkedto proteins by UV light. After immunoprecipitation of <strong>the</strong> protein ofinterest, <strong>the</strong> bound RNAs are identified by Illumina sequencing. Ascross-linked 4-thiouridine is detected as cytosine in <strong>the</strong> sequencingreaction, regions of an RNA with high frequency of T to C conversionindicate sites where protein binds. The authors use this technology,termed PAR-CLIP, to map sites targeted by miRNAs <strong>and</strong> RNAbindingproteins. Surprisingly, 50% of <strong>the</strong> miRNA target sites occurwithin <strong>the</strong> coding regions of mRNAs, although coding sequencetarget sites appear to be less efficient in destabilizing mRNAs thanmore traditional 3´ untranslated region sites. The data also suggestthat RNA-binding proteins <strong>and</strong> miRNAs bind to a large percentageof <strong>the</strong> total cellular RNA species (5–30%), providing <strong>the</strong> basis for acomplex combinatorial mode of post-transcriptional regulation ofgene expression. (Cell 141, 129–141, 2010)MESingle tomato gene linked to yieldCrops with superior agricultural traits can often be created by crossingdifferent varieties, but few examples exist of single genes that determineWritten by Kathy Aschheim, Laura DeFrancesco, Markus Elsner,Peter Hare & Craig Mak<strong>the</strong> genetic basis of this effect, known as heterosis. These so-calledsingle-gene overdominant loci are highly desirable in plant breedingas <strong>the</strong>y would facilitate rational design of hybrid lines. By examining33 hybrid tomato lines, Zamir <strong>and</strong> colleagues identify <strong>the</strong> gene SINGLEFLOWER TRUSS (SFT) as a determinant for increased yield in hybrids.The combination of a defective <strong>and</strong> a functional allele in a hybrid plantleads to a reduced dose of <strong>the</strong> gene product of SFT, which toge<strong>the</strong>rwith a related gene SELF PRUNING (SP) regulates <strong>the</strong> balance betweenvegetative growth <strong>and</strong> <strong>the</strong> development of flowers. These results shouldspur <strong>the</strong> search for o<strong>the</strong>r single genes that control heterosis for desirabletraits <strong>and</strong> suggest that tuning <strong>the</strong> balance between SFT <strong>and</strong> SP may bea general strategy applicable to o<strong>the</strong>r crops. (Nat. Genet., publishedonline 28 March 2010; doi:10.1038/ng.550)ME & CMNotch receptors dissectedNotch signaling plays important roles in both development <strong>and</strong>tumorigenesis. However, attempts to modulate Notch signalingfor <strong>the</strong>rapeutic benefit through inhibition of <strong>the</strong> gamma secretasehave been problematic due to redundancy of <strong>the</strong> pathways (fourNotch receptors exist), <strong>and</strong> lack of knowledge of <strong>the</strong> function ofindividual Notch receptors. Using phage display technology, Wu<strong>and</strong> colleagues report in Nature <strong>the</strong> isolation of antibodies specificfor ei<strong>the</strong>r Notch-1 or Notch-2 (anti-NRR1 <strong>and</strong> anti-NRR2). Testingeach antibody separately, <strong>the</strong>y found that <strong>the</strong> antibodies affectedparticular <strong>and</strong> different T-cell populations. In T-cell acute lymphoblasticleukemia (T-ALL) cells, where mutations in Notch receptorsare common, anti-NRR1 inhibited signaling in cell lines bearing <strong>the</strong>three most common mutations. In xenograft models, anti-NRR1induced tumor regression in even well-established tumors. In arelated study in PLoS ONE, Aste-Amézaga et al. isolated Notch1-specific antibodies that bind two regions of <strong>the</strong> Notch1 receptor,<strong>the</strong> lig<strong>and</strong> binding region <strong>and</strong> negative regulatory region. Bothinhibit Notch signaling in T-ALL cells carrying particular mutations.Whereas clinical application will require careful testing because of<strong>the</strong> potential side effects, <strong>the</strong>se reagents should be im<strong>media</strong>tely usefulfor fur<strong>the</strong>r study of Notch pathways. (Nature 464, 1052–1057,2010; PLoS One 5, e9094, 2010) LDCalming <strong>the</strong> stormCytokine storm is a destructive overreaction of <strong>the</strong> innate immunesystem to infections or o<strong>the</strong>r conditions. When produced in excess,inflammatory cytokines can lead to vascular leakage, tissue edema,organ failure, shock <strong>and</strong> death. Therapeutic approaches are oftenbased on damping various parts of <strong>the</strong> immune system, but <strong>the</strong>sehave had limited success owing to <strong>the</strong> complexity of <strong>the</strong> immuneresponse in cytokine storm. A recent paper by London et al. proposesa <strong>new</strong> treatment strategy focused on streng<strong>the</strong>ning <strong>the</strong> vascularbarrier. Vascular hyperpermeability <strong>media</strong>ted by vascular endo<strong>the</strong>lialgrowth factor was known to be antagonized by signaling of Slitfamily proteins through <strong>the</strong> endo<strong>the</strong>lial-specific receptor Robo4.The authors tested <strong>the</strong> <strong>the</strong>rapeutic utility of <strong>the</strong> active fragment ofSlit in several disease models, including bacterial endotoxin exposure,polymicrobial sepsis <strong>and</strong> H5N1 influenza. The Slit fragmentreduced vascular permeability, multiorgan edema <strong>and</strong> death in allof <strong>the</strong>se models, suggesting that stabilization of <strong>the</strong> endo<strong>the</strong>lialbarrier could be beneficial in a wide variety of infectious diseases.(Science Transl. Med., published online 6 April 2010; doi:10.1126/scitranslmed.3000678)KAnature biotechnology volume 28 number 5 MAY 2010 429


EditorialBiomarkers on a rollA consortium of industry, nonprofit institutions <strong>and</strong> regulators outlines a rolling biomarker qualification process,providing <strong>the</strong> first clear path for translation of such markers from discovery to preclinical <strong>and</strong> clinical practice.© 2010 Nature America, Inc. All rights reserved.This issue presents <strong>the</strong> results of <strong>the</strong> first set of studies by <strong>the</strong> PredictiveSafety Testing Consortium (PSTC), a collaborative effort of scientistsfrom 15 pharmaceutical companies <strong>and</strong> 2 biotech companies, fouracademic institutions, <strong>the</strong> Critical Path Institute, <strong>the</strong> Food <strong>and</strong> <strong>Drug</strong>Administration (FDA) <strong>and</strong> <strong>the</strong> European Medicines Agency (EMEA;now EMA). These studies provide data supporting <strong>the</strong> utility of sevenrenal biomarkers in safety testing in <strong>the</strong> preclinical setting. They have nowbeen formally accepted by <strong>the</strong> US <strong>and</strong> European regulatory authorities,with a decision expected from <strong>the</strong> Japanese Pharmaceuticals <strong>and</strong> MedicalDevices Agency next month.From an industry st<strong>and</strong>point, drug-induced toxicity is a serious issue,killing 30% of compounds overall, from leads in <strong>the</strong> preclinic all <strong>the</strong> way tomarketed products. The availability of better preclinical toxicity biomarkersthus remains a key strategic goal.What makes a good safety biomarker? In essence, <strong>the</strong>re are three importanttechnical attributes: first, <strong>the</strong> marker must be present in peripheralbody tissue <strong>and</strong>/or fluid (e.g., blood, urine, saliva, breath or cerebrospinalfluid); second, it must be easy to detect or quantify in assays that are bothaffordable <strong>and</strong> robust; <strong>and</strong> third, its appearance must be associated asspecifically as possible with damage of a particular tissue, preferably ina quantifiable manner. Existing renal damage biomarkers such as serumcreatinine (SCr) <strong>and</strong> blood urea nitrogen (BUN) meet <strong>the</strong> first two criteria.However, regulators have now accepted that in preclinical testing,at least, six o<strong>the</strong>r renal drug safety biomarkers—Kim-1, albumin, totalprotein, β2-microglobulin, cystatin C <strong>and</strong> clusterin—outperform <strong>the</strong>traditional markers in specificity <strong>and</strong> sensitivity.A ‘good’ biomarker, <strong>the</strong>refore, can be defined technically. But a moreinteresting question is, what makes a ‘qualified’ biomarker? In o<strong>the</strong>r words,what does it take to convince a regulator of a biomarker’s utility? This is<strong>the</strong> question that <strong>the</strong> PSTC set out to answer.Under <strong>the</strong> coordination of <strong>the</strong> nonprofit Critical Path Institute, <strong>the</strong>PSTC was formed in 2006 <strong>and</strong> has grown to encompass around 190 industry<strong>and</strong> government scientists. After preliminary discussions among all<strong>the</strong> participants, 23 urinary biomarkers were selected <strong>and</strong> 33 studies inrats conducted at Novartis, Merck <strong>and</strong> FDA <strong>the</strong>n correlated <strong>the</strong> levels ofseven biomarkers as well as SCr <strong>and</strong> BUN with different histopathologicalassessment for different kidney lesions. Between June 2007 <strong>and</strong> January2008, <strong>the</strong>se data were presented to <strong>the</strong> authorities, which by April 2008 hadaccepted that <strong>the</strong>se biomarkers outperformed <strong>the</strong> current st<strong>and</strong>ards.Agreeing upon multiple nephrotoxicity biomarkers at <strong>the</strong> same timeis, of course, an important achievement in its own right. But <strong>the</strong> largercontribution of <strong>the</strong> PSTC is that <strong>the</strong>re is now a formal, st<strong>and</strong>ardized regulatoryreview process for <strong>the</strong> qualification of biomarkers. A biomarker canbe qualified by <strong>the</strong> regulatory authorities as long as <strong>the</strong>re is appropriatedata support. In <strong>the</strong> case of <strong>the</strong> PSTC’s nephrotoxicity biomarkers, <strong>the</strong>FDA <strong>and</strong> EMEA regard <strong>the</strong> tests as ‘fit for purpose’ in preclinical researchonly because <strong>the</strong> data presented are from animal toxicity testing. Under<strong>the</strong> <strong>new</strong> ‘rolling’ qualification process, <strong>the</strong> aim is that some or all of <strong>the</strong>seurinary biomarkers could subsequently be ‘qualified’ for clinical druginducednephrotoxicity once fur<strong>the</strong>r supportive human data are submitted.Similarly, o<strong>the</strong>r groups at <strong>the</strong> PSTC are hoping to generate preclinicaldata in <strong>the</strong> coming months on drug-induced hepatotoxicity, myopathy,vascular injury <strong>and</strong> nongenotoxic carcinogenicity in rodents.Importantly, <strong>the</strong> PSTC process is both cooperative <strong>and</strong> transparent. Onegroup of regulatory representatives acted as advisors to <strong>the</strong> pharma teams.Separate teams within <strong>the</strong> regulatory agencies <strong>the</strong>n assessed <strong>the</strong> data submissions,providing specific feedback on <strong>the</strong> need for more experimentaldata at additional time points, proper blinding of <strong>the</strong> samples during <strong>the</strong>assessment of kidney tissue sections by pathologists <strong>and</strong> additional typesof statistical analysis of <strong>the</strong> data set.This leaves <strong>the</strong> question of why it has taken so long for regulators <strong>and</strong>industry to agree upon st<strong>and</strong>ards for such a fundamental piece of data.After all, all of <strong>the</strong> <strong>new</strong>ly qualified markers had been known to be associatedwith kidney damage for years, some of <strong>the</strong>m for decades. Fur<strong>the</strong>rmore, <strong>the</strong>limitations of BUN <strong>and</strong> SCr have long been appreciated.One explanation is <strong>the</strong> inadequacy of biomarker research <strong>and</strong> development.The literature throws up dozens of <strong>new</strong> potential biomarkers eachmonth but too many of <strong>the</strong>se studies lack sufficient rigor for translationinto drug development, let alone regulatory qualification. Too often, studieslack adequate description of <strong>the</strong> sampling, data generation or statisticalanalyses. O<strong>the</strong>rs are underpowered or inadvertently biased or identifybiomarkers on <strong>the</strong> basis of portions of cherry-picked data.But a larger part of <strong>the</strong> answer lies in <strong>the</strong> fact that cooperative relationshipsbetween regulators <strong>and</strong> drug companies are a relatively <strong>new</strong> development.The April 2008 announcement of <strong>the</strong> approval of <strong>the</strong> PSTC’srenal biomarkers was <strong>the</strong> first ever cooperative decision by <strong>the</strong> FDA <strong>and</strong>EMEA made on <strong>the</strong> basis of a joint data submission. Pan-industry researchcollaborations are also <strong>new</strong>. The FDA’s Critical Path Initiative started in2004, <strong>the</strong> PSTC in 2006 <strong>and</strong> <strong>the</strong> Innovative Medicine Initiative in 2007(operationally in 2008). Until <strong>the</strong> formation of <strong>the</strong>se structures with aclear m<strong>and</strong>ate to address toxicity markers, industry had no frameworkto engineer cooperative initiatives. The PSTC provides that framework,allowing participants to work under a legal agreement that covers intellectualproperty, confidentiality <strong>and</strong> material transfer.The PSTC is undoubtedly a major step forward in rationalizing <strong>the</strong>development of toxicity biomarkers. Industry now has a clear path toqualify biomarkers in <strong>the</strong> preclinical <strong>and</strong> clinical settings. The jury remainsout on whe<strong>the</strong>r pioneer pharmaceutical companies will share knowledgeon novel biomarkers with <strong>the</strong>ir competitors. But for existing biomarkersthat are widely accepted within industry <strong>and</strong> detailed in <strong>the</strong> literature, <strong>the</strong>PSTC shows how open <strong>and</strong> cooperative precompetitive research amonglarge pharmaceutical companies can benefit <strong>the</strong> entire industry.nature biotechnology volume 28 number 5 MAY 2010 431


forewordResearch at <strong>the</strong> interface of industry,academia <strong>and</strong> regulatory scienceWilliam B Mattes 1 , Elizabeth Gribble Walker 1 , Eric Abadie 2 , Frank D Sistare 3 , Jacky Vonderscher 4 ,Janet Woodcock 5 & Raymond L Woosley 1© 2010 Nature America, Inc. All rights reserved.medicine is defined as ‘a substance or preparation used in treatingA disease’. Society expects that <strong>the</strong> benefits of medicines should substantiallyexceed <strong>the</strong>ir risks, <strong>and</strong> this expectation has been translatedinto governmental policy around <strong>the</strong> world. Part of <strong>the</strong> mission of <strong>the</strong>US Food <strong>and</strong> <strong>Drug</strong> Administration (FDA) is to protect <strong>the</strong> public healthby assuring <strong>the</strong> safety <strong>and</strong> efficacy of medicines 1 . The FDA has carriedout its mission by relying upon <strong>the</strong> best current scientific knowledge<strong>and</strong> practice 2 . By definition, gaps in current scientific knowledge <strong>and</strong>practice limit <strong>the</strong> ability of regulatory agencies, such as <strong>the</strong> FDA <strong>and</strong><strong>the</strong> European Medicines Agency (EMEA; London), to carry out <strong>the</strong>irmission. Current gaps include a limited ability to extrapolate animaldata to humans 3–5 , <strong>the</strong> difficulty of evaluating genetic <strong>and</strong> carcinogenicrisks 6,7 , <strong>and</strong> our poor underst<strong>and</strong>ing of gender-specific responses 8 .It is hoped that <strong>new</strong> knowledge, technologies <strong>and</strong> tools can address<strong>the</strong>se <strong>and</strong> o<strong>the</strong>r gaps <strong>and</strong> improve <strong>the</strong> evaluation of <strong>new</strong> drugs <strong>and</strong>medicines 9–12 .In this context, <strong>the</strong> FDA has advocated a ‘Critical Path Initiative’ 13,14to intentionally address gaps in applied <strong>and</strong> regulatory science. Theinitial report <strong>and</strong> subsequent listing of specific opportunities 15 calledattention to research <strong>and</strong> tools needed to improve <strong>the</strong> process of drugdevelopment that extends from preclinical testing to ultimate regulatoryregistration. Although this area is vital for improving <strong>the</strong> developmentof <strong>new</strong> medicines <strong>and</strong> getting <strong>the</strong>m to <strong>the</strong> public, it receiveslittle academic, public or legislative attention <strong>and</strong>, thus, little funding.Ra<strong>the</strong>r, <strong>the</strong> focus of both academic research <strong>and</strong> <strong>new</strong>s organizations isoften on novel discoveries <strong>and</strong>/or <strong>the</strong> risks <strong>and</strong> benefits of drugs after<strong>the</strong>y have reached <strong>the</strong> <strong>marketing</strong> phase. Never<strong>the</strong>less, a great deal ofessential work must be accomplished between discovery <strong>and</strong> delivery(that is, in <strong>the</strong> critical path) to accomplish <strong>the</strong> delivery of safe <strong>and</strong>effective medicines to <strong>the</strong> public. With <strong>the</strong> goal of improving that process,<strong>the</strong> FDA has not only identified gaps in ‘Critical Path Research’ butalso suggested that an effective approach to address <strong>the</strong>se gaps wouldbe to form consortia of industry, academic <strong>and</strong> regulatory scientists toshare resources, expertise <strong>and</strong> experience toward accomplishing sharedcommon specific objectives.1 Critical Path Institute, Tucson, Arizona, USA. 2 European MedicinesAgency, Canary Wharf, London, UK. 3 Department of LaboratorySciences <strong>and</strong> Investigative Toxicology, Safety Assessment, MerckResearch Laboratories, West Point, Pennsylvania, USA. 4 Novartis, SanDiego, California, USA. 5 Food <strong>and</strong> <strong>Drug</strong> Administration, Silver Spring,Maryl<strong>and</strong>, USA.e-mail: wbmattes@gmail.comConsortia have played key roles in addressing technological problemscommon to a competitive industry. For instance, <strong>the</strong> Sematech consortium,formed in 1987 <strong>and</strong> comprising 14 leading US semiconductorproducers, addressed common issues in semiconductor manufacture<strong>and</strong> increased R&D efficiency by avoiding duplicative research 16 .Sematech demonstrates that consortia provide <strong>the</strong> opportunity forindustry scientists to share <strong>the</strong>ir experiences in identifying <strong>and</strong> solvingproblems, to pool <strong>the</strong>ir expertise <strong>and</strong> to collectively consider mutualquestions. To create similar models in drug, diagnostic <strong>and</strong> devicedevelopment, <strong>the</strong> Critical Path Institute (C-Path) was incorporatedas a “neutral, third party” to serve as a consortium organizer 14 <strong>and</strong>interface between industry members <strong>and</strong> <strong>the</strong> FDA 17 .One of <strong>the</strong> first consortia formed by C-Path to address one of<strong>the</strong> Critical Path gaps was <strong>the</strong> Predictive Safety Testing Consortium(PSTC) 18,19 . As noted in <strong>the</strong> Critical Path Opportunities list, <strong>the</strong>re isa need for “preclinical biomarkers that predict human liver or kidneytoxicity” <strong>and</strong> “collaborations among sponsors to share what isknown about existing safety assays” 15 . Indeed, <strong>the</strong> preamble to <strong>the</strong>legal agreement that binds PSTC members notes that “<strong>the</strong> parties tothis Agreement also recognize <strong>the</strong> importance of validated safety biomarkersto pharmaceutical <strong>and</strong> biotechnology research <strong>and</strong> developmentefforts <strong>and</strong> wish…to conduct research <strong>and</strong> development projects,under <strong>the</strong> coordination of C-Path, to identify <strong>and</strong> validate such biomarkersto increase drug safety.” Thus, <strong>the</strong> PSTC is committed to cooperativeresearch resulting in tools beneficial to both pharmaceuticaldevelopment <strong>and</strong> regulatory science (termed Critical Path Research).Of course, <strong>the</strong>se tools could be valuable to medical situations whereimproved monitoring for drug safety would improve outcomes.The PSTC legal agreement furnishes not only a clear set of goals<strong>and</strong> deliverables that provide guidance for actions <strong>and</strong> decisions of<strong>the</strong> consortium, but also a framework to address issues such as antitrust,intellectual property <strong>and</strong> confidentiality. This assures open datasharing <strong>and</strong> collaboration in a manner consistent with applicable legalrequirements. In particular, <strong>the</strong> confidentiality provisions also assurethat publications (which are encouraged) respect member contributions,again fostering openness <strong>and</strong> participation. As noted above,C-Path provides executive functions <strong>and</strong> contributes overall scientificleadership, whereas members lead strategic <strong>and</strong> technical execution of<strong>the</strong> scientific working groups pursuing biomarkers of several criticaltoxicities where underst<strong>and</strong>ing of <strong>new</strong> biomarkers is desired. Membersalso participate in an advisory committee that, among o<strong>the</strong>r functions,reviews <strong>new</strong> proposals <strong>and</strong> ongoing projects <strong>and</strong> guides <strong>the</strong>ir scope<strong>and</strong> growth.432 volume 28 number 5 MAY 2010 nature biotechnology


foreword© 2010 Nature America, Inc. All rights reserved.A key component of Critical Path Research is <strong>the</strong> participation <strong>and</strong>critical evaluations of <strong>the</strong> very regulatory scientists who will later relyon <strong>the</strong> results obtained with <strong>the</strong>se <strong>new</strong> tools as <strong>the</strong>y are applied to <strong>the</strong>development of <strong>new</strong> pharmaceuticals. Participation of FDA scientistsin PSTC is made possible by a memor<strong>and</strong>um of underst<strong>and</strong>ing betweenC-Path <strong>and</strong> <strong>the</strong> FDA. In addition, <strong>the</strong> PSTC has representatives from<strong>the</strong> EMEA who, like FDA scientists, serve to advise <strong>the</strong> target-organbiomarker working groups (e.g., <strong>the</strong> Nephrotoxicity Working Group<strong>and</strong> <strong>the</strong> Hepatotoxicity Working Group); as experts in <strong>the</strong>ir respectivefields, <strong>the</strong>se advisors bring not only <strong>the</strong>ir expertise but also <strong>the</strong>experience of how problems of a given target-organ toxicity will needto be confronted in a regulatory setting. The biomarker data generatedby a working group is ultimately reviewed by a different set ofregulators, thus safeguarding an impartial scientific evaluation <strong>and</strong>recommendations for how <strong>the</strong> biomarkers may be used in regulatorydecision making.Implicit in <strong>the</strong> formation <strong>and</strong> <strong>the</strong> goals of <strong>the</strong> PSTC is <strong>the</strong> realizationthat <strong>the</strong> current approach to <strong>the</strong> discovery, development, industryuptake <strong>and</strong> regulatory acceptance of <strong>new</strong> safety biomarkers is simplytoo slow <strong>and</strong> too inefficient to meet <strong>the</strong> growing needs of <strong>the</strong> worldwidehealthcare system. For example, serum alanine aminotransferasewas described as a marker for liver damage in <strong>the</strong> early 1960s <strong>and</strong> nowis widely used for that purpose 20 . Even so, it has never been rigorouslyevaluated as a nonclinical or clinical marker for hepatocellular damage(e.g., by receiver operator characteristic curves analysis 21 ), its specificityfor detecting such damage remains in question, <strong>and</strong> defined cut-offvalues for patient monitoring in clinical trials are only now gainingconsensus agreement 22,23 . Newly discovered biomarkers suffer froma similar liability in not having a clear or expedient path for reachinga consensus as to <strong>the</strong>ir value <strong>and</strong> specific terms of use.Thus, one goal of <strong>the</strong> PSTC is to establish an intentional process fordeveloping data sets that would support <strong>the</strong> use of a given biomarkerfor a specific purpose. This process, appropriately termed biomarker‘qualification’ 24 , should be distinguished from technical validation ofa biomarker assay 25 . Wagner 26 describes this qualification process as<strong>the</strong> “fit-for-purpose evidentiary process of linking a biomarker withbiological processes <strong>and</strong> clinical endpoints,” <strong>and</strong> notes that a certainbody of data may support one purpose, whereas a larger body of datamay support a broader purpose 26 . Clearly, this process must entailinteraction between those developing <strong>the</strong> data set <strong>and</strong> regulatory scientists,<strong>and</strong> a framework for beginning that dialog has now been created27 . Importantly, <strong>the</strong> result of such an exchange would be a clearstatement or guidance from regulatory authorities as to <strong>the</strong> acceptableuses of a given biomarker in support of medical product development<strong>and</strong> registration. Fur<strong>the</strong>rmore, <strong>the</strong> process should allow <strong>the</strong> expansionof those qualified uses after <strong>the</strong> development of a larger, relevant bodyof biomarker data, aptly described as “progressive qualification.”The papers in this issue describe critical recent accomplishments of<strong>the</strong> PSTC for <strong>the</strong> regulatory qualification of kidney safety biomarkersfor preclinical applications. In particular, urinary biomarkers wereconsidered, as this fluid passes unmodified through <strong>the</strong> ureter <strong>and</strong>bladder to <strong>the</strong> exterior, is easy to archive <strong>and</strong> its contents offers a monitorof kidney function. The st<strong>and</strong>ard biomarkers for kidney injury,serum creatinine (SCr) <strong>and</strong> blood urea nitrogen (BUN) are widelyrecognized as highly insensitive, <strong>and</strong> thus measures with improvedsensitivity are desired 28 . Several PSTC companies had internal experiencewith o<strong>the</strong>r biomarkers for kidney toxicity, <strong>and</strong>, after sharing <strong>the</strong>sedata, determined that seven biomarkers in particular showed promisefor higher sensitivity than SCr <strong>and</strong> BUN <strong>and</strong> had technically soundassays available for <strong>the</strong>ir measurement. Fur<strong>the</strong>rmore, histopathologyin animal models of nephrotoxicity was tractable as a metric againstwhich <strong>the</strong> urine biomarker performance could be compared.An open collaboration among 17 pharmaceutical/biotech companies,regulatory bodies <strong>and</strong> academia has generated a data set supporting<strong>the</strong> qualification of several <strong>new</strong> biomarkers of drug-inducedkidney injury. In addition, this effort, with <strong>the</strong> involvement of <strong>the</strong> FDA<strong>and</strong> EMEA, explored pilot processes for optimization of content, structureof presentation <strong>and</strong> expectations for regulatory review of similardata sets. The collaboration extends beyond that between scientists incompeting companies, academic scientists <strong>and</strong> regulatory scientists, tothat between regulatory scientists in different jurisdictions. The powerof <strong>the</strong>se collaborations has as its proof <strong>the</strong> speed at which <strong>the</strong> data setwas developed, <strong>the</strong> process of review put into place <strong>and</strong> <strong>the</strong> establishmentof an initial model that future biomarker qualification effortscan follow. PSTC is also a testimony to <strong>the</strong> benefits that can be derivedfrom productive open collaborations between academia, regulatoryagencies <strong>and</strong> <strong>the</strong> private sector. For an area of research long neglected,<strong>the</strong>se accomplishments are all <strong>the</strong> more noteworthy.COMPETING FINANCIAL INTERESTSThe authors declare competing financial interests: details accompany <strong>the</strong> full-textHTML version of <strong>the</strong> paper at http://www.nature.com/naturebiotechnology/.1. Borchers, A.T., Hagie, F., Keen, C.L. & Gershwin, M.E. Clin. Ther. 29, 1–16 (2007).2. Miller, S.A. J. Nutr. 123, 279–284 (1993).3. Collins, J.M. Chem. Biol. Interact. 134, 237–242 (2001).4. Peters, T.S. Toxicol. Pathol. 33, 146–154 (2005).5. Voisin, E.M., Ruthsatz, M., Collins, J.M. & Hoyle, P.C. Regul. Toxicol. Pharmacol. 12,107–116 (1990).6. Jacobs, A. & Jacobson-Kram, D. Toxicol. Sci. 81, 260–262 (2004).7. Jacobson-Kram, D. & Contrera, J.F. Toxicol. Sci. 96, 16–20 (2007).8. Miller, M.A. Int. J. Toxicol. 20, 149–152 (2001).9. MacGregor, J.T. Toxicol. Sci. 75, 236–248 (2003).10. Tong, W. et al. Environ. Health Perspect. 111, 1819–1826 (2003).11. Gutman, S. & Kessler, L.G. Nat. Rev. Cancer 6, 565–571 (2006).12. Lesko, L.J. Clin. Pharmacol. Ther. 81, 807–816 (2007).13. Anonymous. Stagnation: Challenge <strong>and</strong> Opportunity on <strong>the</strong> Critical Path to New MedicalProducts (FDA, Washington, DC, 2004; accessed 23 April 2010). 14. Woosley, R.L. & Cossman, J. Clin. Pharmacol. Ther. 81, 129–133 (2007).15. Anonymous. US Food <strong>and</strong> <strong>Drug</strong> Administration. Critical Path Opportunities List- March 2006 (FDA, Washington, DC, 2006; accessed 23 April 2010). 16. Irwin, D.A. & Klenow, P.J. Proc. Natl. Acad. Sci. USA 93, 12739–12742 (1996).17. The Food <strong>and</strong> <strong>Drug</strong> Adminstration. Fed. Regist. 70, 74823–74826 (2005).18. Marrer, E. & Dieterle, F. Chem. Biol. <strong>Drug</strong> Des. 69, 381–394 (2007).19. Mattes, W.B. Methods Mol. Biol. 460, 221–238 (2008).20. Kim, W.R., Flamm, S.L., Di Bisceglie, A.M. & Bodenheimer, H.C. Hepatology 47, 1363–1370 (2008).21. Zweig, M.H. & Campbell, G. Clin. Chem. 39, 561–577 (1993).22. Senior, J.R. Clin. Liver Dis. 11, 507–524 (2007).23. The Food <strong>and</strong> <strong>Drug</strong> Adminstration. Fed. Regist. 72, 60681–60682 (2007).24. Wagner, J.A., Williams, S.A. & Webster, C.J. Clin. Pharmacol. Ther. 81, 104–107(2007).25. Lee, J.W. et al. Pharm. Res. 23, 312–328 (2006).26. Wagner, J.A. Annu. Rev. Pharmacol. Toxicol. 48, 631–651 (2008).27. Goodsaid, F.M., Frueh, F.W. & Mattes, W. Toxicology 245, 219–223 (2008).28. Schetz, M., Dasta, J., Goldstein, S. & Golper, T. Curr. Opin. Crit. Care 11, 555–565(2005).nature biotechnology volume 28 number 5 MAY 2010 433


GlossaryGlossary© 2010 Nature America, Inc. All rights reserved.Acute kidney injury. Rapid damage to cells of <strong>the</strong> kidney, resultingin loss of function. Acute kidney injury may be caused bynephrotoxic drugs, insufficient blood flow to <strong>the</strong> kidneys (resultingin ischemia) or o<strong>the</strong>r insults. It is functionally defined by <strong>the</strong>Acute Kidney Injury Network (an international interdisciplinarygroup of nephrologists <strong>and</strong> critical care physicians) as beingcharacterized by a rapid time course (50%increase in SCr, as well as a reduction in urine output to 6 hours. The initial histomorphological changes inacute kidney injury may include changes in cell morphology orarchitecture (degeneration), including dilation <strong>and</strong> cell death(necrosis). Several days after <strong>the</strong> initial insult, tubular epi<strong>the</strong>lialcells respond to epi<strong>the</strong>lial cell loss <strong>and</strong> damage by regenerationor proliferation. Severe acute kidney injury or prolonged insults,termed chronic kidney injury, can result in progressive toxicity ortypically a cascade of inflammation <strong>and</strong> fibrosis that irreversiblydamages kidney integrity <strong>and</strong> function.Area under <strong>the</strong> curve (AUC) for a ROC curve (see ‘Receiveroperating characteristic curve’) is a metric to summarize <strong>the</strong>ability of a classifier to discriminate between two outcomes.As <strong>the</strong> name suggests, it can be calculated by integrating<strong>the</strong> receiver operating characteristic curve. It can be looselyinterpreted as <strong>the</strong> sensitivity averaged across <strong>the</strong> levels ofspecificity.Biomarker. A biological marker (DNA, RNA, protein, proteinmodification or metabolite) that reflects a biological state (seealso ‘Safety biomarker,’ ‘Diagnostic biomarker’, ‘Prognosticbiomarker’, ‘Prodromal biomarker’ <strong>and</strong> ‘Predictive biomarker’. Itis a characteristic that is objectively measured <strong>and</strong> evaluated asan indicator of normal biologic processes, pathogenic processesor pharmacologic responses to a <strong>the</strong>rapeutic intervention(Clin. Pharmacol. Therapeut. 69, 89–95, 2001). Typically,<strong>the</strong> development process of a biomarker can be described as apipeline similar to drug development (see ‘Biomarker discovery’,‘Biomarker qualification’ <strong>and</strong> ‘Biomarker verification’.Biomarker discovery. The phase of research in which c<strong>and</strong>idatebiomarkers are identified, often with <strong>the</strong> help of ‘-omics’technologies, such as genomics, proteomics or metabonomics,or genetics. These approaches allow a nontargeted discovery ofbiomarkers correlated to certain biological processes or states.Biomarker qualification. The process of accumulating evidenceabout <strong>the</strong> utility <strong>and</strong> limitations of a biomarker for use in aspecific context. The term biomarker validation refers to <strong>the</strong>same concept as biomarker qualification but is more <strong>and</strong> moreoutdated, as it does not imply <strong>the</strong> fit-for-purpose concept ofqualification (intended use) but ra<strong>the</strong>r means ‘all or nothing’.Biomarker validation also is often mistakenly confused with <strong>the</strong>concept of analytical assay validation, which is <strong>the</strong> validation of<strong>the</strong> analytical performance of an assay.Biomarker verification. The phase of research in which <strong>the</strong>correlation of <strong>the</strong> biomarker c<strong>and</strong>idate with biological processesor states is reproduced with additional investigations, often witha more targeted technology (e.g., reverse transcriptase (RT)-PCRassays or protein assays).Chronic kidney disease. A progressive loss of renal function overa period of months to years. It is often diagnosed by increasesin levels of serum creatinine. There are many causes <strong>and</strong> typesof renal diseases, such as diabetic nephropathy, inflammatoryglomerular injury, also called glomerulonephritis, or hypertensivenephropathy. Renal injury may also result from hereditarydiseases such as autosomal dominant polycystic kidney disease orfocal segmental glomerulosclerosis.Cortex. Part of kidney that makes up <strong>the</strong> outer layer <strong>and</strong> <strong>the</strong> bulkof <strong>the</strong> organ, encloses a smaller inner layer of <strong>the</strong> kidney (see‘Medulla’). The cortex contains glomeruli (see ‘Glomeruli’).Diagnostic biomarker. Reports <strong>the</strong> concurrent presence orabsence of injury.Dilation or dilatation. An abnormal distension of <strong>the</strong> tubule lumen(see ‘Tubules’).Diuretic response. The increase in urine flow, resulting inabundant urine or polyuria, as <strong>the</strong> kidney responds to ongoingtoxicity.Exclusion analysis. Statistical analysis method that excludessamples from animals treated with a toxicant that did notexhibit <strong>the</strong> anticipated histomorphological changes <strong>and</strong> samplesfrom control animals that were unexpectedly positive for <strong>the</strong>sehistomorphological changes. This is in contrast to inclusionanalysis, in which samples from <strong>the</strong>se animals were included.The primary motivation for using exclusion analysis is to avoidpenalizing a marker that might be prodromal (see ‘Prodromalbiomarker’) or more sensitive than <strong>the</strong> histomorphologicalassessment.Glomerular filtration rate. The flow rate of fluid filtered through<strong>the</strong> kidney. The glomerular filtration rate is a common measure of<strong>the</strong> functional state of <strong>the</strong> kidney. It is often approximated by <strong>the</strong>creatinine clearance rate, which is <strong>the</strong> volume of blood plasmathat is cleared of <strong>the</strong> waste product creatinine per unit time.Creatinine clearance rate is measured by timed urine <strong>and</strong> plasmadeterminations of creatinine or estimated by serum creatininelevels. Similarly, <strong>the</strong> level of blood urea nitrogen is a commonparameter to estimate <strong>the</strong> glomerular filtration rate.Glomeruli. Located in <strong>the</strong> cortex, glomeruli filter blood throughcapillary tufts surrounded by specialized epi<strong>the</strong>lial cells calledpodocytes that cover <strong>the</strong> glomerular basement membrane <strong>and</strong>function as a blood filter. This filtrate <strong>the</strong>n enters Bowman’sspace, which is continuous with a series of tubules (see ‘Tubules’)that collectively comprise <strong>the</strong> remainder of <strong>the</strong> nephron (see‘Nephron’).H&E staining. Routinely used (hematoxylin & eosin Y) stainingapproach to help visualize tissue features. Hematoxylin is a bluedye that stains basophilic structures such as nuclei, <strong>and</strong> eosin Yis a red dye that stains eosinophilic structures such as cytoplasm<strong>and</strong> o<strong>the</strong>r protein-rich materials.Histopathology. A process for visual examination of animal tissuesto determine whe<strong>the</strong>r <strong>the</strong>re are microscopic changes. Routinehistopathology in pharmaceutical safety studies is conductedby fixing tissues in formalin, sectioning <strong>the</strong> tissues using amicrotome, fixing <strong>the</strong>se to microscope slides <strong>and</strong> staining <strong>the</strong>tissues before microscopic evaluation (see ‘H&E staining’).434 volume 28 number 5 MAY 2010 nature biotechnology


Glossary© 2010 Nature America, Inc. All rights reserved.Glossary (continued)Immunoassay. A biochemical assay that enables <strong>the</strong>concentration of a substance to be measured by exploiting <strong>the</strong>specific binding between an analyte <strong>and</strong> <strong>the</strong> correspondingdetection antibody. The analyte can be a relatively simplechemical substance, such as a drug, or a complex entity suchas a protein or a virus in biological fluids. Different variantsof immunoassays exist, which can by characterized by <strong>the</strong>measurement steps (e.g., s<strong>and</strong>wich or competitive assays) <strong>and</strong><strong>the</strong> use of nonlabeled or labeled reagents (e.g., enzyme-linkedimmunosorbent assay).Immunohistochemistry. A method to localize a biomarker proteinor o<strong>the</strong>r antigen using an antibody that recognizes that antigen.The use of labeled (e.g., chromagen, fluorochrome, enzyme)antibodies allows localization of biomarker proteins at <strong>the</strong> organ,cellular <strong>and</strong> subcellular level.In situ hybridization. A method for staining <strong>the</strong> mRNA encodinga protein of interest, to determine which cells express that mRNAusing a labeled complementary RNA (riboprobe) sequence tohybridize to <strong>the</strong> target sequence of interest.Loop of Henle. See ‘Tubules’.Medulla. Part of kidney that is enclosed by <strong>the</strong> cortex (see‘Cortex’) <strong>and</strong> which contains <strong>the</strong> renal pelvis (see ‘Renal pelvis’).The inner medulla is thus enriched for <strong>the</strong> Loops of Henle (see‘Tubules’) <strong>and</strong> <strong>the</strong> larger collecting ducts that coalesce to formpapillae.Nephron. Functional unit of <strong>the</strong> kidney. The kidney comprisesmany nephrons, which filter small waste products from <strong>the</strong> bloodfor excretion in urine, recover excess water <strong>and</strong> useful solutes,<strong>and</strong> regulate kidney <strong>and</strong> vascular function through <strong>the</strong> productionof hormones.Papillae. See ‘Medulla’.Predictive biomarker. Appears in <strong>the</strong> absence of any injury withan ability to foretell future injury with some certainty.Prodromal biomarker. Represents a symptom of <strong>the</strong> initial stageof onset of an injury before any observation of certain injury.Prognostic biomarker. Predicts <strong>the</strong> course or outcome (e.g., end,stabilization or progression) of an injury.Renal pelvis. A central space into which <strong>the</strong> large collecting ductsof <strong>the</strong> papillae (see ‘Medulla’) empty urine. This in turn emptiesinto <strong>the</strong> urinary bladder through <strong>the</strong> ureter. The renal pelvis,ureter <strong>and</strong> urinary bladder are lined with transitional epi<strong>the</strong>lium.Receiver operating characteristic curve (ROC). A graphical plotto assess <strong>the</strong> ability of a classifier to discriminate between twooutcomes. For a given classifier, sensitivity is plotted against (1 –specificity) or, equivalently, <strong>the</strong> true-positive rate versus <strong>the</strong> falsepositiverate. This allows <strong>the</strong> assessment of classifier performanceacross <strong>the</strong> entire range of decision rules. As classifiers with highersensitivity for a given specificity are preferred over those withlower sensitivity, a higher ROC curve value is considered to denotebetter performance (see also ‘Area under <strong>the</strong> curve (AUC)’ <strong>and</strong>‘Exclusion analysis’).Safety biomarker. Biomarkers typically used to monitor organsafety <strong>and</strong> diagnose or predict onset or reversibility of injury.Tubular basophila. Areas of regeneration in tubular epi<strong>the</strong>lialcells that appear blue-purple in H&E-stained sections (see‘Histopathology’) due to regenerating cells <strong>and</strong>/or increaseddensity of nuclei in <strong>the</strong>se tubules.Tubules. Kidney tubular structures, surrounded by <strong>the</strong>tubulointerstitium, that recover proteins as well as organic<strong>and</strong> inorganic solutes from glomerular filtrate. Connected to aglomerulus (see ‘Glomeruli’), a proximal tubule begins in <strong>the</strong>cortex (see ‘Cortex’) as a straight tubule that <strong>the</strong>n changes into<strong>the</strong> Loop of Henle in <strong>the</strong> kidney medulla (see ‘Medulla’). Thethick ascending Loop of Henle rises back into <strong>the</strong> cortex, past<strong>the</strong> glomerulus <strong>and</strong> transitions into <strong>the</strong> distal convoluted tubule,which in turn transitions into <strong>the</strong> collecting duct.nature biotechnology volume 28 number 5 MAY 2010 435


commentaryNext-generation biomarkers for detectingkidney toxicityJoseph V Bonventre, Vishal S Vaidya, Robert Schmouder, Peter Feig & Frank Dieterle© 2010 Nature America, Inc. All rights reserved.There is a paucity of biomarkers that reliably detect nephrotoxicity. The Predictive Safety Testing Consortium (PSTC)faced several challenges in identifying novel safety biomarkers in <strong>the</strong> renal setting.The kidney is a major site of organ damagecaused by drug toxicity. This frequentlymanifests during drug development <strong>and</strong>/or inst<strong>and</strong>ard clinical care. Nephrotoxicity resultingfrom drug exposure has been estimatedto contribute to 19–25% of all cases of acutekidney injury (AKI, <strong>the</strong> currently preferredterm for <strong>the</strong> clinical disorder formerly calledacute renal failure) in critically ill patients 1 .Given <strong>the</strong> societal cost of nephrotoxicity <strong>and</strong><strong>the</strong> insensitivity of current methods to detectit, sensitive methods for prediction of toxicityin preclinical studies <strong>and</strong> identification ofinjury in humans are extremely important forpatient safety in clinical practice <strong>and</strong> in allstages of <strong>the</strong> drug-development process. Itis in <strong>the</strong> interest of patients, physicians, <strong>the</strong>drug industry <strong>and</strong> health regulatory bodies toprevent <strong>new</strong> nephrotoxic drugs from entering<strong>the</strong> market or, when <strong>the</strong> medical need dictatesuse of such an agent, to be able to identifyearly <strong>and</strong> best manage nephrotoxicity.This article discusses <strong>the</strong> purview of <strong>the</strong>first effort of <strong>the</strong> PSTC—a collaboration of<strong>the</strong> biotech <strong>and</strong> pharmaceutical industry, <strong>the</strong>US Food <strong>and</strong> <strong>Drug</strong> Administration (FDA;Rockville, MD), <strong>the</strong> European MedicinesJoseph V. Bonventre <strong>and</strong> Vishal S. Vaidya are in<strong>the</strong> Renal Division, Department of Medicine,Brigham <strong>and</strong> Women’s Hospital, HarvardMedical School, Boston, Massachusetts, USA;Robert Schmouder is in Translational Sciences,Novartis Institutes for BioMedical Research,East Hanover, New Jersey, USA; Peter Feig isin Cardiovascular Clinical Research, MerckResearch Laboratories, Rahway, New Jersey,USA; <strong>and</strong> Frank Dieterle is in TranslationalSciences, Novartis Institutes for BioMedicalResearch, Basel, Switzerl<strong>and</strong>.e-mail: joseph_bonventre@hms.harvard.eduBox 1 Ideal features of biomarkers used to detect drug-inducedkidney toxicityThe PSTC Nephrotoxicity Working Group considered several criteria as key characteristicsof a renal safety biomarker. These were as follows:• Identifies kidney injury early (well before <strong>the</strong> renal reserve is dissipated <strong>and</strong> levels ofserum creatinine increase)• Reflects <strong>the</strong> degree of toxicity, in order to characterize dose dependencies• Displays similar reliability across multiple species, including humans• Localizes site of kidney injury• Tracks progression of injury <strong>and</strong> recovery from damage• Is well characterized with respect to limitations of its capacities• Is accessible in readily available body fluids or tissuesAgency (EMEA; London, UK) <strong>and</strong> academia—to facilitate <strong>the</strong> qualification of renal biomarkersfor safety in drug development. It bringstoge<strong>the</strong>r expertise from a variety of disciplinesto organize <strong>and</strong>/or create evidentiary datasetsto present to <strong>the</strong> regulatory agencies for qualificationdecision-making. Although this firstpublished effort describes <strong>the</strong> rationale of <strong>the</strong>PSTC’s Nephrotoxicity Working Group foridentifying <strong>new</strong> renal safety biomarkers, <strong>the</strong>consortium also has working groups focusedon hepatotoxicity, vascular injury <strong>and</strong> myotoxicityas well as genetic signatures for carcinogenicity.Much of what we discuss in <strong>the</strong>context of traditional small molecules alsoapplies to nephrotoxicity arising from <strong>the</strong> useof alternative <strong>and</strong> complementary <strong>the</strong>rapies,including herbs, natural products <strong>and</strong> nutritionalsupplements, especially when <strong>the</strong>y arecombined with conventional drugs 2 .The need for renal biomarkersThe most efficient way to prevent or mitigatenephrotoxicity is to have sensitive <strong>and</strong> specificbiomarkers that can be used in animals earlyin drug development, well before clinicalstudies are underway. These biomarkersshould be able to sensitively predict toxicityin preclinical models <strong>and</strong> clinical situations sothat <strong>the</strong>y can be used to efficiently guide drugdevelopers to modify or discard <strong>the</strong> potential<strong>the</strong>rapeutics <strong>and</strong> replace <strong>the</strong>m with variantsthat affect <strong>the</strong> same target without <strong>the</strong> toxicity.However, it is important to recognize thatsafety concerns must always be incorporatedinto a general ‘risk-benefit’ analysis <strong>and</strong> thattoxicity of a drug does not necessarily meanthat it should not be developed or approved.Some examples of nephrotoxic drugs thathave provided a very high <strong>the</strong>rapeutic benefitare <strong>the</strong> aminoglycoside antibiotics, <strong>the</strong> cancerdrug cisplatin <strong>and</strong> <strong>the</strong> antiviral tenofovir.Some ideal attributes of markers of AKI aresummarized in Box 1. The most useful biomarkersare those that can be used in animals<strong>and</strong> humans. These ‘translational’ biomarkerscan be rigorously studied in animals, <strong>the</strong>rebyestablishing well-defined relationships436 volume 28 number 5 MAY 2010 nature biotechnology


COMMENTARY© 2010 Nature America, Inc. All rights reserved.between biomarker levels <strong>and</strong> kidney histopathology.One of <strong>the</strong> most notable challengesin assessing drug nephrotoxicity inhumans is that we do not have tools capableof predicting nephrotoxicity across speciesboundaries.Normally, when kidney injury is found inpreclinical studies of one species <strong>and</strong> not inano<strong>the</strong>r, <strong>the</strong> compound being tested is notdeveloped. The development of Bristol MyersSquibb’s (Princeton, NJ) Sustiva (efavirenz)provides a good example of a situation inwhich <strong>the</strong> ab<strong>and</strong>onment of a drug owing tospecies-specific differences in nephrotoxicitywould have prevented many patients frombenefiting from use of this non-nucleosidereverse transcriptase inhibitor for treatingHIV infection. Sustiva causes renal epi<strong>the</strong>lialcell necrosis in rats, but not in cynomolgusmonkeys or humans 3 . Its toxicity in rats arisesfrom a species-specific nephrotoxic glutathione-conjugatedmetabolite 3 . Unfortunately,however, when an explanation like this cannotbe found, o<strong>the</strong>rwise compelling drugc<strong>and</strong>idates are routinely ab<strong>and</strong>oned beforeintroduction to humans.Kidney injury associated with drugtoxicityThe human kidney is a complex organ withapproximately 1 million functional unitscalled nephrons. The nephrons of two normalkidneys are collectively responsible for filteringapproximately 150–180 liters of plasmaper day <strong>and</strong> <strong>the</strong>n processing <strong>the</strong> filtrate toregulate fluid, electrolyte <strong>and</strong> acid-base balancewhile eliminating waste products. Thekidneys also produce hormones importantfor cardiovascular, hematologic <strong>and</strong> skeletalmuscle homeostasis. The particularsusceptibility of <strong>the</strong> kidney to drug toxicitycan largely be attributed to its anatomy <strong>and</strong>function. As <strong>the</strong> filtrate moves along <strong>the</strong> complextubular structure of each nephron, itscomponents can be concentrated in excess ofthreefold in <strong>the</strong> proximal tubule, <strong>and</strong> in somecases to much higher levels (>100-fold) in<strong>the</strong> distal tubule <strong>and</strong> collecting duct. Thesehigh intratubular concentrations, toge<strong>the</strong>rwith <strong>the</strong> avid tubular uptake mechanisms,particularly in <strong>the</strong> proximal tubule, enhanceintracellular concentrations. In addition,basolateral uptake of toxic agents deliveredat high rates from <strong>the</strong> peritubular capillariescan contribute to intracellular accumulation.Biotransformation of drugs to toxic metabolitesalso potentiates toxicity to tubular epi<strong>the</strong>lialcells 4 . Fur<strong>the</strong>rmore, nephrotoxinscan accumulate to high concentrations in<strong>the</strong> medulla as a result of <strong>the</strong> countercurrentexchange function of <strong>the</strong> medullaryvasculature. The hypoxia of <strong>the</strong> medulla alsoincreases <strong>the</strong> susceptibility of tubular cellsto nephrotoxicants when <strong>the</strong> toxin results inenhanced oxygen metabolism.One approach to <strong>the</strong> early detection ofkidney injury involves defining differentbiomarkers that rely on <strong>the</strong> mechanisms oftoxicity of each drug or drug class. However,this approach can be problematic for <strong>the</strong>many clinically useful agents for which <strong>the</strong>mechanism of toxicity is not well established.An alternative approach, to which we subscribe,involves finding a limited number ofbiomarkers that identify injury to primarysites in <strong>the</strong> kidney, such as <strong>the</strong> glomerulusor <strong>the</strong> proximal tubule, which toge<strong>the</strong>r represent<strong>the</strong> major sites of toxicity related to>90% of drugs. <strong>Drug</strong>s with different mechanismsof toxicity frequently affect differentparts of <strong>the</strong> kidney, as is evident fromFigure 1, which shows <strong>the</strong> primary sites ofnephron toxicity for various drugs. The mostlikely explanation for this observation is thatdifferent regions of <strong>the</strong> nephron are characterizedby different transporters, metaboliccharacteristics, blood flow characteristics <strong>and</strong>oxygen tensions. Most drug-induced renalinjuries affect <strong>the</strong> proximal tubules. <strong>Drug</strong>toxicity initially targeted to <strong>the</strong> glomerulusor more distal parts of <strong>the</strong> nephron may alsocause secondary injury to proximal tubules.Detection of proximal tubule injury mightthus provide a sensitive way to monitor most,but not all, toxicities. After <strong>the</strong>se markers ofglomerular <strong>and</strong> proximal tubule injury areestablished, additional ones can be added toreflect abnormalities of <strong>the</strong> distal <strong>and</strong> collectingtubules <strong>and</strong> ducts or papillary injury.Histopathological changes in <strong>the</strong> kidneyare associated with drug toxicity. Thesechanges have been well characterized incommonly used experimental animals, <strong>and</strong><strong>the</strong>y currently remain as <strong>the</strong> ‘gold st<strong>and</strong>ards’against which biomarkers from body fluidsare measured. Although histopathologyis <strong>the</strong> gold st<strong>and</strong>ard to detect renal injury,it is not without its shortcomings, evenin animals where <strong>the</strong> entire organ can beexamined. For example, it does not identifynon–histopathology-associated types ofkidney disturbances, such as ei<strong>the</strong>r inhibitionof transporters in <strong>the</strong> proximal tubule(resulting in glucosuria, aminoaciduria orhyperuricosuria) or inhibition of vasopressinaction in <strong>the</strong> collecting duct (resulting indiabetes insipidus). Fur<strong>the</strong>rmore, a degree ofsubjectivity is associated with histopathologicalevaluation. Finally, use of histopathologyinvariably introduces a delay in appearanceof injury; following exposure to nephrotoxicants,levels of at least some biomarkers arereported to appear before obvious changes inhistology are evident.The use of histopathology as a benchmarkfor kidney injury in humans is usuallyimpractical, except in relatively rare instanceswhen a kidney biopsy is justified. Even insuch instances, however, <strong>the</strong> pathophysiologyof <strong>the</strong> toxicity is associated with spatialvariability in tissue injury due to vascularfactors <strong>and</strong> variation in susceptibility of <strong>the</strong>tubules to injury. As biopsies usually permitonly limited sampling of kidney tissue, <strong>the</strong>sefactors complicate <strong>the</strong> interpretation of <strong>the</strong>histopathology. Fur<strong>the</strong>rmore, in humans<strong>the</strong>re are frequently coincident pathophysiologicalprocesses, which complicate <strong>the</strong> interpretationof biomarker data. For example, ablood or urine marker that is produced byan organ o<strong>the</strong>r than <strong>the</strong> kidney, which enters<strong>the</strong> bloodstream <strong>and</strong> is filtered by <strong>the</strong> kidney,can be misinterpreted as reflecting kidneyinjury. Increased urinary levels of a markerthat is expressed by vascular or blood cells inaddition to kidney tubules may reflect systemicperturbation ra<strong>the</strong>r than kidney injury.The strong foundation provided by detailedunderst<strong>and</strong>ing of <strong>the</strong> sensitivity <strong>and</strong> specificityof a biomarker in various contexts ofinjury is thus critical to its appropriate usein animals <strong>and</strong>/or humans.Existing biomarkers for detecting kidneyinjuryTwo serum biomarkers, serum creatinine(SCr) <strong>and</strong> blood urea nitrogen (BUN), arecommonly used to detect kidney toxicity inpreclinical <strong>and</strong> clinical studies as well as inroutine clinical care. Both, however, havesevere limitations relating to sensitivity <strong>and</strong>specificity.Most of <strong>the</strong> >35 different definitions of AKIin <strong>the</strong> published literature 5 rely on changes inSCr, which are insensitive for <strong>the</strong> detection ofhistological injury in preclinical toxicity studies,as has been demonstrated in rats, in thisissue 6 , as well as in humans. This is particularlytrue for patients with a substantial renalreserve, defined by <strong>the</strong> fact that a relativelylarge amount of injury can occur withoutproducing a change in glomerular filtrationrate as reflected by increases in SCr, <strong>the</strong> st<strong>and</strong>ardbiomarker used for evaluation of kidneydysfunction. Likewise, in rodents <strong>and</strong> o<strong>the</strong>ranimals in which drug safety experiments areconducted, with st<strong>and</strong>ard approaches baselineSCr levels are often at <strong>the</strong> lower end of<strong>the</strong> detectable range, <strong>and</strong> <strong>the</strong>re needs to besubstantial injury before SCr levels increaseoutside <strong>the</strong> ‘normal’ range.Thus, in humans as well as in experimentalanimals, a measurable change in glomerularnature biotechnology volume 28 number 5 MAY 2010 437


COMMENTARY© 2010 Nature America, Inc. All rights reserved.filtration rate (GFR) or o<strong>the</strong>r measures of kidneyfunction may be evident only after considerableinjury has occurred. For example,a 53% incidence of nephrotoxicity in a studyinvolving amphotericin 7 was determinedusing <strong>the</strong> criterion of a doubling of SCr levels.This represents a 50% decrease in GFR ifwe assume creatinine production is constant.In comparison, recent definitions of AKI relyabProximal tubulesKim-1ClusterinNGALGST-αβ2-microglobulinα1-microglobulinNAGOsteopontinCystatin C (urinary)Netrin-1RBPIL-18HGFCyr61NHE-3Exosomal fetuin-AL-FABPAlbuminGlomerulusTotal proteinCystatin C (urinary)β2-microglobulinα1-microglobulinAlbuminProximal tubulesCyclosporineTacrolimusCisplatinVancomycinGentamicinNeomycinTobramycinAmikacinIb<strong>and</strong>ronateZoledronateHydroxyethyl starchContrast agentsFoscarnetCidofovirAdefovirTenofovirIntravenous immuneGlobulinGlomerulusDoxorubicin(Adriamycin)PuromycinGoldPamidronatePenicillaminePapillaPelvisUreteron changes in SCr of as little as 0.3 mg/dl 8 ,representing far less than a 50% reduction inGFR in adults. These small changes in SCr areassociated with significant effects on mortality9 . The limitations of using SCr as a sensitiveindicator of nephrotoxicity are fur<strong>the</strong>r underscoredby bearing in mind that loss of musclemass in ill patients means that an even greaterreduction in GFR is necessary to doubleCortexMedullaDistal tubulesOsteopontinClusterinGST-µ/πNGALH-FABPCalbindin D28Collecting ductCalbindin D28Loop of HenleOsteopontinNHE-3Distal tubulesCyclosporineTacrolimusSulfadiazineLithium (chronic)Amphotericin BCollecting ductAmphotericin BAcyclovirLithium (acute)Loop of HenleAnalgesics (chronic)Figure 1 The utility of biomarkers to detect injury to specific nephron segments affected by variousnephrotoxicants. (a) Nephron segment-specific biomarkers of kidney injury. (b) <strong>Drug</strong>s that elicit sitespecifictoxicity in <strong>the</strong> kidney 12,13 .SCr concentration. As SCr is affected not onlyby GFR, but also by <strong>the</strong> systemic productionof creatinine <strong>and</strong> <strong>the</strong> tubular secretion ofcreatinine, changes in SCr concentration arenot specific to tubular injury.Serum creatinine concentration may resultin a very delayed signal even after considerablekidney injury. Large changes in GFR may beassociated with relatively small changes in SCrin <strong>the</strong> first 24–48 h following AKI, resultingnot only in delayed diagnosis <strong>and</strong> interventionbut also in underestimation of <strong>the</strong> degreeof injury 10 . It is not until SCr reaches a <strong>new</strong>steady state that it becomes a reasonable measureof <strong>the</strong> <strong>new</strong> GFR. Moreover, when renalfunction improves, SCr underestimates GFRuntil a <strong>new</strong> steady state is reached. Finally,considerable variability among patients in <strong>the</strong>correlation between SCr <strong>and</strong> baseline GFR,<strong>the</strong> magnitude of functional renal reserve,<strong>and</strong> rates of creatinine syn<strong>the</strong>sis means thatrenal injury of comparable magnitude mayresult in disparate alterations in creatininekinetics <strong>and</strong> steady-state values in differentindividuals.BUN is ano<strong>the</strong>r widely used measure ofrenal function, but it is not a reliable measureof kidney injury because many factors mayaffect its concentration. BUN is freely filteredby <strong>the</strong> glomerulus, but urea is <strong>the</strong>n reabsorbedto varying degrees by o<strong>the</strong>r parts of<strong>the</strong> nephron. Therefore, an increase in BUNcan be seen with volume depletion in <strong>the</strong>absence of any tubular injury. Fur<strong>the</strong>rmore,increased levels of BUN can be observed ifurea production is increased, as occurs wi<strong>the</strong>xogenous (protein supplementation) orendogenous (catabolic states or blood ingastrointestinal tract) protein loads.The inherent flaws in SCr <strong>and</strong> BUN notonly delay <strong>the</strong> recognition of nephrotoxicityin preclinical drug development but alsolimit <strong>the</strong> ability to monitor for drug toxicityin humans. There is also a resultant delay in<strong>the</strong> diagnosis of AKI, which prevents timelypatient-management decisions, such as withdrawalor reduction in dose of <strong>the</strong> offendingagent or administration of agents to mitigate<strong>the</strong> toxicity.Second-generation biomarkers for acutekidney injurySeveral alternatives to SCr <strong>and</strong> BUN havebeen proposed in response to <strong>the</strong> urgent needfor biomarkers that predict human nephrotoxicityin preclinical studies, allow moretimely diagnosis of AKI in humans <strong>and</strong> ideallylocalize <strong>the</strong> injury to a specific nephronsite. Although many biomarker c<strong>and</strong>idateshave failed to show sufficient specificity <strong>and</strong>sensitivity for clinical use, several promising438 volume 28 number 5 MAY 2010 nature biotechnology


COMMENTARY© 2010 Nature America, Inc. All rights reserved.Table 1 Urinary biomarkers of kidney toxicity 12,13ModelBiomarker Preclinical Clinical Nephron segment CommentsAlbuminNephrotoxic AKI orischemic AKINephrotoxic AKI, ischemicAKI or septic AKIα-GST Nephrotoxic AKI Nephrotoxic AKI, septicAKI, ischemic AKI or renaltransplantationα 1 -microglobulin Nephrotoxic AKI Nephrotoxic AKI, ischemicAKI, septic AKI or renaltransplantationβ 2 -microglobulin Nephrotoxic AKI Nephrotoxic AKI, ischemicAKI, septic AKI or renaltransplantationClusterinNephrotoxic AKI,ischemic AKI, unilateralureteral obstruction orsubtotal nephrectomyNo AKI clinical studiesto dateGlomerulus <strong>and</strong>proximal tubuleProximal tubuleProximal tubuleProximal tubuleProximal tubule <strong>and</strong>distal tubuleIncreased urinary excretion may reflect alterationsin glomerular permeability <strong>and</strong>/or defects inproximal tubular reabsorption; increased urinarylevels in <strong>the</strong> setting of fever, exercise, dehydration,diabetes, hypertension, etc., limit specificityfor AKISamples require stabilization buffer forappropriate quantification; clinical data are limitedClinical applicability limited by lack of st<strong>and</strong>ardizedreference levels; increased urinary levels in<strong>the</strong> setting of a number of non-renal disordersmay limit specificity; <strong>and</strong> levels may predictadverse outcome (renal replacement <strong>the</strong>rapy(RRT, dialysis) requirement)Clinical applicability limited by instability in urineIncreased urinary levels observed in rat models oftubular proteinuria but not glomerular proteinuriaCysteine-rich protein Ischemic AKI Ischemic AKI Proximal tubule Urinary levels do not reflect progressive injury;levels assessed via immunoblotting (semiquantitative)Cystatin-C Nephrotoxic AKI Nephrotoxic AKI, ischemicAKI or septic AKIExosomal fetuin-AHeart-type fattyacid-binding proteinHepatocyte growthfactorInterleukin-18Kidney injury molecule-1Liver-type fattyacid-binding proteinN-Acetyl-βglucosaminidaseNetrin-1Neutrophil gelatinaseassociatedlipocalinOsteopontinNephrotoxic AKI orischemic AKINephrotoxic AKI orischemic AKINephrotoxic AKI,ischemic AKI orunilateral nephrectomyNephrotoxic AKI orischemic AKINephrotoxic AKI orischemic AKINephrotoxic AKI,ischemic AKI or unilateralureteral obstructionNephrotoxic AKI orischemic AKINephrotoxic AKI orischemic AKINephrotoxic AKI orischemic AKINephrotoxic AKI, ischemicAKI or unilateral ureteralobstructionSeptic AKI or ischemicAKINephrotoxic AKI or renaltransplantationNephrotoxic AKI, ischemicAKI, septic AKI or renaltransplantationNephrotoxic AKI, ischemicAKI, septic AKI or renaltransplantationNephrotoxic AKI, ischemicAKI, septic AKI or renaltransplantationNephrotoxic AKI orischemic AKISeptic AKI or renaltransplantationNephrotoxic AKI, ischemicAKI, septic AKI or renaltransplantationNephrotoxic AKI, ischemicAKI or septic AKINephrotoxic AKI, ischemicAKI or septic AKINo AKI clinical studiesto dateRetinol-binding protein Nephrotoxic AKI Nephrotoxic AKI, septicAKI, ischemic AKI or renaltransplantationSodium/hydrogenexchanger isoform 3Nephrotoxic AKINephrotoxic AKI, septicAKI, ischemic AKI or renaltransplantationGlomerulus <strong>and</strong> proximaltubuleProximal tubuleDistal tubuleProximal tubule <strong>and</strong>distal tubuleProximal tubuleProximal tubuleProximal tubuleProximal tubuleProximal tubuleProximal tubule <strong>and</strong>distal tubuleProximal tubule, loopof Henle <strong>and</strong> distaltubuleProximal tubuleProximal tubule <strong>and</strong>loop of HenleUrinary levels may predict adverse outcome(RRT requirement)Levels assessed via immunoblotting (semiquantitative);limited clinical data (n = 3)Increased urinary levels in <strong>the</strong> setting of heartdisease may limit specificityUrinary levels may predict adverse outcomes(death or RRT); may play an important role in renalrepair <strong>and</strong> regeneration following AKIUrinary levels may predict adverse outcomes(death)Levels may predict adverse outcome (death or RRT)Levels may predict adverse outcome (death orRRT); increased urinary levels in acute liver injurymay limit specificityLevels may predict adverse outcome (death/RRT);decreased activity in <strong>the</strong> presence of heavy metalsmay limit sensitivity for AKI; <strong>and</strong> increased urinarylevels in <strong>the</strong> setting of several non-renal disordersmay limit specificityLevels assessed via immunoblotting (semiquantitative);limited clinical data (n = 14)Levels may predict severity of AKI <strong>and</strong> adverseoutcome (RRT); increased levels in <strong>the</strong> settingof urinary tract infections or sepsis may limitspecificityIncreased urinary levels observed in rat models<strong>and</strong> humans following nephrotoxicityDecreased sensitivity may be observed in vitaminA–deficient statesLevels assessed via immunoblotting(semiquantitative)nature biotechnology volume 28 number 5 MAY 2010 439


COMMENTARY© 2010 Nature America, Inc. All rights reserved.c<strong>and</strong>idates have emerged recently (Table 1).These include urinary kidney injury molecule-1(KIM-1), neutrophil gelatinaseassociatedlipocalin (NGAL), interleukin-18(IL-18), cystatin C, clusterin, fatty acid bindingprotein–liver type (L-FABP) <strong>and</strong> osteopontin.Not only do <strong>the</strong>se biomarkers have<strong>the</strong> potential to both transform <strong>the</strong> way wedetect <strong>and</strong> quantify nephrotoxicity <strong>and</strong> prevent<strong>the</strong> development <strong>and</strong> entry into <strong>the</strong> marketof nephrotoxic drugs, but <strong>the</strong>y may alsoallow <strong>the</strong> continued development of potentiallyuseful drugs that, without <strong>the</strong> help ofbiomarkers, would be erroneously believedto be toxic on <strong>the</strong> basis of a particular preclinicalmodel.It is important to consider that biomarkersfor one type of kidney toxicity may notbe as useful in ano<strong>the</strong>r. A good biomarkerfor injury may not reliably indicate delayedrepair; a biomarker that detects inflammationeffectively may not be as sensitive indetecting early proximal tubule toxicity in<strong>the</strong> absence of inflammation. A biomarkerof injury might not detect a functionaldefect, such as is observed in Fanconi syndromeor nephrogenic diabetes insipidus.And a biomarker useful in an animal modelmay or may not be useful in <strong>the</strong> same way inhumans. Ano<strong>the</strong>r question is whe<strong>the</strong>r panelsof biomarkers will be more informative thana single biomarker. At first, this might seemlogical because different biomarkers might bemore sensitive or specific for different formsof injury. None<strong>the</strong>less, if multiple biomarkersare used to detect a similar form of injury, anadjudication process will be necessary if <strong>the</strong>biomarkers suggest different outcomes.Conclusions<strong>Drug</strong>-induced nephrotoxicity plays an importantrole in <strong>the</strong> high incidence <strong>and</strong> prevalenceof AKI <strong>and</strong> may serve as an important contributorto chronic renal disease. Currentmetrics, such as SCr <strong>and</strong> BUN, lack <strong>the</strong> sensitivity<strong>and</strong>/or specificity to adequately detectnephrotoxicity before significant loss of renalfunction. Better biomarkers will allow drugdevelopers to make more informed decisionsabout which products to move forward intesting, <strong>the</strong> doses at which <strong>the</strong>y should beused, <strong>and</strong> ways to design clinical trials thatwill provide clear information about productbenefit <strong>and</strong> safety. Besides facilitating drugdevelopment, biomarkers shown to reliablypredict kidney injury in experimental animalsshould eventually be evaluated for <strong>the</strong>ir utilityin humans to promote patient safety <strong>and</strong>guide <strong>the</strong>rapeutic decisions in <strong>the</strong> clinic.The results <strong>and</strong> knowledge gained from <strong>the</strong>PSTC Nephrotoxicity Working Group <strong>and</strong><strong>the</strong> resulting biomarker qualification processdescribed in this issue 11 promise to enable earlieridentification of nephrotoxicity in preclinicalstudies, provide translational markers to monitorpatient responses when <strong>the</strong>re is a concernabout toxicity, reduce <strong>the</strong> current high rate ofattrition during clinical drug development <strong>and</strong>post-<strong>marketing</strong>, prevent or reduce <strong>the</strong> entryof nephrotoxic drugs into <strong>the</strong> market, <strong>and</strong>eventually facilitate <strong>the</strong> early management ofpatients who suffer kidney injury.COMPETING FINANCIAL INTERESTSThe authors declare competing financial interests:details accompany <strong>the</strong> full-text HTML version of <strong>the</strong>paper at http://www.nature.com/naturebiotechnology/.1. Mehta, R.L. et al. Kidney Int. 66, 1613–1621(2004).2. Blowey, D.L. Adolesc. Med. Clin. 16, 31–43 (2005).3. Mutlib, A.E. et al. Toxicol. Appl. Pharmacol. 169,102–113 (2000).4. Perazella, M.A. Clin. J. Am. Soc. Nephrol. 4, 1275–1283 (2009).5. Kellum, J.A., Levin, N., Bouman, C. & Lameire, N. Curr.Opin. Crit. Care 8, 509–514 (2002).6. Vaidya, V.S. et al. Nat. Biotechnol. 28, 478–485(2010).7. Wingard, J.R. et al. Clin. Infect. Dis. 29, 1402–1407(1999).8. Molitoris, B.A. et al. J. Am. Soc. Nephrol. 18, 1992–1994 (2007).9. Chertow, G.M., Burdick, E., Honour, M., Bonventre, J.V.& Bates, D.W. J. Am. Soc. Nephrol. 16, 3365–3370(2005).10. Waikar, S.S. & Bonventre, J.V. J. Am. Soc. Nephrol. 20,672–679 (2009).11. Dieterle, F. et al. Nat. Biotechnol. 28, 455–462(2010).12. Vaidya, V.S., Ferguson, M.A. & Bonventre, J.V. Annu.Rev. Pharmacol. Toxicol. 48, 463–493 (2008).13. Ferguson, M.A., Vaidya, V.S. & Bonventre, J.V. Toxicology245, 182–193 (2008).440 volume 28 number 5 MAY 2010 nature biotechnology


COMMENTARYEvolution of biomarker qualification at <strong>the</strong>health authoritiesFederico Goodsaid & Marisa Papaluca© 2010 Nature America, Inc. All rights reserved.By streamlining <strong>the</strong> qualification process for biomarkers, coordinated protocols recently implemented at <strong>the</strong> differentregulatory agencies can facilitate progress <strong>and</strong> provide impetus to novel biomarker discovery <strong>and</strong> validation.Since <strong>the</strong> sequencing of <strong>the</strong> human genomewas first announced in 2000, regulatoryagencies in <strong>the</strong> United States (The Food <strong>and</strong><strong>Drug</strong> Administration; FDA, Rockville, MD),Europe (European Medicines Agency, EMEA;London) <strong>and</strong> Japan (<strong>the</strong> Pharmaceuticals<strong>and</strong> Medical Devices Agency, PMDA; Tokyo)anticipated <strong>the</strong> potential impact of this <strong>new</strong>knowledge on drug development <strong>and</strong> toge<strong>the</strong>rinitiated a series of fact-finding internationalconferences with <strong>the</strong> objective of obtaininginput from <strong>the</strong> pharmaceutical industry <strong>and</strong>o<strong>the</strong>r stakeholders. Several initiatives havesince been developed to address <strong>the</strong> need forsharing knowledge <strong>and</strong> risk associated with<strong>the</strong> use of <strong>the</strong> <strong>new</strong> genomic methodologiesin drug research <strong>and</strong> development.In 2003, under <strong>the</strong> framework of <strong>the</strong> bilateralconfidentiality agreements between <strong>the</strong>European Union (EU; Brussels) <strong>and</strong> <strong>the</strong> FDA 1 ,FDA <strong>and</strong> EMEA scientists held joint discussionswith sponsors on Voluntary GenomicData Submission (VGDS) packages. Thesuccess of <strong>the</strong> initial experience with <strong>the</strong>semeetings led in 2004 to an exp<strong>and</strong>ed VGDSprocess 2 , including <strong>the</strong> option for sponsorsto have joint FDA-EMEA VGDS briefingmeetings. A joint document 3 explains howsuch requests are received, processed <strong>and</strong>reviewed by <strong>the</strong> agencies. In 2005, regulatoryagencies in <strong>the</strong> United States 4 , <strong>the</strong> EU 5 <strong>and</strong>Japan 6 issued guidelines or requests for <strong>the</strong>submission of genomic information from<strong>the</strong> R&D pipelines. These documents servedFederico Goodsaid is at <strong>the</strong> Food <strong>and</strong> <strong>Drug</strong>Administration, Silver Spring, Maryl<strong>and</strong>,USA, <strong>and</strong> Marisa Papaluca is at The EuropeanMedicines Agency, London, UK.e-mail: federico.goodsaid@fda.hhs.gov orMarisa.Papaluca@ema.europa.euThree drug regulatory agencies—FDA in Rockville, MD, shown on left; EMEA in London, shown on right;<strong>and</strong> PMDA in Tokyo—have encouraged <strong>the</strong> establishment of public-private partnerships <strong>and</strong> consortiato advance <strong>the</strong> qualification of <strong>new</strong> biomarkers.several purposes: <strong>the</strong>y encouraged voluntarysubmission of genomic data by sponsors to<strong>the</strong>se agencies; <strong>the</strong>y described how <strong>the</strong> agenciesprocess VGDS data (that is, submissionsthat are not required as part of a regulatorysubmission) <strong>and</strong> <strong>the</strong> associated discussionmeetings; <strong>and</strong> <strong>the</strong>y emphasized that voluntarysubmissions are used to help <strong>the</strong> agenciesgain an underst<strong>and</strong>ing of genomic data<strong>and</strong> are not part of <strong>the</strong> regulatory decisionmakingprocesses.Over recent years, VGDS meetings <strong>and</strong> o<strong>the</strong>rinteractions with sponsors at <strong>the</strong> FDA, EMEA<strong>and</strong> PMDA have suggested extensive progressin <strong>the</strong> development of exploratory biomarkers.The FDA <strong>and</strong> EMEA consider that manyresearch activities have been under way within<strong>the</strong> pharmaceutical <strong>and</strong> biotech industry toqualify biomarkers, but that many of <strong>the</strong> datagenerated by <strong>the</strong>se activities remain within <strong>the</strong>firewalls of individual companies. These dataare shared nei<strong>the</strong>r among companies nor withregulatory agencies.In this article, we describe <strong>the</strong> efforts of<strong>the</strong> various regulatory agencies to establisha mechanism to facilitate <strong>the</strong> sharing of biomarkerdata. By encouraging <strong>the</strong> establishmentof public-private partnerships <strong>and</strong>consortia, <strong>the</strong>se efforts have served as a catalystfor noncompetitive pooling of data with<strong>the</strong> objective of achieving a critical mass ofdata, enhanced knowledge about biomarkersMark Thomas/Science Photo Library; Newscom/Dennis Bracknature biotechnology volume 28 number 5 MAY 2010 441


COMMENTARY© 2010 Nature America, Inc. All rights reserved.<strong>and</strong> a consensus on how biomarkers shouldbe applied both at <strong>the</strong> preclinical stage <strong>and</strong>ultimately in <strong>the</strong> clinic.Setting <strong>the</strong> stageSeveral strategic documents, such as <strong>the</strong>FDA’s Critical Path Initiative 7 , <strong>the</strong> EMEA’s“Road Map to 2015” (ref. 8), <strong>and</strong> <strong>the</strong> recentreport from <strong>the</strong> European Medicines AgencyInnovation Think-Tank group 9 , have focusedon <strong>the</strong> importance of support by regulatorsin this area, with <strong>the</strong> ultimate objective ofensuring that <strong>new</strong> technologies are takenup in pharmaceutical R&D to promote <strong>the</strong>development of safe <strong>and</strong> efficacious <strong>new</strong>medicines for <strong>the</strong> benefit of patients. Severalconsortia, such <strong>the</strong> Critical Path PredictiveSafety Testing Consortium 10 <strong>and</strong> <strong>the</strong> EUInnovative Medicines Initiative 11 , are todaygenerating substantial data that may overlapor complement each o<strong>the</strong>r <strong>and</strong> also influenceregulatory st<strong>and</strong>ards, which require properregulatory appraisal to encourage <strong>the</strong>irapplication in R&D. Regulatory agencies notonly have been deeply involved in supportingbiomarker integration in pharmaceuticalR&D through scientific advice starting from<strong>the</strong> early stages of product development butalso aim to provide a scientifically robust<strong>and</strong> predictable set of requirements for <strong>the</strong>evaluation of data in Marketing AuthorizationApplications (MAAs), Investigational New<strong>Drug</strong>s Applications (INDs), New <strong>Drug</strong>Applications (NDAs) or Biologic LicenseApplications (BLAs).At <strong>the</strong> international level, <strong>the</strong> joint activitiesof <strong>the</strong> EMEA PharmacogenomicsWorking Party <strong>and</strong> <strong>the</strong> FDA InterdisciplinaryPharmacogenomics Review Group have establisheda working model for global regulatoryreview of exploratory biomarker data. Onthis basis, <strong>and</strong> in view of <strong>the</strong> advances in <strong>the</strong>field, <strong>the</strong> regulatory agencies have developeddedicated processes to deal with biomarkerqualification. These biomarker qualificationprocesses address <strong>the</strong> need of individual organizations<strong>and</strong> consortia asking for a regulatoryqualification of <strong>the</strong> results obtained from<strong>the</strong> ongoing collaborative efforts. Such a pathhas been tested in <strong>the</strong>se biomarker qualifications.This process is focused on <strong>the</strong> specificneeds of <strong>the</strong> regulatory environment to ensurescientifically accurate <strong>and</strong> clinically (or preclinically)useful decision-making.The biomarker qualification processThe Predictive Safety Testing Consortium(PSTC) application for <strong>the</strong> qualification ofseven <strong>new</strong> renal biomarkers as predictorsof drug-<strong>media</strong>ted nephrotoxicity is <strong>the</strong> firstexperience of this <strong>new</strong> joint-agency reviewprocess put in place by US <strong>and</strong> Europe<strong>and</strong>rug regulators. At this time, <strong>the</strong> qualificationof <strong>the</strong>se biomarkers covers voluntarysubmission of <strong>the</strong>se data for rat studies. Ona case-by-case basis, <strong>the</strong> FDA <strong>and</strong> <strong>the</strong> EMEAwill also consider o<strong>the</strong>r possible applicationsof <strong>the</strong>se biomarkers in early clinical trials. Thetests measure levels of seven key proteins orbiomarkers that scientists from <strong>the</strong> FDA <strong>and</strong>EMEA believe provide important <strong>new</strong> safetyinformation about <strong>the</strong> effect of drugs on <strong>the</strong>kidney. When reviewing INDs, NDAs or BLAs,both regulatory agencies can now consider <strong>the</strong>test results in addition to blood urea nitrogen(BUN) <strong>and</strong> serum creatinine (SCr) data 12 .The long-term implementation of this processwill reflect knowledge <strong>and</strong> experiencegained as additional biomarker qualificationsubmissions are received <strong>and</strong> reviewed. Theprocess itself is likely to succeed if qualificationdata are selected <strong>and</strong> submitted to accuratelysupport a specific context of use. Themost difficult part of this process will be todefine incremental contexts of use <strong>and</strong> <strong>the</strong>corresponding evidence with which biomarkersmay be qualified.The desired goals in regard to public healthin this case would be to obtain better biomarkersof nephrotoxicity for routine clinical useas quickly as <strong>the</strong> data will allow, but inter<strong>media</strong>tequalification contexts <strong>and</strong> data need tobe defined so that investment in biomarkerqualification studies will be productive bothfor clinical use as well as for <strong>the</strong> pharmaceutical<strong>and</strong> biotech industry.Initial studies proposed by consortia areunlikely to match a clear context for qualificationfor a full clinical application of biomarkers.What inter<strong>media</strong>te contexts forqualification can we define? What study characteristicscan we propose for qualification in<strong>the</strong>se inter<strong>media</strong>te contexts of use? Severalauthors (e.g., see refs. 13,14) have proposedevidentiary recommendations for biomarkerqualification. In contrast to this incrementalprocess for biomarker qualification, papers onevidentiary recommendations often proposeall-or-nothing qualification contexts, whereif <strong>the</strong> ultimate goal is a clinical qualification,no inter<strong>media</strong>te qualification contextsare expected to be defined or qualified. Thisapproach is not only time-consuming but alsounlikely to encourage <strong>the</strong> investment neededto generate data for biomarker qualification.At each stage, whe<strong>the</strong>r <strong>the</strong> context of use fora biomarker is to be in vitro, in a nonclinicalanimal model or in <strong>the</strong> clinic, a company orconsortium proposing <strong>the</strong> qualification of abiomarker will likely seek a quick return onthat qualification once data are available toqualify <strong>the</strong> biomarker in a specific context indrug development. An effective process forbiomarker qualification should include incrementalapplication context steps, so that <strong>the</strong>seincremental steps can fit into <strong>and</strong> benefit <strong>the</strong>drug development process.Steps in submission for biomarkerqualificationThe first step in drafting a submission forqualification of a biomarker is to determineits context of use, in advance of specific decisionson applicable structure <strong>and</strong> format.The context of use for a biomarker is (i) <strong>the</strong>general area of biomarker application, (ii) <strong>the</strong>specific applications <strong>and</strong> implementations<strong>and</strong> (iii) <strong>the</strong> critical factors that define wherea biomarker is to be used <strong>and</strong> how <strong>the</strong> informationfrom measurement of this biomarkeris to be integrated into drug development <strong>and</strong>regulatory review. To demonstrate <strong>the</strong> alignmentbetween proposed context <strong>and</strong> data, <strong>the</strong>initial context proposal must be supported bydata available at <strong>the</strong> initial application step orexpected to be available throughout <strong>the</strong> dataevaluation process. There is a convergentrelationship between an initial qualificationcontext <strong>and</strong> <strong>the</strong> data supporting it.The initial gap between proposed context<strong>and</strong> data may need to be filled throughout <strong>the</strong>qualification process. Initial context proposals,however, should project a significant improvementover currently available biomarkers <strong>and</strong>/or endpoints. The context of a biomarkerdrives data requirements to demonstrate itsqualification for <strong>the</strong> intended application.The structure of a submission documentensures that <strong>the</strong> context <strong>and</strong> data can besubmitted in a package that is consistent forconsortia submitting qualification as well asfor reviewers in regulatory agencies evaluatinga qualification package. The structure ofa qualification submission is independent of<strong>the</strong> context of this submission but must alsobe flexible enough to deal with <strong>the</strong> specificrequirements of each context. The format ofdata required to qualify a biomarker may varysignificantly with <strong>the</strong> context in which it isto be used <strong>and</strong> with specificities of each biomarkerconsidered.In addition to joint efforts by individualregulatory agencies, an effort to harmonize<strong>the</strong> submissions for qualification of genomicbiomarkers has been initiated through <strong>the</strong>International Conference on Harmonization(ICH) framework. The ICH E16 WorkingGroup has developed a draft guideline 15 summarizinghow <strong>the</strong> context of use for <strong>the</strong> biomarkermay be defined <strong>and</strong> how <strong>the</strong> structureof <strong>the</strong> submission <strong>and</strong> data formats are to beintegrated. The harmonization of documentssuch as this is likely to have an impact well442 volume 28 number 5 MAY 2010 nature biotechnology


COMMENTARY© 2010 Nature America, Inc. All rights reserved.beyond <strong>the</strong> document itself. The harmonizationfor <strong>the</strong> submission document is a precedentthat should help address <strong>the</strong> fear thatoften permeates <strong>the</strong> discussion of biomarkerqualification. This precedent will also facilitatefuture harmonization efforts for o<strong>the</strong>raspects of qualification.Past, present <strong>and</strong> future of biomarkerqualificationBiomarker qualification is not a <strong>new</strong> concept.Biomarkers have been accepted through severalad hoc pathways in each regulatory agency.At both <strong>the</strong> FDA <strong>and</strong> <strong>the</strong> EMEA, biomarkershave been qualified in recent years on a caseby-casebasis, in which <strong>the</strong> application contextof use for <strong>the</strong> biomarker is always drugdependent. Biomarker qualification is alsoimplicitly integrated in regulatory reviewof drug-test co-development 16 . Finally, a suigeneris qualification is also implicit when biomarkerinformation is added to a preexistingdrug label. This experience is reflected in <strong>the</strong>“Table of Valid Genomic Biomarkers in <strong>the</strong>Context of Approved <strong>Drug</strong> Labels” 17 on <strong>the</strong>FDA website. In this case, <strong>the</strong> context of usefor <strong>the</strong> biomarker is explicitly linked to text in<strong>the</strong> labels for one or more drugs.The VXDS at <strong>the</strong> FDA, <strong>the</strong> Joint FDA-EMEAVXDS briefing meetings with sponsors, <strong>and</strong><strong>the</strong> dedicated qualification procedures implementedat <strong>the</strong> FDA <strong>and</strong> <strong>the</strong> EMEA 18 openopportunities for <strong>the</strong> qualification of biomarkersnot only directly connected to anindividual product development but also witha wider relevance in <strong>the</strong> assessment of drugefficacy <strong>and</strong> safety. These processes will need tobe tested in <strong>the</strong> near future with submissionsfor qualification of biomarkers from a diverserange of platforms, nonclinical <strong>and</strong> clinicalareas, biotech <strong>and</strong> pharmaceutical companies,diagnostic companies <strong>and</strong> academic institutions.The evolution of this process <strong>and</strong> its usefulnessfor drug development will accelerate as<strong>new</strong> examples of novel biomarkers are broughtthrough for qualification.Competing Interests StatementThe authors declare no competing financial interests.1. European Agency for <strong>the</strong> Evaluation of Medical Products.Public Statement: EU–US FDA Bilateral Agreement(EMEA, London, UK, 2003; accessed 8 September2009). 2. The Food <strong>and</strong> <strong>Drug</strong> Administration. Guidance forIndustry: Pharmacogenomic Data Submissions (FDA,Rockville, Maryl<strong>and</strong>, USA, March 2005; accessed 8September 2009). 3. European Agency for <strong>the</strong> Evaluation of MedicalProducts. Medicines <strong>and</strong> Emerging Science (EMEA,London, UK, accessed 8 September 2009) 4. The Food <strong>and</strong> <strong>Drug</strong> Administration. Fed. Reg. 69,48876–48877 (2004). 5. The European Medicines Agency. EMEA/CHMP/PGxWP/20227/04 Guideline on PharmacogneticsBriefing Meetings (EMEA, London, UK, 2004; accessed8 September 2009). 6. Uyama, Y. Nippon Yakurigaku Zasshi 126, 432–435(2005).7. The Food <strong>and</strong> <strong>Drug</strong> Administration. Critical PathInitiative (FDA, Rockville, Maryl<strong>and</strong>, USA, (2004)http://www.fda.gov/ScienceResearch/SpecialTopics/CriticalPathInitiative/default.htm8. The European Medicines Agency. The EuropeanMedicines Agency Road Map to 2015: Preparing <strong>the</strong>Ground for <strong>the</strong> Future (EMEA, London, UK, 2010; 8September 2009). 9. The European Medicines Agency. Innovative <strong>Drug</strong>Development Approaches (EMEA/127318/2007)—Final Report of <strong>the</strong> EMEA/CHMP Think-Tank onInnovative <strong>Drug</strong> Development (EMEA, London, UK,2007; accessed 8 September 2009). 10. Goodsaid, F.M., Frueh, F.W. & Mattes, W. Toxicology.245, 219–223 (2008).11. Hunter, A.J. <strong>Drug</strong> Discov. Today. 13, 371–373(2008).12. The European Medicines Agency. Final Report on <strong>the</strong>Pilot Joint EMEA/FDA VXDS Experience on Qualificationof Nephrotoxicity Biomarkers (EMEA, London, UK, May2008; accessed 8 September 2009). 13. Altar, C.A. et al. Clin. Pharmacol. Ther. 83, 368–371(2008).14. Lathia, C.D. et al. Clin. Pharmacol. Ther. 86, 32–43(2009).15. The Food <strong>and</strong> <strong>Drug</strong> Administration. E16 GenomicBiomarkers Related to <strong>Drug</strong> Response: Context,Structure, <strong>and</strong> Format of Qualification Submissions(FDA, Rockville, Maryl<strong>and</strong>, USA; accessed 8 September2009). 16. The Food <strong>and</strong> <strong>Drug</strong> Administration. Fed. Reg. 69, 42060–42061 (2004; accessed 8 September 2009). 17. The Food <strong>and</strong> <strong>Drug</strong> Administration. Table of ValidGenomic Biomarkers in <strong>the</strong> Context of Approved <strong>Drug</strong>Labels (FDA, Rockville, Maryl<strong>and</strong>, USA; accessed8 September 2009). .18. The European Medicines Agency. Guidance Documenton <strong>the</strong> Qualification of Novel Methodologies for <strong>Drug</strong>Development (EMEA, London, UK, January 2009;accessed 8 September 2009). nature biotechnology volume 28 number 5 MAY 2010 443


<strong>new</strong>s <strong>and</strong> viewsA roadmap for biomarker qualificationDavid G Warnock & Carl C PeckA collaborative effort between pharmaceutical companies, regulatory agencies <strong>and</strong> academia to qualify biomarkers forkidney toxicity provides a model for investigating <strong>and</strong> identifying reliable safety markers for preclinical applications.© 2010 Nature America, Inc. All rights reserved.The dependence of preclinical screens on histopathology<strong>and</strong> weakly informative biomarkerscauses considerable delays <strong>and</strong> inefficiencyin transitioning <strong>new</strong> drugs into human testing.This delays confirmation of <strong>the</strong> safety <strong>and</strong>effectiveness of <strong>new</strong> <strong>the</strong>rapies. Four papers 1–4in this issue describe <strong>the</strong> utility of previouslydescribed markers of kidney damage to specificallyassess renal damage in rats exposed toa range of nephrotoxic agents. Two additionalmanuscripts 5,6 fur<strong>the</strong>r describe <strong>the</strong> protocolsused to qualify <strong>the</strong>se biomarkers <strong>and</strong> explain<strong>the</strong> broader implications of <strong>the</strong> assessmentsissued by two major regulatory bodies,<strong>the</strong> Food <strong>and</strong> <strong>Drug</strong> Administration (FDA)<strong>and</strong> European Medicines Agency (EMEA;London). Toge<strong>the</strong>r, <strong>the</strong> papers documentprogress toward establishing a formal processthat will hopefully emerge as a model fordeveloping better biomarkers for predictinga range of toxicities frequently encounteredduring drug development.The work described in this collectionof papers was done by <strong>the</strong> NephrotoxicityWorking Group of <strong>the</strong> Predictive SafetyTesting Consortium (PSTC) 7 , which wascreated as part of <strong>the</strong> FDA’s Critical PathInitiative 8 . O<strong>the</strong>r PSTC groups are currentlyinvolved in qualifying biomarkers to detec<strong>the</strong>patotoxicity, vascular injury, nongenotoxiccarcinogenicity <strong>and</strong> myopathy. The PSTC aimsto pioneer a process framework to criticallyvet a range of previously reported c<strong>and</strong>idateDavid G. Warnock is in <strong>the</strong> Division ofNephrology, University of Alabama atBirmingham, Birmingham, Alabama, USA. CarlC. Peck is at <strong>the</strong> Center for <strong>Drug</strong> DevelopmentScience, Department of Bioengineering <strong>and</strong>Therapeutic Sciences, University of CaliforniaSan Francisco, UC Washington Center,Washington DC, USA.e-mail: dwarnock@uab.edu orcarl@carlpeck.comsafety biomarkers for various organ <strong>and</strong> tissuetypes, qualify <strong>the</strong>m for preclinical applications<strong>and</strong> eventually assess <strong>the</strong>ir feasibility foruse in humans.The need for reliable in vitro systems <strong>and</strong>preclinical models to predict nephrotoxicity inhumans poses a major impediment to developing<strong>and</strong> using <strong>new</strong> drugs. The limitations ofusing detectable changes in serum creatinine(SCr) or blood urea nitrogen (BUN) are wellrecognized, <strong>and</strong> even histopathology, whichis widely regarded as <strong>the</strong> “gold st<strong>and</strong>ard” foranimal studies, has inadequate sensitivity <strong>and</strong>specificity for certain applications 9 . In <strong>the</strong>context of this challenge, <strong>the</strong> NephrotoxicityWorking Group of <strong>the</strong> PSTC selected 23 urinarybiomarkers <strong>and</strong> systematically evaluated<strong>the</strong> utility of <strong>the</strong> most promising biomarkersin several rat models of kidney injury. Theim<strong>media</strong>te intent of <strong>the</strong> collaborative effortwas to apply <strong>the</strong> patterns of renal injury discernedusing <strong>the</strong>se biomarkers to developinga knowledge base that eventually permits preclinicalresults to predict potential renal injuryin a clinical setting before frank nephrotoxicitybecomes apparent.This culminated in <strong>the</strong> detailed presentationof data for seven safety biomarkers(kidney injury molecule 1 (Kim-1), albumin,total protein, β2-microglobulin, cystatinC, clusterin <strong>and</strong> trefoil factor-3 (TFF3)) forconsideration by <strong>the</strong> FDA <strong>and</strong> EMEA. Untilnow, none of <strong>the</strong>se markers could be used tosupport drug applications. A notable aspectof <strong>the</strong> analyses 1–4 is formal evaluation of<strong>the</strong> sensitivity <strong>and</strong> specificity for each of <strong>the</strong>biomarkers. This was accomplished by usinghistologic scoring as a benchmark for renalinjury <strong>and</strong> rigorous analyses employing <strong>the</strong>area under <strong>the</strong> receiver operator characteristics(ROC) curve method (Fig. 1). In thisfigure, <strong>the</strong> dashed diagonal line, which representsidentity between <strong>the</strong> true-positive rate<strong>and</strong> <strong>the</strong> false-positive rate, signifies when <strong>the</strong>test is not informative. The area under <strong>the</strong>curve (0 < AUC < 1.0) represents <strong>the</strong> overallprobability that <strong>the</strong> disease state beinginvestigated (e.g., <strong>the</strong> presence or absence ofdrug-induced renal injury) of a r<strong>and</strong>omlychosen subject is correctly identified by <strong>the</strong>test 10 . These analyses are especially valuablefor comparing <strong>the</strong> costs <strong>and</strong> benefits of singletest measures with panels of tests that includeTrue-positive rate (sensitivity)1.00.80.60.40.200 0.2 0.4 0.6 0.8 1.0False-postive rate (1 – specificity)Figure 1 Receiver operating characteristics(ROC) curve analysis. ROC curves provide acomprehensive <strong>and</strong> visually attractive way tosummarize <strong>the</strong> accuracy of predictions. Each pointon <strong>the</strong> curve represents <strong>the</strong> true-positive rate <strong>and</strong>false-positive rate associated with a particulartest value. The AUC provides a useful metricto compare different tests (indicator variables).Whereas an AUC value close to 1 indicates anexcellent diagnostic test, a curve that lies closeto <strong>the</strong> diagonal (AUC = 0.5) has no informationcontent <strong>and</strong> <strong>the</strong>refore no diagnostic utility. Morethan one ROC curve can be presented in <strong>the</strong> sameplot, <strong>and</strong> <strong>the</strong> absolute areas under each curvecompared to determine which test, or combinationof tests, has <strong>the</strong> better diagnostic performance.The ability to superimpose curves, as shown here,permits tests to be chosen based on considerationssuch as cost <strong>and</strong> availability. Modified from.444 volume 28 number 5 MAY 2010 nature biotechnology


<strong>new</strong>s <strong>and</strong> views© 2010 Nature America, Inc. All rights reserved.serum cystatin C is more sensitive than SCr<strong>and</strong> BUN in monitoring general renal failurecaused by drug exposure.Overall, <strong>the</strong> clinical relevance of <strong>the</strong>se findingsmust be viewed as suggestive because<strong>the</strong>y are based on preclinical models that werechosen to emphasize different injury patternsthat may not pertain to clinical settings where‘injury’ is often multifactorial <strong>and</strong> frequentlyprogresses from one compartment of <strong>the</strong> kidneyto ano<strong>the</strong>r. Fur<strong>the</strong>rmore, not all markerswere evaluated in all of <strong>the</strong> injury models,<strong>and</strong> combinations of markers would also beworth fur<strong>the</strong>r consideration. The regulatoryagencies have encouraged <strong>the</strong> community toprovide additional collaborative clinical studiesto provide additional information about<strong>the</strong> utility of <strong>the</strong>se <strong>and</strong> o<strong>the</strong>r biomarkers inhumans. Because histopathology is not usuallyan option for most clinical applications,physicians <strong>and</strong> clinical investigators currentlyrely on safety biomarkers that are insensitiveboth to <strong>the</strong> initiation of an injury phase, aswell as its extent <strong>and</strong> recovery 9 . Measuringlevels of SCr <strong>and</strong> BUN, along with o<strong>the</strong>r traditionalurinary measurements (volume flow,epi<strong>the</strong>lial cell loss, changes in concentratingability <strong>and</strong> sodium absorption), have not fulfilled<strong>the</strong> needs in <strong>the</strong> clinical setting of earlypredictors of renal damage <strong>and</strong> compromisedfunction. Ano<strong>the</strong>r benefit of <strong>the</strong> applicationof <strong>the</strong> well-defined preclinical findings to <strong>the</strong>clinical setting is <strong>the</strong> possibility that <strong>the</strong> currentlyavailable traditional markers of kidneyinjury could be better defined in <strong>the</strong>ir timing<strong>and</strong> application to specific clinical settings,which could in turn optimize <strong>the</strong> timing <strong>and</strong>application of <strong>the</strong> biomarker measurements.The FDA has concluded that although noneof <strong>the</strong> seven biomarkers are broadly qualifiedto be used as primary renal monitoringtests or dose-stopping criteria, <strong>the</strong>ir use maybe appropriate on a case-by-case basis. Ineach case, risks <strong>and</strong> benefits must be carefullyevaluated for monitoring <strong>and</strong> providingassurances of kidney safety in patients <strong>and</strong><strong>the</strong>refore enabling early clinical investigationsof promising <strong>the</strong>rapeutic agents.We look forward to seeing how many of <strong>the</strong>validated biomarkers from <strong>the</strong> preclinical initiativesare eventually brought forward to <strong>the</strong>clinic. At this point, a fairly wide net has beencast because <strong>the</strong> ideal set of biomarkers in <strong>the</strong>preclinical studies may not be <strong>the</strong> same set thatwill be validated in <strong>the</strong> clinical setting, accountingfor <strong>the</strong> obvious difficulties of defining <strong>the</strong>true gold st<strong>and</strong>ard for kidney injury in <strong>the</strong>clinical studies. The most telling progress along<strong>the</strong>se lines will be made by exploiting <strong>the</strong> modeldeveloped for <strong>the</strong> Critical Path Initiative, whichinvolves close collaboration between <strong>the</strong> pharmorethan one diagnostic measure or test.Thus, ROC curves can be used to interpret<strong>the</strong> interplay of <strong>the</strong> sensitivity <strong>and</strong> specificityof each c<strong>and</strong>idate biomarker in isolation—<strong>and</strong> even more informatively—toge<strong>the</strong>r witho<strong>the</strong>rs. Members of <strong>the</strong> consortium were ableto take advantage of <strong>the</strong> fact that issues suchas timing, extent <strong>and</strong> specific location(s)of <strong>the</strong> injury (e.g., whe<strong>the</strong>r it occurs in <strong>the</strong>glomerulus or kidney tubule), toge<strong>the</strong>r withprogress of <strong>the</strong> recovery phase can be assessedwith multiple biomarkers, each of which mayprovide unique temporal information abouteach of <strong>the</strong>se injury phases.Dieterle et al. 1 use this approach to showthat urinary clusterin outperforms SCr <strong>and</strong>BUN in detecting proximal tubular injury<strong>and</strong> that total protein, cystatin C <strong>and</strong> urinaryβ2-microglobulin each outperform ei<strong>the</strong>r SCror BUN in detecting glomerular injury. Theirfindings suggest that some biomarkers may performbetter with glomerular ra<strong>the</strong>r than tubularinjury. Yu et al. 2 show that urinary albuminis superior to ei<strong>the</strong>r SCr or BUN in detectingtubule damage <strong>and</strong> that urinary TFF3 abundancecomplements <strong>the</strong> capacity of combinedSCr <strong>and</strong> BUN levels to detect renal injury. Vaidyaet al. 3 show that changes in levels of Kim-1clearly outperform changes in <strong>the</strong> abundancesof SCr, BUN or N-acetyl-β-d-glucosaminidasefor detecting kidney damage induced in rats bya range of nephrotoxic agents.These efforts culminated in <strong>the</strong> recommendationby biomarker qualification reviewteams of <strong>the</strong> FDA 11 <strong>and</strong> EMEA 12 that voluntarymeasurement of <strong>the</strong>se seven kidney biomarkersbe regarded as acceptable evidence ofnephrotoxicity in rat studies. Moreover, <strong>the</strong>yare deemed to be of value in complementinginformation obtained from measuring levelsof SCr <strong>and</strong> BUN. Both agencies recommendedthat studies in different species <strong>and</strong> modelsbe undertaken to enhance underst<strong>and</strong>ing of<strong>the</strong> generality of <strong>the</strong> rat findings for preclinicaltoxicity testing. At this time, <strong>the</strong>re is nointent to replace histological assessments in<strong>the</strong> preclinical models. None<strong>the</strong>less, <strong>the</strong> limitationsof bridging from traditional animalfindings to <strong>the</strong> clinical setting, where histologicalassessments are rarely available, cannotbe overemphasized.Data from a fourth study, by Ozer et al. 4 , wasnot part of <strong>the</strong> initial submission but addresskey issues related to evaluating recovery frominjury as well as <strong>the</strong> severity of <strong>the</strong> initialnephrotoxic injury. They find that a panel ofurinary biomarkers enables <strong>the</strong> progression ofrenal injury <strong>and</strong> subsequent repair <strong>and</strong> recoveryto be monitored after exposure of rats toei<strong>the</strong>r of two nephrotoxic agents. The authorscomplement this study by demonstrating thatmaceutical companies, <strong>the</strong> regulatory agencies<strong>and</strong> nephrologists involved in both <strong>the</strong> basic<strong>and</strong> clinical research arenas. A well-defined,drug-induced nephrotoxic event where <strong>the</strong> dosingschedule is prospectively defined would be alogical model for predictive biomarker testing.Examples of such models could be nephrotoxicitydue to intravenous radio-contrast agentsor nephrotoxicity resulting from cisplatin chemo<strong>the</strong>rapy.Monitoring of clinically relevantmeasures of kidney function along with <strong>the</strong>c<strong>and</strong>idate biomarkers seems <strong>the</strong> most obviousfirst step along this path.But beyond <strong>the</strong>se accomplishments <strong>and</strong> <strong>the</strong>remaining challenges to improve early detectionof nephrotoxicity in humans, <strong>the</strong>se studiesintroduce a model collaborative process <strong>and</strong>set <strong>new</strong> st<strong>and</strong>ards for scientific <strong>and</strong> regulatoryqualification of safety biomarkers in general.Until now, both <strong>the</strong> FDA <strong>and</strong> EMEA requiredpharmaceutical companies to submit <strong>the</strong>results of renal toxicity biomarker qualificationtests separately. However, <strong>the</strong> <strong>new</strong> frameworkestablished as a result of this initiative will simplifysubmission of such data to both <strong>the</strong> FDA<strong>and</strong> EMEA, as both agencies have found <strong>the</strong>qualification procedure to be acceptable. Thesuccessful collaboration of fiercely competitivepharmaceutical companies (overcomingsubstantial intellectual property barriers) withscientists from academia <strong>and</strong> regulatory bodiesis particularly notable. If <strong>the</strong> momentumgenerated by this pilot biomarker qualificationprocess can be sustained to translate this rigoroussafety biomarker qualification processto human testing, <strong>and</strong> <strong>the</strong> predictive value ofnovel biomarkers are clinically confirmed, wewill have realized <strong>the</strong> ultimate goal of ensuringsafer <strong>new</strong> <strong>the</strong>rapeutic agents.COMPETING FINANCIAL INTERESTSThe authors declare no competing financial interests.1. Dieterle, F. et al. Nat Biotechnol. 28, 463–469(2010).2. Yu, Y. et al. Nat. Biotechnol. 28, 470–477 (2010).3. Vaidya, V.S. et al. Nat. Biotechnol. 28, 478–485(2010).4. Ozer, J.S. et al. Nat. Biotechnol. 28, 486–494(2010).5. Sistare, F.D. et al. Nat. Biotechnol. 28, 446–454(2010).6. Dieterle, F. et al. Nat. Biotechnol. 28, 455–462(2010).7. http://www.fda.gov/oc/initiatives/criticalpath/projectsummary/consortium.html8. http://www.fda.gov/oc/initiatives/criticalpath/9. Bonventre, J.V. et al. Nat. Biotechnol. 28, 436–440(2010).10. Hanley, J.A. & NcNeil, B.J. Radiology 143, 29–36(1982).11. The Food <strong>and</strong> <strong>Drug</strong> Administration BiomarkerQualification Review Team. Review of QualificationData for Biomarkers of Nephrotoxicity Submitted by <strong>the</strong>Predictive Safety Testing Consortium (FDA CDER, 21February 2008).12. EMEA. Biomarkers Qualification: Guidance toApplicants (doc. ref. EMEA/CHMP/SAWP/72894/2008-CONSULTATION, 24 April 2008).nature biotechnology volume 28 number 5 MAY 2010 445


perspectiveTowards consensus practices to qualify safetybiomarkers for use in early drug development© 2010 Nature America, Inc. All rights reserved.Frank D Sistare 1 , Frank Dieterle 2 , Sean Troth 1 , Daniel J Holder 1 , David Gerhold 1 , Dina Andrews-Cleavenger 3 ,William Baer 4 , Graham Betton 5 , Denise Bounous 6 , Kevin Carl 2 , Nathaniel Collins 7 , Peter Goering 8 ,Federico Goodsaid 8 , Yi-Zhong Gu 7 , Valerie Guilpin 9 , Ernie Harpur 9 , Alita Hassan 4 , David Jacobson-Kram 8 ,Peter Kasper 10 , David Laurie 2 , Beatriz Silva Lima 11 , Romaldas Maciulaitis 10 , William Mattes 12 , Gérard Maurer 2 ,Leslie Ann Obert 13 , Josef Ozer 13 , Marisa Papaluca-Amati 10 , Jonathan A Phillips 14 , Mark Pinches 5 ,Mat<strong>the</strong>w J Schipper 4 , Karol L Thompson 8 , Spiros Vamvakas 10 , Jean-Marc Vidal 10 , Jacky Vonderscher 15 ,Elizabeth Walker 12 , Craig Webb 4 & Yan Yu 1Application of any <strong>new</strong> biomarker to support safety-relateddecisions during regulated phases of drug development requiresprovision of a substantial data set that critically assessesanalytical <strong>and</strong> biological performance of that biomarker. Suchan approach enables stakeholders from industry <strong>and</strong> regulatorybodies to objectively evaluate whe<strong>the</strong>r superior st<strong>and</strong>ards ofperformance have been met <strong>and</strong> whe<strong>the</strong>r specific claims offit-for-purpose use are supported. It is <strong>the</strong>refore importantduring <strong>the</strong> biomarker evaluation process that stakeholders seekagreement on which critical experiments are needed to testthat a biomarker meets specific performance claims, how <strong>new</strong>biomarker <strong>and</strong> traditional comparators will be measured <strong>and</strong> how<strong>the</strong> resulting data will be merged, analyzed <strong>and</strong> interpreted.A safety biomarker can be defined, at least in <strong>the</strong> context of drug development,as any analyte that can be quantified to indicate an adverse responseto a test agent. Important terminology for biomarker discussions hasbeen established previously 1 <strong>and</strong> reviewed recently 2 . Whereas validationrefers to <strong>the</strong> process of assessing <strong>the</strong> measurement performance characteristicsof <strong>the</strong> biomarker’s assay, qualification is <strong>the</strong> fit-for-purposeprocess of linking a biomarker with biological processes <strong>and</strong> clinical(animal <strong>and</strong>/or human) endpoints. Because <strong>the</strong> time <strong>and</strong> resources thatcould be invested in answering every important question regarding <strong>the</strong>use of a <strong>new</strong> safety biomarker under all possible circumstances would beprohibitive, <strong>the</strong> most rational approach to identify <strong>and</strong> implement <strong>the</strong> useof safety biomarkers in drug development involves aligning stakeholders1 Merck Research Laboratories, Safety Assessment, West Point, Pennsylvania,USA. 2 Novartis Pharma AG, Basel, Switzerl<strong>and</strong>. 3 Amgen, Inc., Thous<strong>and</strong> Oaks,California, USA. 4 ClinXus, <strong>and</strong> Van Andel Research Institute, Gr<strong>and</strong> Rapids,Michigan, USA. 5 AstraZeneca Pharmaceuticals, Cheshire, Engl<strong>and</strong>.6 Bristol-Myers Squibb, Princeton, New Jersey, USA. 7 Schering-Plough ResearchInstitute, Summit, New Jersey, USA. 8 US Food <strong>and</strong> <strong>Drug</strong> Administration,Silver Spring, Maryl<strong>and</strong>, USA. 9 Sanofi-aventis, Malvern, Pennsylvania, USA.10 European Medicines Agency, London, UK. 11 iMED.UL, Lisbon University,Portugal. 12 Critical Path Institute, Tucson, Arizona, USA. 13 Pfizer Inc., Groton,Connecticut, USA. 14 Boehringer Ingelheim Pharmaceuticals, Inc., Ridgefield,Connecticut, USA. 15 Hoffman La Roche, Basel, Switzerl<strong>and</strong>. Correspondenceshould be addressed to F.D.S. (e-mail: frank_sistare@merck.com).Published online 10 May 2010; doi:10.1038/nbt.1634to prioritize <strong>the</strong> critical answers needed, st<strong>and</strong>ardizing <strong>the</strong> approach <strong>and</strong>agreeing to <strong>the</strong> amount of effort needed to sufficiently qualify a safetybiomarker for regulatory purposes.Historically, efforts to exp<strong>and</strong> <strong>the</strong> safety biomarker toolbox for drugdevelopment have not been met with similar enthusiasm as attempts todevelop biomarkers of <strong>new</strong> drug-target engagement, disease progression,disease mitigation <strong>and</strong> drug efficacy 3,4 . Some may attribute this lag in<strong>the</strong> introduction of <strong>new</strong> safety biomarkers to a view that <strong>the</strong> qualificationof safety biomarkers is more an applied science problem for privateindustry than a high priority for public funding through academicgrants. Even so, such an effort represents a sizeable resource burden forany single pharmaceutical company <strong>and</strong> is thus difficult to tackle alone,especially as most would agree that establishing <strong>the</strong> biological sensitivity<strong>and</strong> specificity of a <strong>new</strong> safety biomarker is a daunting task. Moreover,<strong>the</strong> effort would need to address long-st<strong>and</strong>ing deficiencies in biomarkerdevelopment, making it difficult for any one company to justify divertingresources to such a systemic problem. There is also concern that moresensitive, but poorly established, <strong>new</strong> safety biomarkers could be forcedinto use prematurely by well-meaning regulatory authorities <strong>and</strong> that thiscould complicate drug development. A premature implementation intoan early clinical trial of a safety biomarker that may not have been sufficientlyqualified could yield ei<strong>the</strong>r false-negative or false-positive conclusions<strong>and</strong> negatively affect both patient welfare <strong>and</strong> <strong>the</strong> drug c<strong>and</strong>idate’sfuture development path, to a far greater extent than an insufficientlyqualified efficacy biomarker. For safety biomarkers deemed sufficientlyqualified to be applied in an early regulatory drug trial setting, both regulatoryauthorities <strong>and</strong> drug sponsors must have sufficient confidence thatstable levels of that biomarker indicate that <strong>the</strong> drug is safe at that dose<strong>and</strong>, conversely, that significant change in <strong>the</strong> safety biomarker representsan adverse effect.Despite <strong>the</strong>se difficulties, <strong>the</strong> past few years have witnessed increasedinterest in <strong>the</strong> development <strong>and</strong> qualification of safety biomarkers.This interest has been fueled by recent scientific advances in analytical‘-omics’ technologies <strong>and</strong> animal models, as well as by <strong>the</strong> growingrealization of <strong>the</strong> promise of <strong>the</strong>se biomarkers to facilitate drug development.The Critical Path Initiative, launched by <strong>the</strong> US Food <strong>and</strong> <strong>Drug</strong>Administration (FDA) in 2004 (ref. 5), <strong>and</strong> <strong>the</strong> European MedicinesAgency/Committee for Medicinal Products for Human Use (EMEA/446 volume 28 number 5 may 2010 nature biotechnology


perspective© 2010 Nature America, Inc. All rights reserved.CHMP; London) Think-Tank Report 6 fur<strong>the</strong>r highlight <strong>the</strong> regulatoryviewpoints on <strong>the</strong> importance of biomarker development as a way tomodernize how drugs are developed <strong>and</strong> evaluated, <strong>and</strong> <strong>the</strong>y express acommitment of regulatory support to foster progress. The subsequentcreation of enabling frameworks under which private industry couldpartner with regulatory authorities has provided a way for <strong>the</strong> variousstakeholders to work toge<strong>the</strong>r to advance <strong>the</strong> development <strong>and</strong> qualificationof safety biomarkers for drug development.The Critical Path Institute (Tucson, AZ, USA) Predictive Safety TestingConsortium (PSTC) Nephrotoxicity Working Group (NWG) representsone such enabling framework that joins industry, academic <strong>and</strong> regulatoryscientist stakeholders. Several PSTC working groups were formedto focus on qualifying safety biomarkers for different organs <strong>and</strong> druginducedinjuries, including kidney, liver, vascular system, carcinogenesis<strong>and</strong> myopathy 7 . Drawing on <strong>the</strong> pioneering experiences of <strong>the</strong> PSTCNWG as an example, this report provides recommendations from <strong>the</strong>Critical Path Institute’s PSTC NWG, EMEA <strong>and</strong> FDA for establishingprocedures to meet <strong>the</strong>ir common goal of qualifying safety biomarkers astools for drug development that are appropriate for regulatory decisionmaking.To ensure objectivity, research scientists <strong>and</strong> o<strong>the</strong>r regulatoryauthorities from <strong>the</strong> FDA <strong>and</strong> EMEA who contributed to data generation<strong>and</strong> submission were excluded from <strong>the</strong> EMEA <strong>and</strong> FDA BiomarkerQualification Review Team (BQRT) evaluations (Table 1).In this article, we describe core principles <strong>and</strong> mutual decisions, aswell as yet unresolved issues, which emerged in <strong>the</strong> course of <strong>the</strong> PSTC’ssuccessful efforts to qualify seven <strong>new</strong> biomarkers of drug-induced renalinjury to support regulatory decision-making during early drug development.We outline <strong>the</strong> procedures defined from this effort in <strong>the</strong> hopethat this may guide o<strong>the</strong>r collaborative groups seeking to qualify safetybiomarkers for o<strong>the</strong>r purposes.Setting expectations <strong>and</strong> core principles for <strong>the</strong> qualificationprocessThe need for a defined qualification process that meets regulatory <strong>and</strong>industry requirements is exemplified by <strong>the</strong> slow adoption of <strong>the</strong> cardiactroponins as <strong>new</strong> serum biomarkers of cardiac injury for drug developmentapplications. Thirteen years elapsed from when cardiac troponinswere first reported to possess benefits over o<strong>the</strong>r cardiac injury biomarkers8 , to when <strong>the</strong> American College of Cardiology <strong>and</strong> <strong>the</strong> EuropeanSociety of Cardiology declared <strong>the</strong>se biomarkers a gold st<strong>and</strong>ard for diagnosingischemic cardiac injury in 2000 (ref. 9). In this case, <strong>the</strong> acceptanceof serum cardiac troponins by healthcare providers for broad-baseddiagnoses of cardiac disease outpaced any systematic acceptance for drugdevelopment use <strong>and</strong> regulatory decision-making.The initial steps taken toward establishing <strong>and</strong> implementing a practicalprocess map for biomarker qualification were for <strong>the</strong> Critical PathInstitute PSTC NWG, EMEA <strong>and</strong> FDA to first establish mutual underst<strong>and</strong>ing<strong>and</strong> acceptance of four core principles.First, <strong>new</strong>ly qualified biomarkers must be implemented safely. Theresults of efforts to successfully qualify <strong>and</strong> <strong>the</strong>n implement safetybiomarkers must not place patients in clinical trials at additional risk.Instead, <strong>the</strong>y should improve upon <strong>the</strong> most critical shortcomings of currentbiomarker use in drug development <strong>and</strong> be implemented judiciouslyin animal toxicology studies that are used to support <strong>the</strong> safe conductof clinical trials, <strong>and</strong> <strong>the</strong>n only in those clinical trials where <strong>the</strong> risk-tobenefitratio is deemed appropriate. It is critical that <strong>the</strong>re be generalagreement that <strong>the</strong> data generated will satisfactorily support specificallystated qualification claims before implementation.Second, initial goals of biomarker qualification must be directedat highly specific fit-for-purpose limited contexts. Every biomarker isexpected to demonstrate strengths <strong>and</strong> limitations in any carefully definedcontext of use. No single biomarker is expected to become a surrogateendpoint of organ health. Therefore, at <strong>the</strong> outset, it is necessary to setappropriate expectations of success, frame <strong>the</strong> ultimate specific claims forbiomarker utility, chart <strong>the</strong> initial experimental qualification strategy <strong>and</strong>define very specific application contexts in drug development involvingregulatory decision making.Third, additional data will eventually exp<strong>and</strong> biomarker utility <strong>and</strong>streng<strong>the</strong>n confidence in <strong>the</strong> use of biomarkers for applications beyondinitial qualification claims. The initial data set will largely be dedicated totesting sensitivity <strong>and</strong> negative predictivity in animal toxicology studiesagainst a current benchmark with its own limitations, using a carefullychosen limited set of known test agents. This will require a comparativelysmaller investment than would be needed to thoroughly assess specificity,for example. Additional data will be expected to more rigorously assessspecificity <strong>and</strong> exp<strong>and</strong> knowledge of biomarker use <strong>and</strong> applicability tobroader purposes. The concept of developing evidentiary st<strong>and</strong>ards 3,10,11tailored to use of a specific biomarker has emerged fairly recently as anTable 1 Steps in <strong>the</strong> regulatory qualification of <strong>new</strong> safety biomarkers for PSTC1. Set expectations <strong>and</strong> core principles, <strong>and</strong> precisely define <strong>the</strong> goals, objectives <strong>and</strong>limited <strong>new</strong> biomarker qualification claims.Industry <strong>and</strong>academic consortiummember inputRegulatory BQRTmember inputO<strong>the</strong>r regulatoryresearch scientistcontributor inputYes Yes Yes2. Evaluate c<strong>and</strong>idate safety biomarkers against strength-of-evidence criteria (Table 2). Yes No Yes3. Assess <strong>the</strong> utility of any existing available data, study samples <strong>and</strong> assays. Yes No Yes4. Complete gap analysis:prioritize biomarker c<strong>and</strong>idatesspecify analytical assay validation needsset general design of <strong>new</strong> studiesidentify <strong>new</strong> biomarkers to be measured in existing samples5. Define research plan to address gaps:define fit-for-purpose assay validation plansdefine study protocols <strong>and</strong> specific studies to test biomarker performance claimsalign on processes, procedures, lexicons for collection of gold st<strong>and</strong>ard measurementsalign on <strong>the</strong> statistical analysis planYes No YesYes Yes Yes6. Resolve unforeseen issues in ongoing manner. Yes Yes Yes7. Execute research plan <strong>and</strong> submit results <strong>and</strong> conclusions for BQRT review. Yes No Yesnature biotechnology volume 25 number 5 may 2010 447


perspective© 2010 Nature America, Inc. All rights reserved.important concept in biomarker development. This ‘fit-for-purpose’approach enables qualification for a specific use based on more limiteddata with <strong>the</strong> potential to broaden this qualification as <strong>new</strong> data emerge.Fur<strong>the</strong>rmore, in a very practical sense, initial qualification is expectedto drive commercial development of analytically validated improvedassays on high-throughput platforms, critical for exp<strong>and</strong>ing <strong>the</strong> scopeof evaluation.And fourth, biomarkers that are qualified should be used in second-tiertests, at least initially. Given <strong>the</strong> higher costs associated with <strong>new</strong> tests,toge<strong>the</strong>r with <strong>the</strong> large data set needed to more fully evaluate specificity<strong>and</strong> causes for potential false-positive test results, qualified biomarkersshould be reserved initially as second-tier tests, ra<strong>the</strong>r than deployed ona routine basis as a first-tier test. They should serve as follow-up tools incarefully chosen situations when routine study data define <strong>the</strong>ir specificneed to support development for a given compound.A specific fit-for-purpose context <strong>and</strong> progressive qualificationframework for nephrotoxicity biomarkersIn <strong>the</strong> context of kidney injury, histopathological lesions that develop intwo test species in response to a test compound at relevant human <strong>the</strong>rapeuticdoses that are not associated with elevations in serum creatinine(SCr) or blood urea nitrogen (BUN) raise legitimate concerns about<strong>the</strong> potential for safe development of that compound 12 . However, <strong>the</strong>human relevance is less clear when such toxicity findings are seen onlyat very high <strong>the</strong>rapeutic doses in animals or only in one of several testspecies. This common experience among industry collaborators helpedfocus <strong>the</strong> highest priority goal for <strong>the</strong> PSTC NWG, to qualify accessiblebiomarkers that could outperform BUN <strong>and</strong> SCr for ensuring safe monitoringof <strong>the</strong> kidney. Delays in development that result from ab<strong>and</strong>oningpromising drug development programs at pivotal times before anyproof-of-concept evidence for <strong>the</strong> class or target has been established in<strong>the</strong> clinic not only divert resources <strong>and</strong> leng<strong>the</strong>n development times butalso slow <strong>the</strong> process of bringing important <strong>new</strong> drugs to patients. For any<strong>new</strong> renal safety biomarkers to gain regulatory acceptance <strong>and</strong> industryuptake, <strong>the</strong> biomarkers need to demonstrate increased sensitivity for earlydetection of drug-induced injury <strong>and</strong> reduce <strong>the</strong> false-negative predictionrate of BUN <strong>and</strong> SCr for monitoring kidney safety. Fur<strong>the</strong>rmore, aso<strong>the</strong>r mechanisms that do not involve kidney injury can elicit increases inBUN <strong>and</strong> SCr, we need <strong>new</strong> biomarkers that can resolve such ambiguities.For o<strong>the</strong>r organs, <strong>the</strong> safety monitoring improvements sought from <strong>new</strong>biomarkers will similarly depend upon <strong>the</strong> glaring weaknesses of currentbiomarkers in conventional use.The agreed upon research goal of <strong>the</strong> original Critical Path InstitutePSTC NWG initiative, <strong>the</strong>refore, was to qualify accessible translationalbiomarkers for regulatory decision making that improve monitoring ofspecific kidney tubule <strong>and</strong> glomerular safety concerns in toxicology testspecies <strong>and</strong> early human clinical trials to facilitate early drug development.We decided to focus initially on establishing biomarker performancemetrics in <strong>the</strong> rat <strong>and</strong> to use knowledge gained with that speciesto <strong>the</strong>n build on any publicly available human data <strong>and</strong> bridge to humanbiomarker qualification studies. To achieve this, <strong>the</strong> goal for <strong>the</strong> NWGinitiative was defined as described below.First, <strong>the</strong> investment <strong>and</strong> structure of <strong>the</strong> PSTC was designed to meet<strong>the</strong> needs of both industry <strong>and</strong> regulators. Thus, although renal biomarkerswould be expected to be useful for internal lead optimization orcompound selection decision making, <strong>the</strong> consortium aimed to establishbiomarker utility in regulated toxicology studies supporting <strong>the</strong> safe conductof clinical trials in a manner that would facilitate mutual acceptanceby both industry <strong>and</strong> regulatory authorities.Second, PSTC initially focused specifically on establishing ‘monitorability’of <strong>the</strong> onset of more acute drug-induced kidney injuries, whichare seen within <strong>the</strong> first 4 weeks of drug dosing. These studies were notdesigned <strong>and</strong> selected to be so broad as to attempt to qualify renal biomarkersfor monitoring late-occurring injuries seen only after chronicdosing, or for general medical uses such as for monitoring progression ofkidney injury associated with diseases such as hypertension or diabetes,or for monitoring kidney transplant rejection or guarding against rare<strong>and</strong> idiosyncratic kidney injuries. The specific fit-for-purpose need, <strong>the</strong>initial goal <strong>and</strong> <strong>the</strong> study designs must be synchronized <strong>and</strong> focused<strong>and</strong>not broadened in an attempt to address all deficiencies in current conventionalbiomarkers.Third, <strong>the</strong> strengths <strong>and</strong> limitations of <strong>the</strong> <strong>new</strong> renal biomarkers weredefined so that that <strong>the</strong>y could be used toge<strong>the</strong>r with BUN <strong>and</strong> SCr toadd value <strong>and</strong> improve on <strong>the</strong> ability of those routine classical parametersalone to monitor for kidney injury <strong>and</strong> proper function. The aim was notto establish <strong>new</strong> surrogate endpoints to replace SCr, BUN or <strong>the</strong> need foranimal histopathology in regulatory toxicology studies. Any attempt toestablish surrogacy <strong>and</strong> replace current tools with <strong>new</strong> biomarkers wouldrequire more effort <strong>and</strong> accrued experience.And fourth, it was important to initially establish claims describing<strong>the</strong> utility of biomarkers in monitoring tubular <strong>and</strong> glomerular injuriesra<strong>the</strong>r than every kidney histopathological lesion reported. The role ofbiomarkers in monitoring o<strong>the</strong>r less commonly seen kidney toxicitiescould be determined subsequently. Thus, a clear focus <strong>and</strong> careful definition<strong>and</strong>/or limitation of initial qualification goals are essential for anybiomarker qualification project.The vision for safe implementation of <strong>the</strong> <strong>new</strong> biomarkers during drugdevelopment was for sponsors to be able to demonstrate for <strong>the</strong>ir specificdevelopment test compound that certain of <strong>the</strong>se <strong>new</strong> biomarkersrespond sufficiently early <strong>and</strong> with sufficient sensitivity in appropriatelydesigned animal toxicology studies to deem any significant histologicfinding to be monitored safely <strong>and</strong> detected when still reversible. Suchresults would <strong>the</strong>n provide a foundation that drug developers <strong>and</strong> regulatoryauthorities could use to build a tailored case-by-case strategy for safeimplementation of <strong>the</strong> biomarker in an early clinical study once relevantclinical experience with <strong>the</strong> biomarkers becomes available <strong>and</strong> <strong>the</strong> risk/benefit ratio is deemed appropriate. The ultimate vision is to providesponsors with a toolbox of qualified safety biomarkers that perform wellfor a drug c<strong>and</strong>idate in animal studies, such that <strong>the</strong> same biomarker(s)could be used on <strong>the</strong> same drug c<strong>and</strong>idate to monitor clinical safety.Although <strong>the</strong> goals <strong>and</strong> vision were clear <strong>and</strong> focused, it was importantto acknowledge that success would lead to fur<strong>the</strong>r opportunities toexp<strong>and</strong> <strong>the</strong> utility of <strong>the</strong> biomarkers into different situations such as o<strong>the</strong>rtypes of kidney injury, o<strong>the</strong>r species, <strong>and</strong> subacute <strong>and</strong> chronic injuries,with <strong>the</strong> eventual aim of establishing more precise definitions of clinicalmonitoring thresholds <strong>and</strong> even utility in human disease mitigation.To set expectations <strong>and</strong> facilitate planning of this process, a ‘ProgressiveQualification Framework’ was developed. This concept defines <strong>the</strong> criticalcore set of data needed to support narrowly defined fit-for-purpose <strong>and</strong>focused initial qualification claims <strong>and</strong> defers broader objectives whilekeeping dialog <strong>and</strong> evidence ga<strong>the</strong>ring continually open. The biomarkerqualification files remain active <strong>and</strong> transparent with <strong>the</strong> regulatoryauthorities, so that <strong>the</strong> strength <strong>and</strong> <strong>the</strong> scope of <strong>the</strong> qualification claimscan be continually <strong>and</strong> incrementally exp<strong>and</strong>ed by any group with <strong>new</strong>data to support additional safety claims or to modify earlier ones. In thiscase, for example, clinical data are expected to streng<strong>the</strong>n clinical qualificationclaims for <strong>new</strong> kidney safety biomarkers. This allows regulatoryauthorities to anticipate that additional data <strong>and</strong> fur<strong>the</strong>r evaluations areexpected <strong>and</strong> desired <strong>and</strong>, fur<strong>the</strong>rmore, positions regulatory authoritiesto play a leading role in helping to define <strong>and</strong> broadly communicateachievements as well as additional needs <strong>and</strong> opportunities to o<strong>the</strong>rstakeholders.448 volume 28 number 5 may 2010 nature biotechnology


perspective© 2010 Nature America, Inc. All rights reserved.A pragmatic approach to biomarker qualificationWe took an experimental approach to maximize support for <strong>the</strong> initiativeamong consortium participants <strong>and</strong> optimize <strong>the</strong> probability for success,while minimizing animal, human <strong>and</strong> capital resource expenditure aswell as <strong>the</strong> time to achieve demonstrable success. This involved threefundamental considerations.Use of existing data <strong>and</strong> study samples. A key strategic agreement was toshare, wherever possible, existing or already planned <strong>and</strong> ongoing studysamples to minimize additional animal, human <strong>and</strong> financial resources.Consortium participants contributed a full tabulation of completed <strong>and</strong>ongoing animal toxicology studies. Once <strong>the</strong> PSTC had obtained studydesigns, summary histopathology findings, data for any <strong>new</strong> biomarkersin those studies <strong>and</strong> an accounting of which samples were still appropriatelymaintained in freezer storage, an inventory of >60 studies involving>30 kidney toxicants <strong>and</strong> >20 kidney nontoxicants was compiled. Aconservative estimate for <strong>the</strong> cost to run studies to generate <strong>the</strong> samplestoge<strong>the</strong>r with all supporting clinical chemistry <strong>and</strong> histopathology metadatais in excess of $4 million.Defining <strong>the</strong> biomarkers for qualification. We invited all members of<strong>the</strong> NWG, including EMEA <strong>and</strong> FDA observers, to nominate promisingbiomarkers. The collaborative effort was not oriented toward <strong>the</strong>discovery of <strong>new</strong> biomarkers. All agreed to focus initially on those biomarkersfor which at least some participants had some experience <strong>and</strong>sufficient confidence in <strong>the</strong> biomarker to justify fur<strong>the</strong>r investigation.Consortium members shared early data from animal toxicology studiesyielding kidney injuries <strong>and</strong> encompassing measurements of 23 differentbiomarkers (Table 2) toge<strong>the</strong>r with BUN, SCr <strong>and</strong> histopathologyTable 2 The 23 urinary protein biomarkers initially proposedby PSTC NWG as safety biomarkers of drug-induced tubularor glomerular injuryBiomarkerAlbuminb2-microglobulinCalbindin d28ClusterinCystatin CEpidermal growth factor (EGF)Glutathione S-transferase α (GSTα)Glutathione S-transferase µ (GSTµ)Kim-1Lipocalin2 (NGAL)N-acetyl-β-glucosaminidase (NAG)OsteoactivinOsteopontinPodocinRenal papillary antigen 1 (RPA1)TFF3Timp1 (tissue inhibitor metalloproteinase type-1)Total proteinUromodulin (Tamm-Horsfall)Vascular endo<strong>the</strong>lial growth factor (VEGF)Macrophage migration inhibitory factorMonokine induced by interferon gammaInterferon-γ induced 10 kDa proteinSelected forqualificationYesYesNoYesYesNoNoNoYesNoNoNoNoNoNoYesNoYesNoNoNoNoNooutcomes. Although not necessarily limited upfront to only proteins inurine, <strong>the</strong> 23 biomarkers for which animal study data were discussed wereall urinary proteins. Any consortium member could choose to share <strong>the</strong>irdata files <strong>and</strong> to fur<strong>the</strong>r study any of <strong>the</strong> biomarkers in existing samplesor in samples from ongoing or planned studies. All participants agreed tomake assays available to o<strong>the</strong>r members <strong>and</strong> to contribute resulting datato a biomarker qualification data submission. The list was reduced to <strong>the</strong>seven biomarkers for which acceptable assay performance data existed <strong>and</strong>promising biological performance data were shared <strong>and</strong> deemed sufficientlyconvincing to warrant additional experimentation. The conceptswe used to evaluate whe<strong>the</strong>r or not a biomarker was initially deemedsufficiently promising for fur<strong>the</strong>r investigation (Box 1) have been summarizedpreviously 10,13 .Once we had identified gaps in <strong>the</strong> data available for <strong>the</strong> seven biomarkers,we pursued additional investigations to sufficiently establishbiomarker sensitivity <strong>and</strong> specificity. We <strong>the</strong>n grouped studies analyzing18 nephrotoxicants <strong>and</strong> 11 non-nephrotoxicants by means of receiveroperating characteristic (ROC) curve analyses to determine <strong>the</strong> relativeability of <strong>the</strong> biomarkers to outperform BUN <strong>and</strong> SCr 12 . We used binary<strong>and</strong> ordinal logistic regression analyses to assess whe<strong>the</strong>r <strong>the</strong> <strong>new</strong> biomarkersprovided additional information regarding kidney toxicity <strong>and</strong>severity relative to <strong>the</strong>se current st<strong>and</strong>ards. Studies were also designed toinvestigate whe<strong>the</strong>r <strong>the</strong> <strong>new</strong> biomarkers could be used to monitor <strong>the</strong>onset of kidney pathology such that it would be detected at a stage wheninjury is mild <strong>and</strong> also shown to be fully reversible. Such approaches toidentifying strengths <strong>and</strong> gaps in existing data <strong>and</strong> for prioritizing <strong>new</strong>experimental efforts are generally applicable for any biomarker qualificationstrategy.Validation of analytical assays. We adopted principles outlined by <strong>the</strong> USNational Institutes of Health (NIH) Chemical Genomics Center 13 <strong>and</strong><strong>the</strong> Bioanalytical Method Validation Guidance for Industry (http://www.fda.gov/downloads/drugs/guidancecomplianceregulatoryinformation/guidances/ucm070107.pdf). Assays within sites were validated based onsensitivity, specificity, robustness, accuracy (70–130%), precision (


perspective© 2010 Nature America, Inc. All rights reserved.Designing <strong>the</strong> study protocolTo evaluate <strong>the</strong> full capacity of a biomarker,it is essential that <strong>the</strong> study protocol generatesamples that mimic <strong>the</strong> most pressing problemthat <strong>the</strong> biomarker is intended to address.Appropriate selection of <strong>the</strong> doses <strong>and</strong> times ofsample collection for both <strong>the</strong> biomarker <strong>and</strong><strong>the</strong> histological examinations or ano<strong>the</strong>r comparatorgold st<strong>and</strong>ard measure is critical. Forexample, it makes little sense to exclude earlystudy time-point collections <strong>and</strong> focus onlyon late terminal study sample collections if abiomarker is desired to provide an early signal.It makes little sense to design a study with onlyhigh doses of test agent where conventional biomarkerchanges are robust if an evaluation of<strong>the</strong> <strong>new</strong> biomarker is needed to assess whe<strong>the</strong>rit can be more sensitive <strong>and</strong> detect toxicities atlower doses than current tests. Similarly, samplestaken at multiple time points from time-course studies are of criticalrelevance to demonstrate whe<strong>the</strong>r <strong>new</strong> biomarkers can detect <strong>the</strong> onsetof lesions earlier than current routine analytes as well as monitor <strong>the</strong>regression of lesions during recovery. The study design should ensurethat sufficient quantities of blood <strong>and</strong> urine samples are collected toenable analysis of multiple <strong>new</strong> analytes, as well as <strong>the</strong> more traditionalanalytes. This will enable accurate comparisons of <strong>the</strong> performances of<strong>new</strong> biomarkers relative to those used in current practice.As discussed below, we used statistical analyses of ROC curves to compare<strong>the</strong> performances of biomarkers. These are absolutely dependentupon <strong>the</strong> protocol design <strong>and</strong> pattern <strong>and</strong> range of histological lesionsobserved. The observed lesions in a study are <strong>the</strong>mselves critically dependentupon <strong>the</strong> choices made in test agents, sampling times, sample collectiondetails, dose levels, dosing frequencies <strong>and</strong> dose durations. Moreover,o<strong>the</strong>r major organs <strong>and</strong> alternative reasons for observed alterations inbiomarkers should be examined to ensure an appropriate attribution <strong>and</strong>linkage to <strong>the</strong> changes seen in <strong>the</strong> target organ. Because food <strong>and</strong> waterintake of individual animals may affect levels of biomarkers, <strong>the</strong>se shouldalso be assessed <strong>and</strong> documented.Consensus on histopathological evaluations for assessingbiomarker performanceIn our studies to qualify safety biomarkers for nephrotoxicity, we used <strong>the</strong>histopathological examination of target tissues from animal toxicologystudies as <strong>the</strong> basis for <strong>the</strong> biomarker response ROC curve. There was aneed <strong>the</strong>refore to st<strong>and</strong>ardize three critical aspects of <strong>the</strong> histopathologicalassessment process across participating institutions so as to ensureobjectivity, sensitivity, specificity <strong>and</strong> consistency.Common diagnostic terminology, thresholds <strong>and</strong> grading assessmentsprovide a framework for identifying <strong>and</strong> binning histologic alterations forstatistical analyses. To meet <strong>the</strong> needs for a st<strong>and</strong>ardized nomenclature foraccurate description of kidney histopathology lesions <strong>and</strong> definitions ofseverity grading, pathologists created a lexicon <strong>and</strong> grading structure ofhistopathological changes to organize all findings. This lexicon was structuredhierarchically to organize data with different levels of complexity<strong>and</strong> to enable statistical evaluation of <strong>the</strong> data (Supplementary Table 1).At <strong>the</strong> highest level, <strong>the</strong> lexicon was organized by anatomic location of <strong>the</strong>renal histomorphologic change such as cortex, medulla, papillae or renalpelvis. Within each of <strong>the</strong> general locations, <strong>the</strong> lexicon allows stepwisesublocalization of <strong>the</strong> histomorphologic change (e.g., glomerulosclerosisoccurs within <strong>the</strong> glomerulus, which is found within <strong>the</strong> cortex). Thisorganization enabled investigators to evaluate how specific anatomicalBox 1 Strength-of-evidence criteria for evaluating biomarkersIn line with previous work carried out elsewhere 10,24 , <strong>the</strong> PSTC considered several criteriain initial selection of renal biomarkers for investigation. These criteria are outlined below.• Availability of a sufficiently validated analytical assay• Biological plausibility of <strong>the</strong> association of <strong>the</strong> biomarkers with injury to <strong>the</strong> organ ofinterest• Underst<strong>and</strong>ing of <strong>the</strong> molecular mechanism of <strong>the</strong> biomarker response• Strong association of changes in biomarker levels to pathological outcomes <strong>and</strong> superiorperformance relative to currently accepted biomarkers• Consistent response across mechanistically diverse toxicants, sexes, strains <strong>and</strong> species• Both dose-response <strong>and</strong> temporal relationships relating <strong>the</strong> magnitude of biomarkeralterations to <strong>the</strong> severity of injury, <strong>and</strong> <strong>the</strong> onset of <strong>and</strong> recovery from injury• Adequate specificity to ensure that <strong>the</strong> biomarker does not respond to injury of o<strong>the</strong>rorgans or to benign activation of physiological processes in <strong>the</strong> organ of interestpatterns of kidney damage correlate with specific patterns of biomarkersignal elevations. The hierarchical lexicon was crafted <strong>and</strong> agreed uponby pathologists from <strong>the</strong> PSTC representative companies to unify preexistingdata that used disparate vocabularies.Pathologists calibrate <strong>the</strong>ir observations with control slides from rats of<strong>the</strong> same source, age <strong>and</strong> strain as <strong>the</strong> actual study animals to allow <strong>the</strong>mto view <strong>the</strong> expected background variations. This assessment of ‘normalvariations’ in background histomorphology is important for identifyingthresholds to distinguish <strong>and</strong> diagnose remarkable treatment-relatedlesions above <strong>the</strong> underlying background variations <strong>and</strong> to underst<strong>and</strong>whe<strong>the</strong>r certain background histomorphology or spontaneously inducedabnormalities can alter biomarker values. Criteria were established fordefining <strong>and</strong> reporting grading thresholds on a severity scale rangingfrom 0 to 5.The assessment of histology was reported in a st<strong>and</strong>ardized fashionusing <strong>the</strong> established lexicon <strong>and</strong> grading system to include <strong>the</strong> following:a general description of <strong>the</strong> ‘normal’ unremarkable background variationsseen across <strong>the</strong> study, a listing by animal of uncommon abnormalspontaneous background lesions unrelated to treatment <strong>and</strong> a listingby animal of all remarkable test-related lesions. Inclusion of a narrativesummary of <strong>the</strong> ‘normal’ background findings provides some insight intodifferences among studies across institutions in biomarker backgroundcontrol levels. Reporting uncommon spontaneous background lesionsunrelated to treatment allows <strong>the</strong> opportunity to fur<strong>the</strong>r investigate <strong>the</strong>irpotential associations with alterations in <strong>new</strong> biomarkers.An objective statistical evaluation of <strong>the</strong> biomarker data dem<strong>and</strong>eda framework for ‘binning’ terms that describe parts of a common processto form composites. The process composite scores selected were asfollows: ‘Necrosis <strong>and</strong> Degeneration’, ‘Regeneration’, ‘Glomerulopathy’,‘O<strong>the</strong>r renal changes’ <strong>and</strong> <strong>the</strong> combined ‘Necrosis <strong>and</strong> Degeneration,<strong>and</strong> Regeneration’ composite called <strong>the</strong> ‘Max Composite’ (Table 3).The biologically based interdependence of <strong>the</strong>se terms, deduced fromprior experience of <strong>the</strong> pathologists independently <strong>and</strong> in advanceof <strong>the</strong> results of <strong>the</strong>se experiments, means that terms can be binnedusing <strong>the</strong> highest pathology score for any binned category to represent<strong>the</strong> composite score for each animal. Binning terms into ‘compositescores’ also reduces <strong>the</strong> multiplicity of endpoints for statisticalanalyses.By underst<strong>and</strong>ing <strong>the</strong> link between <strong>the</strong> specific anatomic location ofacute organ injury, <strong>and</strong> <strong>the</strong> increase or decrease in biomarker release,we can hope to more precisely define <strong>the</strong> region of <strong>the</strong> organ that leadsto changes in biomarker levels. For example, papillary damage elicited450 volume 28 number 5 may 2010 nature biotechnology


perspective© 2010 Nature America, Inc. All rights reserved.weaker Kim-1 <strong>and</strong> trefoil factor 3 (TFF3) signals when compared withtubular damage, suggesting that alterations in Kim-1 <strong>and</strong> TFF3 releaseare more likely to reflect tubular epi<strong>the</strong>lial damage than papillary damage15–17 . In addition, within each anatomic site, findings are fur<strong>the</strong>rorganized by <strong>the</strong> type of histomorphologic change. This lexicon structurealso allows evaluation of links between changes in biomarker levels<strong>and</strong> different types of histomorphologic changes (e.g., necrosis versusregeneration). For example, tubular basophilia caused by cyclosporine Aappears to correlate with Kim-1 <strong>and</strong> clusterin but not with albumin inurine 16–18 . It is important to note that although formal hypo<strong>the</strong>sis testingdoes not preclude subsequent retrospective analysis to detect unanticipatedspecific pairings of biomarker-histologic events to refine biomarkerevaluation, such analysis should be treated as post hoc <strong>and</strong> appropriatemultiplicity considerations applied.Optimization of sample collection <strong>and</strong> preparation. Adequate organsampling <strong>and</strong> <strong>the</strong> preparation of high-quality slides that capture forexamination all major components of <strong>the</strong> target organ are essential tofully evaluate whe<strong>the</strong>r biomarker levels in ei<strong>the</strong>r serum or urine accuratelyreflect tissue injury. At <strong>the</strong> conclusion of a particular study, rigorousoperating procedures should be implemented to assure that samplecollection <strong>and</strong> tissue processing is conducted in a manner that willyield high-quality samples without introducing bias. In paired organs,such as <strong>the</strong> kidney, evaluation of each organ should be performed. TheTable 3 Hierarchical organization <strong>and</strong> binning of PSTC kidney histologic injury lexiconstudy pathologist exercises professional judgment to determine whe<strong>the</strong>r<strong>the</strong> plane of section was optimized, <strong>the</strong> slides were of sufficient quality<strong>and</strong> whe<strong>the</strong>r additional sections are needed, especially for lesionsthat are focal in nature or limited to very specific anatomically distinctregions of a target organ. Objective criteria are set for any collectionof additional sections beyond those defined in <strong>the</strong> protocol to avoidintroducing bias.On occasion, <strong>the</strong>re may be a substantial number of animals in certainstudies with alterations in <strong>new</strong> biomarker measurements that are notreflected by histopathological observations in <strong>the</strong> target organs. Althoughthis is likely to be an uncommon situation, it should be investigated ona case-by-case basis, particularly to resolve what appear to be systematicdiscrepancies in <strong>the</strong> study. There are multiple potential underlyingexplanations that could be considered. These include prodromal biomarkerrelease that precedes histologic alterations, <strong>the</strong> presence of focalhistological changes or an alternative tissue source(s) of biomarkers.Additional molecular endpoint analyses using immunohistochemical orin situ hybridization approaches on sections with no histologic alterationsby st<strong>and</strong>ard staining techniques might be conducted, for example,to investigate discordance between molecular alterations <strong>and</strong> st<strong>and</strong>ardmicroscopic analyses. Problem-solving activities could also include takingadditional sections of archived tissue blocks to search for evidence of focalchanges, although any such assessment should also include a matchednumber of analyses from control animals to avoid introducing bias.PSTC lexiconCategory Description Primary designation Secondary lesion Tertiary segmentsTubular necrosis <strong>and</strong> Degeneration/necrosis of tubular Tubular cell degeneration/necrosis Degeneration No precise location possibledegenerationepi<strong>the</strong>lium Necrosis Proximal convoluted tubuleThick descending tubuleLoop of HenleDistal convoluted tubuleTubular cell Tubular basophilia Tubular cell regeneration Basophilia No precise location possibleregenerationTubular regeneration, epi<strong>the</strong>lial Mitosis increased Proximal convoluted tubuleThick descending tubuleLoop of HenleDistal convoluted tubuleGlomerulopathy Glomerulopathy Glomerular alteration Bowman’s space decr. GlomerulusBowman’s space incr.Mesangial prolif./expansionGlomerular vacuolationO<strong>the</strong>r renal injury Tubular dilatation Tubular dilation CortexFibrosis Fibrosis MedullaPapillaO<strong>the</strong>r Pelvis dilation Pelvis dilation Acute, chronic CortexNephropathy Nephropathy Crystalline, hyaline, granular MedullaMineralization Mineralization PapillaInflammation Inflammation PelvisInfiltrationInfiltrationCastIntratubular castNonrenal tissues Liver damage composite scoreQuadriceps damage composite scoreSoleus damage composite scoreHeart damage composite scoreDiverse descriptors of kidney histology were given hierarchical designations to conform to a st<strong>and</strong>ardized, hierarchical PSTC Renal Lexicon. Each type of kidney injury was <strong>the</strong>n binned intoone of four categories: tubular necrosis/degeneration, tubular cell regeneration, glomerulopathy or o<strong>the</strong>r renal injury. ‘O<strong>the</strong>r renal injury’ comprised two histological findings that are generallytreatment related, whereas ‘O<strong>the</strong>r’ histological changes are occasionally observed in untreated animals <strong>and</strong> may thus be unrelated to treatment. The scores in each of <strong>the</strong> four categoriesare <strong>the</strong>n summed in a composite score as <strong>the</strong> largest grade for any row in that category.nature biotechnology volume 25 number 5 may 2010 451


perspective© 2010 Nature America, Inc. All rights reserved.Implementation of blind studies to minimize bias while performinghistomorphological assessment. St<strong>and</strong>ard industry practice as recommendedby <strong>the</strong> Society of Toxicologic Pathology (STP) guideline 14 wasfollowed in <strong>the</strong> performance of histopathological evaluations of studiesconducted to support biomarker qualification. Although we debated<strong>the</strong> question of whe<strong>the</strong>r pathologist awareness was preferable to havingpathologists blind to knowledge of treatment group, this published guidancestates that knowledge of <strong>the</strong> treatment group during <strong>the</strong> evaluationprocess along with o<strong>the</strong>r available study information favors <strong>the</strong> findingof all treatment-related effects <strong>and</strong> improves diagnostic accuracy. Thereis general agreement that in biomarker qualification studies <strong>the</strong> pathologistshould be blind to knowledge of <strong>new</strong> <strong>and</strong> conventional biomarkermeasurement results. There is also agreement that <strong>the</strong> analytical scientistsgenerating <strong>new</strong> biomarker data should have no knowledge of dosegroup, study findings or histopathological outcome before completionof sample data generation.The interpretation of obvious histomorphological changes across <strong>the</strong>higher doses <strong>and</strong> durations compared to <strong>the</strong> matched controls is fairlystraightforward. Even so, variation in <strong>the</strong> interpretation by a pathologistof subtle histomorphological changes at early time points <strong>and</strong> atlow doses may affect biomarker performance interpretations. Minimalalterations identified in a dose group at <strong>the</strong> margin of histomorphologicaldetection have <strong>the</strong> potential to overlap with similar findings in <strong>the</strong>concurrent or historical controls. In this case, <strong>the</strong> decision by a pathologistto diagnose a change, whe<strong>the</strong>r spontaneous or treatment related,could have a major impact. Rigorous operating procedures described in<strong>the</strong> STP guidance 19 are designed to ensure that microscopic interpretationsof subtle histomorphological changes are conducted in a mannerthat will minimize bias. Recommendations 20 are given in <strong>the</strong> STP guidance19 for minimizing study bias by first, following a pre-establishedset of rigorous operating procedures; second, having additional maskedvisual analyses conducted by <strong>the</strong> study pathologist of r<strong>and</strong>omly orderedcontrol <strong>and</strong> treated samples as necessary to resolve diagnostic ambiguities;third, conducting tests, using peer review with targeted blinding bya second pathologist naive to <strong>the</strong> study, of r<strong>and</strong>omly selected samplesfrom <strong>the</strong> dosed <strong>and</strong> control groups to test if major discrepancies existbetween pathologists; <strong>and</strong> fourth, enlisting a Pathology Working Groupto resolve discrepancies in diagnoses <strong>and</strong> grading scores between <strong>the</strong>study pathologist <strong>and</strong> peer reviewer.During <strong>the</strong> course of <strong>the</strong> project, <strong>the</strong> introduction of potential studybias in <strong>the</strong> collection of histopathological data emerged as a concern21 . This issue arose over concern that a study pathologist havingknowledge of <strong>the</strong> treatment group of origin of <strong>the</strong> study slides before<strong>the</strong> microscopic examination might be influenced by unintentionalobserver bias 20 . As <strong>the</strong>re is no current consensus 21 on whe<strong>the</strong>r fullyblinded reads by <strong>the</strong> study pathologist provide more or less accuratehistomorphological assessments of tissues for biomarker qualificationpurposes, we deferred to <strong>the</strong> STP guidance favoring initial knowledgeof <strong>the</strong> treatment group. One proposal to address <strong>the</strong> issue is to select anadequate set of study samples where a possible difference in pathologistsensitivity to detect subtle lesions could have <strong>the</strong> greatest potentialto affect biomarker performance interpretations, <strong>and</strong> a PathologyWorking Group is engaged at an independent testing site to conducta fully blinded study evaluation. The rationale behind this proposal isto provide added assurances that biomarker study outcomes are notcomplicated by any bias or error introduced unconsciously duringhistopathological evaluation. Although some differences in severitygrading between individual animals are to be expected, ROC analysescan be conducted <strong>and</strong> statistical significance compared to assess <strong>the</strong>impact of any differences. Could such alternative proposals to routineblinding address possible bias in histomorphological assessment forbiomarker qualification data? Additional steps are in progress to reachconsensus in this area.Statistical considerations for assessing relative biomarkerperformance <strong>and</strong> establishing fit-for-purpose utilityStatistical tests of <strong>the</strong> performance of a biomarker should be consideredvery early within <strong>the</strong> context of its proposed uses. A statistical comparisonapproach that incorporated histopathology as <strong>the</strong> ‘gold st<strong>and</strong>ard’ <strong>and</strong>BUN <strong>and</strong> SCr as <strong>the</strong> current commonly used comparators was needed.Agreement was reached on using statistical analyses of ROC curves. ForROC curve analysis to provide a fair comparison, it is important thatit be based on a set of samples for which all of <strong>the</strong> biomarkers beingcompared have been measured. The area under <strong>the</strong> ROC curve (AUC)metric provides a way to average <strong>the</strong> sensitivity measures over <strong>the</strong> entirerange of specificity <strong>and</strong> thus provides a more global assessment of performancethan consideration at only a few thresholds. A plot of <strong>the</strong> entireROC curve, fur<strong>the</strong>rmore, allows graphical comparison of sensitivity atany desired specificity level. ROC analysis is routinely used, relativelysimple to underst<strong>and</strong> <strong>and</strong>, because it is typically nonparametric, does nothave <strong>the</strong> parametric assumption of alternatives like logistic <strong>and</strong> probitregression.The st<strong>and</strong>ard for determination of kidney injury in <strong>the</strong> rat is carefulexamination of kidney morphology by a qualified toxicologic pathologist.Although highly accurate, this determination is not perfect, as pathologistscannot examine every section of both kidneys or all sections of every o<strong>the</strong>rorgan, molecular signals may precede <strong>the</strong> ability to observe structuraldamage, <strong>and</strong> some level of variability on an individual animal-by-animalbasis between <strong>the</strong> subjective evaluations of pathologists is expected. Toaddress <strong>the</strong> possibility of such ‘errors’ in <strong>the</strong> histologic determination ofkidney injury, two types of analyses were performed <strong>and</strong> both presentedin <strong>the</strong> biomarker qualification data submission to regulatory authorities.Each has value <strong>and</strong> limitations. In <strong>the</strong> ‘exclusion’ analysis, only sampleswith <strong>the</strong> highest level of confidence in <strong>the</strong> injury determination wereconsidered—samples from animals not treated with a nephrotoxicant<strong>and</strong> samples from animals treated with nephrotoxicants that developedhistopathological lesions. Specifically, samples from animals given knownnephrotoxicants that did not display kidney histopathology were excluded.‘Inclusion’ analysis, in which all samples were considered, was also performed.The ‘inclusion’ analysis is more objective <strong>and</strong> treats all of <strong>the</strong> datanonselectively. Although generally both types of analyses yielded similarcomparative performance among <strong>the</strong> biomarkers, inclusion analysisyielded higher thresholds, <strong>and</strong> thus lower sensitivity, than exclusion analysisfor <strong>the</strong> same level of specificity. As it yields greater sensitivity to detectinjury, <strong>the</strong> exclusion analysis might be considered a safer choice for settingthresholds. Ano<strong>the</strong>r reason to consider conclusions from an exclusionanalysis is that biomarkers that signal before <strong>the</strong> onset of histopathologyare highly desirable <strong>and</strong> <strong>the</strong> inclusion analysis treats such findings aspenalties or false positives. Differences between <strong>the</strong> inclusion analysis <strong>and</strong>exclusion analysis are illustrated in Figure 1.For <strong>the</strong> tubule damage markers, <strong>the</strong> analyses were driven by a priorihypo<strong>the</strong>ses. Previous research had suggested <strong>the</strong>se proteins to be markersof kidney tubule damage. The markers were compared with a specificset of morphological alterations predetermined within <strong>the</strong> establishedlexicon to be associated with tubule injury. Accordingly, <strong>the</strong> tests involving<strong>the</strong> tubule damage markers were not adjusted for multiplicity. Incontrast, markers of glomerular damage were tested as c<strong>and</strong>idate markersfor several different grouped collections of alterations, <strong>and</strong> not solelyglomerular changes, <strong>and</strong> so were subjected to adjustment for <strong>the</strong> multipletests performed.The ability of a compound to cause any damage was felt to be moreimportant for human risk assessment than distinguishing between452 volume 28 number 5 may 2010 nature biotechnology


perspective7012© 2010 Nature America, Inc. All rights reserved.Urinary Kim-1 levels (fold-changes)100101different grades of injury. Accordingly, analysis focused on distinguishingsamples without kidney injury from those with injury, initiallywithout consideration of severity. ROC curves were used todepict sensitivity <strong>and</strong> specificity across all possible decision rules. TheROC curve AUC was used as <strong>the</strong> metric for statistical assessment ofrelative performance as it is easily interpreted <strong>and</strong> allows for statisticalsignificance tests 22 indicating that one marker outperforms ano<strong>the</strong>r.The sensitivity determined at 95% specificity was calculated as wellfrom <strong>the</strong> ROC curve to allow fur<strong>the</strong>r comparisons between markers,but in an attempt to keep <strong>the</strong> number of tests low, tests of statisticalsignificance were not applied to this parameter. The performanceof biomarkers with respect to different grades of injury was evaluatedby additional ROC analyses using only subsets of histopathologygrades (control animals versus animals with grade 1 injury only orgrade 1 <strong>and</strong> grade 2, etc.). These additional analyses especially servedto illustrate <strong>the</strong> stronger relative merits of <strong>new</strong> biomarkers, such asKim-1, for example, versus SCr in animals where renal injury wasmore subtle 16 .A biomarker that supplies information complementing <strong>the</strong> st<strong>and</strong>ardSCr <strong>and</strong> BUN measures may be valuable, even if it is not superiorto <strong>the</strong> st<strong>and</strong>ard measures. Nested logistic models were used toassess whe<strong>the</strong>r a marker adds information in this way. Specifically, alikelihood ratio test comparing a logistic model containing both <strong>the</strong>c<strong>and</strong>idate marker <strong>and</strong> <strong>the</strong> st<strong>and</strong>ard markers to a model with only <strong>the</strong>st<strong>and</strong>ard markers was used to assess whe<strong>the</strong>r <strong>the</strong> c<strong>and</strong>idate markeraccounts for more variability than would be expected by chance 22 .Reaching statistical significance with this test for a biomarker was<strong>the</strong> basis for a claim that <strong>the</strong> biomarker in question ‘adds value’ tocurrent alternatives 23 . Careful consideration of multiplicity testingcorrection issues, a balanced view of <strong>the</strong> merits <strong>and</strong> deficiencies ofboth inclusion <strong>and</strong> exclusion ROC analysis, <strong>and</strong> defined approaches33 33 7714141477 143 714 3 7 777 1414 143331414143 3 3 7143 7 7314 714 1414373714140 0.5 1 3Dose (mg/kg)Figure 1 Urinary Kim-1 levels after cisplatin treatment 16 . Termination time point is labeled for each animal. The symbol <strong>and</strong> color represent <strong>the</strong>histopathology grading for proximal tubular injury. The magenta line represents <strong>the</strong> Kim-1 threshold for 95% specificity based on ~200 control animals. Oneanimal in this study represents a false positive (encircled). Animals within <strong>the</strong> gray boxes are removed for <strong>the</strong> exclusion analysis in contrast to <strong>the</strong> inclusionanalysis. A number of animals in this box show significantly higher urinary Kim-1 levels (marked with arrows) than control animals.377333to support claims of ‘outperformance’ <strong>and</strong> ‘adds value’, are all importantelements to build into <strong>the</strong> statistical analysis of <strong>new</strong> biomarkerperformance evaluations.ConclusionsHaving first decided to form a consortium of stakeholders from industry,academia <strong>and</strong> regulatory bodies to exp<strong>and</strong> <strong>the</strong> tool box of translationalsafety biomarkers appropriate for use in regulated phases ofearly drug development, we forged a <strong>new</strong> process to define success, aligngoals against an experimental strategy <strong>and</strong> establish rules for collecting<strong>and</strong> interpreting safety biomarker performance data. This consistedof a research plan, containing a very narrow initial fit-for-purposecontext built on clear evidentiary st<strong>and</strong>ards. This was itself containedwithin a progressive qualification framework for subsequent expansionof biomarker claims. Agreement to <strong>the</strong> use of pre-existing studysamples accelerated experimental progress <strong>and</strong> saved over $4 millionin estimated animal, human <strong>and</strong> o<strong>the</strong>r associated study expenses. Datafrom <strong>the</strong>se studies were shared across a list of 23 potential biomarkersof drug-induced kidney injury <strong>and</strong> formed <strong>the</strong> basis for <strong>the</strong> decisionto advance 7 biomarkers with statistically supported claims of outperformingor adding value to <strong>the</strong> conventional biomarkers BUN <strong>and</strong> SCrfor monitoring drug-induced acute renal tubule <strong>and</strong> glomerular injury.Both inclusion <strong>and</strong> exclusion ROC curve analyses proved valuable forcomparing statistical performance levels between biomarkers against<strong>the</strong> gold st<strong>and</strong>ard of kidney histopathology <strong>and</strong> rules were followed forstatistical multiplicity testing adjustments. Because histopathologicaldiagnoses <strong>and</strong> scoring were such essential elements to <strong>the</strong> ROC-basedevaluations of biomarker performance, we placed particular emphasison ensuring alignment of histopathological evaluation practices acrosscontributing laboratories. A st<strong>and</strong>ardized hierarchical nomenclatureor lexicon was established, enabling binning of histological diagnostic3337 7 77 714141414 1414nature biotechnology volume 25 number 5 may 2010 453


perspective© 2010 Nature America, Inc. All rights reserved.terms in a logical manner. Criteria were established for defining <strong>and</strong>reporting grading thresholds. We proposed an STP-established maskedor blinded targeted reanalysis practice of r<strong>and</strong>omly ordered slides fromcontrol <strong>and</strong> treated animals. A set of rigorous operating proceduresfor ensuring slide quality was established <strong>and</strong> a process of reanalyzingsubsets of slides, using peers naive to <strong>the</strong> study was suggested to checkthat observer bias was minimized during <strong>the</strong> unblinded initial reads orslide processing steps. We hope that <strong>the</strong>se principles <strong>and</strong> proceduresguide o<strong>the</strong>r collaborative groups engaged in safety biomarker fit-forpurposequalification initiatives.Note: Supplementary information is available on <strong>the</strong> Nature Biotechnology website.DISCLAIMERThe views presented are those of <strong>the</strong> individuals <strong>and</strong> may not be understood orquoted as being made on behalf of or reflecting <strong>the</strong> position of <strong>the</strong> EMEA <strong>and</strong> <strong>the</strong>FDA.COMPETING FINANCIAL INTERESTSThe authors declare no competing financial interests.Published online at http://www.nature.com/naturebiotechnology/.Reprints <strong>and</strong> permissions information is available online at http://npg.nature.com/reprints<strong>and</strong>permissions/.1. Biomarker Definitions Working Group. Biomarker <strong>and</strong> surrogate endpoints: preferreddefinitions <strong>and</strong> conceptual framework. Clin. Pharmacol. Ther. 69, 89–95 (2001).2. Wagner, J.A. Strategic approach to fit-for-purpose biomarkers in drug development.Annu. Rev. Pharmacol. Toxicol. 48, 631–651 (2008).3. Wilson, C., Schulz, S. & Waldman, S.A. Biomarker development, commercialization,<strong>and</strong> regulation: individualization of medicine lost in translation. Clin. Pharmacol. Ther.81, 153–155 (2007).4. Sistare, F.D. & DeGeorge, J.J. Preclinical predictors of clinical safety: opportunities forimprovement. Clin. Pharmacol. Ther. 82, 210–214 (2007).5. Food <strong>and</strong> <strong>Drug</strong> Administration. Innovation or Stagnation: Challenge And Opportunity on<strong>the</strong> Critical Path to New Medical Products. http://www.fda.gov/oc/initiatives/criticalpath/whitepaper.pdf (16 March 2004)6. European Medicines Agency. Evaluation of Medicines for Human Use Innovative <strong>Drug</strong>Development Approaches Final Report from <strong>the</strong> EMEA/CHMP-Think-Tank Group onInnovative <strong>Drug</strong> Development http://www.emea.europa.eu/pdfs/human/itf/12731807en.pdf (22 March 2007).7. Mattes, W.B. et al. Research at <strong>the</strong> interface of industry, academia <strong>and</strong> regulatory science.Nat. Biotechnol. 28, 432–433 (2010).8. Cummins, B., Auckl<strong>and</strong>, M.L. & Cummins, P. Cardiac-specific troponin-I radioimmunoassayin <strong>the</strong> diagnosis of acute myocardial infarction. Am. Heart J. 113, 1333–1344(1987).9. Alpert, J.S. et al. Myocardial infarction redefined – a consensus document of <strong>the</strong> jointEuropean Society of Cardiology/American College of Cardiology Committee for <strong>the</strong>Redefinition of myocardial infarction. J. Am. Coll. Cardiol. 36, 959–969 (2000).10. Altar, C.A. et al. A prototypical process for creating evidentiary st<strong>and</strong>ards for biomarkers<strong>and</strong> diagnostics. Clin. Pharmacol. Ther. 83, 368–371 (2008).11. Wagner, J.A., Williams, S.A. & Webster, C.J. Biomarkers <strong>and</strong> surrogate end points forfit-for-purpose development <strong>and</strong> regulatory evaluation of <strong>new</strong> drugs. Clin. Pharmacol.Ther. 81, 104–107 (2007).12. Bonventre, J.V., Vaidya, V.S., Schmouder, R., Feig, P. & Dieterle, F. Next-generationbiomarkers for detecting kidney toxicity. Nat. Biotechnol. 28, 436–440 (2010).13. Inglese, J. et al. Eli Lilly <strong>and</strong> Company <strong>and</strong> <strong>the</strong> NIH Chemical Genomics Center AssayGuidance Manual. (Eli Lilly <strong>and</strong> Company <strong>and</strong> <strong>the</strong> NIH Chemical Genomics Center,2008). http://www.ncgc.nih.gov/guidance/section2.html14. Lee, J.W. et al. Fit-for-purpose method development <strong>and</strong> validation for successful biomarkermeasurement. Pharm. Res. 23, 312–328 (2006).15. Ichimura, T., Hung, C.C., Yang, S.A., Stevens, J.L. & Bonventre, J.V. Kidney injurymolecule-1: a tissue <strong>and</strong> urinary biomarker for nephrotoxicant-induced renal injury,American Journal of Physiology – Ren. Physiol. 286, F552–F563 (2004).16. Vaidya, V. et al. Kidney injury molecule-1 outperforms traditional biomarker of kidneyinjury in mult-site preclinical biomarker qualification studies. Nat. Biotechnol. 28,478–485 (2010).17. Yu, Y. et al. Urinary biomarkers trefoil factor 3 <strong>and</strong> albumin enable early detection ofkidney tubular injury. Nat. Biotechnol. 28, 470–477 (2010).18. Dieterle, F. et al. Urinary clusterin, cystatin C, β2-microglobulin <strong>and</strong> total protein asmarkers to detect drug-induced kidney injury. Nat. Biotechnol. 28, 463–469 (2010).19. Crissman, J.W. et al. Best practices guideline: toxicologic histopathology. Toxicol. Pathol.32, 126–131 (2004).20. Ransohoff, D.F. Bias as a threat to <strong>the</strong> validity of cancer molecular-marker research.Nat. Rev. Cancer 5, 142–149 (2005).21. Dieterle, F. et al. Renal biomarker qualificaiton submission: a dialog between <strong>the</strong>FDA-EMEA <strong>and</strong> Predictive Safety Testing Consortium. Nat Biotechnol. 28, 455–462(2010).22. DeLong, E.R., DeLong, D.M. & Clarke-Pearson, D.L. Comparing <strong>the</strong> areas under twoor more correlated receiver operating characteristic curves: a nonparametric approach.Biometrics 44, 837–845 (1988).23. Harrell, F.E. Jr. Regression Modeling Strategies (Springer, New York; 2001).24. Hill, AB. The environment <strong>and</strong> disease: association or causation? Proc. R. Soc. Med.58, 295–300 (1965).454 volume 28 number 5 may 2010 nature biotechnology


perspectiveRenal biomarker qualification submission:a dialog between <strong>the</strong> FDA-EMEA <strong>and</strong> PredictiveSafety Testing Consortium© 2010 Nature America, Inc. All rights reserved.Frank Dieterle 1 , Frank Sistare 2 , Federico Goodsaid 3 , Marisa Papaluca 4 , Josef S Ozer 2,28 , Craig P Webb 5,6 , WilliamBaer 5,7 , Anthony Senagore 5,8 , Mat<strong>the</strong>w J Schipper 5,9 , Jacky Vonderscher 10 , Stefan Sultana 5 , David L Gerhold 2 ,Jonathan A Phillips 11 , Gérard Maurer 1 , Kevin Carl 1 , David Laurie 1 , Ernie Harpur 12 , Manisha Sonee 13 , DanielaEnnulat 14 , Dan Holder 15 , Dina Andrews-Cleavenger 16 , Yi-Zhong Gu 17,29 , Karol L Thompson 3 , Peter L Goering 3 ,Jean-Marc Vidal 4 , Eric Abadie 4 , Romaldas Maciulaitis 4,18 , David Jacobson-Kram 3 , Albert F Defelice 3 , ElizabethA Hausner 3 , Melanie Blank 3 , Aliza Thompson 3 , Patricia Harlow 3 , Douglas Throckmorton 3 , Shen Xiao 3 , NancyXu 3 , William Taylor 3 , Spiros Vamvakas 4 , Bruno Flamion 4 , Beatriz Silva Lima 4 , Peter Kasper 4 , Markku Pasanen 4,19 ,Krishna Prasad 4 , Sean Troth 20 , Denise Bounous 21 , Denise Robinson-Gravatt 22 , Graham Betton 23 , Myrtle A Davis 24 ,Jackie Akunda 25 , James Eric McDuffie 13 , Laura Suter 10 , Leslie Obert 22 , Magalie Guffroy 12 , Mark Pinches 23 , SupriyaJayadev 11 , Eric A Blomme 26 , Sven A Beushausen 22 , Valérie G Barlow 12 , Nathaniel Collins 17,29 , Jeff Waring 26 , DavidHonor 26 , S<strong>and</strong>ra Snook 13 , Jinhe Lee 26 , Phil Rossi 27 , Elizabeth Walker 27 & William Mattes 27The first formal qualification of safety biomarkers for regulatorydecision making marks a milestone in <strong>the</strong> application ofbiomarkers to drug development. Following submission ofdrug toxicity studies <strong>and</strong> analyses of biomarker performanceto <strong>the</strong> Food <strong>and</strong> <strong>Drug</strong> Administration (FDA) <strong>and</strong> EuropeanMedicines Agency (EMEA) by <strong>the</strong> Predictive Safety TestingConsortium’s (PSTC) Nephrotoxicity Working Group, sevenrenal safety biomarkers have been qualified for limited use in1 Novartis Pharma AG, Basel, Switzerl<strong>and</strong>. 2 Department of Investigative LaboratorySciences, Safety Assessment, Merck Research Laboratories, West Point,Pennsylvania, USA. 3 US Food <strong>and</strong> <strong>Drug</strong> Administration, Silver Spring, Maryl<strong>and</strong>,USA. 4 European Medicines Agency, London, UK. 5 ClinXus, Gr<strong>and</strong> Rapids, Michigan,USA. 6 Van Andel Research Institute, Gr<strong>and</strong> Rapids, Michigan, USA. 7 Gr<strong>and</strong> ValleyMedical Specialists, Gr<strong>and</strong> Rapids, Michigan, USA. 8 Spectrum Health, Gr<strong>and</strong>Rapids, Michigan, USA. 9 Innovative Analytics, Inc., Kalamazoo, Michigan, USA.10 Hoffman-La Roche, Basel, Switzerl<strong>and</strong>. 11 Boehringer Ingelheim Pharmaceuticals,Inc., Ridgefield, Connecticut, USA. 12 Sanofi-Aventis, Malvern, Pennsylvania,USA. 13 Johnson & Johnson, San Diego, California, USA. 14 GlaxoSmithKline, Kingof Prussia, Pennsylvania, USA. 15 Department of Biometrics, Merck ResearchLaboratories, West Point, Pennsylvania, USA. 16 Amgen, Inc., Thous<strong>and</strong> Oaks,California, USA. 17 Schering-Plough Research Institute, Summit, New Jersey, USA.18 Nephrology Clinic of Kaunas Medical University Clinics, Department of Basic <strong>and</strong>Clinical Pharmacology of Kaunas Medical University <strong>and</strong> State Medicines ControlAgency, Kaunas, Lithuania. 19 University of Kuopio, Department of Pharmacology<strong>and</strong> Toxicology, Kuopio, Finl<strong>and</strong>. 20 Department of Pathology, Safety Assessment,Merck Research Laboratories, West Point, Pennsylvania, USA. 21 Bristol-MyersSquibb, Princeton, New Jersey, USA. 22 Pfizer, Inc., Groton, Connecticut, USA.23 AstraZeneca Pharmaceuticals, Alderley Park, Macclesfield, UK. 24 National CancerInstitute, National Institutes of Health, Be<strong>the</strong>sda, Maryl<strong>and</strong>, USA. 25 Eli Lilly <strong>and</strong>Co., Indianapolis, Indiana, USA. 26 Abbott Laboratories, Abbott Park, Illinois, USA.27 Critical Path Institute, Tucson, Arizona, USA. 28 Present address: Dynamics <strong>and</strong>Metabolism, PGRD, Pfizer, Andover Laboratories, Andover, Massachusetts, USA.29 Present affiliation: Merck Research Laboratories, Summit, New Jersey, USA.Correspondence should be addressed to F.D. (e-mail: frank.dieterle@novartis.com).Published online 10 May 2010; doi:10.1038/nbt.1625nonclinical <strong>and</strong> clinical drug development to help guide safetyassessments. This was a pilot process, <strong>and</strong> <strong>the</strong> experiencegained will both facilitate better underst<strong>and</strong>ing of how <strong>the</strong>qualification process will probably evolve <strong>and</strong> clarify <strong>the</strong>minimal requirements necessary to evaluate <strong>the</strong> performance ofbiomarkers of organ injury within specific contexts.A Voluntary eXploratory Data Submission was initiated on 15 June2007 for seven urinary renal safety biomarkers, including kidneyinjury molecule-1 (KIM-1), clusterin (CLU), albumin, total protein,β2-microglobulin, cystatin C <strong>and</strong> trefoil factor 3 (TFF3) in urine.The submission to <strong>the</strong> EMEA <strong>and</strong> <strong>the</strong> FDA contained data, interpretations<strong>and</strong> <strong>the</strong> proposed contexts of use for each of <strong>the</strong> biomarkers.This submission was followed by two face-to-face meetings on 12 July2007 <strong>and</strong> 9 October 2007 between FDA, EMEA <strong>and</strong> PSTC members—<strong>the</strong> Pharmaceuticals <strong>and</strong> Medical Devices Agency of Japan also participatedin an observational capacity—<strong>and</strong> several joint telephoneconferences. New processes were established at <strong>the</strong> FDA <strong>and</strong> EMEAfollowing <strong>the</strong> review process (see ref. 1 by F.G. <strong>and</strong> M. Papaluca).Through <strong>the</strong>se collegial communications, experts addressed data gapsin <strong>the</strong> initial submission, which were responded to by <strong>the</strong> consortiumproviding a large amount of additional data in <strong>the</strong> form of ninefollow-up submissions.In this article, we summarize <strong>the</strong> PSTC renal biomarker submission,analyses <strong>and</strong> conclusions, <strong>and</strong> <strong>the</strong>n go on to discuss <strong>the</strong> <strong>new</strong> st<strong>and</strong>ards<strong>and</strong> optimal practices identified through <strong>the</strong> qualification review processat <strong>the</strong> FDA <strong>and</strong> <strong>the</strong> EMEA. We also provide details of <strong>the</strong> correspondingregulatory agency reviews <strong>and</strong> provide an overview of <strong>the</strong>dialog between <strong>the</strong> PSTC <strong>and</strong> <strong>the</strong> FDA-EMEA. By providing detaileddocumentation of <strong>the</strong> review process, we hope to provide guidancefor future regulatory submissions by o<strong>the</strong>r parties, not only for kidneynature biotechnology volume 28 number 5 may 2010 455


perspective© 2010 Nature America, Inc. All rights reserved.biomarkers but also for biomarkers of o<strong>the</strong>r organs <strong>and</strong> tissues (liver,vascular <strong>and</strong> muscle) currently under investigation by <strong>the</strong> PSTC.The PSTC renal biomarker submissionAll data, analyses <strong>and</strong> integrated summaries associated with <strong>the</strong> PSTCsubmission were accompanied with key conclusions <strong>and</strong> proposedpreclinical <strong>and</strong> clinical context of use claims for each of <strong>the</strong> biomarkers.Agreement from regulators was sought for each of <strong>the</strong> PSTC conclusions.These conclusions made claims related to <strong>the</strong> performance<strong>and</strong> <strong>the</strong> proposed use of <strong>the</strong> biomarkers in both <strong>the</strong> nonclinical <strong>and</strong>clinical settings (see Fig. 1 <strong>and</strong> Table 1). In addition to <strong>the</strong> specificbiomarker claims, <strong>the</strong> PSTC submission also proposed a progressive(‘incremental’, ‘rolling’) biomarker qualification concept. This wasproposed because of PSTC concerns that advancement from a ‘probablevalid’ to a ‘valid’ biomarker would be hindered if valid were to beinterpreted by regulatory agencies as ‘valid in every context’ (see ref. 2for an explanation of <strong>the</strong> different stages of qualification).Specific biomarker claims. Three claims were made for <strong>the</strong> performanceof <strong>the</strong> seven urinary biomarkers. First, urinary KIM-1, CLU<strong>and</strong> albumin can individually outperform <strong>and</strong> add information toblood urea nitrogen (BUN) <strong>and</strong> serum creatinine (SCr) assays as earlydiagnostic biomarkers of drug-induced acute kidney tubular alterationsin rat toxicology studies. Second, urinary TFF3 can add informationto BUN <strong>and</strong> SCr assays in rat toxicology studies as an earlydiagnostic biomarker of drug-induced acute kidney tubular alterations.Third, total urinary protein, cystatin C <strong>and</strong> β2-microglobulincan individually outperform SCr assay <strong>and</strong> add information to BUN<strong>and</strong> SCr assays as early diagnostic biomarkers in rat toxicology studiesof acute drug-induced glomerular alterations or damage resulting inimpairment of kidney tubular reabsorption.In terms of nonclinical uses of <strong>the</strong> urinary biomarkers, <strong>the</strong> PSTCput forward two proposals. First, urinary KIM-1, CLU, albuminClinical application of tubular markersTubular toxicity confirmed by histopathology in one orseveral species including rat.BUN <strong>and</strong> SCr levels in control range<strong>and</strong> TFF3 are individually qualified for regulatory decision making,because biomarkers may be used by sponsors on a voluntary basis todemonstrate that drug-induced acute kidney tubular alterations canbe monitored in good laboratory practice (GLP) rat studies used tosupport <strong>the</strong> safe conduct of early-phase clinical trials. And second,total urinary protein, cystatin C or β2-microglobulin are individuallyqualified for regulatory decision making as biomarkers that may beused by sponsors on a voluntary basis to demonstrate that acute druginducedglomerular alterations or damage resulting in impairment ofkidney tubular reabsorption are monitorable in GLP rat studies usedto support <strong>the</strong> safe conduct of early-phase clinical trials.In addition to <strong>the</strong> above nonclinical uses, <strong>the</strong> PSTC posited that <strong>the</strong>biomarkers could also be of utility in a clinical translational context.Specifically, that urinary KIM-1, albumin, total protein, cystatin C <strong>and</strong>β2-microglobulin can individually be considered qualified for regulatorydecision making as clinical bridging biomarkers appropriate foruse by sponsors on a voluntary basis in phase 1 <strong>and</strong> 2 clinical trials formonitoring kidney safety when animal toxicology findings generate aconcern for tubular injury or glomerular alterations.Caveats associated with <strong>the</strong> claims. With respect to <strong>the</strong> submitteddata <strong>and</strong> proposed claims, four important aspects of <strong>the</strong> submissionshould be stressed. First, as used by <strong>the</strong> PSTC, ‘monitoring’ <strong>and</strong> ‘monitorable’refer to <strong>the</strong> detection of <strong>the</strong> onset of lesions <strong>and</strong> injury but notto <strong>the</strong> regression of lesions <strong>and</strong> injury. Because only two studies withlimited biomarker data <strong>and</strong> analyses demonstrating <strong>the</strong> correlationof biomarker levels with regression <strong>and</strong> reversibility of lesions weresubmitted in <strong>the</strong> qualification submission 3 , <strong>the</strong> term monitoring didnot include monitoring <strong>the</strong> reversibility of injury.Second, although <strong>the</strong> novel renal biomarkers have greater sensitivitythan <strong>the</strong> current reference st<strong>and</strong>ard methods for <strong>the</strong> detection ofspecific renal injury (that is, glomerular versus tubular), it is importantto underst<strong>and</strong> that increased novel renal biomarker levels mightClinical application of glomerular markersGlomerular toxicity confirmed by histopathology in one orseveral species including rat.BUN <strong>and</strong> SCr levels in control rangeClinical PreclinicalMeasure BUN, SCr, KIM-1 <strong>and</strong> albumin in urine samplesof GLP study in animal species showing tubular toxicity todemonstrate reversibility, interim urine samplings <strong>and</strong>periodic histopathological assessmentsYesPhase 1/2 clinical trial:monitor KIM-1, albumin,BUN, SCr. Basedecisions on best preclinicalmarker amongKIM-1, albumin**KIM-1, albumindiagnostic?*NoNonmonitorablekidney toxicity:clinical trial delayed unlessmechanistic underst<strong>and</strong>ingcan be developed to addresshuman irrelevanceClinical PreclinicalMeasure BUN, SCr, β2-microglobulin, cystatin C <strong>and</strong> totalprotein in urine samples of GLP study in animal speciesshowing glomerular toxicity to demonstrate reversibility, interimurine samplings <strong>and</strong> periodic histopathological assessmentsYesPhase 1/2 clinical trial:monitor cystatin C,β2-microglobulin, total protein,BUN, SCr. Base decisions onbest preclinical marker amongβ2-microglobulin, cystatin C,total protein**Cystatin C,β2-microglobulin,total proteindiagnostic?*NoNonmonitorablekidney toxicity:clinical trial delayed unlessmechanistic underst<strong>and</strong>ingcan be developed to addresshuman irrelevance* Sponsor can voluntarily measure albumin or Kim-1 alone or both markers toge<strong>the</strong>r.** Preclinical best marker means marker with <strong>the</strong> best diagnostic performance comparedto histopathology.* Sponsor can voluntarily measure cystatin C, β2-microglobulin, total protein alone orseveral of <strong>the</strong>se markers.** Preclinical best marker means marker with <strong>the</strong> best diagnostic performance comparedto histopathology.Figure 1 Flow charts explaining <strong>the</strong> proposed limited clinical translational use of <strong>the</strong> <strong>new</strong> renal biomarkers. This is in <strong>the</strong> context of permitting <strong>the</strong>progression of a compound into human testing, which requires <strong>the</strong> demonstration of reversibility upon drug cessation in an animal study. It is not uncommonfor a compound to be associated with histopathological evidence of drug-induced glomerular or proximal tubular injury in animal toxicology studies withoutan observed change in BUN <strong>and</strong> SCr.456 volume 28 number 5 may 2010 nature biotechnology


perspectiveTable 1 Summary of claims submitted by <strong>the</strong> PSTC to <strong>the</strong> FDA <strong>and</strong> EMEA on seven biomarkers associated with nephrotoxicityBiomarkerQualifiedpreclinicallyAdds information toSCr <strong>and</strong> BUN a,bOutperforms SCr <strong>and</strong>/or BUN a–dAnalytically validatedassayWidely availableassayQualified forclinical use eKIM-1 Yes Yes a Yes a Yes Pending YesAlbumin Yes Yes a Yes a Yes Yes YesCLU Yes Yes a Yes a Yes Yes PendingTFF3 Yes Yes a No Yes Pending PendingTotal protein Yes Yes b Yes b,c Yes Yes YesCystatin C Yes Yes b Yes b,c Yes Yes Yesβ2-microglobulin Yes Yes b Yes b,c Yes Yes Yesa Acute tubular alterations. b Acute glomerular injury with acute tubular reabsorption impairment. c Biomarker outperformed SCr. d If an inclusion ROC analysis is considered, instead of anexclusion ROC analysis, cystatin C <strong>and</strong> β2-microglobulin outperform not only SCr but also blood urea nitrogen with respect to <strong>the</strong> prediction of histopathologically confirmed kidney injury(see text for fur<strong>the</strong>r details of <strong>the</strong> ROC analysis). e Qualified for clinical use refers to a ‘case-by-case’ context <strong>and</strong> not to a broad general qualification.© 2010 Nature America, Inc. All rights reserved.also reflect o<strong>the</strong>r intrarenal <strong>and</strong> extrarenal injury (that is, <strong>the</strong>y maybecome elevated when o<strong>the</strong>r types of lesions occur). For example,increases in <strong>the</strong> tubular biomarkers might also be observed in <strong>the</strong>context of o<strong>the</strong>r kidney lesions that occur concomitantly, such asinterstitial fibrosis, proximal tubular dilatation or glomerular alterations.Therefore, it can often be difficult to determine whe<strong>the</strong>r <strong>the</strong>selesions also contribute to <strong>the</strong> increase in tubular biomarkers. Forexample, primary glomerular injury can lead to subsequent tubularinjury <strong>and</strong> interstitial fibrosis due to hemodynamic changes or proteinoverload. As a result, <strong>the</strong> concentrations of a ‘tubular’ biomarkermay become elevated, even if <strong>the</strong> primary lesion is glomerular. Inthis case, <strong>the</strong> tubular biomarker result is positive because <strong>the</strong>re isreal tubular injury. Thus, simple light microscopy is not sufficientto determine whe<strong>the</strong>r glomerular injury might also be causing <strong>the</strong>elevation of <strong>the</strong> tubular biomarker concentration. As a consequence,although <strong>the</strong> biomarkers identified by <strong>the</strong> PSTC may be sensitive for<strong>the</strong> lesions for which <strong>the</strong>y are qualified, <strong>and</strong> specific with respect tobeing minimally detectable or undetectable in control animals withuninjured kidneys, <strong>the</strong>ir presence in elevated amounts when o<strong>the</strong>rkidney lesions are present may make <strong>the</strong>m appear to be less specific.In situ hybridization <strong>and</strong> immunohistochemistry techniques may begood methods for identifying <strong>the</strong> origin of <strong>the</strong> altered protein in aspecific region of <strong>the</strong> kidney so that site specificity of <strong>the</strong> biomarkerscan be better characterized.Third, <strong>the</strong> PSTC also emphasized to <strong>the</strong> FDA <strong>and</strong> <strong>the</strong> EMEA <strong>the</strong>importance of <strong>the</strong> findings in <strong>the</strong> context of our current underst<strong>and</strong>ingof <strong>the</strong> mechanism of action of <strong>the</strong> urinary biomarkers under study.Traditionally, urinary albumin has been most regarded as a glomerularbiomarker 4 . In <strong>the</strong> PSTC’s qualification submission, however,urinary albumin was proposed as a tubular injury biomarker, whichis consistent with recent literature demonstrating that a considerableamount of albumin filters through <strong>the</strong> glomeruli <strong>and</strong> is reabsorbedin <strong>the</strong> tubules in noninjured kidneys 5 . Fur<strong>the</strong>r investigations are thusrequired to examine <strong>the</strong> sensitivity of albumin for glomerular alterationsin rat <strong>and</strong> o<strong>the</strong>r nonclinical models of glomerular injury.Finally, β2-microglobulin in <strong>the</strong> clinical literature is recognized as atubular biomarker 6 , whereas β2-microglobulin in <strong>the</strong> PSTC qualificationsubmission was proposed as a biomarker for glomerular injurythat results in subsequent impairment of tubular reabsorption. ThePSTC emphasized that <strong>the</strong>se are not viewed as contradictory claimsbut instead reflect <strong>the</strong> proposed sequence of injuries in <strong>the</strong> kidney(glomerular injury leading to subsequent tubular protein overloadin <strong>the</strong> tubules, resulting in tubular injury <strong>and</strong> increased biomarkerlevels). In <strong>the</strong> qualification submission, no data from <strong>the</strong>se studiessupporting this sequence of events were submitted, but reference to<strong>the</strong> literature was made. Subsequent to <strong>the</strong> biomarker submission,supporting immunohistochemistry <strong>and</strong> in situ hybridization data forβ2-microglobulin <strong>and</strong> cystatin C in <strong>the</strong>se animals were generated, <strong>and</strong><strong>the</strong>se data confirm this mechanistic underst<strong>and</strong>ing (see ref. 7).The rolling biomarker qualification concept. The PSTC submissionalso proposed <strong>the</strong> rolling or incremental concept that ‘valid’ shouldbe interpreted as “qualified in a defined context, <strong>the</strong> breadth of whichis determined by <strong>the</strong> available data.” In this context, a <strong>new</strong> biomarker‘incremental qualification’ (or ‘rolling qualification’) process wasdefined as starting with a data-driven qualification claim in a narrow,well-defined context. This rolling qualification process allowsdrug development scientists to apply <strong>the</strong> biomarkers as regulatorytools very early within this context. The context can be subsequentlyrevised, exp<strong>and</strong>ed or both, in <strong>the</strong> light of <strong>new</strong> scientific results in <strong>the</strong>toxicology-biomedical fields that provide <strong>the</strong> following: additionalinsight into specificity <strong>and</strong> sensitivity claims; additional manifestationsof tissue pathologies or dysfunctions; evidence of effective useof a biomarker in a panel of o<strong>the</strong>r markers (that is, composites); orexpansion of use to o<strong>the</strong>r species, including humans 2 .FDA-EMEA conclusions <strong>and</strong> recommendationsThe PSTC biomarker qualification data package was reviewed by <strong>the</strong>Biomarker Qualification Review Teams (BQRTs) at <strong>the</strong> FDA (F.G.,D.J.-K., A.F.D., E.A.H., M.B., A.T., P.H., D.T., S.X., W.T. <strong>and</strong> N.X.) <strong>and</strong><strong>the</strong> EMEA (M. Papaluca, J.-M.V., E.A., R.M., S.V., B.F. <strong>and</strong> B.S.L.),with support from <strong>the</strong> EMEA Pharmacogenomic Working Party (P.K.,M. Pasanen <strong>and</strong> K.P.). As a result of <strong>the</strong>ir deliberations, <strong>the</strong> BQRTsdrew several conclusions.First, <strong>the</strong> biomarkers described in <strong>the</strong> submission could be usedvoluntarily as additional evidence in conjunction with traditionalbiomarkers (e.g., BUN <strong>and</strong> SCr) <strong>and</strong> histopathology for detecting<strong>the</strong> drug-induced lesions proposed by <strong>the</strong> PSTC for each biomarker(urinary KIM-1, albumin, TFF3 <strong>and</strong> CLU for tubular alterations <strong>and</strong>urinary total protein, cystatin C <strong>and</strong> β2-microglobulin for alterationsor damage of glomeruli <strong>and</strong> impairment of kidney tubular reabsorption)in <strong>the</strong> rat.Second, most of <strong>the</strong> biomarkers in <strong>the</strong> PSTC submission show bettersensitivity <strong>and</strong> specificity than BUN <strong>and</strong> SCr, <strong>and</strong> all add additionalcomplementary information to assays of BUN <strong>and</strong> SCr.Third, to ga<strong>the</strong>r additional data to characterize <strong>the</strong> usefulness of<strong>the</strong>se renal biomarkers in monitoring drug-induced renal toxicity inhumans, <strong>the</strong> use of biomarkers in clinical trials may be consideredon a case-by-case basis. Using such novel renal biomarkers in earlyclinical trials for renal toxicity monitoring may represent a reasonablenature biotechnology volume 28 number 5 may 2010 457


perspective© 2010 Nature America, Inc. All rights reserved.risk for <strong>the</strong> development of promising <strong>the</strong>rapies that would o<strong>the</strong>rwisebe ab<strong>and</strong>oned, depending on a risk-benefit analysis. For <strong>the</strong> present,though, sponsors of a submission <strong>and</strong> regulators need to decide ona case-by-case basis how best to implement <strong>the</strong>se biomarkers in <strong>the</strong>clinical development program, with <strong>the</strong> prerequisites of preclinicaldemonstration of reversibility of both biomarker levels <strong>and</strong> histopathology<strong>and</strong> <strong>the</strong> establishment of prespecified cutoff values (thresholds).Finally, it was stressed that <strong>the</strong> general use of <strong>the</strong>se biomarkersfor monitoring nephrotoxicity in <strong>the</strong> clinical setting cannot be recommended,at least until fur<strong>the</strong>r data become available that demonstrate<strong>the</strong> correlation of <strong>the</strong>se biomarkers with <strong>the</strong> evolution ofdrug-induced lesions <strong>and</strong> <strong>the</strong>ir reversibility. Although <strong>the</strong> translationof <strong>the</strong>se <strong>new</strong> renal biomarkers to humans is a final goal, <strong>the</strong>y are notcurrently qualified to be used as primary renal injury monitoring testsor dose-stopping criteria.PSTC response to FDA-EMEA recommendationsThe BQRT review of <strong>the</strong> PSTC submission not only revealed somespecific needs for full underst<strong>and</strong>ing of <strong>the</strong> seven renal safety biomarkers;it also provided valuable pointers applicable to biomarkerqualification in general. The limitations of <strong>the</strong> submission as determinedby <strong>the</strong> BQRTs during review <strong>and</strong> <strong>the</strong> responses of <strong>the</strong> PSTCare detailed below.One criticism was that insufficient data were provided to address<strong>the</strong> temporal relationship between biomarker levels <strong>and</strong> <strong>the</strong> emergence<strong>and</strong> recovery of <strong>the</strong> histopathological alterations. The PSTCdesigned <strong>the</strong> studies underlying <strong>the</strong> submitted data packages withdifferent termination time points, which allow temporal correlationsbetween biomarker levels <strong>and</strong> pathologies within each study. All datawere submitted, but no specific data analyses investigating temporaleffects or chronology of biomarker changes were performed. Instead,data analyses focused on investigating <strong>the</strong> diagnostic performanceindependent of time.In light of this issue, <strong>the</strong> PSTC intends to provide more data <strong>and</strong>analyses geared toward investigating <strong>the</strong> temporal effects of biomarkersin future submissions. At <strong>the</strong> time of <strong>the</strong> original submission, onlytwo studies on a few biomarkers were included that demonstratedreversibility of <strong>the</strong> biomarkers in parallel to <strong>the</strong> recovery of renalinjury. Subsequently, additional studies with recovery time pointshave been performed. The data will be part of a future biomarkerqualification submission 3 .O<strong>the</strong>r limitations identified by <strong>the</strong> BQRTs primarily related toinsufficiencies in <strong>the</strong> histopathological data. There was lack of dataon use of lower doses of nephrotoxicants that would cause less extensivehistopathological damage <strong>and</strong> <strong>the</strong>re was a lack of immunohistochemistryor o<strong>the</strong>r data showing localization of <strong>the</strong> biomarker tospecific regions of <strong>the</strong> kidney. The BQRTs recommended <strong>the</strong> inclusionof data with additional multiple nephrotoxic <strong>and</strong> non-nephrotoxiccompounds to broaden <strong>the</strong> underst<strong>and</strong>ing of <strong>the</strong> generality of <strong>the</strong>conclusions using doses that would cause lesser degrees of toxicity<strong>and</strong> additional immunohistochemistry studies to demonstrate <strong>the</strong>localization of <strong>the</strong> biomarkers to <strong>the</strong> damaged areas of <strong>the</strong> kidney.Only a few nephrotoxic compounds were included in <strong>the</strong> analyses.Although <strong>the</strong> PSTC members agree with <strong>the</strong>se recommendations,it was felt that immunohistochemistry <strong>and</strong> in situ hybridization dataprovide value to support <strong>the</strong> mechanistic evidence <strong>and</strong> claims for <strong>the</strong>localized monitoring of injury only in <strong>the</strong> case of de novo expressionbiomarkers. For functional biomarkers <strong>and</strong> leakage biomarkers, <strong>the</strong>localization is of limited to no particular value, except for ruling outde novo intrarenal syn<strong>the</strong>sis. Also, for a limited set of specific quali-fication studies, several slides of different tissue per organ might beprovided to rule out that focal lesions are not overlooked.The BQRTs pointed to gaps in <strong>the</strong> PSTC submission data in severalo<strong>the</strong>r areas: first, <strong>the</strong>re was a lack of data on types of kidney disturbancesnot related to histopathology (e.g., inhibition of transportersin <strong>the</strong> proximal tubule) but resulting in glucosuria or aminoaciduria;second, <strong>the</strong> variety of compounds <strong>and</strong> tissues tested was insufficientto establish <strong>the</strong> specificity of <strong>the</strong> biomarkers; <strong>and</strong> third, only twonephrotoxicant drugs <strong>and</strong> no non-nephrotoxicant drugs were assessedat all three sites. The BQRTs recommended that future submissionsshould include an evaluation of performance characteristics by exposing<strong>the</strong> same strains of rats to additional nephrotoxicant drugs <strong>and</strong>non-nephrotoxic drugs from different mechanistic classes <strong>and</strong> inaltered physiological conditions.The PSTC members responded by agreeing that <strong>the</strong> data submittedshowed limitations in terms of number of studies inducing druginducedtoxicity <strong>and</strong> particularly in <strong>the</strong> number of studies looking atorgan specificity <strong>and</strong> kidney disturbances o<strong>the</strong>r than proximal tubular<strong>and</strong> glomerular injuries. Even so, <strong>the</strong> PSTC hopes that current interestin renal biomarkers will promote <strong>the</strong> implementation of <strong>the</strong> biomarkersin <strong>new</strong> drug studies <strong>and</strong> generate fur<strong>the</strong>r supportive evidence. In<strong>the</strong> spirit of <strong>the</strong> incremental qualification concept, additional datawill help to increase <strong>the</strong> confidence in use of <strong>the</strong>se biomarkers, <strong>the</strong>ircontext of use <strong>and</strong> also <strong>the</strong> awareness of <strong>the</strong>ir limitations. With <strong>the</strong>current available evidence for <strong>the</strong>se biomarkers, <strong>the</strong> PSTC proposeda voluntary use of <strong>the</strong> <strong>new</strong> biomarkers in addition to <strong>the</strong> currentst<strong>and</strong>ards (SCr <strong>and</strong> BUN) <strong>and</strong> in addition to st<strong>and</strong>ard histopathologyassessment for nonclinical evaluation of <strong>new</strong> drugs.As stated already, <strong>the</strong> <strong>new</strong> renal biomarkers described in <strong>the</strong> PSTCsubmission are qualified only for use in <strong>the</strong> rat <strong>and</strong> not any o<strong>the</strong>r animalspecies. The PSTC believes that <strong>the</strong> widespread use of <strong>the</strong> biomarkersfor use in rat <strong>and</strong> human studies should encourage different stakeholdersto systematically develop <strong>and</strong> extend <strong>the</strong> use of assays for o<strong>the</strong>rspecies <strong>and</strong> to perform studies supporting <strong>the</strong> qualification of <strong>the</strong> biomarkersin o<strong>the</strong>r species. The qualification of biomarkers for rats <strong>and</strong>humans may ease <strong>the</strong> burden of evidence needed for <strong>the</strong> qualification ino<strong>the</strong>r species. For example, a limited number of bridging studies mightbe used to support <strong>the</strong> use of <strong>the</strong> biomarkers in o<strong>the</strong>r species.Proposed best practices for biomarker qualificationThe PSTC submission established several practices for qualifying<strong>the</strong> renal biomarkers under consideration. These best practices aredetailed below.First, <strong>the</strong> results from studies were summarized using receiveroperating characteristic (ROC) curves, which are plots of true positiverate (sensitivity) against <strong>the</strong> rate of false positives (1 – specificity)for a continuous variable against a specific reference st<strong>and</strong>ard.The corresponding area under <strong>the</strong> curve (AUC) is a measure of <strong>the</strong>diagnostic performance of a biomarker. Using histopathology as <strong>the</strong>reference st<strong>and</strong>ard, <strong>the</strong> diagnostic performance as measured by AUCor ROC of proposed biomarkers <strong>and</strong> <strong>the</strong> current diagnostic clinicalst<strong>and</strong>ards SCr <strong>and</strong> BUN were evaluated <strong>and</strong> compared. For <strong>the</strong> claimthat a marker “outperforms SCr <strong>and</strong> BUN,” <strong>the</strong> FDA, EMEA <strong>and</strong>PSTC agreed to apply a statistical method for comparison of AUCsoriginally published by DeLong et al. 8 . It was also agreed that supplementaryROC analyses limited to subsets of histopathology grades(controls versus low grades) allowed systematic evaluations of <strong>the</strong>performance of <strong>the</strong> biomarkers for subtle grades of toxicity, althoughsample numbers could be limiting for some subset analyses.It should be noted that <strong>the</strong> evaluation of <strong>the</strong> biomarker performancevia AUC is independent from specific (pre-)defined sensitivity-458 volume 28 number 5 may 2010 nature biotechnology


perspective© 2010 Nature America, Inc. All rights reserved.specificity combinations <strong>and</strong> thus is most appropriate for an evaluationindependent of a specific context of use, which typically implies afixed biomarker threshold with an associated sensitivity <strong>and</strong> specificity.For specific contexts of use, it is crucial that <strong>the</strong> necessary tradeoffbetween false positives (decreased specificity) <strong>and</strong> true positives(increased sensitivity) is explicitly addressed <strong>and</strong> a judgment is madereflecting a specific choice of specificity-sensitivity combination. Forexample, a specific target sensitivity (or ranges <strong>the</strong>reof) might bepredefined, <strong>and</strong> <strong>the</strong> corresponding specificity (which can be derivedfrom <strong>the</strong> ROC curve) is a more accurate measure of <strong>the</strong> performanceof a biomarker for <strong>the</strong> intended use with <strong>the</strong> specific target sensitivity.In addition, for a diagnostic application in broad clinical contexts,taking <strong>the</strong> prevalence of organ injuries or diseases into account maybe a better way of assessing <strong>the</strong> diagnostic utility of a test, especiallyin low-prevalence conditions (e.g., drug-induced toxicity), becausemisclassifications are more likely in diseases of low prevalence. Inthis way, decision thresholds based on positive-likelihood ratios (truepositive rate/false positive rate) or negative-likelihood ratios (falsenegative rate/true negative rate) can be defined. To assess whe<strong>the</strong>r a<strong>new</strong>ly identified biomarker adds information to current st<strong>and</strong>ardswhen used toge<strong>the</strong>r, <strong>the</strong> PSTC opted to assess <strong>the</strong> degree of improvementby employing likelihood ratio tests of logistic models of <strong>the</strong>current st<strong>and</strong>ards with <strong>and</strong> without <strong>the</strong> <strong>new</strong> biomarker 9 .Second, an approach was agreed on as to how to correct for multiplestatistical testing in <strong>the</strong> context of <strong>the</strong> nonclinical biomarker qualificationstudies. Ultimately, <strong>the</strong> PSTC <strong>and</strong> BQRTs agreed that for <strong>the</strong> earlyassessment of <strong>the</strong> performance characteristics of several biomarkerswithin <strong>the</strong> same sample for <strong>the</strong> same site of histopathological injury,no multiplicity correction would be required. This agreement wasmade on <strong>the</strong> basis that <strong>the</strong> biomarkers in <strong>the</strong> PSTC submission wereevaluated as separate entities. Each biomarker was analyzed using<strong>the</strong> same urine samples <strong>and</strong> <strong>the</strong> same histological specimens. Thisapproach has <strong>the</strong> advantage that it allows reuse of stored samples forfur<strong>the</strong>r qualifications of additional biomarkers <strong>and</strong> <strong>the</strong> exchange ofsamples for cross-validation purposes without statistical penalties.Conversely, in situations wherein <strong>the</strong> performance of a single biomarkeris tested using <strong>the</strong> same data or sample set for various pathologies(e.g., for testing a biomarker for glomerular injury, tubular injury,collecting duct injury or interstitial inflammation), <strong>the</strong> PSTC <strong>and</strong>BQRTs agreed that a multiplicity correction should be considered. Tocomply with this, <strong>the</strong> PSTC corrected <strong>the</strong> test results of <strong>the</strong> glomerularbiomarkers in its first qualification submission for multiple testingbecause <strong>the</strong> analyses assessed both tubular <strong>and</strong> glomerular injuries. Itshould be noted that ‘binning’ pathologies (integration of subterms<strong>and</strong> subfindings) does not require a multiplicity adjustment per se.Third, agreement was also needed as to <strong>the</strong> optimal method fornormalizing urinary biomarkers to account for different dilutions ofurine (which exhibit biological variation of up to 20-fold). The PSTCproposed <strong>the</strong> normalization of urinary biomarker values to urinarycreatinine concentrations in <strong>the</strong> same sample, which is also st<strong>and</strong>ardin clinical practice. Literature reviews by PSTC <strong>and</strong> BQRTs led to <strong>the</strong>conclusion that creatinine normalization is commonly used <strong>and</strong> would<strong>the</strong>refore be an acceptable method, especially in <strong>the</strong> light of certain conditions(e.g., non–dilution-related variations of urinary creatinine dueto muscle breakdown, dietary restrictions, changes of tubular creatininesecretion <strong>and</strong> renal injury, even moderate levels of which can result in adecrease in urinary creatinine concentration). Because renal injury canresult in a decrease in urinary creatinine, biomarkers whose amountsdecrease with renal injury are not recommended to be normalized relativeto urinary creatinine concentrations (as was <strong>the</strong> case observed by<strong>the</strong> PSTC for <strong>the</strong> renal biomarker TFF3).Fourth, <strong>the</strong> PSTC selected analytical methods of optimal reliability<strong>and</strong> reproducibility for <strong>the</strong> task at h<strong>and</strong>. The purpose of analyticalmethod validation is to demonstrate that a particular method usedfor quantitative measurement of a parameter is analytically reliable<strong>and</strong> reproducible for its intended use. In <strong>the</strong> case of <strong>the</strong> renalbiomarker submission, validation of assays was consistent with <strong>the</strong>Bioanalytical Method Validation Guidance for Industry (http://www.fda.gov/downloads/drugs/guidancecomplianceregulatoryinformation/guidances/ucm070107.pdf),but importantly, <strong>the</strong> PSTC electedto alter <strong>the</strong> acceptance criteria for accuracy <strong>and</strong> precision. This validationprocess is in general agreement with <strong>the</strong> recently proposedconcept of a fit-for-purpose validation 10 . On a case-by-case basis,performance decisions should account for <strong>the</strong> relationship of assayimprecision to <strong>the</strong> proposed ranges for normal <strong>and</strong> ‘positive’ values<strong>and</strong> how defined imprecision affects <strong>the</strong> sensitivity <strong>and</strong> specificityperformance of <strong>the</strong> assay. For example, imprecision of up to 30%around <strong>the</strong> mean might be considered acceptable for a biomarkerassay if <strong>the</strong>re is a large fold difference between positive <strong>and</strong> negativebiomarker values <strong>and</strong> minimal-to-moderate biological variability;conversely, it may not be acceptable when <strong>the</strong> difference betweenpositive <strong>and</strong> negative values is small. In general, <strong>the</strong> acceptable rangeof analytical imprecision may be determined by its influence on <strong>the</strong>confidence of diagnostic toxicity prediction. Key parameters for <strong>the</strong>analytical validation are accuracy, precision, selectivity, sensitivity,reproducibility, stability, measurements in <strong>the</strong> biological matriceswith analyte spiking <strong>and</strong> testing of <strong>the</strong> most relevant cross-reactants<strong>and</strong> interferences. Interference testing should be determined on acase-by-case basis because some potential interference may not haveto be tested in a fit-for-purpose context.Fifth, <strong>the</strong> PSTC developed a common st<strong>and</strong>ardized histopathologylexicon 11 to both maximize reproducibility <strong>and</strong> comparabilitybetween different studies <strong>and</strong> sites <strong>and</strong> streamline communication ofresults with <strong>the</strong> regulatory agencies. This harmonization was crucialin this biomarker qualification, in that biomarker performance wasevaluated using histopathology as a gold st<strong>and</strong>ard.Sixth, data generated by toxicity studies (e.g., st<strong>and</strong>ard clinicalchemistry parameters <strong>and</strong> in vivo data (e.g., body weight)), werefound to be helpful data in supporting <strong>the</strong> qualification of biomarkers.If <strong>the</strong>se data are available, <strong>the</strong>y should be submitted not only aspart of study reports (as in <strong>the</strong> case of <strong>the</strong> PSTC qualification submission)but electronically as well. If, on <strong>the</strong> o<strong>the</strong>r h<strong>and</strong>, qualifiedbiomarkers are to be used in regular drug development studies (e.g.,GLP studies), which are not part of a biomarker qualification package,such data would not be required (or needed for <strong>the</strong> evaluationof biomarkers changes per se).Finally, data <strong>and</strong> reports should be available in a format that isst<strong>and</strong>ardized, readily interpretable <strong>and</strong> accessible, facilitating <strong>the</strong>irassessment by reviewers at regulatory agencies. This is particularlynoteworthy in cases of multiparty qualification efforts, such as consortia,where multiple entities with varying data submission originscontribute to <strong>the</strong> qualification submission. Text <strong>and</strong> tables in reportsshould facilitate copying by regulatory reviewers for preparationof <strong>the</strong>ir reports. The forthcoming International Conference forHarmonization E16 guidance should help to harmonize <strong>the</strong> formatof biomarker qualification data submissions. Raw data should beprovided if requested.Consensus is needed with respect to <strong>the</strong> optimal format for <strong>the</strong>submission of raw nonclinical <strong>and</strong> clinical biomarker data. For <strong>the</strong>PSTC submission, a common spreadsheet format (e.g., MicrosoftExcel) was employed. However, this format is an interim, suboptimalsolution until a common database <strong>and</strong> data submission format can benature biotechnology volume 28 number 5 may 2010 459


perspective© 2010 Nature America, Inc. All rights reserved.established for nonclinical <strong>and</strong> clinical biomarker data. A first dialogbetween key stakeholders from health authorities, industry, academic<strong>and</strong> patient organizations has noted that such a data submissionformat <strong>and</strong> database should comprise a lot more information thanraw biomarker values, such as subject level data, study protocols,demographics, index drug or disease conditions, concomitant medications,traditional <strong>and</strong> novel kidney biomarkers performance data,assay information <strong>and</strong> histopathology data.Outst<strong>and</strong>ing issuesMuch of <strong>the</strong> debate between <strong>the</strong> regulatory agencies <strong>and</strong> <strong>the</strong> PSTCover <strong>the</strong> renal biomarkers submission centered around <strong>the</strong> needfor blinded histopathology assessments <strong>and</strong> <strong>the</strong> exclusion fromdata analyses of animals that might yield false-positive results. Forboth points, <strong>the</strong> discussions aided <strong>and</strong> advanced <strong>the</strong> developmentof several scientific proposals <strong>and</strong> procedures on <strong>the</strong> best path forward;however, no consensus has yet been reached. Therefore, fur<strong>the</strong>rresearch <strong>and</strong> open discussions in <strong>the</strong> scientific community areanticipated around <strong>the</strong>se two key issues.The reviewers from <strong>the</strong> regulatory agencies stated concerns overpotential study bias at <strong>the</strong> point of data collection involving <strong>the</strong>histopathological evaluation of <strong>the</strong> toxicology study sample slides.In evaluating a <strong>new</strong> biomarker, as opposed to a drug, <strong>the</strong> BQRTscaution against possible bias that could result from <strong>the</strong> pathologisthaving knowledge of <strong>the</strong> study design (e.g., treatment assignment)or results. Even if a pathologist is ‘blinded’ to specific novel biomarkerresults, any additional data, such as comparator biomarkerresults or treatment, provided to <strong>the</strong> pathologist may impart cluesthat could influence, consciously or subconsciously, <strong>the</strong> evaluation of<strong>the</strong> slides. The BQRTs thus hold <strong>the</strong> evaluation of <strong>the</strong> histopathologyfor biomarker qualification to a different st<strong>and</strong>ard from that usedfor safety determinations in drug development programs. For thisreason, <strong>the</strong> BQRTs consider it critical that blinded evaluations ofboth histopathology <strong>and</strong> biomarkers be used in prospective studiesof <strong>the</strong>se biomarkers.The PSTC does not share this concern about study pathologistbias, provided <strong>the</strong> pathologist is working with clear objectivecriteria for lesion diagnosis <strong>and</strong> grading, <strong>and</strong> is fully trained<strong>and</strong> experienced in veterinary toxicological pathology. The PSTCbelieves, similarly to <strong>the</strong> guideline published by <strong>the</strong> Society ofToxicologic Pathology, that knowledge of dose-group assignment,gross observations at necropsy, organ weight changes, in-life data<strong>and</strong> results from clinical chemistry are important for <strong>the</strong> most precisehistological evaluation of slides for biomarker qualificationstudies. However, <strong>the</strong> pathologist should be blinded to <strong>the</strong> dataon <strong>the</strong> biomarkers undergoing qualification to provide an unbiasedanalysis of biomarker diagnostic performance. For biomarkerqualification studies, <strong>the</strong> PSTC proposes <strong>the</strong> st<strong>and</strong>ard Society ofToxicologic Pathology procedure for <strong>the</strong> evaluation of histopathologyslides. In addition, an assessment of bias to provide assurancesthat no significant error has been introduced can be conductedon full sets of slides by mutually chosen experienced third-partypathologist(s) from a subset of qualification studies of interest,which would be selected by reviewers from <strong>the</strong> relevant healthauthorities. Importantly, this procedure would also facilitate <strong>the</strong>use of samples <strong>and</strong> histopathology assessments from st<strong>and</strong>ard GLPanimal toxicology studies conducted to support product <strong>marketing</strong>authorizations in biomarker qualification exercises. This wouldeliminate <strong>the</strong> need to conduct additional customized animal studies,which would unnecessarily delay qualification submissions <strong>and</strong>increase animal use <strong>and</strong> overall costs.As to contention over <strong>the</strong> exclusion from data analyses of animalsthat could yield false-positive results, <strong>the</strong> PSTC submission contained<strong>the</strong> results of two types of statistical performance analyses deemed <strong>the</strong>inclusion analysis <strong>and</strong> <strong>the</strong> exclusion analysis. The inclusion analysisincludes samples of all animals, whereby animals with histopathologyfindings are treated as positives <strong>and</strong> animals without histopathologyfindings are treated as negatives, independent of treatment groups.In <strong>the</strong> exclusion analysis, samples from nephrotoxicant-treated animalsthat did not have a positive composite kidney histopathologyscore were excluded to avoid misinterpretation as to (i) whe<strong>the</strong>r abiomarker is exhibiting prodromal characteristics (biomarkers providinga more sensitive <strong>and</strong> earlier signal relative to histopathologicalfindings), (ii) whe<strong>the</strong>r histopathology is providing a false-negativeresult (e.g., due to lesions outside of <strong>the</strong> section used for histopathologyexamination) or (iii) whe<strong>the</strong>r <strong>the</strong> biomarkers are providing afalse-positive signal. This systematic <strong>and</strong> unbiased approach mightavoid <strong>the</strong> challenges associated with resolving <strong>the</strong>se discrepancieswhen <strong>the</strong>re is a biomarker signal in <strong>the</strong> absence of histopathologicalsigns of renal injury in this subset of animals. The intent of thisexclusion approach is to avoid penalizing biomarkers of renal injurythat may be giving earlier <strong>and</strong> more sensitive signals <strong>and</strong> to delivermore conservative thresholds mainly determined by <strong>the</strong> variance ofbiomarker levels in control animals.In contrast, <strong>the</strong> BQRT reviewers felt that “no data established in asufficient number of animals evaluated with an adequate number ofhistopathology sections indicates that positive values for a biomarkerare predictive of subsequent histopathology. Therefore, <strong>the</strong> BQRTpreferred to draw conclusions based on <strong>the</strong> PSTC ‘inclusion’ analysisin which all animals were evaluated.”The PSTC <strong>and</strong> BQRT reviewers discussed three additional typesof investigations <strong>and</strong> studies that could be carried out to demonstratewhe<strong>the</strong>r biomarkers can be earlier indicators of toxicity thanhistopathology. The first of <strong>the</strong>se would be a time-course study withdifferent necropsy time points <strong>and</strong> with urine collections performedbefore this point in time in all animals. In addition, groups of animalswith later necropsy time points could be sampled longitudinallyin parallel <strong>and</strong> allowed to progress to functional renal injury. Thiswould establish whe<strong>the</strong>r at early time points, biomarker signals arealready present in <strong>the</strong> absence of histopathological evidence, <strong>and</strong>that <strong>the</strong>se precede pathological-functional findings more classicallyassociated with renal injury.A second type of investigation would be to exp<strong>and</strong> <strong>the</strong> histopathologystudies. The combination of immunohistochemistry, insitu hybridization, or both, with hematoxylin <strong>and</strong> eosin staining ofadjacent slides from animals with negative histopathology, but positivebiomarker readout, may demonstrate that, on a molecular level,considerable changes have already occurred <strong>and</strong> relate to biomarkerchanges, which are not visible by st<strong>and</strong>ard light microscopy.A final line of investigation would be to perform a stepwise sectioningof <strong>the</strong> organ(s) of interest with subsequent histopathology assessmentin selected animals with increased biomarker levels but negative initialhistopathology assessment in one or several specific studies. This couldhelp demonstrate whe<strong>the</strong>r focal lesions explain increased biomarkervalues, which appear prodromal with single-slide histopathology assessment.If it can be demonstrated that biomarkers do reflect a prodromalinjury pattern, <strong>the</strong> exclusion analysis may become <strong>the</strong> more acceptedtool for biomarker analysis <strong>and</strong> qualification.Implications for preclinical <strong>and</strong> clinical contextsThe regulatory qualification of <strong>the</strong> renal toxicology biomarkers hasbroad implications for <strong>the</strong> use of <strong>the</strong> biomarkers in many phases of460 volume 28 number 5 may 2010 nature biotechnology


perspective© 2010 Nature America, Inc. All rights reserved.drug development. In <strong>the</strong> longer term, <strong>the</strong>y may also prove of usein <strong>the</strong> clinic.Preclinical-translational contexts. The successful qualification of biomarkerswill encourage <strong>the</strong>ir adoption by pharmaceutical companiesin non-GLP <strong>and</strong> GLP animal toxicology studies as well as in certainclinical trial settings. In non-GLP studies <strong>the</strong> biomarkers can support<strong>the</strong> selection of drug c<strong>and</strong>idates or help provide a better mechanisticunderst<strong>and</strong>ing of drug-induced renal effects. In GLP studies, <strong>the</strong>biomarkers can help in assessing <strong>the</strong> safety of drugs as well as providesupport for <strong>the</strong> translation of drugs into early human studies. In such acase, <strong>the</strong> biomarkers could permit <strong>the</strong> development of a drug formerlythought not to be viable because of <strong>the</strong> inability of available tests toadequately detect <strong>and</strong> monitor early renal injury. For example, a drugdevelopment c<strong>and</strong>idate may cause renal toxicity in animals, but <strong>the</strong>relevance for toxicity in humans is unknown. If a sponsor can providepreclinical evidence that <strong>the</strong> biomarker signal can detect early signs ofrenal injury when full reversibility of <strong>the</strong> lesions upon cessation of <strong>the</strong>drug is possible, <strong>the</strong> progression of <strong>the</strong> drug development c<strong>and</strong>idateinto human trials might be facilitated after careful evaluation of <strong>the</strong>risk/benefit ratio, in discussion with health authorities. In that case, <strong>the</strong>biomarkers would be carefully monitored in first-in-human studies,<strong>and</strong> appropriate actions could be taken on <strong>the</strong> basis of <strong>the</strong> biomarkersignals <strong>and</strong> established decision thresholds.The regulatory qualification of <strong>the</strong>se renal biomarkers alsomeans that prospective measurements of <strong>the</strong>se biomarkers implementedin GLP studies in support of a medicinal product’s developmentmust be reported to <strong>the</strong> health authorities (Guidance forIndustry: Pharmacogenomic Data Submissions . In <strong>the</strong> case-by-case transition of apotentially nephrotoxic <strong>new</strong> drug from <strong>the</strong> preclinical setting intoearly human trials using <strong>the</strong>se biomarkers, a close interaction withhealth authorities will be needed to evaluate <strong>the</strong> risk/benefit ratio<strong>and</strong> to discuss <strong>the</strong> optimal implementation of <strong>the</strong> <strong>new</strong> biomarkersinto clinical trials.Clinical contexts. The amount of clinical experience with <strong>the</strong> sevenrenal biomarkers in <strong>the</strong> PSTC submission as measures of acute druginducedkidney injury varies widely 12 . The FDA <strong>and</strong> EMEA concludedthat <strong>the</strong> data included in <strong>the</strong> PSTC submission were insufficient tosupport <strong>the</strong>ir adoption in a general clinical application context. Tomove toward this goal, <strong>the</strong> PSTC, in consultation with <strong>the</strong> regulatoryagencies, intends to develop <strong>and</strong> implement a series of clinical studies(both observational <strong>and</strong> interventional trials) in support of <strong>the</strong>incremental biomarker qualification process <strong>and</strong> attempt to exp<strong>and</strong><strong>the</strong> context of use for specific clinical applications. The initial studieswill include an observational longitudinal (6–12 months follow-up)study to assess <strong>the</strong> intrasubject <strong>and</strong> intersubject variability in baselinebiomarker levels across well-characterized age- <strong>and</strong> gender-matchednormal volunteers <strong>and</strong> ambulatory patients with underlying conditionsknown to predispose <strong>the</strong>m to drug-induced nephrotoxicity. Asubset of patients with currently active chronic kidney injury frommultiple causes will also be included for comparative analysis. Thisstudy will provide a consolidated sample set from patients with wellcharacterizeddisease through use of long-st<strong>and</strong>ing electronic medicalrecords. Results from <strong>the</strong>se st<strong>and</strong>ardized <strong>and</strong> validated assays willin turn provide key data relating to biomarker baseline values <strong>and</strong>natural longitudinal variance of biomarkers within different subjectpopulations, which are essential for <strong>the</strong> application of <strong>the</strong>se biomarkersin future interventional trials.The PSTC <strong>and</strong> regulatory authorities have also held discussions on<strong>the</strong> optimal design of studies using marketed drugs with known nephrotoxicity(e.g., intravenous contrast medium, gentamicin <strong>and</strong> cisplatin(Platinol)); it is anticipated that <strong>the</strong>se studies will be initiated in parallelwith clinical assay development <strong>and</strong> validation. Cross-validation ofbiomarker thresholds from <strong>the</strong>se different studies could provide compellingevidence regarding <strong>the</strong> general use of biomarker signals acrossa range of patient populations <strong>and</strong> drug exposures. Initial feedbackon an observational study of biomarkers in <strong>the</strong> context of contrastmedium–induced nephrotoxicity in patients undergoing cardiac ca<strong>the</strong>terization(high-risk patients) was provided at <strong>the</strong> first PSTC-FDAprotocol development workshop on clinical biomarker qualificationheld on 10 October 2008. A key issue with respect to <strong>the</strong> design <strong>and</strong>analysis of <strong>the</strong>se clinical studies relates to <strong>the</strong> poor sensitivity <strong>and</strong> specificityof <strong>the</strong> currently accepted gold st<strong>and</strong>ard, SCr, as a marker of acutekidney injury. In contrast to preclinical studies, kidney biopsies arenot generally considered st<strong>and</strong>ard of care in disease populations thatwould be eligible for future clinical trials evaluating <strong>the</strong> accuracy ofbiomarker signals <strong>and</strong> decision thresholds. As such, much discussionhas focused on <strong>the</strong> definition of acute kidney injury, <strong>the</strong> need to collectdata on o<strong>the</strong>r traditional markers of renal injury (e.g., urine analysis<strong>and</strong> microscopy <strong>and</strong> urine electrolytes) <strong>and</strong> a possible adjudication processfor identifying cases of likely acute tubular injury (as distinguishedfrom a prerenal or hemodynamically <strong>media</strong>ted rise in creatinine). Given<strong>the</strong> lack of sensitivity of SCr to renal injury, discussion has also focusedon <strong>the</strong> meaning <strong>and</strong> interpretation of biomarker elevations that occurin <strong>the</strong> absence of an increase in SCr.As confidence in <strong>the</strong>se <strong>new</strong> biomarkers <strong>and</strong> corresponding emergingdecision thresholds increases, clinical trial designs addressing <strong>the</strong>intended use (that is, first-in-human studies in which a potential renaltoxicity <strong>and</strong> safety concern regarding an investigational drug has tobe closely monitored) will be more commonly developed <strong>and</strong> tested.Early discussions on <strong>the</strong> design <strong>and</strong> implementation of such trials arenow underway. The EMEA has recently put in place a dedicated processto provide both for scientific advice on future protocols for fur<strong>the</strong>rdevelopment of biomarkers <strong>and</strong> for biomarker qualification opinions(see http://www.emea.europa.eu/pdfs/human/biomarkers/7289408en.pdf). Approaches on how to best implement <strong>the</strong>se biomarkers in clinicalstudies for biomarker qualification <strong>and</strong> in fur<strong>the</strong>r nonclinical <strong>and</strong>clinical development programs will be discussed on a case-by-case basisin <strong>the</strong> context of <strong>the</strong> <strong>new</strong> EMEA pathway for qualification scientificadvice <strong>and</strong> with <strong>the</strong> FDA in <strong>the</strong> context of biomarker qualification datasubmissions. It is hoped that <strong>the</strong>se trials will soon be initiated throughcontinued collaboration between <strong>the</strong> PSTC, regulatory agencies <strong>and</strong>o<strong>the</strong>r experts <strong>and</strong> interested parties.ConclusionsConsistent with fit-for-purpose or progressive biomarker qualification,<strong>the</strong> PSTC submission has resulted in <strong>the</strong> regulatory qualificationof seven renal safety biomarkers in <strong>the</strong> specific context of uses supportedby <strong>the</strong> submitted data. Going forward, this early qualificationwill allow <strong>the</strong> generation <strong>and</strong> submission of fur<strong>the</strong>r data to exp<strong>and</strong><strong>the</strong> preclinical <strong>and</strong> clinical contexts of use of <strong>the</strong> biomarkers in incrementalsteps. The qualification of several of <strong>the</strong> biomarkers for atranslational context of use on a case-by-case basis will also foster<strong>the</strong>ir implementation in preclinical studies <strong>and</strong> may facilitate <strong>the</strong>development of promising drug c<strong>and</strong>idates that have <strong>the</strong> potentialto address unmet medical needs but were previously thought to beineligible for development because <strong>the</strong> current diagnostic methodsfall short in <strong>the</strong>ir ability to detect early development of renal injury.Implementation of <strong>the</strong>se biomarkers as exploratory biomarkers innature biotechnology volume 28 number 5 may 2010 461


perspective© 2010 Nature America, Inc. All rights reserved.clinical trials may also provide fur<strong>the</strong>r supportive evidence of <strong>the</strong>irclinical utility. It is expected that qualification efforts for o<strong>the</strong>r novelkidney biomarkers <strong>and</strong> biomarkers for monitoring <strong>the</strong> safety of o<strong>the</strong>rorgans will follow this first formal biomarker qualification.Author ContributionsMembers of <strong>the</strong> PSTC Nephrotoxicity Working Group compiling <strong>the</strong> submissionfor biomarker qualification: F.D., F.S., J.S.O., C.P.W., W.B., A.S., M.J.S., J.V., S.S.,D.L.G., J.A.P., G.M., K.C., D.L., E.H., M.S., D.E., D.H., D.A.-C., Y.-Z.G., K.L.T.,P.L.G., J.-M.V., S.T., D.B., D.R.-G., G.B., M.A.D., J.A., J.E.MD., L.S.-D., L.O., M.G.,M. Papaluca, S.J., E.A.B., S.A.B., V.G.B., N.C., J.W., D.H., S.S., J.L., P.R., E.W. <strong>and</strong>W.M.; members of <strong>the</strong> FDA Biomarker Qualification Review Team, reviewing <strong>the</strong>submission for biomarker qualification: F.G., D.J.-K., A.F.D., E.A.H., M.B., A.T.,P.H., D.T., S.X., W.T. <strong>and</strong> N.X.; members of <strong>the</strong> EMEA Biomarker QualificationReview Team, reviewing <strong>the</strong> submission for biomarker qualification: M. Papaluca,J.-M.V., E.A., R.M., S.V., B.F., B.S.L., P.K., M. Pasanen <strong>and</strong> K.P.DisclaimerThe views expressed in this article are <strong>the</strong> personal views of <strong>the</strong> authors <strong>and</strong> maynot be understood or quoted as being made on behalf of or reflecting <strong>the</strong> positionof <strong>the</strong> institutions, companies, <strong>the</strong> FDA <strong>and</strong> EMEA, or one of its committees orworking parties.COMPETING FINANCIAL INTERESTSThe authors declare competing financial interests: details accompany <strong>the</strong> full-textHTML version of <strong>the</strong> paper at http://www.nature.com/naturebiotechnology/.Published online at http://www.nature.com/naturebiotechnology/.Reprints <strong>and</strong> permissions information is available online at http://npg.nature.com/reprints<strong>and</strong>permissions/.1. Goodsaid, F. & Papaluca, M. Evolution of biomarker qualification at <strong>the</strong> health authorities.Nat. Biotechnol. 28, 441–443 (2010).2. Altar, C.A. et al. A prototypical process for creating evidentiary st<strong>and</strong>ards for biomarkers<strong>and</strong> diagnostics. Clin. Pharmacol. Ther. 83, 368–371 (2008).3. Ozer, J.S. A panel of urinary biomarkers to monitor reversibility of renal injury <strong>and</strong> aserum marker with improved potential to assess renal function. Nat. Biotechnol. 28,486–494 (2010).4. Varghese, S.A. et al. Urine biomarkers predict <strong>the</strong> cause of glomerular disease. J. Am.Soc. Nephrol. 18, 913–922 (2007).5. Comper, W.D., Hilliard, L.M., Nikolic-Paterson, D.J. & Russo, L.M. Disease-dependentmechanisms of albuminuria. Am. J. Physiol. Renal Physiol. 295, F1589–F1600 (2008).6. Trof, R.J., Di Maggio, F., Leemreis, J. & Groeneveld, A.B. Biomarkers of acute renalinjury <strong>and</strong> renal failure. Shock 26, 245–253 (2006).7. Dieterle, F. et al. Urinary clusterin, cystatin C, β2-microglobulin <strong>and</strong> total protein asmarkers to detect drug-induced kidney injury. Nat. Biotechnol. 28, 463–469 (2010).8. DeLong, E.R., DeLong, D.M. & Clarke-Pearson, D.L. Comparing <strong>the</strong> areas under twoor more correlated receiver operating characteristic curves: a nonparametric approach.Biometrics 44, 837–845 (1988).9. Harrell, F. Regression Modeling Strategies (Springer, New York, 2001).10. Lee, J.W. et al. Fit-for-purpose method development <strong>and</strong> validation for successful biomarkermeasurement. Pharm. Res. 23, 312–328 (2006).11. Sistare, F. et al. Towards consensus practices to qualify safety biomarkers for use inearly drug development. Nat. Biotechnol. 28, 446-454 (2010).12. Bonventre, J.V., Vaidya, V.S., Schmouder, R., Feig, P. & Dieterle, F. Next-generationbiomarkers for detecting kidney toxicity. Nat. Biotechnol. 28, 436-440 (2010).462 volume 28 number 5 may 2010 nature biotechnology


A r t i c l e sUrinary clusterin, cystatin C, β2-microglobulin <strong>and</strong> totalprotein as markers to detect drug-induced kidney injuryFrank Dieterle, Elias Perentes, André Cordier, Daniel R Roth, Pablo Verdes, Olivier Grenet, Serafino Pantano,Pierre Moulin, Daniel Wahl, Andreas Mahl, Peter End, Frank Staedtler, François Legay, Kevin Carl, David Laurie,Salah-Dine Chibout, Jacky Vonderscher & Gérard Maurer© 2010 Nature America, Inc. All rights reserved.Earlier <strong>and</strong> more reliable detection of drug-induced kidney injury would improve clinical care <strong>and</strong> help to streamline drugdevelopment.As <strong>the</strong> current st<strong>and</strong>ards to monitor renal function, such as blood urea nitrogen (BUN) or serum creatinine (SCr),are late indicators of kidney injury, we conducted ten nonclinical studies to rigorously assess <strong>the</strong> potential of four previouslydescribed nephrotoxicity markers to detect drug-induced kidney <strong>and</strong> liver injury. Whereas urinary clusterin outperformed BUN<strong>and</strong> SCr for detecting proximal tubular injury, urinary total protein, cystatin C <strong>and</strong> 2-microglobulin showed a better diagnosticperformance than BUN <strong>and</strong> SCr for detecting glomerular injury. Gene <strong>and</strong> protein expression analysis, in-situ hybridization <strong>and</strong>immunohistochemistry provide mechanistic evidence to support <strong>the</strong> use of <strong>the</strong>se four markers for detecting kidney injury to guideregulatory decision making in drug development. The recognition of <strong>the</strong> qualification of <strong>the</strong>se biomarkers by <strong>the</strong> EMEA <strong>and</strong> FDAwill significantly enhance renal safety monitoring.<strong>Drug</strong>-induced kidney injury frequently accounts for <strong>the</strong> loss of considerablemoney <strong>and</strong> time invested in drug development. Moreover,more sensitive <strong>and</strong> noninvasive approaches to monitor <strong>and</strong> managenephrotoxicity in clinical situations where kidney damage cannot beavoided would considerably improve on current options for patientcare. However, <strong>the</strong> current st<strong>and</strong>ards to detect nephrotoxicity, SCr <strong>and</strong>BUN, are insensitive <strong>and</strong> of poor diagnostic value 1 . Clearly, regulatoryapproval of better biomarkers for nephrotoxicity could not only helpclinicians to modify treatment regimes for patients, but should alsoaccelerate <strong>the</strong> rate with which much-needed drugs for many indicationswill enter <strong>the</strong> market 1,2 .Several renal safety biomarkers, mostly accessible as proteins in urine,have been proposed for monitoring renal safety in non-clinical settings<strong>and</strong> for human use 3–6 . None<strong>the</strong>less, although some reports demonstratethat <strong>the</strong>se <strong>new</strong> biomarkers appear earlier <strong>and</strong> are more sensitive thanBUN <strong>and</strong> SCr in specific situations, <strong>the</strong>y are not accepted for regulateddrug development, much less clinical use. As members of <strong>the</strong> CriticalPath Institute’s Predictive Safety Testing Consortium NephrotoxicityWorking Group 7 , we set out to address this issue by evaluating <strong>the</strong>potential of four previously described biomarkers of nephrotoxicity—urinary clusterin, urinary β2-microglobulin, urinary cystatin C <strong>and</strong>urinary total protein—for sensitive <strong>and</strong> specific detection of kidneytoxicity. These biomarkers monitor different renal functions <strong>and</strong> compartmentsof <strong>the</strong> kidney, which renders <strong>the</strong>m ideal when used on apanel to monitor renal safety on a broad basis, as different mechanismsof nephrotoxicity can affect different parts of <strong>the</strong> kidney 1 .The first of <strong>the</strong> biomarkers we studied, <strong>the</strong> secreted isoform ofclusterin, which is a 76-80 kDa glycosylated protein with extensivepost-translational modifications, such as glycosylation, cleavages<strong>and</strong> dimerization. In <strong>the</strong> context of kidney injury, clusterin has beensuggested to play an anti-apoptotic role <strong>and</strong> to be involved in cellprotection, lipid recycling, cell aggregation <strong>and</strong> cell attachment 8 .Clusterin gene overexpression was induced by different types of kidneyinjury in glomeruli, tubules <strong>and</strong> papillae of rats <strong>and</strong> dogs as aresult of drug nephrotoxicity 9–11 , surgery <strong>and</strong> ischemia 12–15 , <strong>and</strong> renaldiseases 16 . Changes of protein levels of clusterin have been observedin kidneys <strong>and</strong> in <strong>the</strong> urine in a few animal studies 11,12,15,16 , as wellas in human 17,18 . Although clusterin is expressed in several tissues,its molecular size prevents a filtration of clusterin in <strong>the</strong> kidney, thusrendering its urinary levels specific to kidney injury.The second marker we studied, total urinary protein, has been highlightedas a diagnostic marker <strong>and</strong> as a factor predicting progressiveloss of renal function in clinical <strong>and</strong> nonclinical contexts 19 . Increasedurinary excretion of protein, typically referred to as proteinuria,results from alterations of <strong>the</strong> glomerular filtration barrier usuallyassociated with damage of <strong>the</strong> glomerular podocytes 20 .β2-Microglobulin, <strong>the</strong> third biomarker we investigated, is a12-kDa polypeptide chain that is constantly syn<strong>the</strong>sized throughout<strong>the</strong> body. It is filtered by <strong>the</strong> glomeruli <strong>and</strong> nearly completelyreabsorbed <strong>and</strong> catabolized in <strong>the</strong> tubules so that only 0.3% of <strong>the</strong>filtered β2-microglobulin is found in <strong>the</strong> urine. Impairment oftubular uptake elevates urinary β2-microglobulin concentration upto several hundred fold. This occurs by two mechanisms. First, specificglomerular alterations, damage <strong>and</strong> disease cause high molecularweight protein leakage. This results in a high protein load in <strong>the</strong>tubules, which competes with tubular uptake of β2-microglobulinNovartis Institutes for BioMedical Research, Novartis, Basel, Switzerl<strong>and</strong>. Correspondence should be addressed to F.D. (frank.dieterle@novartis.com).Received 13 October 2009; accepted 22 March 2010; published online 10 May 2010; doi:10.1038/nbt.1622nature biotechnology VOLUME 28 NUMBER 5 MAY 2010 463


A rt i c l e s© 2010 Nature America, Inc. All rights reserved.<strong>and</strong> increases its excretion into urine. Second, tubular reabsorptioncan be directly impacted by treatment with drugs or caused by differenttubular diseases 21,22 .The last of <strong>the</strong> markers we studied is cystatin C, a non-glycosylatedlow-molecular protein of 13 kDa continuously produced by all nucleatedcells 23 . Cystatin C is directly filtered from blood in <strong>the</strong> glomerulus,<strong>and</strong> its serum levels are an ideal estimator of <strong>the</strong> glomerularfiltration rate 24–27 . Virtually all filtered cystatin C is reabsorbed <strong>and</strong>metabolized by <strong>the</strong> tubules (e.g., 99.5% in rats) 28 . An impairment ofreabsorption in proximal tubules by <strong>the</strong> same mechanisms describedfor urinary β2-microglobulin can lead to a several-hundred foldincrease of urinary levels in human <strong>and</strong> rats 28–31 .We developed <strong>and</strong> validated assays for <strong>the</strong>se biomarkers <strong>and</strong> measured<strong>the</strong>ir abundances in ten mechanistic time-course rat studies wi<strong>the</strong>ight nephrotoxicants known to induce different types of renal lesions<strong>and</strong> with two hepatotoxicants to investigate specificity. The data,results <strong>and</strong> interpretations we report were submitted to <strong>the</strong> EuropeanMedicines Agency (EMEA) <strong>and</strong> US Food <strong>and</strong> <strong>Drug</strong> Administration(FDA) to qualify urinary clusterin for monitoring drug-inducedproximal tubular injury, <strong>and</strong> urinary total protein, cystatin C <strong>and</strong>β2-microglobulin for monitoring drug-induced glomerular injury.We also report additional findings that support our underst<strong>and</strong>ingof <strong>the</strong> biomarker characteristics <strong>and</strong> <strong>the</strong>ir proposed contexts of use.These investigations were not part of <strong>the</strong> regulatory authority submissionmaterial, but were proposed during <strong>the</strong> regulatory review processto gain fur<strong>the</strong>r mechanistic supporting evidence.RESULTSWe assessed histopathology <strong>and</strong> levels of BUN <strong>and</strong> SCr for 739 animalsin <strong>the</strong> ten studies <strong>and</strong> used multiplexed protein assays to measureurinary biomarker levels. A variety of renal lesions were induced bytreatment with <strong>the</strong> eight nephrotoxicants, reflecting different modesof toxicity. In contrast, <strong>the</strong> two hepatotoxicants (alpha-naphtylisothiocyanate(ANIT) <strong>and</strong> <strong>the</strong> antihistamine methapyrilene) didnot induce kidney injury (Supplementary Table 1). The nephrotoxicdrugs comprise different drug classes such as aminoglycosides(gentamicin), antibiotics (puromycin <strong>and</strong> vancomycin), chemo<strong>the</strong>rapeutics(cisplatin <strong>and</strong> doxorubicin (Doxil, Adriamycin)), diuretics(furosemide (Lasix)), immunosuppressants (tacrolimus (Protopic,Prograf)) <strong>and</strong> lithium carbonate (Eskalith) used to treat bipolar disorders.Kidney injury was assessed by histopathology in which a systematicgrading system (grade 0–5) <strong>and</strong> a controlled lexicon was appliedto describe <strong>the</strong> types of lesions <strong>and</strong> <strong>the</strong>ir exact localization 32 .Diagnostic performance of urinary clusterin as proximaltubular injury biomarkerThe diagnostic performances of <strong>the</strong> biomarkers BUN <strong>and</strong> SCr weresummarized by receiver operating characteristic (ROC) curves, whichare plots of <strong>the</strong> true-positive rate (sensitivity) against <strong>the</strong> false-positiverate (1 – specificity) for a continuous variable (biomarker) against aspecific reference st<strong>and</strong>ard (histopathology). The corresponding areaunder <strong>the</strong> curve (AUC) is a measure of <strong>the</strong> diagnostic performanceof <strong>the</strong> corresponding biomarker, whereby a perfect biomarker correspondsto an AUC of 1, <strong>and</strong> a biomarker not better than r<strong>and</strong>omguessing corresponds to an AUC of 0.5 (ref. 32). Using <strong>the</strong> samples ofall studies, urinary clusterin had <strong>the</strong> highest diagnostic power (AUC of<strong>the</strong> exclusion analysis ROC of 0.88 (inclusion analysis, 0.85)) for <strong>the</strong>detection of proximal tubular injury <strong>and</strong> statistically outperformed<strong>the</strong> current peripheral st<strong>and</strong>ards SCr (0.73; inclusion analysis, 0.72)<strong>and</strong> BUN (0.79; inclusion analysis, 0.75) with P < 0.05 (Fig. 1 <strong>and</strong>Supplementary Tables 2 <strong>and</strong> 3). Additionally, when marker levels ofaSensitivitycSensitivity1.00.90.80.70.60.50.40.30.20.1000.10.20.31.00.90.80.70.60.50.40.30.20.1000.10.20.30.40.40.50.60.70.50.60.7R<strong>and</strong>omSCrBUNClu1 – specificityR<strong>and</strong>omSCrBUNClu1 – specificity0.80.91.00.80.91.00.5000.7330.7880.8770.5000.6690.7550.843animals with only low-grade kidney injury histopathology findingswere compared with those observed in control animals (grade 1<strong>and</strong> 2 versus grade 0, <strong>and</strong> grade 1 versus grade 0) urinary clusterinshowed a better diagnostic performance than BUN <strong>and</strong> SCr (Fig. 1<strong>and</strong> Supplementary Table 2). For a practical application with a decisionthreshold <strong>and</strong> associated sensitivity <strong>and</strong> specificity (<strong>and</strong> an easyassessment of false- <strong>and</strong> true-positive <strong>and</strong> false- <strong>and</strong> true-negativerates), a threshold for all three parameters was established to achievea minimum specificity of 95%. For 95.2% specificity, clusterin showeda sensitivity of 69.7% with a threshold of 1.85-fold increase (473 ng/gSCr), BUN showed a sensitivity of 50.8% for a threshold of 1.20-foldincrease, <strong>and</strong> SCr showed a sensitivity of 40.2% for a threshold of1.15-fold increase for <strong>the</strong> exclusion analysis.For a graphical assessment of <strong>the</strong> marker levels with respect to tubularinjury on an animal-by-animal basis for all 739 animals in <strong>the</strong> ten studies,<strong>the</strong> concentration levels of clusterin, BUN <strong>and</strong> SCr are displayed inFigure 2c–e, expressed as fold-changes. Clusterin levels generally correlatedwith <strong>the</strong> severity grades of proximal tubular injury. Compared toSCr <strong>and</strong> BUN, urinary clusterin detected most animals having proximaltubular injury (non-red dots above <strong>the</strong> threshold) in <strong>the</strong> cisplatin, gentamicin,vancomycin, tacrolimus, puromycin <strong>and</strong> doxorubicin studies.In addition, a number of treated animals (mainly low doses <strong>and</strong> earlytime-points) without proximal tubular injury observed by histopathologywere above <strong>the</strong> clusterin threshold, suggesting that increased urinaryclusterin levels occurred earlier <strong>and</strong> possibly with lower doses than positivehistopathology readouts. Systematically increased levels of clusterin<strong>and</strong> SCr for <strong>the</strong> high-dosed lithium group indicate that clusterin canalso be a sensitive marker for <strong>the</strong> drug-induced collecting-duct injuryin this study. Finally, clusterin did not show false-positive results in <strong>the</strong>specificity studies with <strong>the</strong> hepatotoxicants methapyrilene <strong>and</strong> ANIT, incontrast to SCr, whose false-positive serum elevation can be attributedto a drug-induced muscle breakdown.bSensitivity1.00.90.80.70.60.50.40.30.20.10d00.11.000.950.900.850.800.750.700.650.600.550.500.20.3CluSCrBUN0.40.50.60.7R<strong>and</strong>omSCrBUNClu1 – specificity0.80.91.00.5000.7260.7830.879All 0 to 2 0 <strong>and</strong> 1Histopathology grade subsetsFigure 1 Receiver operator characteristic (ROC) analyses for animals fromall ten rat studies demonstrating sensitivity <strong>and</strong> specificity of BUN, SCr<strong>and</strong> urinary clusterin with respect to a composite histopathology score fortubular injury. (a–c) Data included were from all histopathology grades (a),histopathology grades 0–2 (b) <strong>and</strong> histopathology grades 0 <strong>and</strong> 1 (c). (d)AUC of BUN, SCr <strong>and</strong> urinary clusterin compared to <strong>the</strong> ‘gold st<strong>and</strong>ard’histopathology, for <strong>the</strong> different histopathology grade subsets <strong>and</strong>corresponding st<strong>and</strong>ard errors represented by error bars. Animal numbers, n.Negative: n = 289. Positive: all, n = 132; 0 to 2, n = 129; 0 to 1, n = 94.464 VOLUME 28 NUMBER 5 MAY 2010 nature biotechnology


A rt i c l e s© 2010 Nature America, Inc. All rights reserved.Figure 2 Marker levels for proximal tubular injury.(a–e) Correlation of clusterin mRNA in kidney (a),of clusterin protein in kidney (b), of clusterinprotein in urine (c), of BUN (d) <strong>and</strong> of SCr (e) withseverity grades of histopathology for 739 animalsin all ten studies. All values are represented asfold-changes versus <strong>the</strong> average values of studymatched<strong>and</strong> time-matched control animals on alogarithmic scale (a–c) <strong>and</strong> on a linear scale (d,e).The animals are ordered by study, within eachstudy by dose group (with increasing doses) <strong>and</strong>within each dose group by termination time point(with increasing time). The symbols <strong>and</strong> <strong>the</strong>colors represent <strong>the</strong> histopathology readout forproximal tubular damage for <strong>the</strong> correspondinganimal, from grades 0–3, on a 5-grade scale, with0 denoting no histopathology finding observed.The magenta lines represent <strong>the</strong> thresholdsdetermined for 95% specificity in <strong>the</strong> ROCexclusion analysis of peripheral biomarkers for allhistopathology grades (1.854 for urinary cystatinC, 1.203 for BUN <strong>and</strong> 1.148 for SCr).Clusterin geneexpression in kidneyClusterin proteinexpression in kidneyClusterin proteinlevels in urineDiagnostic performance of urinary cystatin C,2-microglobulin <strong>and</strong> total protein to detect glomerular injuryWe observed drug-induced glomerular alterations in <strong>the</strong> puromycin<strong>and</strong> doxorubicin studies. For urinary cystatin C, β2-microglobulin,BUN <strong>and</strong> SCr, <strong>the</strong> results of <strong>the</strong> ROC analyses are displayed in Figure 3<strong>and</strong> in Supplementary Tables 3 <strong>and</strong> 4. In terms of AUC of <strong>the</strong> ROCanalysis, all three urinary biomarkers, cystatin C (0.92; inclusion, 0.90),β2-microglobulin (0.89; inclusion, 0.89), <strong>and</strong> total protein (0.86; inclusion,0.86) had a higher diagnostic power than <strong>the</strong> current st<strong>and</strong>ardsBUN (0.80; inclusion, 0.74) <strong>and</strong> SCr (0.53; inclusion, 0.55) for allhistopathology grades observed. All three urinary biomarkers showeda better diagnostic performance than <strong>the</strong> current st<strong>and</strong>ards when onlyanimals with grade 1 histopathology were compared with controlanimals (Fig. 3 <strong>and</strong> Supplementary Table 4). The three biomarkerssignificantly outperformed SCr (P < 0.05; inclusion <strong>and</strong> exclusionanalysis), <strong>and</strong> both urinary cystatin C <strong>and</strong> β2-microglobulin outperformedBUN in <strong>the</strong> inclusion analysis (Supplementary Table 3).The goals of <strong>the</strong> practical application of <strong>the</strong>se biomarkers with adecision threshold are twofold. First, <strong>the</strong>se markers should be appliedas sensitive tools to detect glomerular injury with subsequent impairmentof tubular reabsorption, especially as drug-induced glomerularaSensitivity00.10.20.30.40.50.60.70.80.91.0b1.01.00.90.90.80.80.70.70.6 0.60.5 0.50.4 R<strong>and</strong>om 0.500 0.4SCr 0.5320.3 BUN 0.800 0.30.2 Tot Prot 0.858 0.2B2-Micr 0.8890.1 Cystatin 0.915 0.10 0Sensitivity00.10.20.30.40.50.60.71 – specificity 1 – specificityabc1011011001010.1Cisplatin Vancomycin Puromycin Lithium MethapyrileneGentamicin Tacrolimus Doxorubicin Furosemide ANITR<strong>and</strong>om 0.500SCr 0.551BUN 0.762Tot Prot 0.828B2-Micr 0.865Cystatin C 0.8970.80.91.0c1.00.90.80.70.60.50.40.30.20.10dBUN levelseSerum creatinine levels432101232.22.01.81.61.41.21.00.8Cisplatin Vancomycin Puromycin Lithium MethapyrileneGentamicin Tacrolimus Doxorubicin Furosemide ANITdamage is often not reversible. Second, <strong>the</strong>se markers should also bespecific to glomerular injury with subsequent inhibition of tubularreabsorption in contrast to tubular injury alone. Therefore twodifferent thresholds were established determined by (i) a minimalspecificity of 99% <strong>and</strong> (ii) a minimal sensitivity of 85%. The correspondingthresholds, sensitivities <strong>and</strong> specificities are listed inSupplementary Table 4.Urinary total protein is <strong>the</strong> most specific marker (77.5% sensitivityfor 99.3% specificity, 1.90-fold threshold; 1.46 mg/g SCr threshold),whereas β2-microglobulin (91.7% specificity for 85.0% sensitivity,1.97-fold threshold; 17 µg/g SCr threshold) <strong>and</strong> cystatin C (89.0%specificity for 85.0% sensitivity, 1.60-fold threshold; 708 ng/g SCrthreshold) are <strong>the</strong> most sensitive markers (all values are for <strong>the</strong> exclusionanalysis). We also investigated whe<strong>the</strong>r <strong>the</strong> markers, when used inparallel with SCr <strong>and</strong> BUN, increased detection of glomerular injury,using likelihood ratio test statistics of logistic models of currentst<strong>and</strong>ards with <strong>and</strong> without <strong>the</strong> <strong>new</strong> biomarkers. Each of <strong>the</strong> threebiomarkers added significant information to <strong>the</strong> current st<strong>and</strong>ards(Supplementary Table 5; P < 0.05).Plots of all concentration levels of urinary cystatin C (Fig. 4b), urinaryβ2-microglobulin (Fig. 4d), urinary total protein (Fig. 5a), BUN(Fig. 5b) <strong>and</strong> SCr (Fig. 5c), of all 739 animalsin <strong>the</strong> ten studies show that all three urinarybiomarkers detected most of <strong>the</strong> druginducedglomerular alterations <strong>and</strong> damage(green <strong>and</strong> black dots above <strong>the</strong> thresholds)in <strong>the</strong> puromycin <strong>and</strong> doxorubicin study. InTot proteinSCrBUNβ2-MicrCystatin CAll 0 <strong>and</strong> 1Histopathology grade subsetsFigure 3 ROC exclusion analysis for animals from all ten studies demonstrating sensitivity <strong>and</strong>specificity of BUN, SCr, urinary cystatin C, urinary β2-microglobulin <strong>and</strong> urinary total protein withrespect to a composite histopathology score for glomerular alterations <strong>and</strong>/or damage. (a,b) Dataincluded were from all histopathology grades (a) <strong>and</strong> histopathology grade 0 to 1 (b). (c) AUC ofBUN, SCr, urinary cystatin C, urinary β2-microglobulin <strong>and</strong> urinary total protein compared to <strong>the</strong>‘gold st<strong>and</strong>ard’, histopathology for <strong>the</strong> different histopathology grade subsets <strong>and</strong> correspondingst<strong>and</strong>ard errors represented by error bars. Animal numbers, n. Negative: n = 291. Positive: all,n = 40; 0 to 1, n = 33.particular, <strong>the</strong> three markers detected all animalswith a grade 2 injury. In contrast, BUNdetected glomerular injury in <strong>the</strong> puromycinstudy but not in <strong>the</strong> doxorubicin study.As <strong>the</strong> plots also represent 332 animalswith lesions o<strong>the</strong>r than glomerular lesions,<strong>the</strong> specificity of <strong>the</strong> markers for glomerularlesions versus o<strong>the</strong>r lesions can be investigated.Total urinary protein was elevated onlyin <strong>the</strong> puromycin <strong>and</strong> doxorubicin studies,demonstrating its high specificity for glomerularinjury. Although <strong>the</strong>y had overall goodspecificity, cystatin C <strong>and</strong> β2-microglobulinnature biotechnology VOLUME 28 NUMBER 5 MAY 2010 465


A rt i c l e sFigure 4 Marker levels for glomerular injury.(a–d) Correlation of cystatin C protein inkidney (a), cystatin C protein in urine (b),β2-microglobulin protein in kidney (c) <strong>and</strong>β2-microglobulin in urine (d) with severitygrades of histopathology for 739 animals inten studies. All values are represented as inFigure 2, except that all are on a logarithmicscale. The animals are ordered as in Figure 2.The symbols <strong>and</strong> <strong>the</strong> colors represent <strong>the</strong>histopathology readout for glomerularalterations, similarly to Figure 2 for tubulardamage. The magenta lines represent <strong>the</strong>thresholds determined for 99% specificity in<strong>the</strong> ROC analyses of peripheral biomarkersfor all histopathology grades (3.108 foraCystatin C proteinexpression in kidneybCystatin C proteinlevels in urine1011001010.1Cisplatin Vancomycin Puromycin Lithium MethapyrileneGentamicin Tacrolimus Doxorubicin Furosemide ANITcB2-Mic. proteinexpression in kidneydB2-Mic. proteinlevels in urine1011001010.1Cisplatin Vancomycin Puromycin Lithium MethapyrileneGentamicin Tacrolimus Doxorubicin Furosemide ANITurinary cystatin C <strong>and</strong> 3.594 for urinary β2-microgolobulin) <strong>and</strong> <strong>the</strong> yellow lines represent <strong>the</strong> thresholds determined for 85% sensitivity in <strong>the</strong> ROCexclusion analyses of peripheral biomarkers for all histopathology grades (1.601 for urinary cystatin C <strong>and</strong> 1.966 for urinary β2-microglobulin).012© 2010 Nature America, Inc. All rights reserved.misclassified systematically <strong>the</strong> high-dosed samples from <strong>the</strong> gentamicinstudy bearing grade 2–4 tubular lesions. Histopathologyshowed mainly tubular necrosis, basophilia, hypertrophy, enlargement<strong>and</strong> hyaline droplet formation in <strong>the</strong> proximal convoluted tubules.Increased urinary β2-microglobulin levels during gentamicin treatmenthave already been reported for humans <strong>and</strong> rats 33,34 , <strong>and</strong> it isassumed that gentamicin causes an inhibition of protein reabsorptionby <strong>the</strong> tubules, similar to that induced by polycationic proteins. On <strong>the</strong>o<strong>the</strong>r h<strong>and</strong>, it has been suggested that increases of β2-microglobulinin urine after treatment with polycationic drugs such as gentamicinshould be interpreted with caution <strong>and</strong> may not necessarily indicatekidney injury 33,35,36 . It can be assumed that <strong>the</strong> increases in urinarylevels of cystatin C in animals dosed with gentamicin can be explainedsimilarly as owing to <strong>the</strong> same reabsorption <strong>and</strong> catabolism processesin <strong>the</strong> tubules.aTotal protein levels in urineb1014012Mechanistic investigations fur<strong>the</strong>r characterize clusterin as aproximal tubular injury biomarkerAs clusterin is a de novo expression marker in <strong>the</strong> kidney, it is possible tofollow it from mRNA expression in <strong>the</strong> kidney, to translation into protein<strong>and</strong> finally to excretion in urine. Clusterin mRNA <strong>and</strong> protein wereextracted from kidney, <strong>and</strong> protein levels in urine were measured <strong>and</strong>compared animal by animal. Using Pearson correlations (SupplementaryTable 6), a generally high correlation between all three entities wasobserved, demonstrating <strong>the</strong> close connection between <strong>the</strong> three levels(r = 0.66 for mRNA <strong>and</strong> kidney protein, r = 0.70 for kidney protein <strong>and</strong>urinary protein <strong>and</strong> 0.81 for mRNA <strong>and</strong> urinary protein).On an animal-by-animal basis, <strong>the</strong> levels of kidney clusterin mRNA,kidney clusterin protein, <strong>and</strong> urinary clusterin protein are displayed inFigure 2a–c, where <strong>the</strong> horizontally aligned dots belong to <strong>the</strong> same animal.An extremely good correlation between mRNA in kidney, protein inkidney <strong>and</strong> protein in urine demonstrates <strong>the</strong> tight link between <strong>the</strong> threelevels. In general, clusterin gene expression was slightly more sensitive<strong>and</strong> occurred earlier than clusterin protein expression in urine.Immunohistochemistry <strong>and</strong> in situ hybridization investigationswere performed to determine <strong>the</strong> localization of clusterin. In kidneysfrom untreated animals, clusterin was not detectable ei<strong>the</strong>r byimmunohistochemistry (Fig. 6a) or in situ hybridization (Fig. 6b).In kidneys from vancomycin-treated animals, a clear expression ofclusterin was found by immunohistochemistry (Fig. 6c) <strong>and</strong> in situhybridization (Fig. 6d) in <strong>the</strong> medullary rays <strong>and</strong> in <strong>the</strong> medulla, correspondingto <strong>the</strong> segments of <strong>the</strong> nephron showing drug-inducedlesions (S3 <strong>and</strong> thick ascending limb; Supplementary Table 1).BUN levelscSerum creatinine levels3212.22.01.81.61.41.21.00.8Cisplatin Vancomycin Puromycin Lithium MethapyrileneGentamicin Tacrolimus Doxorubicin Furosemide ANITFigure 5 Marker levels for glomerular injury. (a–c) Correlation of total proteinin urine (a), BUN (b) <strong>and</strong> SCr (c) with severity grades of histopathology for739 animals in ten studies. All values are represented as fold-changes versus<strong>the</strong> average values of study-matched <strong>and</strong> time-matched control animals ona logarithmic scale (a) <strong>and</strong> on a linear scale (b,c). The animals are orderedby study, within each study by dose-group (with increasing doses) <strong>and</strong> withineach dose-group by termination time point (with increasing time). Thesymbols <strong>and</strong> <strong>the</strong> colors represent <strong>the</strong> histopathology readout for glomerularalterations/damage for each animal (red = no histopathology findingobserved, green = grade 1, black = grade 2 on a 5-grade scale). The magentalines represent <strong>the</strong> thresholds determined for 99% specificity in <strong>the</strong> ROCanalyses for all histopathology grades (1.904 for urinary total protein, 1.293for BUN <strong>and</strong> 0.914 for SCr) <strong>and</strong> <strong>the</strong> yellow lines represent <strong>the</strong> thresholdsdetermined for 85% sensitivity in <strong>the</strong> ROC exclusion analyses of peripheralbiomarkers for all histopathology grades (0.872 for urinary total protein,0.919 for BUN <strong>and</strong> 1.401 for SCr). The unbiased ROC analysis treatsSCr as a marker negatively correlated with glomerular injury histopathologyscores to obtain an AUC > 0.5 (0.5 = r<strong>and</strong>om).466 VOLUME 28 NUMBER 5 MAY 2010 nature biotechnology


A rt i c l e sa b c de f k lFigure 6 Localization of clusterin, β2-microglobulin<strong>and</strong> cystatin C in kidneys. (a–d) Localizationof clusterin by immunohistochemistry(a,c) <strong>and</strong> in situ hybridization (b,d) in control(a,b) <strong>and</strong> vancomycin-treated (c,d) animals.(e–j) Localization of β2-microglobulin incontrol animals (e,f) <strong>and</strong> animals treated withpuromycin (g), gentamicin (h), doxorubicin (i) <strong>and</strong>vancomycin (j). (k–p) Localization of cystatin Cin control animals (k,l) <strong>and</strong> animals treated withpuromycin (m), gentamicin (n), doxorubicin (o)<strong>and</strong> vancomycin (p). Bars, 500 µm (a–d); 200 µm(e,k); 50 µm (f–j,l–p).© 2010 Nature America, Inc. All rights reserved.g h m ni j o pAlthough clusterin expression highly correlates with proximal tubularinjury, it is not solely expressed in <strong>the</strong> proximal tubules. Therefore,clusterin can be also a sensitive marker for lesions in o<strong>the</strong>r compartmentsof <strong>the</strong> nephron. This was additionally demonstrated byincreased urinary levels of clusterin in animals with lithium-inducedlesions of <strong>the</strong> collecting ducts (Fig. 2).Mechanistic investigations show urinary cystatin C <strong>and</strong>2-microglobulin are glomerular injury biomarkersCystatin C <strong>and</strong> β2-microglobulin protein levels were measured inkidney homogenates. As expected, <strong>the</strong> correlation between <strong>the</strong> proteincontent in <strong>the</strong> kidneys <strong>and</strong> <strong>the</strong> urinary levels were very weakfor both functional markers (r = 0.10 for cystatin C <strong>and</strong> r = 0.16 forβ2-microglobulin). Plots of <strong>the</strong> kidney protein levels on an animalby-animalbasis (Fig. 4a,c) <strong>and</strong> immunohistochemistry localization(Fig. 6e–p) support <strong>the</strong> observed urinary protein levels <strong>and</strong> <strong>the</strong>known molecular mechanisms of β2-microglobulin <strong>and</strong> cystatin C.In particular:1. There were increased urinary β2-microglobulin <strong>and</strong> cystatin C levelsowing to glomerular injury. In both studies with drug-induced glomerularlesions <strong>and</strong> increased urinary β2-microglobulin <strong>and</strong> cystatin C levels, <strong>the</strong>corresponding kidney protein content was slightly below <strong>the</strong> levels in<strong>the</strong> corresponding control animals (Fig. 4a,c). Immunohistochemistryinvestigations show a similar picture: in controlanimals, <strong>the</strong> normal immunoreactivity forβ2-microglobulin was found in <strong>the</strong> proximaltubule whereas <strong>the</strong> rest of <strong>the</strong> cortex was devoidof labeling (Fig. 6e). The subapical distributionof <strong>the</strong> labeling confirms <strong>the</strong> known mechanismof β2-microglobulin reabsorption by <strong>the</strong>sub-apical endosomes (Fig. 6f). Puromycintoxicity reduces <strong>the</strong> immunoreactivity ofβ2-microglobulin in <strong>the</strong> proximal tubulesheterogeneously according to <strong>the</strong> extent ofinjury (Fig. 6g). The severe changes inducedby doxorubicin in <strong>the</strong> proximal tubules wereaccompanied by a complete absence of immunoreactivityfor β2-microglobulin (Fig. 6i). Controlanimals showed <strong>the</strong> normal immunoreactivityfor cystatin C in <strong>the</strong> proximal tubules (Fig. 6k,l).Puromycin toxicity reduced <strong>the</strong> immunoreactivityof cystatin C in <strong>the</strong> proximal tubulesheterogeneously, causing diffuse focal labeling(Fig. 6m). Doxorubicin treatment resulted ina moderate reduction of immunoreactivity forcystatin C (Fig. 6o). In summary, both glomerulartoxicants reduced <strong>the</strong> content of both cystatinC <strong>and</strong> β2-microglobulin in <strong>the</strong> proximaltubules. This suggests a reabsorption deficiency due to protein overloadresulting in increased urinary levels of <strong>the</strong>se markers.2. There were increased urinary cystatin C <strong>and</strong> β2-microglobulin levelsupon gentamicin treatment. Cystatin C <strong>and</strong> β2-microglobulin proteincontents were systematically higher in <strong>the</strong> kidneys of gentamicin-treatedanimals, compared to controls, similarly to urinary levels (Fig. 4). Theintensity of β2-microglobulin <strong>and</strong> cystatin C immunoreactivity was globallyincreased in <strong>the</strong> cortex after gentamicin treatment (Fig. 6h,n). Thissuggests that <strong>the</strong> known reabsorption of gentamicin interferes not onlywith <strong>the</strong> reabsorption of <strong>the</strong>se small proteins, but also with <strong>the</strong>ir catabolism,resulting in increased β2-microglobulin <strong>and</strong> cystatin C proteinlevels in kidneys <strong>and</strong> urine.3. Changes in urinary β2-microglobulin <strong>and</strong> cystatin C levels werespecific for glomerular alterations. With <strong>the</strong> exception of <strong>the</strong> gentamicinstudy, no systematic increase of ei<strong>the</strong>r protein in ei<strong>the</strong>r urine or kidneywas observed in animals treated with tubular toxicants. As an example,<strong>the</strong> immunoreactivity of β2-microglobulin <strong>and</strong> of cystatin C was notchanged for an animal with severe tubular injury (vancomycin study,high-dosed animal; Fig. 6j,p).DISCUSSION<strong>Drug</strong>-induced kidney injury is a serious <strong>and</strong> not uncommon adverseevent in drug development. A number of <strong>new</strong> peripheral biomarkersnature biotechnology VOLUME 28 NUMBER 5 MAY 2010 467


A rt i c l e s© 2010 Nature America, Inc. All rights reserved.have been proposed to detect renal injury earlier than <strong>the</strong> currentst<strong>and</strong>ards. The lack of nonclinical <strong>and</strong> clinical qualification <strong>and</strong> acceptanceby regulatory authorities prevents <strong>the</strong> use of <strong>the</strong>se biomarkersto demonstrate <strong>the</strong> absence or presence of nephrotoxicity, to change<strong>the</strong> treatment regimens of patients (doses <strong>and</strong> schedules), terminatetreatment or to select individuals for treatment during drug development<strong>and</strong> clinical use. Hence, important lifesaving drugs with potentialinherent adverse effects on <strong>the</strong> kidney might not be developed <strong>and</strong>will never enter <strong>the</strong> market without <strong>the</strong> availability of improved toolsaccepted by regulatory authorities to monitor renal safety.We report <strong>the</strong> preclinical data, analyses, data evaluations <strong>and</strong> interpretationsbehind this submission for four of <strong>the</strong> seven urinary renalsafety biomarkers submitted to <strong>the</strong> FDA <strong>and</strong> EMEA as part of <strong>the</strong> firstformal application to gain acceptance of safety biomarkers for nonclinicaluse in regulatory animal studies as well as in early clinical studiesin a translational context. We demonstrate that urinary clusterinis a more sensitive marker than <strong>the</strong> current st<strong>and</strong>ards BUN <strong>and</strong> SCrto detect drug-induced proximal tubular injury. Urinary clusterin notonly outperforms SCr <strong>and</strong> BUN for <strong>the</strong> whole spectrum of severitygrades of tubular injury in all ten rat studies, but it also shows a betterdiagnostic performance for minimal tubular injury, which is <strong>the</strong> mostimportant point with respect to translational <strong>and</strong> clinical use. Besidesproposing clusterin as a proximal tubular safety biomarker, we present<strong>the</strong> submission data <strong>and</strong> analyses of three urinary biomarkers for detecting<strong>and</strong> monitoring drug-induced glomerular injury (with subsequentinhibition of tubular reabsorption). We show that all three biomarkersperform better in identifying animals with drug-induced glomerularinjury than <strong>the</strong> current st<strong>and</strong>ards, even when <strong>the</strong> intensity of <strong>the</strong> injuryis minimal, <strong>and</strong> that <strong>the</strong> use of all three markers—each taken on itsown—adds substantial information to <strong>the</strong> parallel use of SCr <strong>and</strong> BUN.We also show that <strong>the</strong> three markers are very specific to glomerularinjury <strong>and</strong> do not increase in <strong>the</strong> case of renal injury in <strong>the</strong> absence ofglomerular involvement with <strong>the</strong> sole exception of gentamicin treatmentwith its known tubular reabsorption interference.These data, analyses <strong>and</strong> results were submitted to <strong>the</strong> FDA <strong>and</strong>EMEA with <strong>the</strong> following proposed claims for <strong>the</strong> preclinical use of<strong>the</strong> four biomarkers. (i) Urinary clusterin can outperform <strong>and</strong> addinformation to BUN <strong>and</strong> SCr as an early diagnostic biomarker in rattoxicology studies of drug-induced acute kidney tubular alterations.(ii) Urinary clusterin is qualified for regulatory decision making asa biomarker that may be used by sponsors on a voluntary basis todemonstrate that drug-induced acute kidney tubular alterations aremonitorable in good laboratory practice (GLP) rat studies that areused to support <strong>the</strong> safe conduct of clinical trials. (iii) Total urinaryprotein, β2-microglobulin <strong>and</strong> urinary cystatin C outperform SCr <strong>and</strong>add information to BUN <strong>and</strong> SCr as early diagnostic biomarkers in rattoxicology studies of acute drug-induced glomerular alterations <strong>and</strong>/ordamage resulting in impairment of kidney tubular reabsorption.(iv) Total urinary protein, β2-microglobulin <strong>and</strong> urinary cystatin Care qualified for regulatory decision making as biomarkers that may beused by sponsors on a voluntary basis to demonstrate that acute druginducedglomerular alterations <strong>and</strong>/or damage resulting in impairmentof kidney tubular reabsorption can be monitored in GLP rat studiesthat are used to support <strong>the</strong> safe conduct of clinical trials.In addition, expert reviews of <strong>the</strong> clinical literature on urinary totalprotein, urinary cystatin C <strong>and</strong> urinary β2-microglobulin to detectdrug-induced renal injury <strong>and</strong> renal diseases in human toge<strong>the</strong>rwith a clinical implementation plan supports a fifth proposedclinical claim for <strong>the</strong> three biomarkers. (v) Total urinary protein,β2-microglobulin <strong>and</strong> urinary cystatin C can be considered qualifiedfor regulatory decision making as clinical bridging biomarkersappropriate for use by sponsors on a voluntary basis in phase 1 <strong>and</strong>2 clinical trials for monitoring kidney safety when animal toxicologyfindings generate a concern for glomerular alterations/damage withassociated tubular impairment.After assessment, both regulatory agencies came to <strong>the</strong>se conclusions 37 .1. The renal biomarkers submitted were acceptable in <strong>the</strong> context ofnonclinical drug development for detection of acute drug-inducedrenal toxicity.2. The renal biomarkers provide additional <strong>and</strong> complementary informationto <strong>the</strong> currently available st<strong>and</strong>ards.3. The use of renal biomarkers in clinical trials is to be considered ona case-by-case basis to ga<strong>the</strong>r fur<strong>the</strong>r data to qualify <strong>the</strong>ir usefulnessin monitoring drug-induced renal toxicity in humans.The regulatory authorities came to <strong>the</strong> same conclusions also for<strong>the</strong> o<strong>the</strong>r submitted kidney biomarkers, urinary albumin 38 , TFF3(ref. 38) <strong>and</strong> Kim-1 (ref. 39).The assessment by <strong>the</strong> regulatory authorities also identified some gapsin <strong>the</strong> data submitted. Among <strong>the</strong>se deficiencies was <strong>the</strong> absence of data,such as localization of markers at <strong>the</strong> mRNA <strong>and</strong> protein level, to establishsupporting mechanisms. These are now presented here. In addition,protein levels in <strong>the</strong> kidney for all four markers <strong>and</strong> gene expression inkidney for clusterin in all animals were determined to follow <strong>the</strong> ‘molecularlifetime’ of <strong>the</strong> biomarkers. These additional investigations deepened <strong>and</strong>confirmed <strong>the</strong> mechanistic underst<strong>and</strong>ing of <strong>the</strong>se markers, supporting <strong>the</strong>claims for nonclinical <strong>and</strong> clinical use. The four biomarkers described herecover two different renal toxicities, proximal tubular injury <strong>and</strong> glomerularinjury with subsequent impairment of tubular reabsorption. Monitoringboth types of injury is crucial, because on one h<strong>and</strong>, proximal tubularinjury is <strong>the</strong> most frequent direct or indirect drug-induced renal injury, <strong>and</strong>on <strong>the</strong> o<strong>the</strong>r h<strong>and</strong>, glomerular injury often lacks reversibility. As <strong>the</strong> <strong>new</strong>renal safety biomarkers are ei<strong>the</strong>r locally expressed or locally reabsorbedin contrast to <strong>the</strong> nonlocalized functional parameters BUN <strong>and</strong> SCr, only apanel of <strong>new</strong> biomarkers can provide all-encompassing safety monitoringof all compartments of <strong>the</strong> kidney in a more sensitive way than <strong>the</strong> currentst<strong>and</strong>ards. Therefore, <strong>the</strong> combination of clusterin with one or several of<strong>the</strong> glomerular injury biomarkers can be considered as <strong>the</strong> starting pointof a qualified biomarker panel for renal safety monitoring.The results of <strong>the</strong>se investigations <strong>and</strong> <strong>the</strong> recognition of <strong>the</strong>se biomarkersshould also be put in <strong>the</strong> context of <strong>the</strong> nonclinical <strong>and</strong> clinical useof <strong>the</strong>se biomarkers <strong>and</strong> published studies. Only urinary total proteinis a biomarker often used in nonclinical <strong>and</strong> in clinical studies, eventhough discussion of its clinical utility has been controversial owing to itswidespread qualitative use with test strips, which is often not appropriatefor sensitive monitoring. For <strong>the</strong> o<strong>the</strong>r four urinary biomarkers, veryfew nonclinical investigations have been reported, partly because of <strong>the</strong>lack of sensitive <strong>and</strong> widely available assays for assays used in preclinicalstudies (in contrast to commercially available sensitive assays for humanuse for all four biomarkers). In addition, no positive results for <strong>the</strong> clinicaluse of clusterin have been reported, <strong>and</strong> <strong>the</strong> clinical application ofurinary cystatin C has been limited to kidney diseases. Therefore webelieve that our systematic preclinical investigation, which resulted in<strong>the</strong> recognition of <strong>the</strong> nonclinical <strong>and</strong> clinical utility for a translationalcontext, will open a <strong>new</strong> chapter for use of <strong>the</strong>se renal biomarkers.The availability of sensitive <strong>and</strong> analytically validated rat protein assaysfor urinary cystatin C, β2-microglobulin <strong>and</strong> clusterin toge<strong>the</strong>r with <strong>the</strong>extensive qualification data will facilitate <strong>the</strong>ir preclinical use. The presentedqualification data can be seen as application rules for <strong>the</strong>se biomarkers,in relation to <strong>the</strong> context of pathology, thresholds with associatedsensitivity <strong>and</strong> specificity. This will render questions such as “Is a twofoldincrease significant?” unnecessary. The qualification of <strong>the</strong> biomarkers forGLP rat studies <strong>and</strong> <strong>the</strong>ir clinical utility in drug development studies in a468 VOLUME 28 NUMBER 5 MAY 2010 nature biotechnology


A rt i c l e s© 2010 Nature America, Inc. All rights reserved.case-by-case translational context are of high interest for drug development.The absence, presence <strong>and</strong> development of glomerular <strong>and</strong> tubularinjury can now be detected with <strong>the</strong>se sensitive <strong>and</strong> qualified tools. Inaddition, <strong>the</strong>se biomarkers offer <strong>the</strong> potential to enable development ofpromising <strong>the</strong>rapies that may o<strong>the</strong>rwise be ab<strong>and</strong>oned because of reservationsabout being able to adequately detect acute renal injury earlywith current st<strong>and</strong>ards. If a sponsor can demonstrate that drug-inducedtubular or glomerular injury of a <strong>new</strong> compound can be detected at anearly stage <strong>and</strong> managed preclinically with <strong>the</strong> help of <strong>the</strong>se <strong>new</strong> biomarkers,he may be able to bring this c<strong>and</strong>idate on a case-by-case basis intoearly clinical trials using <strong>the</strong> <strong>new</strong> biomarkers to monitor human renalsafety. Thus, <strong>the</strong> clinical utility of this compound could be evaluatedin humans, along with <strong>the</strong> question whe<strong>the</strong>r early-stage nephrotoxicityobserved preclinically also occurs in human subjects.With <strong>the</strong> formal qualification of <strong>the</strong>se <strong>new</strong> safety biomarkers <strong>and</strong> with<strong>the</strong> availability of validated assays, it is fully expected that <strong>the</strong>ir preclinical<strong>and</strong> clinical use in drug development programs will increase, fur<strong>the</strong>rsupplementing <strong>the</strong> nonclinical <strong>and</strong> clinical evidence of <strong>the</strong>ir utility butalso fur<strong>the</strong>r highlighting <strong>the</strong>ir limitations. Clinically, <strong>the</strong> biomarkers willallow a timely intervention for an optimized <strong>the</strong>rapy on an individualbasis (personalized medicine) <strong>and</strong> may find an application in daily clinicalcare for identifying <strong>and</strong> staging renal diseases <strong>and</strong> renal impairment.MethodsMethods <strong>and</strong> any associated references are available in <strong>the</strong> online versionof <strong>the</strong> paper at http://www.nature.com/naturebiotechnology/.Note: Supplementary information is available on <strong>the</strong> Nature Biotechnology website.AcknowledgmentsS. Leuillet <strong>and</strong> B. Palate from CIT are acknowledged for performing <strong>the</strong> in-lifestudies <strong>and</strong> <strong>the</strong> histopathology assessment. J. Mapes <strong>and</strong> D. Eisinger fromRules-Based Medicine are acknowledged for <strong>the</strong> development of <strong>the</strong> luminexassays. We thank <strong>the</strong> D. Moor <strong>and</strong> P. Brodmann from Biolytix for <strong>the</strong> validation<strong>and</strong> measurements of <strong>the</strong> RT-PCR assays.AUTHOR CONTRIBUTIONSF.D. supervised <strong>the</strong> project, performed <strong>the</strong> data analysis <strong>and</strong> prepared <strong>the</strong> manuscript,E.P. designed <strong>the</strong> studies, supervised <strong>the</strong> histopathology <strong>and</strong> edited <strong>the</strong> manuscript,A.C. designed <strong>the</strong> studies <strong>and</strong> edited <strong>the</strong> manuscript, D.R.R. designed <strong>the</strong> studies,supervised <strong>the</strong> histopathology <strong>and</strong> edited <strong>the</strong> manuscript, P.V. performed <strong>the</strong> dataanalyses <strong>and</strong> edited <strong>the</strong> manuscript, O.G. performed <strong>the</strong> genomic analyses, S.P.performed <strong>the</strong> in situ hybridization <strong>and</strong> immunohistochemistry analyses <strong>and</strong> edited<strong>the</strong> manuscript, P.M. performed <strong>the</strong> in situ hybridization <strong>and</strong> immunohistochemistry<strong>and</strong> edited <strong>the</strong> manuscript, D.W. designed <strong>the</strong> database, A.M. designed <strong>the</strong> studies<strong>and</strong> edited <strong>the</strong> manuscript, P.E. performed <strong>the</strong> protein extraction <strong>and</strong> edited <strong>the</strong>manuscript, F.S. designed <strong>the</strong> studies <strong>and</strong> edited <strong>the</strong> manuscript, F.L. supervised<strong>the</strong> protein analyses, K.C. performed <strong>the</strong> regulatory submission <strong>and</strong> edited <strong>the</strong>manuscript, D.L. performed <strong>the</strong> regulatory submission <strong>and</strong> edited <strong>the</strong> manuscript,S.-D.C. supervised <strong>the</strong> project, J.V. supervised <strong>the</strong> project <strong>and</strong> G.M. supervised <strong>the</strong>project, designed <strong>the</strong> studies <strong>and</strong> edited <strong>the</strong> manuscript.COMPETING FINANCIAL INTERESTSThe authors declare competing financial interests: details accompany <strong>the</strong> full-textHTML version of <strong>the</strong> paper at http://www.nature.com/naturebiotechnology/.Published online at http://www.nature.com/naturebiotechnology/.Reprints <strong>and</strong> permissions information is available online at http://npg.nature.com/reprints<strong>and</strong>permissions/.1. Vaidya, V.S. et al. Next-generation biomarkers for detecting kidney toxicity.Nat. Biotechnol. 28, 436–440 (2010).2. Chertow, G.M., Burdick, E., Honour, M., Bonventre, J.V. & Bates, D.W. Acute kidneyinjury, mortality, length of stay, <strong>and</strong> costs in hospitalized patients. J. Am. Soc.Nephrol. 16, 3365–3370 (2005).3. Dieterle, F. et al. Monitoring kidney safety in drug development: emerging technologies<strong>and</strong> <strong>the</strong>ir implications. Curr. Opin. <strong>Drug</strong> Discov. Devel. 11, 60–71 (2008).4. Parikh, C.R. & Devarajan, P. New biomarkers of acute kidney injury. Crit. Care Med.36 Suppl, S159–S165 (2008).5. Endre, Z.H. & Westhuyzen, J. Early detection of acute kidney injury: emerging <strong>new</strong>biomarkers. Nephrology 13, 91–98 (2008).6. Ferguson, M.A., Vaidya, V.S. & Bonventre, J.V. Biomarkers of nephrotoxic acutekidney injury. Toxicology 245, 182–193 (2008).7. Mattes, W.B. & Walker, E.G. Translational toxicology <strong>and</strong> <strong>the</strong> work of <strong>the</strong> predictivesafety testing consortium. Clin. Pharmacol. Ther. 85, 327–330 (2009).8. Rosenberg, M.E. & Silkensen, J. Clusterin <strong>and</strong> <strong>the</strong> kidney. Exp. Nephrol. 3, 9–14(1995).9. Kharasch, E.D., Schroeder, J.L., Bammler, T., Beyer, R. & Srinouanprachanh, S.Gene expression profiling of nephrotoxicity from <strong>the</strong> sevoflurane degradation productfluoromethyl-2,2-difluoro-1-(trifluoromethyl)vinyl e<strong>the</strong>r (“compound A”) in rats.Toxicol. Sci. 90, 419–431 (2006).10. Rached, E. et al. Evaluation of putative biomarkers of nephrotoxicity after exposureto ochratoxin a in vivo <strong>and</strong> in vitro. Toxicol. Sci. 103, 371–381 (2008).11. Correa-Rotter, R. et al. Induction of clusterin in tubules of nephrotic rats. J. Am.Soc. Nephrol. 9, 33–37 (1998).12. Tsuchiya, Y. et al. Investigation on urinary proteins <strong>and</strong> renal mRNA expression incanine renal papillary necrosis induced by nefiracetam. Arch. Toxicol. 79, 500–507(2005).13. Yoshida, T. et al. Monitoring changes in gene expression in renal ischemiareperfusionin <strong>the</strong> rat. Kidney Int. 61, 1646–1654 (2002).14. Correa-Rotter, R., Hostetter, T.H., Manivel, J.C., Eddy, A.A. & Rosenberg, M.E.Intrarenal distribution of clusterin following reduction of renal mass. Kidney Int.41, 938–950 (1992).15. Ishii, A., Sakai, Y. & Nakamura, A. Molecular pathological evaluation of clusterinin a rat model of unilateral ureteral obstruction as a possible biomarker ofnephrotoxicity. Toxicol. Pathol. 35, 376–382 (2007).16. Hidaka, S., Kränzlin, B., Gretz, N. & Witzgall, R. Urinary clusterin levels in <strong>the</strong> ratcorrelate with <strong>the</strong> severity of tubular damage <strong>and</strong> may help to differentiate betweenglomerular <strong>and</strong> tubular injuries. Cell Tissue Res. 310, 289–296 (2002).17. Rosenberg, M.E. & Silkensen, J. Clusterin: physiologic <strong>and</strong> pathophysiologicconsiderations. Int. J. Biochem. Cell Biol. 27, 633–645 (1995).18. Ghiggeri, G.M. et al. Depletion of clusterin in renal diseases causing nephroticsyndrome. Kidney Int. 62, 2184–2194 (2002).19. D’Amico, G. & Bazzi, C. Pathophysiology of proteinuria. Kidney Int. 63, 809–825(2003).20. Shankl<strong>and</strong>, S.J. The podocyte’s response to injury: role in proteinuria <strong>and</strong>glomerulosclerosis. Kidney Int. 69, 2131–2147 (2006).21. Thielemans, N., Lauwerys, R. & Bernard, A. Competition between Albumin <strong>and</strong>low-molecular-weight proteins for renal tubular uptake in experimental nephropathies.Nephron 66, 453–458 (1994).22. Gatanaga, H. et al. Urinary beta2-microglobulin as a possible sensitive marker forrenal injury caused by tenofovir disoproxil fumarate. AIDS Res. Hum. Retroviruses22, 744–748 (2006).23. Mussap, M. & Plebani, M. Biochemistry <strong>and</strong> clinical role of human cystatin C. Crit.Rev. Clin. Lab. Sci. 41, 467–550 (2004).24. Madero, M., Sarnak, M.J. & Stevens, L.A. Serum cystatin C as a marker of glomerularfiltration rate. Curr. Opin. Nephrol. Hypertens. 15, 610–616 (2006).25. Dharnidharka, V.R., Kwond, C. & Stevens, G. Serum cystatin C is superior to serumcreatinine as a marker of kidney function: a meta-analysis. Am. J. Kidney Dis. 40,221–226 (2002).26. Shlipak, M.G., Praught, M.L. & Sarnak, M.J. Update on cystatin C: <strong>new</strong> insightsinto <strong>the</strong> importance of mild kidney dysfunction. Curr. Opin. Nephrol. Hypertens.15, 270–275 (2006).27. Herget-Rosenthal, S. et al. Early detection of acute renal failure by serum CystatinC. Kidney Int. 66, 1115–1122 (2004).28. Tenstad, O., Roald, A.B., Grubb, A. & Aukl<strong>and</strong>, K. Renal h<strong>and</strong>ling of radiolabelledhuman cystatin C in <strong>the</strong> rat. Sc<strong>and</strong>. J. Clin. Lab. Invest. 56, 409–414 (1996).29. Collé, A., Tavera, C., Laurent, P., Leung-Tack, J. & Girolami, J.P. Direct radioimmunoassayof rat cystatin C: increased urinary excretion of this cysteine proteases inhibitor duringchromate nephropathy. J. Immunoassay 11, 199–214 (1990).30. Herget-Rosenthal, S., van Wikj, J. & Bröcker-Preuss, M. Increased urinary cystatin Creflects structural <strong>and</strong> functional renal tubular impairment independent of glomerularfiltration rate. Clin. Biochem. 40, 946–951 (2007).31. Conti, M. et al. Urinary cystatin C as a specific marker of tubular dysfunction. Clin.Chem. Lab. Med. 44, 288–291 (2006).32. Sistare, F.D. et al. Towards consensus practices to qualify safety biomarkers for usein early drug development. Nat. Biotechnol. 28, 446–454 (2010).33. Bernard, A., Viau, C., Ouled, A., Tulkens, P. & Lauwerys, R. Effects of gentamicinon <strong>the</strong> renal uptake of endogenous <strong>and</strong> exogenous protein in conscious rats. Toxicol.Appl. Pharmacol. 84, 431–438 (1986).34. Rybak, M.J., Frankowski, J.J., Edwards, D.J. & Albrecht, L.M. Alanine aminopeptidase<strong>and</strong> beta 2-microglobulin excretion in patients receiving vancomycin <strong>and</strong> gentamicin.Antimicrob. Agents Chemo<strong>the</strong>r. 31, 1461–1464 (1987).35. Trollfors, B., Bergmark, J., Hiesche, K. & Jagenburg, R. Urinary alanineaminopepticase <strong>and</strong> β2-microglobulin as measurements of aminoglycosideassociatedrenal impairment. Infection 12, 20–22 (1984).36. Kaye, W.A. et al. The significance of beta-2 microglobulinuria associated withgentamicin <strong>the</strong>rapy. Ann. Clin. Lab. Sci. 11, 530–537 (1981).37. Biomarker Website, E.M.E.A. .38. Gerhold, D.L. et al. Urinary biomarkers trefoil factor 3 <strong>and</strong> albumin enable earlydetection of kidney tubular injury. Nat. Biotechnol. 28, 470–477 (2010).39. Vaidya, V.S. et al. Kidney injury molecule-1 outperforms traditional biomarkers ofkidney injury in preclinical biomarker qualification studies. Nat. Biotechnol. 28,478–485 (2010).nature biotechnology VOLUME 28 NUMBER 5 MAY 2010 469


© 2010 Nature America, Inc. All rights reserved.ONLINE METHODSIn-life study design. Ten specifically designed studies to generate dose- <strong>and</strong> timedependentnephrotoxicity <strong>and</strong> hepatotoxicity with eight nephrotoxicants <strong>and</strong>two hepatotoxicants were conducted. All studies followed a generic study designwith specific details <strong>and</strong> differences listed in <strong>the</strong> Supplementary Table 7.For each study, three groups of 24 Wistar Han males rats (11- to 12-weeksold) were allocated to three different dose groups (Supplementary Table 7).An additional group of 24 males receiving <strong>the</strong> vehicle under <strong>the</strong> same experimentalconditions served as control group. The study schedule included adaily check for mortality <strong>and</strong> clinical signs <strong>and</strong> regular body weight <strong>and</strong> foodconsumption records. The human dosing route was <strong>the</strong> preferred route whenfeasible except for a few compounds with historical internal experience. AllNovartis studies were performed at <strong>the</strong> contract research organization CIT incompliance with Animal Health regulations, in particular with <strong>the</strong> CouncilDirective No. 86/609/EEC of 24th November 1986.Blood collection, urine collection <strong>and</strong> clinical chemistry. On completion of<strong>the</strong> treatment periods, six non-fasted animals per group at each time point(four termination time points) were sampled for blood biochemistry, urinalysis<strong>and</strong> histopathology, 3 h after dosing. In fasted animals, urine was collectedinto tubes from 2:00 p.m. to 8:00 p.m. <strong>and</strong> from 8:00 p.m. to 6:00 a.m.on <strong>the</strong> days listed in Supplementary Table 7. The urine was collected at ~4 °Cduring <strong>the</strong> collection period. The collected urine fractions were split into2-ml aliquots <strong>and</strong> centrifuged at 4 °C for 30 min at 10,000g. Urinalysis <strong>and</strong>urinary biomarker analysis (separate aliquots) were subsequently performedon <strong>the</strong> samples collected overnight. The maximum blood volume (at least5 ml) was taken from <strong>the</strong> retro-orbital sinus of <strong>the</strong> animals, im<strong>media</strong>tely beforescheduled necropsy, under light isoflurane anes<strong>the</strong>sia, <strong>and</strong> collected into tubescontaining <strong>the</strong> appropriate anticoagulant (heparin tubes for blood biochemistry<strong>and</strong> Na 2 EDTA tubes for biomarker assays).Clinical chemistry analyses of urine <strong>and</strong> blood were performed with anAdvia 1650 device (Bayer). A complete validation of <strong>the</strong> clinical chemistryanalyses was performed evaluating <strong>the</strong> repeatability (replicate analysis of eachparameter, under <strong>the</strong> same conditions on a single day) <strong>and</strong> reproducibility(replicate analysis on different days). The validation resulted in <strong>the</strong> followingworking ranges <strong>and</strong> coefficients of variation (CV) for <strong>the</strong> reproducibility for<strong>the</strong> listed assays (only <strong>the</strong> highest CVs in <strong>the</strong> working range are reported):BUN: Advia 1650/Urease UV (Bayer); 0.33–116 mmol/l ; 4.0%SCr: Advia 1650/Jaffe (Bayer); 0–2210 µmol/l; 4.5%Urinary creatinine: Advia 1650/Jaffe (Bayer); 1500–14140 µmol/l; 5.0%Total protein: Advia 1650/Pyrogallol red (Bayer) 0.05–4 g/l; 7.0%Histopathology. The animals were euthanized <strong>and</strong> submitted to a macroscopicpost-mortem examination. At necropsy, kidneys, liver <strong>and</strong> brain were takenfrom all animals, weighed <strong>and</strong> fixed in neutral phosphate-buffered formalin.The right kidneys of all animals <strong>and</strong> <strong>the</strong> liver of animals of <strong>the</strong> control <strong>and</strong>high-dose groups were processed histologically <strong>and</strong> examined microscopically.Additionally <strong>the</strong> livers of <strong>the</strong> low-dose <strong>and</strong> mid-dose groups were processedhistologically <strong>and</strong> examined microscopically in <strong>the</strong> two hepatotoxicant studies.The kidney evaluation was done according to <strong>the</strong> Predictive Safety TestingConsortium (PSTC) Nephrotoxicity Working Group histopathology lexicon<strong>and</strong> scoring system (localized lesions with a 5-grade system 32 ). The histopathologyassessment was performed by an experienced toxicology pathologistat CIT <strong>and</strong> peer reviewed by ano<strong>the</strong>r experienced pathologist. Subsequently,about 30% of <strong>the</strong> slides were reviewed by an experienced Novartis toxicologypathologist. If deemed necessary, discrepancies were discussed <strong>and</strong> resolved bydiscussion between all involved Novartis <strong>and</strong> CIT pathologists. All pathologistswere blinded to any biomarker results. To assess <strong>the</strong> performance of clusterinas a proximal tubular injury marker, <strong>the</strong> lesions of <strong>the</strong> type “tubular degeneration,”“necrosis,” “apoptosis” <strong>and</strong> “cell sloughing” in <strong>the</strong> segments S1 to S3were integrated into <strong>the</strong> histopathology composite category “proximal tubularinjury” using <strong>the</strong> highest grade of each lesion. The drug-induced glomerularalterations encompassing a spectrum of lesions (“mesangial cell proliferation/enlargement,” “glomerular vacuolation,” <strong>and</strong> “interstitial Bowman’s capsulefibrosis”) were integrated into <strong>the</strong> composite histopathology category “glomerularalterations/damage”. Representative photos of typical lesions observed areavailable in <strong>the</strong> Supplementary Data.Development, validation <strong>and</strong> evaluation of biomarker assays. The techniqueused for performing <strong>the</strong> clusterin, β2-microglobulin <strong>and</strong> cystatin C assays in<strong>the</strong> present investigation used xMAP technology from Luminex Corporationto perform high-content immunoassay panels known as Multi-Analyte Profiles(MAPs). Briefly, xMAP uses fluorescently addressable microsphere sets that are“dyed” into 100 distinct sets. Each set carries <strong>the</strong> reactants of an immunoassayon <strong>the</strong>ir surface. Each assay results in <strong>the</strong> formation of a fluorescent complexof antigen <strong>and</strong> antibody on <strong>the</strong> microsphere’s surface which is measured byexcitation/emission with a green laser as it passes through an interrogationzone in a liquid flow cell. In <strong>the</strong> same zone, a red laser excites <strong>the</strong> fluorescentdyes impregnated within <strong>the</strong> microsphere, which identifies <strong>the</strong> microsphereset <strong>and</strong> hence <strong>the</strong> assay in question.The cystatin C <strong>and</strong> clusterin assays were developed by Rules-Based Medicineas part of a 3-analyte multiplex panel toge<strong>the</strong>r with osteopontin. The β2-microglobulin assay (competitive assay design) was established as part of a7-analyte multiplex toge<strong>the</strong>r with GST-α, GST-µ, VEGF, podocin, lipocalin-2<strong>and</strong> Timp-1. The multiplex panels were validated for measurements of urine,serum <strong>and</strong> protein extracts from kidney. Capture antibodies (Upstate,06-458 for <strong>the</strong> cystatin C; R&D Systems, AF2747 for clusterin; <strong>and</strong> Antibody byDesign, AbyD 03403 for β2-microglobulin) were covalently coupled through<strong>the</strong>ir free amino groups by st<strong>and</strong>ard EDC/NHS chemistry onto one set of <strong>the</strong>fluorescently encoded, carboxylated microspheres as per st<strong>and</strong>ard operatingprocedure (SOP). Detection of antibodies specific for each of <strong>the</strong> analytes in<strong>the</strong> multiplexes (R&D Systems, AF1238, 6 µg/ml for cystatin C; R&D Systems,BAF2747, 6 µg/ml for clusterin, proprietary custom-made detection antibodies<strong>and</strong> competitive antibodies from Antibody by Design for β2-microglobulin)were biotinylated using an NHS-biotin procedure as per SOP. The biomarkerassays are available via Rules-Based Medicine.The validation of <strong>the</strong> assay followed accepted procedures recommendedin The Bioanalytical Method Validation Guidance for Industry (http://www.fda.gov/downloads/<strong>Drug</strong>s/GuidanceComplianceRegulatoryInformation/Guidances/UCM070107.pdf) with <strong>the</strong> exception that (i) for <strong>the</strong> accuracy at<strong>the</strong> lower limit of quantification (LLOQ) <strong>and</strong> upper limit of quantification(ULOQ) a mean deviation of 30% instead of <strong>the</strong> recommended 20% was used<strong>and</strong> (ii) for <strong>the</strong> accuracy in between LLOQ <strong>and</strong> ULOQ a mean deviation of20% instead of 15% was accepted for <strong>the</strong> quality controls. The assay validationcovered inter-day, inter-operator, <strong>and</strong> inter-instrument reproducibility,linearity, parallelism, spike recovery, freeze-thaw stability (3 cycles), short-termstability (1 h to 24 h at 25 °C), long-term stability (5 weeks, 3 months), matrixinterferences <strong>and</strong> cross-reactivity (against each of <strong>the</strong> 66 antigens measuredin <strong>the</strong> Rules-Based Medicine rodent MAP version 1.6 ). In <strong>the</strong> workingrange between LLOQ <strong>and</strong> ULOQ, at least 66% of all determinations foreach validation criterion cited above had to fulfill <strong>the</strong> acceptance criteria. Inparticular, <strong>the</strong> following working ranges were determined:Clusterin in urine: 0.023 µg/ml–11 µg/mlClusterin in extracts from homogenates: 0.33 µg/ml–120 µg/mlCystatin C in urine: 1.5 ng/ml–839 ng/mlCystatin C in extracts from homogenates: 8 ng/ml–4500 ng/mlβ2-Microglobulin in urine: 0.56 µg/ml–58 µg/mlβ2-Microglobulin in extracts from homogenates: 0.72 µg/ml–250 µg/mlAll biomarkers proved to be stable in short-term (at 25 °C) <strong>and</strong> <strong>the</strong> long-term(1 year at −70 °C) stability tests <strong>and</strong> for three freeze-thaw cycles.For <strong>the</strong> measurements of <strong>the</strong> studies, samples were thawed <strong>and</strong> centrifugedat 6,000g for 10 min. Using automated pipetting, each sample was introducedinto one of <strong>the</strong> capture microsphere multiplexes. These mixtures were thoroughlymixed <strong>and</strong> incubated at 25 °C for 1 h. A multiplexed cocktail of biotinylatedreporter antibodies was <strong>the</strong>n added, thoroughly mixed <strong>and</strong> incubatedfor 1 h at 25 °C. Assays were <strong>the</strong>n developed using an excess of streptavidinphycoerythrin,which was evenly mixed into each multiplex <strong>and</strong> incubatedfor 1 additional hour at 25 °C. The volume of each multiplexed reaction wasreduced by vacuum filtration <strong>and</strong> <strong>the</strong> reaction mixture increased by dilutioninto matrix buffer for analysis (50-fold dilution for cystatin C <strong>and</strong> clusterinin urine, twofold for β2-microglobulin in urine, 500-fold for clusterin <strong>and</strong>cystatin C in homogenates, fivefold for β2-microglobulin in homogenates).The analysis was performed in a Luminex 100 instrument <strong>and</strong> <strong>the</strong> resultingdata stream was interpreted using proprietary data analysis software developednature biotechnologydoi:10.1038/nbt.1622


© 2010 Nature America, Inc. All rights reserved.at Rules-Based Medicine. For each multiplex, calibrators <strong>and</strong> quality controlswere included on each microtiter plate (R&D Systems 1238pi as antigen st<strong>and</strong>ardfor <strong>the</strong> cystatin C assay, Biovendor RD372034100 for <strong>the</strong> clusterin assay,<strong>and</strong> custom-made antigen by Antibody by Design for <strong>the</strong> β-microglobulinassay). Eight-point calibrators were assayed in <strong>the</strong> first <strong>and</strong> last column ofeach plate, <strong>and</strong> quality controls at 3 or 4 concentration levels spanning <strong>the</strong>complete working range were included in duplicate. A value for each of <strong>the</strong>analytes localized in a specific multiplex was determined using an RBM customwritten data analysis package available commercially from Qiagen.Protein extractions from kidneys. A snap-frozen half of <strong>the</strong> kidney (upperpart of <strong>the</strong> left kidney) was rapidly weighed (a minimum of 300 mg <strong>and</strong> amaximum of 700 mg). Lysis buffer (20 mM Tris-HCL, pH 7.4, 1% CellLytic-M(Sigma), Roche protease inhibition cocktail “complete” (1 tablet for 50 mlbuffer) <strong>and</strong> 0.7 µg/ml pepstatin) was added at a ratio of 1 g tissue to 9 mlbuffer. The tissue was never allowed to thaw before buffer was added. As soonas <strong>the</strong> buffer had been added, <strong>the</strong> tissue was homogenized using a Polytrontype homogenizer with a saw-tooth generator. During <strong>the</strong> homogenizationprocess, <strong>the</strong> tube was kept submersed in a water ice bath to maintain <strong>the</strong> sampleat 4 °C. After homogenization, <strong>the</strong> sample was centrifuged for 3 min in amicrocentrifuge at 10,000g. Making sure not to disturb <strong>the</strong> pellet, <strong>the</strong> supernatantwas aspirated, divided into aliquots <strong>and</strong> placed in a 1.5-ml microcentrifugetube <strong>and</strong> frozen at 0 wasdoi:10.1038/nbt.1622nature biotechnology


© 2010 Nature America, Inc. All rights reserved.considered ‘positive’ for all samples. Thus, all positive grades of histopathology(grades 1 to 3) were treated with equal weight for <strong>the</strong> initial ROC analysis. TheROC methods described were also applied to specific subsets of samples basedon <strong>the</strong> severity grade of <strong>the</strong> histopathologic alteration score. The subsets usedin <strong>the</strong> analyses were <strong>the</strong> following:1. All samples,2. Only samples with maximum composite histopathology scores of 0, 1, or 2,3. Only samples with maximum composite histopathology scores of 0 or 1.Two types of ROC analyses were performed <strong>and</strong> are reported in this publication:<strong>the</strong> so-called inclusion <strong>and</strong> <strong>the</strong> so-called exclusion analyses:In <strong>the</strong> inclusion analysis, data from all animals were used. Animals with ahistopathology score of 0 were treated as negative cases <strong>and</strong> animals with ahistopathology score >0 were considered as positive cases.In <strong>the</strong> exclusion analysis, animals treated with vehicles or non-kidney toxicants(ANIT <strong>and</strong> methapyrilene) having a kidney histopathology grade = 0reported were considered as negative cases <strong>and</strong> animals treated with kidneytoxicants having a histopathology grade > 0 were considered as positive cases.Samples from animals treated with a nephrotoxicant that did not have a positivecomposite kidney histopathology score were excluded in this model. Thereason for <strong>the</strong> exclusion is to prevent <strong>the</strong> ambiguity of decision if animals areprodromic (markers might be earlier than histopathology), if histopathology isfalse negative or if <strong>the</strong> markers are false positive in possible cases of discrepanciesbetween markers <strong>and</strong> histopathology for those animals 32 .The AUC from each ROC curve, <strong>the</strong> sensitivity at a predefined specificity <strong>and</strong><strong>the</strong> specificity at a predefined sensitivity, as well as <strong>the</strong> comparisons to BUN <strong>and</strong>SCr <strong>and</strong> <strong>the</strong> results of significance tests for <strong>the</strong>se comparisons to support a claimthat <strong>the</strong> <strong>new</strong> biomarkers “outperform” BUN/SCr were calculated <strong>and</strong> stated forsubset 1. In addition, <strong>the</strong> AUC, sensitivity <strong>and</strong> specificity for <strong>the</strong> o<strong>the</strong>r subsetsrestricted to lower histopathology grades were determined <strong>and</strong> plotted.The ROC analyses were implemented in a unbiased way without prespecifyingif biomarker values are positively or negatively correlated withhistopathology scores. In a first step a positive correlation is assumed. If<strong>the</strong> resulting AUC is < 0.5 (corresponding to a non-informative or r<strong>and</strong>ommarker), <strong>the</strong> algorithm assumes a negative correlation <strong>and</strong> subsequently <strong>the</strong>ROC calculations are repeated <strong>and</strong> all results are updated accordingly.Likelihood ratio test statistics. To support a claim that a marker “addsinformation to” SCr <strong>and</strong> BUN, <strong>the</strong> likelihood ratio test statistic comparing(i) a logistic model was calculated that included an intercept <strong>and</strong> termsfor <strong>the</strong> marker, SCr, BUN <strong>and</strong> <strong>the</strong> SCr*BUN interaction with (ii) a logisticmodel that included <strong>the</strong> same terms except of <strong>the</strong> marker. It is known (see,for example, ref. 43) that <strong>the</strong> log ratio statistic = −2*log(likelihood[reducedmodel]/likelihood[full model]) is asymptotically χ 2 distributed with degreesof freedom equal to <strong>the</strong> difference in <strong>the</strong> number of parameters in <strong>the</strong> twomodels. In our case, this test statistic has one degree of freedom. Similarly to<strong>the</strong> ROC analyses, two types of analyses were performed, <strong>the</strong> exclusion <strong>and</strong>inclusion analyses.Availability of data. All presented data are available online (SupplementaryTable 9).40. Sing, T., S<strong>and</strong>er, O., Beerenwinkel, N. & Lengauer, T. ROCR: visualizing classifierperformance in R. Bioinformatics 21, 3940–3941 (2005).41. Hanley, J.A. & McNeil, B.J. The meaning <strong>and</strong> use of <strong>the</strong> area under a receiveroperating characteristic (ROC) curve. Radiology 143, 29–36 (1982).42. DeLong, E.R., DeLong, D.M. & Clarke-Pearson, D.L. Comparing <strong>the</strong> areas under twoor more correlated receiver operating characteristic curves: a nonparametricapproach. Biometrics 44, 837–845 (1988).43. Harrell, F. ERegression Modeling Strategies (Springer, New York, 2001).nature biotechnologydoi:10.1038/nbt.1622


A rt i c l e sUrinary biomarkers trefoil factor 3 <strong>and</strong> albumin enableearly detection of kidney tubular injuryYan Yu 1 , Hong Jin 1 , Daniel Holder 2 , Josef S Ozer 1,8 , Stephanie Villarreal 3 , Paul Shughrue 3 , Shu Shi 1 ,David J Figueroa 4 , Holly Clouse 5 , Ming Su 1 , Nagaraja Muniappa 6 , Sean P Troth 6 , Wendy Bailey 1 , John Seng 7 ,Amy G Aslamkhan 1 , Douglas Thudium 1 , Frank D Sistare 1 & David L Gerhold 1© 2010 Nature America, Inc. All rights reserved.The capacities of urinary trefoil factor 3 (TFF3) <strong>and</strong> urinary albumin to detect acute renal tubular injury have never beenevaluated with sufficient statistical rigor to permit <strong>the</strong>ir use in regulated drug development instead of <strong>the</strong> current preclinicalbiomarkers serum creatinine (SCr) <strong>and</strong> blood urea nitrogen (BUN). Working with rats, we found that urinary TFF3 protein levelswere markedly reduced, <strong>and</strong> urinary albumin were markedly increased in response to renal tubular injury. Urinary TFF3 levelsdid not respond to nonrenal toxicants, <strong>and</strong> urinary albumin faithfully reflected alterations in renal function. In situ hybridizationlocalized TFF3 expression in tubules of <strong>the</strong> outer stripe of <strong>the</strong> outer medulla. Albumin outperformed ei<strong>the</strong>r SCr or BUN fordetecting kidney tubule injury <strong>and</strong> TFF3 augmented <strong>the</strong> potential of BUN <strong>and</strong> SCr to detect kidney damage. Use of urinary TFF3<strong>and</strong> albumin will enable more sensitive <strong>and</strong> robust diagnosis of acute renal tubular injury than traditional biomarkers.Early detection of acute kidney injury remains a challenge in bothpreclinical research <strong>and</strong> clinical practice. In <strong>the</strong> context of drug development,more sensitive <strong>and</strong> specific markers of nephrotoxicity areneeded both for preclinical toxicology studies <strong>and</strong> for safely monitoringhuman patients to prevent drug-induced kidney injury in clinicaltrials. More informed decisions early during <strong>the</strong> drug-developmentpipeline should not only prevent entry of nephrotoxic drugs into<strong>the</strong> market but also enable better decisions about which products tomove forward in testing.The current st<strong>and</strong>ards for detecting nephrotoxicity, SCr <strong>and</strong> BUN,have clear limitations in terms of both <strong>the</strong>ir sensitivity <strong>and</strong> specificity.Typically, renal injury must abrogate over half of renal functionto result in elevated SCr <strong>and</strong> BUN 1,2 . To improve on <strong>the</strong>se benchmarkbiomarkers, we compared <strong>the</strong> utility of TFF3 <strong>and</strong> albuminto <strong>the</strong> merits of using traditional BUN <strong>and</strong> SCr as biomarkers ofkidney tubule injury. Urine is more accessible than blood <strong>and</strong> severalurinary biomarker proteins have been reported to be sensitiveindicators of kidney injury 1 . We chose to work with rats because thismodel system enables controlled studies with histopathology endpoints,has been well characterized for use in toxicological assessments<strong>and</strong> enables use of sufficient numbers to ensure statisticalrigor in analyses.Trefoil factor peptides 1, 2 <strong>and</strong> TFF3 are small peptide hormonessecreted by mucus-producing cells, <strong>and</strong> by epi<strong>the</strong>lial cells of multipletissues, in mammals 3 . TFF3 plays essential functions in both mucosalsurface maintenance <strong>and</strong> restitution 4 . By inhibiting apoptosis <strong>and</strong>promoting survival <strong>and</strong> migration of epi<strong>the</strong>lial cells into lesions,TFF3 facilitates restoration of intestinal epi<strong>the</strong>lium as a protectivebarrier against injury 5 . TFF3 also plays a role in inducing airwayepi<strong>the</strong>lial ciliated cell differentiation 6 . Rat kidney is also a majorsite of Tff3 mRNA expression 7 , although until now, <strong>the</strong> overall distributionof TFF3 protein in rat kidney was not well characterized.Histochemical localization using a labeled TFF3 fusion proteindetected binding sites in <strong>the</strong> collecting ducts of <strong>the</strong> rat kidney 8 <strong>and</strong>aging was correlated with decreased renal expression of Tff3 transcriptin rats 9 .Albumin is a major serum protein <strong>and</strong> is often <strong>the</strong> most abundantprotein found in urine during renal injury. The quantity ofalbumin appearing in urine is very important to distinguish <strong>the</strong> etiologyof renal disease. Historically, substantial albuminuria (nephroticrange >3.5 g/d) has primarily been associated with glomerulardamage in humans 10 . Normally, a small fraction of serum albumin(


A rt i c l e sa864Histo severity grade0 1 2 3 4 5Figure 1 Cisplatin kidney toxicity study. (a–c) Cisplatin was administered ina single intraperitoneal dose at 0, 0.5, 3.5 <strong>and</strong> 7 mg/kg. Levels of SCr (a),urinary TFF3 (b) <strong>and</strong> urinary albumin/urinary creatinine (alb/uCr) (c) weremeasured in rats on days 3 <strong>and</strong> 8. Day <strong>and</strong> dose are indicated at <strong>the</strong>bottom. Circles indicate biomarker values from individual animals. Kidneyhistopathological assessment was performed on days 3 <strong>and</strong> 8. Grades 0–5were indicated by white, yellow, orange, red, blue <strong>and</strong> black, respectively.Line indicates average for each group.© 2010 Nature America, Inc. All rights reserved.Urinary alb/uCr (µg/mg) Urinary TFF3 (ng/ml) SCr (mg/dl)bc210.80.60.41,0008006004002001008060402005,0001,00050010050105100 0.5 3.5Day 3RESULTSUrinary TFF3 <strong>and</strong> albumin protein responses to renal <strong>and</strong>nonrenal toxicantsHaving first observed marked Tff3 downregulation at <strong>the</strong> gene expressionlevel as a result of toxicant treatments (data not shown), we nextgenerated antibodies <strong>and</strong> developed an enzyme-linked immunosorbentassay (ELISA) to quantify TFF3 levels in rat urine samples. Both<strong>the</strong> TFF3 assay <strong>and</strong> <strong>the</strong> commercial albumin ELISA were validated toestablish intra- <strong>and</strong> interday reproducibility, limits of quantification,matrix effects <strong>and</strong> lack of interference by urine constituents <strong>and</strong> heavymetals (Supplementary Assay Validation).We evaluated <strong>the</strong> performance of <strong>the</strong> biomarkers in rats treated withkidney toxicants to assess responses to renal injury, as well as nonrenaltoxicants <strong>and</strong> diuretic treatments to assess specificity (SupplementaryData). We used an automated immuno-turbidimetric assay or a7Cisplatin (mg/kg)0 0.5 3.5 7Day 8commercial ELISA assay to measure changes in urinary albumin inresponse to 11 renal toxicants, six nonrenal toxicants <strong>and</strong> three diuretics.According to <strong>the</strong> literature, <strong>the</strong> renal toxicants included six proximaltubule toxicants, cisplatin (Platinol), gentamicin, carbapenem A,thioacetamide, hexachlorobutadiene (HCB) <strong>and</strong> d-serine; two papillarytoxicants, phenylanthranilic acid (NPAA) <strong>and</strong> propyleneimine; oneglomerular toxicant, doxorubicin (Adriamycin), one renal vasculartoxicant cyclosporin A; <strong>and</strong> one kidney tubulo-interstitial toxicantallopurinol (Zyloprim) (Supplementary Table 1). TFF3 was assayedin a 10-studies subset of <strong>the</strong> 20 studies, that included four proximaltubule toxicants, one renal vascular toxicant, three nonrenal toxicants<strong>and</strong> three diuretic treatments. Toxicants were used at three dose levelsin most studies (Supplementary Table 1), <strong>and</strong> groups of four to sixrats treated with each dose were euthanized on each of several studydays to assess histopathology. In this way biomarkers were assessed inanimals with kidney toxicity of different histopathological grades <strong>and</strong>stages of progression. Low-grade ‘very slight’ toxicities in some sampleswere induced to explore <strong>the</strong> sensitivity of <strong>the</strong> biomarkers.Our findings for several renal <strong>and</strong> nonrenal toxicant studiesare described in detail for both TFF3 <strong>and</strong> albumin, <strong>and</strong> all studyresults are summarized through receiver operating characteristic(ROC) plots <strong>and</strong> calculations, as described elsewhere in this issue 15 .Photomicrographs illustrating typical histomorphologic findingswith descriptions <strong>and</strong> severity grades are provided in SupplementaryResults. Detailed descriptions of study dose groups refer to whe<strong>the</strong>ra particular biomarker exceeded <strong>the</strong> 95% specificity threshold foranimals with pathology. Additional statistical analyses evaluatingbiomarker performance metrics for specific compounds, doses <strong>and</strong>time points were not performed as <strong>the</strong>y would have narrowed <strong>the</strong>universality of <strong>the</strong> statistical results presented in <strong>the</strong> combined ROCarea under <strong>the</strong> curve (AUC) analyses.Cisplatin is an antineoplastic drug known for its direct proximaltubular nephrotoxicity in both humans <strong>and</strong> animals 16,17 . We performednecropsies on rats that had been administered a single intraperitonealdose of cisplatin at 0, 0.5, 3.5 or 7 mg/kg on both 3 <strong>and</strong> 8 dafter dosing. Tubular epi<strong>the</strong>lial degeneration <strong>and</strong> necrosis were <strong>the</strong>only lesions observed on day 3, preceding <strong>the</strong> development of tubularepi<strong>the</strong>lial regeneration <strong>and</strong> dilatation on day 8. No pathologic changewas observed in <strong>the</strong> kidneys of control <strong>and</strong> low-dose animals (Fig. 1).Also, no pathologic change was observed in <strong>the</strong> livers of treated animals.Urinary TFF3 levels decreased in rats with tubular pathology,enabling detection of renal tubular toxicity. One out of five animalsin <strong>the</strong> low-dose group showed a decreased level of TFF3 on day 3, <strong>and</strong>by day 8 showed a trend toward decreased urinary TFF3 relative tovehicle controls (Fig. 1). This change in <strong>the</strong> low-dose group occurredbelow <strong>the</strong> dose causing histopathologic injury <strong>and</strong> was not accompaniedby changes in levels of ei<strong>the</strong>r SCr or urinary albumin. Althoughthis response could be a false-positive signal, <strong>the</strong> histopathologic toxicityobserved in higher dose groups suggests that it is a prodromalresponse. Eight- to tenfold reductions of urinary TFF3 concentrationwere observed in day 3 mid- <strong>and</strong> high-dose groups, <strong>and</strong> TFF3dropped fur<strong>the</strong>r to <strong>the</strong> lower limit of quantification (LLOQ) rangenature biotechnology VOLUME 28 NUMBER 5 MAY 2010 471


A rt i c l e s© 2010 Nature America, Inc. All rights reserved.aUrinary alb/uCr (µg/mg) Urinary TFF3 (ng/ml) SCr (mg/dl)bc864210.80.60.41,00075050025010075502510505,0001,000500100501050Histo severity grade0 1 2 3 4 50 20 80 240 0 20 80 240Day 3Gentamicin (mg/kg/d)Day 9240 0 20 80Day 12Day 15in day-8 mid- (14-fold) <strong>and</strong> high-dose (sixfold) groups, all of whichshowed pathology changes corresponding to severity grades 2–5 (Fig. 1).Urinary albumin levels (normalized by urine creatinine) increased formiddle- <strong>and</strong> high-dose groups on both days, except that one animal in<strong>the</strong> middle dose group on day 8 showed no response. Both TFF3 <strong>and</strong>albumin levels showed robust dynamic ranges of response.Gentamicin is an aminoglycoside antibiotic widely used for <strong>the</strong>treatment of Gram-negative bacterial infections 18 . Nephrotoxicity isa major complication associated with aminoglycoside administration,most likely as a consequence of its ability to accumulate within proximaltubule cells in lipid complexes by means of phospholipidosis 19 .We administered to male Sprague-Dawley rats a daily dose at 0, 20, 80or 240 mg/kg of gentamicin for up to 14 d <strong>and</strong> observed, first, renaltubular degeneration <strong>and</strong> necrosis, <strong>and</strong> <strong>the</strong>n tubular regeneration <strong>and</strong>dilatation, on days 9, 12 <strong>and</strong> 15. We did not observe liver histologicFigure 2 Gentamicin kidney toxicity study. Gentamicin sulfate wasadministered intraperitoneally with a dosing volume of 5 ml/kg at 0, 20, 80or 240 mg/kg/d to groups of five rats for up to 15 d. (a–c) Levels of SCr (a),urinary TFF3 (b) <strong>and</strong> urinary albumin/urinary creatinine (alb/uCr) (c)were measured. Day <strong>and</strong> dose are indicated at <strong>the</strong> bottom. Circles indicatebiomarker values from individual animals. Kidney histopathologicalassessment was performed on days 3, 9, 12 <strong>and</strong> 15. Grades 0–5 wereindicated by white, yellow, orange, red, blue <strong>and</strong> black, respectively. Lineindicates average for each group. The 240 mg/kg/d gentamicin sulfate day-15group was euthanized early at drug day 12 owing to signs of physical distress.lesions at any of <strong>the</strong>se times. The 240 mg/kg high-dose group was euthanizedon <strong>the</strong> twelfth day after dosing began, owing to signs of distress<strong>and</strong> several mortalities. A rise in SCr was observed for middle- <strong>and</strong>high-dose groups at days 9, 12 <strong>and</strong> 15. Urinary albumin showedmoderate-to-large increases in middle- <strong>and</strong> high-dose groups on days3, 9 <strong>and</strong> 12, but recovered by day 15 such that four of seven animalsthat showed histopathological renal damage at low- <strong>and</strong> middle-dosegroups showed a return to normal levels (Fig. 2). In contrast, reducedTFF3 levels were clearly observed not only for middle- <strong>and</strong> high-dosegroups early on day 3, but also trended strongly downward for <strong>the</strong> lowdosegroup (three to five out of five animals) on days 9 <strong>and</strong> 15. Mostof <strong>the</strong>se animals did not show any histologic lesions (Fig. 2). Thus,TFF3 demonstrated an apparent prodromal signal before <strong>the</strong> onsetof tubular lesions. TFF3 levels were fur<strong>the</strong>r reduced at middle- <strong>and</strong>high-dose groups later on days 9 <strong>and</strong> 15, which correlated well with <strong>the</strong>severity of <strong>the</strong> renal histopathology. All of <strong>the</strong> animals with any kidneyhistopathologic lesions showed decreased levels of urinary TFF3.Beta-lactam antibiotics, such as carbapenem A, can induce acute proximaltubular necrosis in humans <strong>and</strong> animals 20,21 . Male Sprague-Dawleyrats received daily intravenous injections of carbapenem A at 0, 75, 150 or225 mg/kg/d. Necropsies were performed 3, 8 <strong>and</strong> 14 d after exposure to<strong>the</strong> drug. On day 3, renal tubular necrosis <strong>and</strong> dilatation were pronouncedin middle- <strong>and</strong> high-dose groups, followed by substantial tubular regenerationon day 8 (Fig. 3). On day 14, only grade 1 tubular regeneration<strong>and</strong> dilatation were observed at <strong>the</strong> middle dose. At <strong>the</strong> low-dose level, nohistopathology lesions were observed on any of <strong>the</strong> 3 d, except that oneanimal showed tubular necrosis on day 3. Liver histopathologic lesionswere not observed in this study. Both SCr <strong>and</strong> urinary albumin were elevatedin middle- <strong>and</strong> high-dose groups on day 3, but were less profoundon day 8 <strong>and</strong> returned almost to basal level on day 14 (Fig. 3). A substantialreduction of urinary TFF3 concentration was seen for <strong>the</strong> low-dose groupon days 3, 8 <strong>and</strong> 14. This TFF3 reduction preceded <strong>the</strong> observation oftubular necrosis, dilatation <strong>and</strong> regeneration at higher doses, as well as anyincrease in SCr or urinary albumin (Fig. 3). TFF3 levels dropped fur<strong>the</strong>r atmiddle <strong>and</strong> high doses on days 3 <strong>and</strong> 8, <strong>and</strong> remained low on day 14. BothSCr <strong>and</strong> urinary albumin values were unaltered in animals that showedgrade 1 tubular regeneration <strong>and</strong> dilatation at <strong>the</strong> middle dose on day 3or 9 <strong>and</strong> day 14, whereas TFF3 levels showed a marked reduction for <strong>the</strong>seanimals (Fig. 3). Several animals without histopathologic kidney lesionsin <strong>the</strong> low-dose group on days 8 <strong>and</strong> 14 showed elevated urinary albuminlevels, suggesting a prodromal sign of toxicity. These albumin signalswere less marked than similar changes observed with TFF3. Hence, TFF3surpassed albumin <strong>and</strong> SCr as a kidney toxicity biomarker for this study.Fur<strong>the</strong>rmore, both TFF3 <strong>and</strong> albumin demonstrated superior sensitivityover SCr <strong>and</strong> BUN in early detection of renal tubular toxicity caused bythioacetamide in rats (Supplementary Fig. 1).Genipin <strong>and</strong> isoproterenol were used to produce nonrenal organtoxicity in rats 22,23 . Genipin treatment elicited severe liver damageat both 75 <strong>and</strong> 150 mg/kg/d on days 2 <strong>and</strong> 3, as well as grade 2 renaltubular degeneration only at 150 mg/kg/d on day 2 (TFF3 <strong>and</strong> urinary472 VOLUME 28 NUMBER 5 MAY 2010 nature biotechnology


A rt i c l e s© 2010 Nature America, Inc. All rights reserved.aUrinary alb/uCr (µg/mg) Urinary TFF3 (ng/ml)SCr (mg/dl)bc64210.80.60.47505002501007550251002,5001,000500100501050Histo severity grade0 1 2 3 4 50 75150 2250 75 150 225 0 75 150Carbapenem A (mg/kg/d)Day 3 Day 8 Day 14albumin were not assayed on day 2). Although no renal damage wasobserved on day 3, twofold elevations in levels of SCr <strong>and</strong> BUN, <strong>and</strong> a50% reduction of glomerular filtration rate on day 3 (SupplementaryData) indicated functional renal impairment. An upward trend inalbumin levels in <strong>the</strong> genipin study (Fig. 4) reflected this functionalrenal impairment, likely secondary to <strong>the</strong> pre-renal liver eventobserved. Only one animal treated with genipin on day 3 exhibited aslightly decreased level of TFF3 compared to <strong>the</strong> controls (Fig. 4a).TFF3 did not respond to this functional change, which demonstratesits specificity to intrinsic kidney tissue injury. Isoproterenol causedonly skeletal muscle <strong>and</strong> heart histopathologic toxicities, observed ondays 3 <strong>and</strong> 8. Figure 4c,d shows that TFF3 <strong>and</strong> albumin remained at<strong>the</strong> basal level across all dose groups on day 8, as did SCr <strong>and</strong> BUN.No renal functional changes were observed in rats treated with isoproterenol.Additional specificity studies using <strong>the</strong> muscle toxicantFigure 3 Carbapenem A kidney toxicity study. Carbapenem A wasadministered intravenously at doses of 0, 75, 150 or 225 mg/kg/d togroups of five rats. (a–c) Levels of SCr (a), urinary TFF3 (b) <strong>and</strong> urinaryalbumin/urinary creatinine (alb/uCr) (c) were measured in rats. Day <strong>and</strong>dose are indicated at <strong>the</strong> bottom. Circles indicate biomarker values fromindividual animals. Kidney histopathological assessment was performedon days 3, 8 <strong>and</strong> 14. Grades 0–5 were indicated by white, yellow, orange,red, blue <strong>and</strong> black, respectively. Line indicates average for each group.The 225 mg/kg/d day-14 group was euthanized on day 9 owing to signs ofphysical distress <strong>and</strong> toxicity.cerivastatin <strong>and</strong> three diuretic challenges (Supplementary Data)demonstrated that TFF3 did not respond to <strong>the</strong>se treatments, supportingits kidney selectivity. Overall, <strong>the</strong> functional renal biomarkerurinary albumin responded faithfully to functional renal alterationsei<strong>the</strong>r in <strong>the</strong> absence or presence of intrinsic renal tissue damage.ROC analysisWe performed ROC analysis based on a binary histopathologic classificationto assess whe<strong>the</strong>r each biomarker indicated kidney toxicityin each animal. Nephrotoxicant-treated animals without histologylesions were treated nei<strong>the</strong>r as true positives or false positives; butinstead were excluded from analysis due to <strong>the</strong> possibility of earlysubtle tissue or functional damage at <strong>the</strong> cellular or molecular levelundetectable by traditional histopathology. Both inclusion <strong>and</strong>exclusion analyses results 15 are shown in Supplementary Table 2a.95% specificity threshold levels were determined for each biomarker.Using <strong>the</strong> ten studies with complete data for TFF3, SCr <strong>and</strong> BUN, <strong>the</strong>AUCs for <strong>the</strong> exclusion analyses (<strong>and</strong> inclusion analyses includingall treated animals lacking histopathologic changes) were: 0.931 forTFF3 ng/ml (inclusion analysis, 0.887), 0.917 for TFF3 excretion(inclusion analysis, 0.849), 0.900 for TFF3 normalized by urinarycreatinine (inclusion analysis, 0.828), 0.896 for SCr (inclusion analysis,0.880) <strong>and</strong> 0.901 for BUN (inclusion analysis, 0.876; Fig. 5a).ROC analysis on <strong>the</strong> 20 studies for urinary albumin is illustrated inFigure 5b. Exclusion AUC values were: albumin/urinary creatinine(UCr) 0.901 (inclusion analysis, 0.843), SCr 0.766 (inclusion analysis,0.742) <strong>and</strong> BUN 0.822 (inclusion analysis, 0.752). Comparisonof AUCs between each pair of markers using <strong>the</strong> exclusion criterionindicated that albumin performance exceeded that of SCr with (P =1.0 × 10 −9 ; inclusion analysis, P = 8.9 × 10 −7 ); <strong>and</strong> that albumin performanceexceeded that of BUN (P = 1.5 × 10 −4 ; inclusion analysis,P = 1.1 × 10 −5 ). The corresponding TFF3 AUC comparisons using<strong>the</strong> exclusion criterion were not significant: for example, TFF3 concentrationcompared to SCr P = 0.074 (inclusion analysis, P = 0.96)or to BUN P = 0.267 (inclusion analysis, P = 0.69; SupplementaryTable 2b).Statistical analysis addressed whe<strong>the</strong>r TFF3 or albumin complements<strong>the</strong> st<strong>and</strong>ard SCr <strong>and</strong> BUN measures, regardless of whe<strong>the</strong>r <strong>the</strong>y arestatistically superior by <strong>the</strong>mselves. Nested logistic models were usedto assess whe<strong>the</strong>r a marker adds information in this way. Specifically,a likelihood ratio test, comparing a logistic model containing both <strong>the</strong>c<strong>and</strong>idate marker <strong>and</strong> <strong>the</strong> st<strong>and</strong>ard markers to a model with only <strong>the</strong>st<strong>and</strong>ard markers, was used to assess whe<strong>the</strong>r <strong>the</strong> c<strong>and</strong>idate markeraccounts for more variability than would be expected by chance 24 . TheP values assessing whe<strong>the</strong>r each biomarker adds value to combinedBUN <strong>and</strong> SCr are: TFF3 concentration using <strong>the</strong> exclusion analysis,P < 1.0 × 10 −5 (inclusion analysis, P < 1.0 × 10 −5 ), TFF3/UCr using <strong>the</strong>exclusion analysis, P < 1.0 × 10 −5 (inclusion analysis, P < 1.0 × 10 −5 ),TFF3 excretion using <strong>the</strong> exclusion analysis, P < 1.0 × 10 −5 (inclusionanalysis, P < 1.0 × 10 −5 ) <strong>and</strong> Albumin/UCr using <strong>the</strong> exclusion analysis,nature biotechnology VOLUME 28 NUMBER 5 MAY 2010 473


A rt i c l e s© 2010 Nature America, Inc. All rights reserved.Figure 4 Genipin liver toxicity study <strong>and</strong>isoproterenol muscle <strong>and</strong> heart toxicitystudy. (a–d) Levels of urinary TFF3 ng/ml orurinary albumin/urinary creatinine (alb/uCr)were measured on day 3 after intraperitonealadministration of genipin at 75 mg/kg/d (a,b)<strong>and</strong> day 8 after intravenous doses of isoproterenolat 0, 0.064, 0.25 or 1 mg/kg/d (c,d). Day <strong>and</strong>dose are indicated at <strong>the</strong> bottom. Circlesindicate biomarker values from individualanimals. Liver or heart <strong>and</strong> muscle histologygrades 0–5 are indicated by white, yellow,orange, red, blue <strong>and</strong> black, respectively.Line indicates average for each group.No kidney histopathology was observed.mkd, mg/kg/d.P < 1.0 × 10 −5 (inclusion analysis, P < 1.0 ×10 −5 ). Thus, we conclude that both TFF3 <strong>and</strong>albumin add value to SCr <strong>and</strong> BUN by ei<strong>the</strong>rexclusion or inclusion criteria.Additional analysis on fitting binary <strong>and</strong>ordinal logistic regression models to <strong>the</strong>kidney histopathology response (0, >0) forbinary, <strong>and</strong> histopathological severity (0, 1, 2, 3, >3) for ordinal aregiven for SCr+BUN models with <strong>and</strong> without each marker usingboth inclusion <strong>and</strong> exclusion approaches. With <strong>the</strong> addition of eachmarker, <strong>the</strong> performance measure showed significant improvementwith P < 1.0 × 10 −5 (Supplementary Table 3); <strong>the</strong> basis for ‘addsvalue’ claims.In situ hybridization to determine TFF3 localization in rat kidneyTo determine <strong>the</strong> source of TFF3 in <strong>the</strong> kidney, we performed in situhybridization on caudal sections (cross sections) from kidneys fromfour control animals <strong>and</strong> four animals treated with carbapenem A.Examination of <strong>the</strong> gross anatomical distribution of Tff3 mRNAshowed strong labeling of <strong>the</strong> outer medulla of control kidneys, withlittle or no labeling above background level in papillae or <strong>the</strong> cortex,except for labeling that extended into <strong>the</strong> cortex in apparent medullaryrays (Fig. 6a). Toxicant-treated kidneys showed markedly diminishedaSensitivity1.00.8b 1.00.8aUrinary alb/uCr (µg/mg) Urinary TFF3 (ng/ml)1,000900800700600500400300b5,000d 5,0001,00050010050100c1,0009008007006005004003002001,000500100500mkd 75mkd0mkd 0.064mkd 0.25mkd 1mkdGenipin (mg/kg/d) on day 3Urinary alb/uCr (µg/mg) Urinary TFF3 (ng/ml)1050Histo severity grade Isoproterenol (mg/kg/d) on day 80 1 2 3 4 5hybridization (Fig. 6b), in agreement with diminished urinary TFF3protein levels (Fig. 3b) <strong>and</strong> with Tff3 mRNA quantification using quantitativeRT-PCR in <strong>the</strong>se animals (data not shown). Kidney sectionswere subsequently overlaid with photographic emulsion, exposed for5 d before development <strong>and</strong> <strong>the</strong>n stained with hematoxylin to visualizecell nuclei. Darkfield <strong>and</strong> brightfield microscopy were <strong>the</strong>n usedto visualize regional <strong>and</strong> cellular localization of Tff3 mRNA in <strong>the</strong>kidney. Microscopic observation revealed selective labeling of tubulesthat are abundant in <strong>the</strong> outer stripe of <strong>the</strong> outer medulla, probablyproximal straight tubule cells (Fig. 6c–f). Despite <strong>the</strong> diminishedstaining intensities, toxicant-treated kidneys maintained <strong>the</strong> samegross <strong>and</strong> cellular distribution seen in control sections. Specifically,most outer medullary tubules showed much reduced or undetectableexpression of Tff3 mRNA in toxicant-treated kidneys; whereas aminority of outer medullary tubules maintained significant expressionof Tff3 mRNA (Fig. 6b,d). A western blot was performed using<strong>the</strong> ELISA capture antibody to probe manually dissected tissues froma control male rat. This experiment confirmed <strong>the</strong> presence of TFF3protein selectively in dissected medulla relative to cortex or papillae(data not shown).0.60.40.2MarkerTFF3 ng/mlTFF3 ngTFF3/uCrBUNSCrAUC0.9310.9170.9000.9010.896SENS0.8670.7750.7840.7480.756Threshold2.472.152.011.301.1900 0.2 0.4 0.6 0.895%Specificity 1 – specificity1.0Sensitivity0.60.40.2MarkerAlb/uCrBUNSCrAUC SENS Threshold0.901 0.711 2.230.822 0.614 1.260.766 0.484 1.2200 0.2 0.4 0.6 0.895%Specificity 1 – specificityFigure 5 ROC curves for TFF3 <strong>and</strong> for albumin compared to those forBUN, <strong>and</strong> creatinine. (a) ROC curves for TFF3, BUN <strong>and</strong> creatinine fromten rat studies (gentamicin, cisplatin, cyclosporin, <strong>and</strong> thioacetamide pluscarbapenem A-DRS <strong>and</strong> –TS renal toxicant studies <strong>and</strong> for isoproterenol,genipin, cerivastatin, <strong>and</strong> diuresis). (b) ROC curves for albumin, BUN,<strong>and</strong> creatinine from 20 rat studies. The biomarkers are rank ordered forperformance from top to bottom. The broken arrow marks 95% specificity.SEN, sensitivity at 95% specificity; <strong>and</strong> threshold (fold-cutoff) relativeto concurrent controls to achieve 95% specificity are shown. Note thatall animals with grade 0 histopathology despite treatment with a kidneytoxicant were excluded for this analysis.1.0DISCUSSIONA series of rat studies with model nephrotoxicants revealed strikingtime- <strong>and</strong> dose-dependent urinary TFF3 decreases <strong>and</strong> urinary albuminincreases, which were diagnostic for <strong>the</strong> onset of renal tubulartoxicity observed by histological examination. Whereas urinary TFF3decreased through a gene regulatory response to tubular toxicity, urinaryalbumin levels increased, possibly reflecting impaired proximal tubulealbumin recovery. Our observations recommend TFF3 as a sensitive,specific, dynamic <strong>and</strong> potentially prodromal biomarker. First, reductionsin TFF3 levels may be prodromal at low treatment doses. The level ofurinary TFF3 was decreased significantly in three dose groups: <strong>the</strong> carbapenemA 75 mg/kg/d dose groups on days 3, 8 <strong>and</strong> 14, <strong>and</strong> gentamicin20 mg/kg/d <strong>and</strong> 80 mg/kg/d dose groups on day 9. Histopathologiclesions were not observed in <strong>the</strong>se dose groups, except in one animaltreated with 75 mg/kg/d cabapenem A on day 3. This result raised <strong>the</strong>question of whe<strong>the</strong>r this response was a false-positive response, or aprodromal signal associated with an incipient true toxicity. The histologic474 VOLUME 28 NUMBER 5 MAY 2010 nature biotechnology


A rt i c l e s© 2010 Nature America, Inc. All rights reserved.aceCortexMedullaFigure 6 Determination of <strong>the</strong> renal source of Tff3 mRNA by in situhybridization. (a–f) 35 S-labeled antisense cRNA for rat Tff3 washybridized to cross-sections of rat kidneys from vehicle-treated controlrats (a,c,e) or rats treated with carbapenem A for 11 d (b,d,f). a <strong>and</strong> brepresent entire sections exposed to film. c <strong>and</strong> d represent dark-fieldimages exp<strong>and</strong>ed from outer medullary regions from a <strong>and</strong> b such thatTff3-hybridization shows white against a dark background of kidneytissue. e <strong>and</strong> f show brightfield images of <strong>the</strong> regions in rectangles fromc <strong>and</strong> d. Scale bars, 2.4 mm (a,b); 240 µm (c,d); 40 µ (e,f).toxicity observed at moderately higher doses or at <strong>the</strong> same doses but atlater time points in <strong>the</strong> same studies suggests that <strong>the</strong>se were prodromalresponses. This prodromal signal was also seen at early time points. Forexample, in <strong>the</strong> 80 <strong>and</strong> 240 mg/kg/d dose groups of <strong>the</strong> gentamicin studyon day 3, no change is seen in SCr but pathology is noted in three of tenanimals dosed with 80 or 240 mg/kg/d. In contrast, on day 3, both TFF3<strong>and</strong> albumin responded at <strong>the</strong> 80 <strong>and</strong> 240 mg/kg/d dose levels before <strong>the</strong>elevated SCr <strong>and</strong> more severe histopathology observed on day 9. Second,despite <strong>the</strong> exclusion of samples manifesting a putative prodromal signalof TFF3, ROC analysis confirmed that TFF3 concentration was a sensitivebiomarker in <strong>the</strong>se studies relative to BUN, SCr or urinary albumin.None<strong>the</strong>less, TFF3 did not significantly outperform BUN <strong>and</strong> SCr when<strong>the</strong>se were used toge<strong>the</strong>r.Ordinal logistic regression analysis established that TFF3 addedvalue to use of SCr <strong>and</strong> BUN toge<strong>the</strong>r based upon a regression modelusing TFF3+SCr+BUN that accounts for more variability than wouldbe expected by chance 24 relative to SCr+BUN alone. This improvementwas better than expected by chance with P < 1 × 10 −5 regardlessof whe<strong>the</strong>r exclusion or inclusion data sets were used in <strong>the</strong> analysis(Supplementary Table 3). Third, TFF3 suppression to


A rt i c l e s© 2010 Nature America, Inc. All rights reserved.Although our findings demonstrate a striking correlation betweena decline in urinary TFF3 <strong>and</strong> acute tubular injury, more renal toxicantsshould be tested to increase qualification of urinary TFF3 in<strong>the</strong> context of glomerular, papillary, interstitial <strong>and</strong> vascular injuries.Moreover, <strong>the</strong> urinary TFF3 protein in control female rats of <strong>the</strong> sameage was approximately sixfold less abundant. Never<strong>the</strong>less, decreasesin urinary TFF3 levels were also observed in female rats in response torenal tubular toxicants (data not shown). More studies are also necessaryto confirm <strong>the</strong> value of urinary TFF3 in organ differential diagnosis,as well as to explore its performance in chronic kidney injury.It will also be important to compare <strong>the</strong> performance of urinary TFF3as a marker for acute kidney injury relative to o<strong>the</strong>r kidney injurybiomarkers, such as KIM-1, urinary neutrophil gelatinase-associatedlipocalin (NGAL) <strong>and</strong> glutathione sulfotransferase alpha (GST α)across more studies 1,2 . Finally, we have identified Tff3 mRNA <strong>and</strong> proteinexpression in human kidney (unpublished results). The potentialfor <strong>the</strong> application of TFF3 as a clinical biomarker warrants testingin clinical settings to monitor drug safety, drug efficacy <strong>and</strong> diseaseprogression. To extend <strong>the</strong>se observations, we set out to determine<strong>the</strong> localization pattern of TFF3 in rat kidneys.In situ hybridizations localized Tff3 mRNA to abundant tubulesof <strong>the</strong> outer stripe of <strong>the</strong> outer medulla, a site enriched for proximalstraight tubules. This raises <strong>the</strong> question of which cells are likely torespond to <strong>the</strong> TFF3 hormone. Cells that respond may be proximalstraight tubular or o<strong>the</strong>r epi<strong>the</strong>lial cells distal to <strong>the</strong> site of syn<strong>the</strong>sis,such as cells lining <strong>the</strong> loop of Henle, distal convoluted tubule, collectingducts or transitional epi<strong>the</strong>lium of <strong>the</strong> renal pelvis <strong>and</strong> distalsites. The specific stimulus that regulates TFF3, in <strong>the</strong> context of acutetubular injury, is unknown. Inflammatory cytokines play importantroles in <strong>the</strong> pathogenesis of toxicant-induced kidney injury <strong>and</strong>ischemia-reperfusion injury including tumor necrosis factor-alpha(TNFα) in mice 31,32 . Evidence from gastrointestinal models in vitro<strong>and</strong> in vivo suggests that TFF3 can be downregulated by inflammatorycytokines, such as TNFα, <strong>and</strong> <strong>the</strong> interleukins IL-1β <strong>and</strong> IL6 33,34 .Fur<strong>the</strong>r work is required to investigate <strong>the</strong> regulation <strong>and</strong> functionof TFF3 in <strong>the</strong> rat kidney.In contrast to TFF3, albumin has long been used in <strong>the</strong> clinic fordetection of renal or cardiac dysfunction. Normally, <strong>the</strong>re are smallamounts of albumin present in rat urine. The average albumin contentin normal control male rat urine was about 17 µg/mg of creatinine,based on a total of 49 control rats in <strong>the</strong> four studies. A >2.23-foldelevation of urinary albumin was a reliable <strong>and</strong> specific indicator ofrenal tubular lesions. ROC analysis of a total of 20 studies demonstratedthat urinary albumin added value <strong>and</strong> significantly outperformedBUN <strong>and</strong> creatinine. In addition, urinary albumin manifesteda putative prodromal signal in several studies (Figs. 1–3). Overall,urinary albumin demonstrated robust signal <strong>and</strong> superior sensitivityover SCr <strong>and</strong> BUN in early detection of renal tubular toxicity. Itshould be noted that albuminuria is not specific to renal disease <strong>and</strong>cannot be used alone to diagnose renal injury, suggesting its utilityonly with panels of o<strong>the</strong>r biomarkers. In <strong>the</strong> clinic, albuminuria isa known biomarker of <strong>the</strong> development <strong>and</strong> progression of renal orcardiovascular disease as well as acute renal toxicant injury. Urinaryalbumin excretion is also prognostic for chronic kidney disease <strong>and</strong>cardiovascular disease, such as <strong>the</strong> increased risk for <strong>the</strong> developmentof diabetes or hypertension in normotensive subjects 10,35 . Extra-renalalbuminuria can also result from inflammation, hemorrhage or infectionof <strong>the</strong> lower urinary tract, fever or stress 10 . Given that we areable to exclude <strong>the</strong> o<strong>the</strong>r sources of albumin origin in controlledrat studies, urinary albumin serves as a sensitive marker for renaltubular injury.We propose that our studies qualify albumin <strong>and</strong> TFF3 as biomarkersof acute kidney injury, with each providing complementary informationrelative to BUN <strong>and</strong> SCr <strong>and</strong> to one ano<strong>the</strong>r. Although ei<strong>the</strong>rbiomarker can improve sensitivity to detect tubular injuries comparedto conventional BUN or SCr, <strong>the</strong> contrasting information providedby TFF3 versus albuminuria enables a ‘fit-for-purpose’ model ofbiomarker utilization. In accordance with this model, increased albuminuriareveals functional impairment in tubular protein recovery,alone or in combination with functional impairment in glomerularfiltration selectivity. In contrast, decreased TFF3 reveals a biologicalregulatory response to intrinsic proximal tubule injury.These studies provide a step toward pursuit of formal qualificationdecisions from worldwide regulatory authorities. The biomarkerqualification submission to <strong>the</strong> US Food <strong>and</strong> <strong>Drug</strong> Administration(FDA) <strong>and</strong> European Medicines Agency (EMEA) started in June 2007.In spring 2008, FDA <strong>and</strong> EMEA acknowledged that both TFF3 <strong>and</strong>albumin are considered qualified for detection of acute renal injuryin pre-clinical contexts <strong>and</strong> albumin is qualified for use, on a caseby-casebasis, in translational clinical contexts. Clinical studies areplanned to assess a panel of biomarkers including albumin <strong>and</strong> TFF3in <strong>the</strong> context of several sources of acute kidney injury.MethodsMethods <strong>and</strong> any associated references are available in <strong>the</strong> online versionof <strong>the</strong> paper at http://www.nature.com/naturebiotechnology/.Note: Supplementary information is available on <strong>the</strong> Nature Biotechnology website.AcknowledgmentsThe authors wish to thank <strong>the</strong> participants in <strong>the</strong> Nephrotoxicity Predictive SafetyTesting Consortium <strong>and</strong> <strong>the</strong> Merck Kidney Biomarker Working Group. We thankJ. Mardi, J. Flor, A. Smith <strong>and</strong> D. Harner for photographic <strong>and</strong> editorial assistancein assembling <strong>the</strong> histopathology supplement.AUTHOR CONTRIBUTIONSY.Y., D.H., J.S.O., P.S., S.P.T., W.B., A.G.A., F.D.S. <strong>and</strong> D.L.G. designed <strong>and</strong> analyzedexperiments. Y.Y., H.J., S.V., D.J.F., H.C., M.S., J.S., N.M., S.P.T. <strong>and</strong> S.S. performedexperiments. Y.Y., D.H., J.S.O., P.S., A.G.A., D.T., F.D.S. <strong>and</strong> D.L.G. wrote <strong>and</strong>edited <strong>the</strong> manuscript.COMPETING FINANCIAL INTERESTSThe authors declare competing financial interests: details accompany <strong>the</strong> full-textHTML version of <strong>the</strong> paper at http://www.nature.com/naturebiotechnology/.Published online at http://www.nature.com/naturebiotechnology/.Reprints <strong>and</strong> permissions information is available online at http://npg.nature.com/reprints<strong>and</strong>permissions/.1. Devarajan, P. Emerging biomarkers of acute kidney injury. Acute Kidney Injury 156,203–212 (2007).2. Bonventre, J.V., Vaidya, V.S., Schmouder, R., Feig, P. & Dieterle, F. Next-generationbiomarkers for detecting kidney toxicity. Nat. Biotechnol. 28, 436–440 (2010).3. Madsen, J., Nielsen, O., Tornoe, I., Thim, L. & Holmskov, U. Tissue localization ofhuman trefoil factors 1, 2, <strong>and</strong> 3. J. Histochem. Cytochem. 55, 505–513(2007).4. Taupin, D. & Podolsky, D.K. Trefoil factors: initiators of mucosal healing. Nat. Rev.Mol. Cell Biol. 4, 721–732 (2003).5. Kinoshita, K., Taupin, D.R., Itoh, H. & Podolsky, D.K. Distinct pathways of cellmigration <strong>and</strong> antiapoptotic response to epi<strong>the</strong>lial injury: Structure-function analysisof human intestinal trefoil factor. Mol. Cell. Biol. 20, 4680–4690 (2000).6. LeSimple, P. et al. Trefoil factor family 3 peptide promotes human airway epi<strong>the</strong>lialciliated cell differentiation. Am. J. Respir. Cell Mol. Biol. 36, 296–303 (2007).7. Suemori, S., Lynchdevaney, K. & Podolsky, D.K. Identification <strong>and</strong> characterizationof rat intestinal trefoil factor—tissue-specific <strong>and</strong> cell-specific member of <strong>the</strong> trefoilprotein family. Proc. Natl. Acad. Sci. USA 88, 11017–11021 (1991).8. Chinery, R., Poulsom, R., Elia, G., Hanby, A.M. & Wright, N.A. Expression <strong>and</strong>purification of a trefoil peptide motif in A beta-galactosidase fusion protein <strong>and</strong> itsuse to search for trefoil-binding sites. Eur. J. Biochem. 212, 557–563 (1993).9. Debata, P.R., P<strong>and</strong>a, H. & Supakar, P.C. Altered expression of trefoil factor 3 <strong>and</strong>ca<strong>the</strong>psin L gene in rat kidney during aging. Biogerontology 8, 25–30 (2007).476 VOLUME 28 NUMBER 5 MAY 2010 nature biotechnology


A rt i c l e s© 2010 Nature America, Inc. All rights reserved.10. Venkat, K.K. Proteinuria <strong>and</strong> microalbuminuria in adults: significance, evaluation,<strong>and</strong> treatment. South. Med. J. 97, 969–979 (2004).11. Christensen, E., Birn, H., Rippe, B. & Maunsbach, A.B. Controversies in nephrology:renal albumin h<strong>and</strong>ling, facts, <strong>and</strong> artifacts!. Kidney Int. 72, 1192–1194(2007).12. Tugay, S., Bircan, Z., Caglayan, C., Arisoy, A.E. & Gokalp, A.S. Acute effects ofgentamicin on glomerular <strong>and</strong> tubular functions in preterm neonates. Pediatr.Nephrol. 21, 1389–1392 (2006).13. Koch Nogueira, P.C. et al. Long-term nephrotoxicity of cisplatin, ifosfamide, <strong>and</strong>methotrexate in osteosarcoma. Pediatr. Nephrol. 12, 572–575 (1998).14. Kern, W. et al. Microalbuminuria during cisplatin <strong>the</strong>rapy: relation withpharmacokinetics <strong>and</strong> implications for nephroprotection. Anticancer Res. 20,3679–3688 (2000).15. Sistare, F. et al. Towards consensus practices to qualify safety biomarkers for usein early drug development. Nat. Biotechnol. 28, 446–454 (2010).16. Safirstein, R., Winston, J., Moel, D., Dikman, S. & Guttenplan, J. Cisplatinnephrotoxicity—insights into mechanism. Int. J. Androl. 10, 325–346 (1987).17. Winston, J.A. & Safirstein, R. Reduced renal blood-flow in early cisplatin-inducedacute renal-failure in <strong>the</strong> rat. Am. J. Physiol. 249, F490–F496 (1985).18. Martinez-Salgado, C., Lopez-Hern<strong>and</strong>ez, F.J. & Lopez-Novoa, J.M. Glomerularnephrotoxicity of aminoglycosides. Toxicol. Appl. Pharmacol. 223, 86–98 (2007).19. Feldman, S., Wang, M.Y. & Kaloyanides, G.J. Aminoglycosides induce a phospholipidosisin <strong>the</strong> renal cortex of <strong>the</strong> rat—an early manifestation of nephrotoxicity. J. Pharmacol.Exp. Ther. 220, 514–520 (1982).20. Tune, B.M. & Hsu, C.Y. Mechanisms of beta-lactam antibiotic nephrotoxicity. Toxicol.Lett. 53, 81–86 (1990).21. Tune, B.M. Renal tubular transport <strong>and</strong> nephrotoxicity of beta-lactam antibiotics—structure-activity-relationships. Miner. Electrolyte Metab. 20, 221–231 (1994).22. Yamano, T. et al. Hepatotoxicity of geniposide in rats. Am. J. Pathol. 74, 575–519(1974).23. York, M. et al. Characterization of troponin responses in isoproterenol-inducedcardiac injury in <strong>the</strong> Hanover Wistar rat. Toxicol. Pathol. 35, 606–617 (2007).24. Harrell, F.E., Lee, K.L., Califf, R.M., Pryor, D.B. & Rosati, R.A. Regressionmodeling strategies for improved prognostic prediction. Stat. Med. 3, 143–152(1984).25. Ozer, J. et al. A panel of urinary biomarkers to monitor reversibility of renal injury<strong>and</strong> a serum marker with improved potential to assess renal function. Nat. Biotechnol.28, 486–494 (2010).26. Brooks, D.P., Drutz, D.J. & Ruffolo, R.R. Prevention <strong>and</strong> complete reversal ofcyclosporine a-induced renal vasoconstriction <strong>and</strong> nephrotoxicity in <strong>the</strong> rat byfenoldopam. J. Pharmacol. Exp. Ther. 254, 375–379 (1990).27. Goldman, L. & Bennett, C.. Cecil Texbook of Medicine, edn. 21 (W.B. Saunders,2000).28. Loeb, F.W. & Quimby, W.F.P. Clinical Chemistry of Laboratory Animals, edn. 2 (CRC,1999).29. Schwab, S.J., Christensen, R.L., Dougherty, K. & Klahr, S. Quantitation of proteinuriaby <strong>the</strong> use of protein-to-creatinine ratios in single urine samples. Arch. Intern. Med.147, 943–944 (1987).30. Ginsberg, J.M., Chang, B.S., Matarese, R.A. & Garella, S. Use of single voidedurine samples to estimate quantitative proteinuria. N. Engl. J. Med. 309,1543–1546 (1983).31. Ramesh, G. & Reeves, W.B. Inflammatory cytokines in acute renal failure. KidneyInt. 66, S56–S61 (2004).32. Zhang, B., Ramesh, G., Norbury, C.C. & Reeves, W.B. Cisplatin-induced nephrotoxicityis <strong>media</strong>ted by tumor necrosis factor-alpha produced by renal parenchymal cells.Kidney Int. 72, 37–44 (2007).33. Dossinger, V., Kayademir, T., Blin, N. & Gott, P. Down-regulation of TFF expressionin gastrointestinal cell lines by cytokines <strong>and</strong> nuclear factors. Cell. Physiol. Biochem.12, 197–206 (2002).34. Loncar, M.B. et al. Tumour necrosis factor alpha <strong>and</strong> nuclear factor kappa B inhibittranscription of human TFF3 encoding a gastrointestinal healing peptide. Gut 52,1297–1303 (2003).35. Sarafidis, P.A. Proteinuria: natural course, prognostic implications <strong>and</strong> <strong>the</strong>rapeuticconsiderations. Minerva Med. 98, 693–711 (2007).nature biotechnology VOLUME 28 NUMBER 5 MAY 2010 477


© 2010 Nature America, Inc. All rights reserved.ONLINE METHODSAnimal subjects <strong>and</strong> study design. Male Sprague Dawley (Sprague-Dawley) rats,Rattus norvegicus, ~10 weeks of age <strong>and</strong> 275–325 g weight, were purchased fromCharles River Laboratories. They were maintained in a room with controlledtemperature (21 °C) <strong>and</strong> 12:12 h light/dark cycle. 22 g/day of Purina Millscertifiedrodent diet <strong>and</strong> water ad libitum were provided. The use of animals<strong>and</strong> study procedures was approved by <strong>the</strong> Merck IACUC Committee <strong>and</strong> inaccordance with <strong>the</strong> NIH Guide for <strong>the</strong> Care <strong>and</strong> Use of Laboratory Animals (NIH,1999). The general design of model nephrotoxicant, organotoxicant <strong>and</strong> diuresisstudies is summarized in Supplementary Table 1 for 20 studies using cisplatin,gentamicin, carbapenem A, cyclosporin A, thioacetamide 36 , HCB 37 , allopurinol38 , NPAA 39 , d-serine 40 , propyleneimine 41 <strong>and</strong> adriamycin 42 , isoproterenol(aka, isoprenaline) 43 , furan 44 , genipin 22 , cerivastatin 44 , CCl4 or CBrCCl3 45,46 ,<strong>and</strong> diuresis using H 2 0 or 2% NaCl or 4% sucrose 47–49 .Details of <strong>the</strong> design of model nephrotoxicant <strong>and</strong> organotoxicant studiesare detailed in Supplementary Table 1, <strong>and</strong> validation of <strong>the</strong> TFF3 <strong>and</strong> albuminELISA is described in Supplementary Assay Validation.TFF3 ELISA. Recombinant rat TFF3 protein was expressed in <strong>the</strong> BL21Escherichia coli strain by subcloning full-length rat Tff3 cDNA into pRSETA(Invitrogen) at <strong>the</strong> 5′-BamHI-HindIII-3′ restriction sites. Protein was purifiedusing Ni-NTA chromatography <strong>and</strong> used to generate a st<strong>and</strong>ard curve in diluentbuffer (0–39 µg/ml). To produce <strong>the</strong> capture antibody, we used <strong>the</strong> C-terminalCSIPNVPWCFKPLQETECTF <strong>and</strong> middle CNYPTVTSEQCNNRGC-CONH2peptides of TFF3. An aliquot of 2.5 µg/ml polyclonal rabbit anti-rat TFF3 antibodyin coating buffer (0.1 M carbonate buffer, pH 9.6) was used to coat <strong>the</strong> plateovernight at 4 °C. Diluent buffer (5% Tween 20 <strong>and</strong> 0.00625 % BSA in 1× PBS)was used to block <strong>the</strong> plate for 1 h at 20–24 °C. Urine samples were desalted usingAmicon YM3 columns (Millipore). Eight µl of eluent plus an additional 42 µl ofdiluent buffer was used per sample in ELISA (2 h incubation, 22 °C). After fourwashes in PBS with 0.5% Tween 20, 1:100 goat anti-mouse ITF (M-18) (SantaCruz3) was used as <strong>the</strong> detection antibody (1 h at 22 °C). Mouse anti-goat/sheepmonoclonal antibody (1:10,000) (Sigma) was used for 1 h at 20–24 °C followedby horseradish peroxidase (HRP) substrate incubation. Im<strong>media</strong>tely after additionof stop solution, <strong>the</strong> samples were analyzed at OD 450 using a MolecularDevices spectrophotometer. Based on <strong>the</strong> assay validation results, <strong>the</strong> TFF3 lowestlimit of quantification (LLOQ) values were set to 19 ng/ml, <strong>and</strong> <strong>the</strong> upperlimit of quantification (ULOQ) values were set at 1,220 ng/ml.Albumin ELISAs. The AssayMax rat albumin ELISA kit from AssayPro, whichis a competitive ELISA assay, was used to detect urinary albumin. Sample urineswere diluted 1:20 for this assay (2.5 µl urine/assay). Twenty-five µl of albuminst<strong>and</strong>ard or samples was used per well, plus <strong>the</strong> addition of 25 µl of biotinylatedalbumin to each well. The mixture was placed at 20–24 °C for 2 h toincubate. After five washes in 1× wash buffer, 50 µl of streptavidin-peroxidaseconjugate was added to each well <strong>and</strong> incubate for 30 min. Then <strong>the</strong> plate waswashed five times followed by 5–7 min of chromogen substrate. Im<strong>media</strong>telyafter addition of stop solution, samples were analyzed at OD 450 using aMolecular Devices SpectraMax M5 Spectrophotometer. Based on <strong>the</strong> assayvalidation results, <strong>the</strong> albumin LLOQ values were set to 12 µg/ml, <strong>and</strong> <strong>the</strong>ULOQ values were set at 1,500 µg/ml. The following studies were assayed usingthis ELISA: cisplatin, gentamicin, carbapenem A (two studies), thioacetamide,cyclosporin A, isoproterenol <strong>and</strong> genipin.Albumin immunoturbidimetric assays. The Tina-quant albumin kits (Roche)were used for our indicated studies. This immunoturbidimetric assay was performedon automated clinical analyzers that were calibrated with rat albuminst<strong>and</strong>ards (Sigma) <strong>and</strong> used to detect urinary albumin. Sample urines wereassayed undiluted (15 ml urine/assay) or diluted as necessary to fall within <strong>the</strong>analytical range. Four-parameter log-log analyses were performed on all datasets. Tina-quant albumin LLOQ values were set at 10 µg/ml. All samples above<strong>the</strong> ULOQ were diluted <strong>and</strong> repeated to obtain accurate albumin determinations.The following studies were assayed using this immunoturbidimetricassay: allopurinol, d-serine, HCB, gentamicin, NPAA, diuretic treatments,BrCCl 3 <strong>and</strong> CCl 4 , adriamycin, propyleneimine <strong>and</strong> furan. The carbapenem Astudy was repeated using this assay, as well as <strong>the</strong> ELISA, yielding quantitativelysimilar results.Urine <strong>and</strong> blood collection <strong>and</strong> analysis. Before necropsy, animals werefasted <strong>and</strong> urine was collected overnight (18 h) on dry ice using metaboliccages. Urine was stored at −80 °C until thawing for urinalysis <strong>and</strong> ELISAdeterminations. Typically, 2.5 ml urine samples were used for urinalysis(Roche Modular Analyzer) including sodium, potassium <strong>and</strong> chloride(expressed in mmol/l); specific gravity; pH; urinary blood; glucose; <strong>and</strong>protein (dipstick test). Glomerular filtration rate was calculated as endogenouscreatinine clearance (C.Cr) = (U.Cr * volume/S.Cr)*1,000/(18 h *60 min) µl/min. Where U.Cr is <strong>the</strong> urinary concentration of creatinine, V is<strong>the</strong> urine volume for an 18 h collection, <strong>and</strong> S.Cr is <strong>the</strong> serum concentrationof creatinine. At <strong>the</strong> end of <strong>the</strong> urine collection at necropsy, animals werebled from <strong>the</strong> vena cava with ~2 ml of blood collected into a serum separatortube (1,500g, 10 min, 4 °C) <strong>and</strong> 2 ml collected into an EDTA collection tubeto isolate plasma (900 g, 15 min, 4 °C). Urea Nitrogen (BUN), creatinine<strong>and</strong> electrolytes were measured using a st<strong>and</strong>ard clinical chemistry analyzer(Roche Modular).Histology. Rats were euthanized on <strong>the</strong> necropsy days of each study. Theleft quadriceps (3-mm section including all four muscle groups), left kidney(5-mm section including <strong>the</strong> papilla, cortex <strong>and</strong> medulla), right lateral lobeof <strong>the</strong> liver, <strong>and</strong> <strong>the</strong> vasocranial aspect of <strong>the</strong> heart were removed from eachanimal <strong>and</strong> placed in 10% neutral buffered formalin. The tissues were fixedin neutral buffered formalin for 24 h, <strong>and</strong> processed to paraffin. Embeddedtissues were cut into 4–6-µm sections <strong>and</strong> stained with hematoxylin <strong>and</strong> eosin.Kidneys from control, high-dose animals <strong>and</strong> organs with test article–relatedrenal changes from lower-dose groups, were examined microscopically by aMerck pathologist (blinded from biomarker data) <strong>and</strong> results were reviewedby ano<strong>the</strong>r supervising pathologist. A scale of 1 to 5 was used to rate <strong>the</strong>severity of pathological lesions: 0 (no observable pathologic change), 1 (veryslight), 2 (slight), 3 (moderate), 4 (marked) or 5 (severe). Renal lesions werecategorized according to <strong>the</strong> Critical Path Institute′s Predictive Safety TestingConsortium 50 lexicon categories including: tubular epi<strong>the</strong>lial degeneration,tubular epi<strong>the</strong>lial necrosis, tubular epi<strong>the</strong>lial regeneration, tubular dilatation<strong>and</strong> inflammation. The single highest pathology score in any of <strong>the</strong>se tubularinjury categories was assigned as <strong>the</strong> score for <strong>the</strong> total kidney histopathologycomposite for individual animals 15 . Liver, heart, quadriceps <strong>and</strong> soleus wereobserved for histological changes in animals treated with genipin or isoproterenol.Organ damage at high dose was followed sequentially by examinationof middle dose, <strong>the</strong>n low-dose samples, until no damage was observed in adose group.In situ hybridization. Twenty-micrometer thick caudal sections were preparedfrom frozen kidneys of four control- <strong>and</strong> four treated kidneys from<strong>the</strong> carbapenem A study. 35 S-labeled RNA transcripts of rat Tff3 mRNA(Tff3pExpress1, Invitrogen) were used as hybridization probes. T7 RNApolymerase <strong>and</strong> an EcoRI digest were used to make <strong>the</strong> antisense str<strong>and</strong>, orSp6 polymerase <strong>and</strong> a XhoI digest to make <strong>the</strong> sense str<strong>and</strong>. Methods forIn situ Hybridization (ISH) followed published protocol 51 .Statistics. For TFF3, total excretion (concentration * urine volume), normalizedby urine creatinine (TFF3/UCr as ng/mg) <strong>and</strong> nonnormalized concentrationvalues (ng/ml) were analyzed. Albumin values were normalized to µgof albumin per mg of creatinine(Alb/UCr). Values below <strong>the</strong> lower limit ofquantification were replaced with LLOQ-1 before data transformation. Allanalysis of marker values was performed on <strong>the</strong> log scale. To obtain <strong>the</strong> relativefold or percentage changes for each marker BUN, SCr, albumin normalizedto urinary creatinine (Alb/UCr) <strong>and</strong> urinary TFF3 concentration), measurementsfrom individual animals were divided by <strong>the</strong> control group geometricmeans measured in <strong>the</strong> same study on <strong>the</strong> same study day. ROC analysis 52 wasconducted to evaluate <strong>the</strong> performance of each marker in detecting <strong>the</strong> absenceor presence of kidney toxicity defined by <strong>the</strong> total kidney histopathology composite(0 or ≥1). Animals treated with a kidney toxicant, but with histologyscore of 0 were set aside in <strong>the</strong> analyses. This was done to avoid declaring falsepositives in such animals owing to possible superior biomarker sensitivityrelative to histology. For <strong>the</strong> markers (BUN, SCr, Alb/UCr, TFF3 ng/ml, TFF3 ng,TFF3/UCr) to be compared, only samples that had values for all of <strong>the</strong>markers were used. AUC between each pair of markers were compared usingnature biotechnologydoi:10.1038/nbt.1624


a χ 2 statistic test. Complementarity testing was performed using a likelihoodratio test to compare pairs of logistic models. These pairs included all animalsfor which necessary biomarker normalization data were available; for example,<strong>the</strong> TFF3/UCr versus SCr comparison used all animals with data for TFF3concentration, urinary creatinine <strong>and</strong> SCr.36. Barker, E.A. & Smuckler, E.A. Nonhepatic thioacetamide injury. II. The morphologicfeatures of proximal renal tubular injury. Hepatogastroenterology 54, 1339–1344(2007).37. Boroushaki, M. Development of resistance against hexachlorobutadiene in <strong>the</strong>proximal tubules of young male rat. Comp. Biochem. Physiol. C-Toxicol. Pharmacol.136, 367–375 (2003).38. Ansari, N.H. & Rajaraman, S. Allopurinol-induced nephrotoxicity - protection by <strong>the</strong>antioxidant, butylated hydroxytoluene. Res. Commun. Chem. Pathol. Pharmacol.75, 221–229 (1992).39. Nguyen, T.K.T., Obatomi, D.K. & Bach, P.H. Increased urinary uronic acid excretion inexperimentally-induced renal papillary necrosis in rats. Ren. Fail. 23, 31–42 (2001).40. Williams, R.E. & Lock, E.A. d-serine-induced nephrotoxicity: possible interactionwith tyrosine metabolism. Toxicology 201, 231–238 (2004).41. Halman, J., Miller, J., Fowler, J.S.L. & Price, R.G. Renal toxicity of propyleneimine—assessment by noninvasive techniques in <strong>the</strong> rat. Toxicology 41, 43–59 (1986).42. Zoja, C., Perico, N. & Remuzzi, G. Abnormalities in arachidonic-acid metabolitesin nephrotoxic glomerular injury. Toxicol. Lett. 46, 65–75 (1989).43. Sirica, A.E. Biliary proliferation <strong>and</strong> adaptation in furan-induced rat liver injury <strong>and</strong>carcinogenesis. Toxicol. Pathol. 24, 90–99 (1996).44. Kaufmann, P. et al. Toxicity of statins on rat skeletal muscle mitochondria. Cell.Mol. Life Sci. 63, 2415–2425 (2006).45. Masuda, Y. Learning toxicology from carbon tetrachloride-induced hepatotoxicity.Yakugaku Zasshi 126, 885–899 (2006).46. Mehendale, H.M. Mechanism of <strong>the</strong> lethal interaction of chlordecone <strong>and</strong> CCl 4 atnon-toxic doses. Toxicol. Lett. 49, 215–241 (1989).47. Thulesen, J., Jorgensen, P.E., Torffvit, O., Nexo, E. & Poulsen, S.S. Urinary excretionof epidermal growth factor <strong>and</strong> Tamm-Horsfall protein in three rat models withincreased renal excretion of urine. Regul. Pept. 72, 179–186 (1997).48. Croxatto, H.R., Huidrobro, R., Rojas, M., Roblero, J. & Albertini, R. Effect of watersodium overloading <strong>and</strong> diuretics upon urinary kallikrein. Agents Actions 6, 420(1976).49. Baracho, N.C.V., Simoes-e-Silva, Khosla, M.C. & Santos, R.A.S. Effect of selectiveangiotensin antagonists on <strong>the</strong> antidiuresis produced by angiotensin-(1–7) in waterloadedrats. Braz. J. Med. Biol. Res. 31, 1221–1227 (1998).50. Mattes, W.B. & Walker, E.G. Translational toxicology <strong>and</strong> <strong>the</strong> work of <strong>the</strong> predictivesafety testing consortium. Clin. Pharmacol. Ther. 85, 327–330 (2009).51. Ky, B. & Shughrue, P.J. Methods to enhance signal using isotopic in situhybridization. J. Histochem. Cytochem. 50, 1031–1037 (2002).52. DeLong, E.R., Delong, D.M. & Clarkepearson, D.I. Comparing <strong>the</strong> areas under 2 ormore correlated receiver operating characteristic curves a nonparametric approach.Biometrics 44, 837–845 (1988).© 2010 Nature America, Inc. All rights reserved.doi:10.1038/nbt.1624nature biotechnology


A rt i c l e sKidney injury molecule-1 outperforms traditionalbiomarkers of kidney injury in preclinical biomarkerqualification studies© 2010 Nature America, Inc. All rights reserved.Vishal S Vaidya 1 , Josef S Ozer 2,8 , Frank Dieterle 3 , Fitz B Collings 1 , Victoria Ramirez 1 , Sean Troth 4 , Nagaraja Muniappa 4 ,Douglas Thudium 2 , David Gerhold 2 , Daniel J Holder 5 , Norma A Bobadilla 6 , Estelle Marrer 3 , Elias Perentes 3 ,André Cordier 3 , Jacky Vonderscher 3 , Gérard Maurer 3 , Peter L Goering 7 , Frank D Sistare 2 & Joseph V Bonventre 1Kidney toxicity accounts both for <strong>the</strong> failure of many drug c<strong>and</strong>idates as well as considerable patient morbidity. Whereashistopathology remains <strong>the</strong> gold st<strong>and</strong>ard for nephrotoxicity in animal systems, serum creatinine (SCr) <strong>and</strong> blood urea nitrogen(BUN) are <strong>the</strong> primary options for monitoring kidney dysfunction in humans. The transmembrane tubular protein kidney injurymolecule-1 (Kim-1) was previously reported to be markedly induced in response to renal injury. Owing to <strong>the</strong> poor sensitivity<strong>and</strong> specificity of SCr <strong>and</strong> BUN, we used rat toxicology studies to compare <strong>the</strong> diagnostic performance of urinary Kim-1 to BUN,SCr <strong>and</strong> urinary N-acetyl--d-glucosaminidase (NAG) as predictors of kidney tubular damage scored by histopathology. Kim-1outperforms SCr, BUN <strong>and</strong> urinary NAG in multiple rat models of kidney injury. Urinary Kim-1 measurements may facilitatesensitive, specific <strong>and</strong> accurate prediction of human nephrotoxicity in preclinical drug screens. This should enable earlyidentification <strong>and</strong> elimination of compounds that are potentially nephrotoxic.Acute kidney injury (AKI) is a common <strong>and</strong> devastating clinical problemwith an in-hospital mortality of 40–80% in <strong>the</strong> intensive care setting 1 .<strong>Drug</strong>-induced nephrotoxicity plays a major role in <strong>the</strong> high incidence <strong>and</strong>prevalence of AKI in both hospitalized <strong>and</strong> nonhospitalized individuals 2 .Nephrotoxicity seen in animal toxicology studies is also a major factor in <strong>the</strong>failure of drug c<strong>and</strong>idates because of <strong>the</strong> lack of good kidney biomarkers formonitoring kidney injury. Traditional markers of renal injury, SCr, BUN,urine sediment <strong>and</strong> urinary indices (e.g., fractional excretion of sodium<strong>and</strong> urine osmolality), lack <strong>the</strong> sensitivity <strong>and</strong>/or specificity to adequatelydetect nephrotoxicity before considerable loss of renal function. NAG, aproximal tubular brush border lysosomal enzyme, is released into <strong>the</strong> urineafter renal proximal tubule injury <strong>and</strong> has been proposed to be a sensitive<strong>and</strong> robust indicator of kidney damage in rodents <strong>and</strong> humans 3–7 . Given“renal reserve” <strong>and</strong> each of <strong>the</strong> test’s sensitivity, minimal histopathologicfindings are often undetectable using <strong>the</strong>se traditional biomarkers. Thereis thus an urgent need for improved <strong>and</strong> noninvasive renal biomarkers topermit early detection of AKI, assess <strong>the</strong> severity of injury, <strong>and</strong> aid in predictivesafety assessment during drug development by resolving ambiguitiesassociated between humans <strong>and</strong> animal test species 8 .Kim-1 (also known as T cell immunoglobulin <strong>and</strong> mucin (TIM-1) <strong>and</strong>hepatitis A virus cellular receptor 1 (HAVCR-1)) is a type I cell membraneglycoprotein containing a unique six-cysteine immunoglobulin-likedomain <strong>and</strong> a mucin-rich extracellular region that is conserved acrossspecies in zebrafish, rodents, dogs, primates <strong>and</strong> humans 9 . Kim-1 is aphosphatidylserine receptor on renal epi<strong>the</strong>lial cells that recognizesapoptotic cells, directing <strong>the</strong>m to lysosomes <strong>and</strong> <strong>the</strong>reby converting <strong>the</strong>normal proximal tubule cell into a phagocyte 10 . Kim-1 mRNA levelsare elevated more than any o<strong>the</strong>r known gene across <strong>the</strong>se speciesafter initiation of kidney injury <strong>and</strong> <strong>the</strong> protein is localized at veryhigh levels on <strong>the</strong> apical membrane of proximal tubule in that regionwhere <strong>the</strong> tubule is most affected 9,11 . After injury, <strong>the</strong> ectodomainof Kim-1 is shed from proximal tubular kidney epi<strong>the</strong>lial cells intourine in rodents 3,9,11–14 <strong>and</strong> humans 4,5,15,16 . Urinary Kim-1 has beenshown to be a sensitive <strong>and</strong> early diagnostic indicator of renal injuryin a variety of acute <strong>and</strong> chronic rodent kidney injury models, resultingfrom drugs 3,14,17 , environmental toxicants 13,14,18 , ischemia 3 <strong>and</strong>protein overload 19 . However, studies thus far have lacked <strong>the</strong> abilityto systematically evaluate <strong>the</strong> performance characteristics of urinaryKim-1 along with traditional biomarkers using different grades of histopathologicaldamage as a benchmark of kidney damage.The primary objective of this study was to comprehensivelyevaluate <strong>the</strong> relative sensitivity <strong>and</strong> specificity of urinary Kim-1 as1 Renal Division, Department of Medicine, Brigham <strong>and</strong> Women’s Hospital, Harvard Medical School, Boston, Massachusetts, USA. 2 Department of InvestigativeLaboratory Sciences, Safety Assessment, Merck Research Laboratories, West Point, Pennsylvania, USA. 3 Translational Sciences, Novartis Institutes for BioMedicalResearch, Basel, Switzerl<strong>and</strong>. 4 Department of Pathology, Safety Assessment, Merck Research Laboratories, West Point, Pennsylvania, USA. 5 Department ofBiometrics, Merck Research Laboratories, West Point, Pennsylvania, USA. 6 Molecular Physiology Unit, Instituto de Investigaciones Biomédicas, Universidad NacionalAutónoma de México <strong>and</strong> Instituto Nacional de Ciencias Médicas y Nutrición Salvador Zubirán, Mexico City, Mexico. 7 Center for Devices <strong>and</strong> Radiological Health,US Food <strong>and</strong> <strong>Drug</strong> Administration, Silver Spring, Maryl<strong>and</strong>, USA. 8 Present address: Pharmacokinetics, Dynamics, <strong>and</strong> Metabolism, PGRD, Pfizer, AndoverLaboratories, Andover, Massachusetts, USA. Correspondence should be addressed to: V.S.V. (vvaidya@partners.org) or J.S.O. (josef_ozer@merck.com) orF.D. (frank.dieterle@novartis.com).Received 8 October 2009; accepted 22 March 2010; published online 10 May 2010; doi:10.1038/nbt.1623478 VOLUME 28 NUMBER 5 MAY 2010 nature biotechnology


A rt i c l e s© 2010 Nature America, Inc. All rights reserved.a nephrotoxicity marker, relative to BUN, SCr <strong>and</strong> urinary NAG. Toallow <strong>the</strong> generalizability <strong>and</strong> reproducibility of <strong>the</strong> results, rodenttoxicology studies with unique <strong>and</strong> overlapping nephrotoxicants wereconducted at two different sites, Novartis (Basel) <strong>and</strong> Merck ResearchLaboratories (West Point, New Jersey, USA). Using a clinically relevantmodel of bilateral renal ischemia/reperfusion injury to <strong>the</strong> kidney, wetested ten well-established nephrotoxicants, three hepatotoxicants <strong>and</strong>a cardiotoxicant to correlate <strong>the</strong> diagnostic performance of urinaryKim-1, BUN, SCr <strong>and</strong> urinary NAG with histopathology as a benchmark.Composite area under <strong>the</strong> receiver operating characteristicscurve (AUC-ROC) analysis enabled us to evaluate <strong>the</strong> relative sensitivities<strong>and</strong> specificities of urinary Kim-1, BUN, SCr <strong>and</strong> NAG oversubsets of histomorphologic scores using different ranges of severitygrades. Our secondary objective involved testing a microbead-basedassay for quantifying Kim-1 abundance, with <strong>the</strong> goal of increasingthroughput of analyses involving this biomarker 20 . We conclude that<strong>the</strong> use of Kim-1 as a marker of nephrotoxicity could help to reduce<strong>the</strong> rate of attrition during clinical drug development, as well as aidingpost-<strong>marketing</strong> surveillance of drug-related nephrotoxicity.aKim-1 mRNA levels in kidneybKim-1 protein levels in kidney1,0001001011,000100101RESULTSKim-1 gene <strong>and</strong> protein expressionWe first used <strong>the</strong> Rat Genome 230 2.0 Array <strong>and</strong> Affymetrix MicroArray Suite 5.0 (MAS5) normalization to measure changes in Kim-1gene expression in 48 organs/structures, blood <strong>and</strong> bone marrowobtained from control <strong>and</strong> high-dosed animals at various time pointsin numerous studies that included up to 45 active drug entities (manyreference compounds <strong>and</strong> drugs known to be toxic to liver, cardiac,skeletal muscle, central nervous system, gastrointestinal, lung, bone<strong>and</strong> testis). Average raw intensity values <strong>and</strong> st<strong>and</strong>ard deviations forKim-1 expression were very low (“absent” according to Affymetrixst<strong>and</strong>ards) across all organs analyzed in <strong>the</strong> control animals (baseline)(Supplementary Fig. 1). Only blood cells, lymph nodes, spleen <strong>and</strong>lachrymal gl<strong>and</strong>s had reliably detectable baseline Kim-1 expression.The baseline level of Kim-1 was very low in kidney, <strong>and</strong> only after kidneytoxicity, as detected by histopathology, was a >100-fold increase ofKim-1 expression evident (Fig. 1a). Kim-1 expression did not changein any of <strong>the</strong> o<strong>the</strong>r organs demonstrating <strong>the</strong> specificity of Kim-1 forkidney injury (Supplementary Fig. 1).dSCr (fold-change)eBUN (fold-change)2.2.2.01.81.61.41.21.00.8Grade 0Grade 14321Grade 2Grade 3cf10Urinary Kim-1 (fold-change)100101Urinary NAG (fold-change)10.1Cisplatin Vancomycin Puromycin Lithium MethapyrileneGentamicin Tacrolimus Doxorubicin Furosemide ANITCisplatin Vancomycin Puromycin Lithium MethapyrileneGentamicin Tacrolimus Doxorubicin Furosemide ANITFigure 1 Correlation of Kim-1 mRNA <strong>and</strong> protein levels in <strong>the</strong> kidney <strong>and</strong> urine, <strong>and</strong> comparison of urinary Kim-1 levels with SCr, BUN <strong>and</strong> urinaryNAG with severity grades of histopathology following a dose response <strong>and</strong> time course in ten Novartis rat toxicology studies. (a–c) Male Han Wistar rats(n = 739) were dosed with a low, medium or high dose of eight mechanistically distinct nephrotoxicants <strong>and</strong> two hepatotoxicants, <strong>and</strong> renal Kim-1mRNA (a), renal Kim-1 protein (b) <strong>and</strong> urinary Kim-1 protein levels (c) were measured. (d–f) Conventional markers for kidney toxicity including SCr (d),BUN (e) <strong>and</strong> urinary NAG (f) were also measured <strong>and</strong> compared to different grades of kidney tubular histopathology. All values are represented as foldchangesversus <strong>the</strong> average values of study-matched <strong>and</strong> time-matched control animals on a logarithmic scale. The animals are ordered by study, withineach study by dose group (with increasing doses) <strong>and</strong> within each dose group by termination time point (with increasing time). The symbols <strong>and</strong> <strong>the</strong>colors represent <strong>the</strong> histopathology readout for proximal tubular damage (red = no histopathology finding observed, green = grade 1, blue = grade 2,black = grade 3 on a 5-grade scale). For each toxicant <strong>the</strong> animals are ordered left to right by dose group (low to high). For each dose <strong>the</strong> animals areordered from left to right by termination time point. The magenta lines represent <strong>the</strong> thresholds determined for 95% specificity in <strong>the</strong> ROC analysis forall histopathology grades. ANIT, α-naphthyl isothiocyanate.nature biotechnology VOLUME 28 NUMBER 5 MAY 2010 479


A rt i c l e s© 2010 Nature America, Inc. All rights reserved.A dose-response <strong>and</strong> time-course study involving eight nephrotoxicants<strong>and</strong> two hepatotoxicants confirmed that Kim-1 gene expression(Fig. 1a) in <strong>the</strong> kidney was correlated with Kim-1 protein levels in<strong>the</strong> kidney (Fig. 1b) <strong>and</strong> urine (Fig. 1c). These studies were conductedat Novartis using gentamicin, cisplatin (Platinol), vancomycin(Vancocin) <strong>and</strong> tacrolimus (Protopic, Prograf), all proximal tubulartoxicants; puromycin <strong>and</strong> doxorubicin (Doxil, Adriamycin), both glomerulartoxicants; furosemide (Lasix) <strong>and</strong> lithium (Eskalith), both tubular<strong>and</strong> collecting duct toxicants; <strong>and</strong> <strong>the</strong> two hepatotoxicants α-naphthylisothiocyanate <strong>and</strong> methapyrilene.We developed a microbead-based assay to measure Kim-1 proteinusing a pair of epitopically distinct mouse monoclonal antibodiesagainst rat Kim-1 ectodomain. An important advantage of <strong>the</strong>microbead-based assay over previously established enzyme-linkedimmunosorbent assay (ELISA) methods 3 is <strong>the</strong> exp<strong>and</strong>ed dynamicrange (from 4 pg/ml to 40,000 pg/ml), which eliminates <strong>the</strong> need todilute <strong>the</strong> urine samples. O<strong>the</strong>r advantages of this assay include <strong>the</strong>ability to quantify Kim-1 using only 30 µl of undiluted urine samples<strong>and</strong> reducing <strong>the</strong> assay time from 6 h to 3.5 h, while maintaining inter<strong>and</strong>intra-assay variability between Kim-1 values


A rt i c l e s© 2010 Nature America, Inc. All rights reserved.aSensitivitybSensitivitycSensitivity0.80.400.80.400.8Kim-1NAGBUNSCrAUC SENS0.91 0.790.82 0.520.79 0.520.73 0.44AUC SENSKim-1 0.91 0.79NAG 0.82 0.52BUN 0.78 0.51SCr 0.73 0.43AUC SENS0.4 Kim-1 0.88 0.71NAG 0.80 0.47BUN 0.76 0.480SCr 0.67 0.290 0.2 0.4 0.6 0.8 1.01 – specificity<strong>and</strong> thioacetamide). Elevations in urinary Kim-1 were closely correlatedwith gentamicin-induced damage to kidney tubules detected byhistopathology. On days 9 <strong>and</strong> 15 of treatment, mid-dose (80 mg/kg/d)-treated animals showed mean normalized urinary Kim-1 levelsthat were elevated 11- <strong>and</strong> 40-fold, respectively, compared to meanconcurrent control values. High-dose (240 mg/kg/d)-treated animalson days 3, 9 <strong>and</strong> 12 showed 23-, 117-, 163-fold increased levels ofurinary Kim-1, respectively (Fig. 3). Tubular degeneration, necrosis<strong>and</strong> regeneration observed at days 9 <strong>and</strong> 12 in animals treated withhigh-dose gentamicin corresponded to an ~100-fold elevation of urinaryKim-1. By comparison, lower doses (20 mg/kg/d) were associatedwith a lower incidence <strong>and</strong> severity of tubular degeneration, necrosis<strong>and</strong> regeneration, as well as smaller elevations in Kim-1 levels at20 mg/kg/d at day 15, or at 80 mg/kg/d as early as day 3, which persistedto days 9 <strong>and</strong> 15. After treatment with <strong>the</strong> highdose of gentamicin, normalized NAG activitywas elevated nearly tenfold on day 3 (Fig. 3).At days 12 <strong>and</strong> 15, NAG activity levels wereelevated for <strong>the</strong> mid- <strong>and</strong> high-dose animals,with corresponding histomorphologic severitygrades of 2 <strong>and</strong> 5, respectively (Fig. 3).Figure 3 Correlation of BUN, SCr, urinaryKim-1 <strong>and</strong> urinary NAG with severity gradesof histopathologic change after gentamicintreatment in <strong>the</strong> Merck study. Male SpragueDawley rats were administered gentamicinsulfate intraperitoneally at 0, 20, 80 or240 mg/kg/d to groups of five rats/dose/time point<strong>and</strong> <strong>the</strong> animals were euthanized on days 3, 9or 15 for toxicity evaluation, which includedserum clinical chemistry (BUN, SCr), urinaryKim-1 <strong>and</strong> NAG levels <strong>and</strong> renal histopathology(H&E staining). Open squares indicate grade 0pathology <strong>and</strong> <strong>the</strong> composite tubular severityscore is color coded from yellow (1), orange (2),purple (4) to blue (5). Black circles indicateaverage values of dose groups.CdAUC from ROCeSensitivity from ROC0.90.80.70.60.80.60.4BUN (mg/dl)Kim-1/uCr (ng/mg)2001007050403020102010510.50.10.05KIM1NAGBUNSCrKIM1BUNNAGSCr0.2ALL 0 to 2 0 <strong>and</strong> 1Histopathology grade subsetFigure 2 ROC analysis for Novartis studies. (a–e) ROC curves from eightdifferent nephrotoxicant studies <strong>and</strong> two different hepatotoxicant studiesfrom Novartis, demonstrating sensitivity <strong>and</strong> specificity of BUN, SCr, urinaryKim-1 <strong>and</strong> NAG with respect to a composite histopathology score thatincluded all histopathology grades (a), histopathology grade 0 to 2 (b) <strong>and</strong>histopathology grade 0 to 1 (c). (d,e) Area under <strong>the</strong> curve (d) <strong>and</strong> sensitivity(e) (at 95% specificity) compared to <strong>the</strong> ‘gold st<strong>and</strong>ard’, histopathology.Urinary Kim-1 <strong>and</strong> NAG were normalized to urinary creatinine. Animalnumbers, n. Negative: n = 283. Positive: all, n = 132; 0 to 2, n = 129;0 to 1, n = 94.Only animals treated with <strong>the</strong> high dose showed markedly elevated SCrlevels >1.5 mg/dl at day 9, but SCr levels were normal at day 3. Similarly,BUN elevations were seen only in high-dose animals at day 9.Urinary <strong>and</strong> serum biomarker elevations with cisplatin-inducednephrotoxicityNormalized urinary Kim-1 levels were elevated after mid-dose(3.5 mg/kg) treatment with cisplatin at both days 3 <strong>and</strong> 8 (20- <strong>and</strong> 97-fold,respectively), where severity grade 2 <strong>and</strong> 4 overall tubular damagescores were observed (Fig. 4). Similarly, normalized urinary Kim-1 waselevated in high-dose (7 mg/kg) cisplatin-treated animals both at day 3(11-fold) <strong>and</strong> day 8 (48-fold), corresponding with severity grade 2 <strong>and</strong>5 overall tubular damage scores, respectively (Fig. 4). A mean increaseof approximately ninefold was seen in low dose (0.5 mg/kg) animals atday 3, which trended down at day 8. With cisplatin treatment, normalizedNAG values were elevated over twofold at day 3 in animals showingtubular grade 2 overall tubular damage at <strong>the</strong> high dose, but not inanimals with grade 2 overall tubular damage at <strong>the</strong> mid-dose. At day8, urinary NAG activity did not change in mid- <strong>and</strong> high-dose–treatedanimals with severity grade 4 to 5 overall tubular damage (Fig. 4). Therewere treatment-related increases in BUN <strong>and</strong> SCr in <strong>the</strong> mid- <strong>and</strong> highdosetreatment groups at both days 3 <strong>and</strong> 8 with markedly higher elevationsfor both at day 8 after high-dose treatment.Urinary <strong>and</strong> serum biomarker elevations after nephrotoxicityinduced by cyclosporine A <strong>and</strong> thioacetamideFor rats treated with 0, 6, 30 or 60 mg/kg/d cyclosporin A for 3, 9or 15 d, we observed subtle tubular basophilia of <strong>the</strong> regenerativetype at severity grades 1 <strong>and</strong> 2 in most mid-dose animals at day 15,Day 3 Day 9 (12) Day 15Day 3 Day 9 (12) Day 1564 SCrSCr (mg/dl)NAG/uCr (IU/mg) × 10 4210.80.60.4600400200100800 60140245 200 20 80 240 0 20 80 240 240 0 20 800 20 80 240 0 20 80 240 240 0 20 80Gentamicin (mg/kg/d)Gentamicin (mg/kg/d)nature biotechnology VOLUME 28 NUMBER 5 MAY 2010 481


A rt i c l e s© 2010 Nature America, Inc. All rights reserved.BUN (mg/dl)Kim-1/uCr (ng/mg)400200100806040202010510.50.10.05Day 3 Day 8Day 3Day 88640 0.5 3.5Cisplatin (mg/kg)7 0 0.5 3.5 7all high-dose animals on day 15 <strong>and</strong> one high-dose animal at day 9(Supplementary Fig. 2). Elevations of Kim-1 were seen in all animalswith histomorphologic changes (Supplementary Fig. 2), whereaselevated NAG activity was seen in nearly half of <strong>the</strong> animals withtubular regeneration. With cyclosporin A treatment, modest elevationsin BUN were observed in all animals with histomorphologicchanges on day 15 after high-dose treatment. In contrast, we did notobserve increases in SCr associated with histomorphologic changes.Thioacetamide (TAA) has been reported as a model nephrotoxicantof proximal tubule injury 23 . A 2- <strong>and</strong> 3-d TAA studywas performed using single administrations of ei<strong>the</strong>r 50, 100 or200 mg/kg (Supplementary Fig. 3). We observed both liver <strong>and</strong>kidney histomorphologic changes, including tubular degeneration<strong>and</strong> necrosis, at all doses <strong>and</strong> on both days. The observed tubularhistopathologic changes were severity grade 1 <strong>and</strong> 2 for day-2 animals<strong>and</strong> increased in a dose-dependent manner on day 3. UrinaryKim-1 levels was <strong>the</strong> most sensitive biomarker of toxicity, with34- <strong>and</strong> 36-fold increases in concentration at days 2 <strong>and</strong> 3,respectively, already seen with low-dose treatment (SupplementaryFig. 3). At <strong>the</strong> mid-dose, Kim-1 levels were increased 12- <strong>and</strong> sixfoldat days 2 <strong>and</strong> 3, respectively, <strong>and</strong> about 18-fold at <strong>the</strong> high dose onboth days. Urinary NAG activity increased in a dose-dependentmanner at both days 2 <strong>and</strong> 3 (Supplementary Fig. 3). Significantelevations in BUN <strong>and</strong> SCr were observed only on day 3 after mid<strong>and</strong>high-dose treatment (P < 0.05).Sensitivity <strong>and</strong> specificity of urinary Kim-1 <strong>and</strong> NAG, BUN<strong>and</strong> SCr with respect to kidney histopathology caused by fournephrotoxicantsThe performance of Kim-1 in <strong>the</strong> Merck studies as measured byAUC from <strong>the</strong> ROC analysis (Fig. 5 <strong>and</strong> Supplementary Table 2) was01245NAG/uCr (IU/mg) × 10 4 SCr (mg/dl)210.80.60.420010070504030200consistently high (>0.99 by exclusion analysis<strong>and</strong> >0.96 by inclusion analysis) for all of<strong>the</strong> histomorphologic severity grade subsets(Fig. 5a,e) (Supplementary Table 2). For <strong>the</strong>analysis that uses all of <strong>the</strong> samples, <strong>the</strong> differencein AUC between Kim-1 <strong>and</strong> NAG was0.12 by both exclusion <strong>and</strong> inclusion analyses(P < 0.001). The difference in AUC betweenKim-1 <strong>and</strong> SCr was 0.15 by exclusion analysis(P < 0.001) <strong>and</strong> 0.10 by inclusion analysis(P < 0.001) <strong>and</strong> <strong>the</strong> difference in AUCbetween Kim-1 <strong>and</strong> BUN was 0.09 by exclusionanalysis (P < 0.001) <strong>and</strong> 0.12 by inclusionanalysis (P < 0.001) (Supplementary Table 3).The difference in AUC between Kim-1 <strong>and</strong>NAG increased from 0.12, when all nephrotoxicitysamples were used, to 0.26, whenthose with a severity grade 0 <strong>and</strong> 1 wereused; between Kim-1 <strong>and</strong> BUN it increasedfrom 0.09, using all nephrotoxicity samples,to 0.22, using only <strong>the</strong> severity grade 0 <strong>and</strong>1 nephrotoxicity sample subsets. This indicatesa lower correlation of NAG <strong>and</strong> BUNto histomorphologic change when a moresensitive morphologic metric was employed(Fig. 5a,d,e). Similarly <strong>the</strong> difference in AUCbetween Kim-1 <strong>and</strong> SCr increased from 0.15,using all nephrotoxicity samples, to 0.37,using only <strong>the</strong> severity grade 0 <strong>and</strong> 1 nephrotoxicitysample subsets (Fig. 5a,d,e).The sensitivity, or proportion of positives correctly identifiedat a threshold that yields 95% specificity, for Kim-1 was 0.99 forall of <strong>the</strong> nephrotoxicity histopathology severity grade subsets(Fig. 5a,d,f). The sensitivity for NAG decreased from 0.56 usingall nephrotoxicity samples to 0.20 for severity grade 0 <strong>and</strong> 1 subset(Fig. 5f <strong>and</strong> Supplementary Table 2). Similarly <strong>the</strong> sensitivity forBUN <strong>and</strong> SCr also decreased from 0.71 <strong>and</strong> 0.68, respectively, usingall nephrotoxicity samples, to 0.45 <strong>and</strong> 0.20, respectively, usingseverity grade 0 <strong>and</strong> 1 nephrotoxicity sample subsets (Fig. 5f <strong>and</strong>Supplementary Table 2). Both inclusion <strong>and</strong> exclusion analysisshow that unlike BUN, SCr <strong>and</strong> NAG, <strong>the</strong> performance of Kim-1 isuniformly high within <strong>the</strong> full range of nephrotoxicity subsetsanalyzed (Supplementary Tables 2 <strong>and</strong> 3).0.5 3.5 7 0 0.5 3.5 7Cisplatin (mg/kg)Figure 4 Correlation of BUN, SCr, urinary Kim-1 <strong>and</strong> urinary NAG with severity grades ofhistopathologic change after cisplatin nephrotoxicity treatment in <strong>the</strong> Merck study. Male SpragueDawley rats were administered cisplatin intraperitoneally (n = 5/dose/time point) at doses of 0,0.5, 3.5 or 7 mg/kg <strong>and</strong> rats were killed on days 3 <strong>and</strong> 8 for toxicity evaluation, which includedserum clinical chemistry (BUN, SCr), urinary Kim-1 <strong>and</strong> NAG levels <strong>and</strong> renal histopathology (H&Estaining). Open squares indicate grade 0 pathology <strong>and</strong> <strong>the</strong> composite tubular severity score is colorcoded from yellow (1), orange (2), purple (4) <strong>and</strong> blue (5). Black circles indicate average values ofdose groups. uCr, urinary creatinine.Specificity of urinary Kim-1 as a biomarker for kidney injuryWe measured urinary Kim-1 levels in well-established models ofhepato- <strong>and</strong> cardiotoxicity at Merck to assess <strong>the</strong> specificity of increasesin Kim-1 levels associated with renal damage. Bromotrichloromethane(CBrCl 3 ) induced substantial hepatotoxicity in rats on days 2 <strong>and</strong> 4 atboth low <strong>and</strong> high doses, as assessed by plasma levels of alanine aminotransferase(ALT) aspartate aminotransferase (AST) proteins <strong>and</strong>histopathology scoring (necrosis <strong>and</strong> degeneration). None<strong>the</strong>less, it producedno treatment-related kidney toxicity (Supplementary Table 4).Urinary Kim-1 levels were similar between controls <strong>and</strong> CBrCl 3 -treatedanimals with liver injury. Isoproterenol induced necrosis <strong>and</strong> degenerationof both cardiac <strong>and</strong> skeletal muscle with histomorphologic changesat 3 <strong>and</strong> 8 d after a dose of 1 mg/kg/d, yet did not cause increases inurinary Kim-1 levels. Our observation that changes in urinary Kim-1levels were unremarkable after toxicant-induced hepatotoxicity <strong>and</strong>cardiotoxicity in rats fur<strong>the</strong>r supports <strong>the</strong> specificity of Kim-1 for renaldamage (Supplementary Table 4).482 VOLUME 28 NUMBER 5 MAY 2010 nature biotechnology


A rt i c l e s© 2010 Nature America, Inc. All rights reserved.a b eSensitivitySensitivity0.80.400.80.40AUC SENSKim-1NAG1.000.880.990.56BUNSCr0.900.850.710.680 0.2 0.4 0.6 0.8 1.01 – specificity0 0.2 0.4 0.6 0.8 1.01 – specificitySensitivity0.4Sensitivity 0.80.80.4Comparison of urinary Kim-1 with o<strong>the</strong>r markers in <strong>the</strong> ratmodel of renal ischemia/reperfusion injury (I/R)We used a rat model of 20-min bilateral renal I/R to show an approximatelythree- <strong>and</strong> sixfold increase in urinary Kim-1 as compared to sham3 h <strong>and</strong> 6 h after reperfusion, respectively. Urinary Kim-1 levels peakedat 24 h (700-fold increase) <strong>and</strong> plateaued to levels persistently abovebaseline at 96 <strong>and</strong> 120 h (~70-fold increase)50after reperfusion (Fig. 6). This time course45correlated with <strong>the</strong> histological changes of <strong>the</strong>40kidney with grade 1 proximal tubular damage35at 3 h <strong>and</strong> 6 h, <strong>and</strong> single cell necrosis, tubulardilation <strong>and</strong> sloughing of cells in tubules3025of <strong>the</strong> outer stripe of <strong>the</strong> outer medulla at209 h after I/R. At 12 <strong>and</strong> 24 h <strong>the</strong>re was substantialproximal tubular necrosis with associated1510inflammation <strong>and</strong> cast formation classified0as grade 4 <strong>and</strong> 5 histopathology, respectively.50Modest <strong>and</strong> transient elevations in BUN (~1.4fold), SCr (~1.5 fold) <strong>and</strong> NAG (3.2 fold) were10observed only at earlier time points (between53 <strong>and</strong> 9 h) after reperfusion. Statistically significantincreases in urinary NAG activity were10.5observed at 12 h after reperfusion with ~5.5fold elevation <strong>and</strong> at 18 h for BUN (~2.1 fold)0.1<strong>and</strong> SCr (~2.4 fold) (Fig. 6).Threshold determination comparisonsFor <strong>the</strong> purposes of obtaining uniformity indata interpretation, it is important that <strong>the</strong>thresholds derived from different biomarkerstudy data sets for a specific predefinedsensitivity or specificity should be <strong>the</strong> samefor practical general utility. For <strong>the</strong> Novartisstudies, ROC analysis defined a threshold000 0.2 0.4 0.6 0.8 1.01 – specificityc d fAUC SENSKim-1 1.00 0.98NAG 0.85 0.43BUN 0.87 0.59SCr 0.77 0.51AUC SENSKim-1 1.00 0.98NAG 0.85 0.46BUN 0.88 0.63SCr 0.79 0.56AUC SENSKim-1 1.00 1.00NAG 0.73 0.20BUN 0.78 0.45SCr 0.62 0.200 0.2 0.4 0.6 0.8 1.01 – specificityBUN (mg/dl)Kim-1/uCr (ng/mg)0AUC from ROCSensitivity01.00.90.80.71.00.90.80.70.6KIM1BUNNAGSCrALL 0 to 3 0 to 2 0 <strong>and</strong> 1Histopathology grade subsetNAGKIM1BUNSCrALL 0 to 3 0 to 2 0 <strong>and</strong> 1Histopathology grade subsetFigure 5 ROC analysis for Merck studies. (a–d) ROC curves from four different nephrotoxicant studiesshowing sensitivity <strong>and</strong> specificity of BUN, SCr, urinary Kim-1 <strong>and</strong> NAG with respect to a compositehistopathology score that included all histopathology grades (a), histopathology grade 0 to 3 (b),histopathology grade 0 to 2 (c) <strong>and</strong> histopathology grade 0 <strong>and</strong> 1 (d). (e,f) Area under <strong>the</strong> curve(e) <strong>and</strong> sensitivity (f) (at 95% specificity) of BUN, SCr, urinary Kim-1 <strong>and</strong> NAG compared to <strong>the</strong>gold st<strong>and</strong>ard, histopathology. Urinary Kim-1 <strong>and</strong> NAG were normalized to urinary creatinine. Animalnumbers, n. Negative: n = 45. Positive: all, n = 75; 0 to 3, n = 54; 0 to 2, n = 49; 0 to 1, n = 20.for 95% specificity of 1.87-fold increase (byexclusion analysis <strong>and</strong> 3.9-fold by inclusionanalysis) (Supplementary Table 2). For <strong>the</strong>Merck studies, ROC analysis defined a thresholdfor 95% specificity of 1.88-fold increase(Supplementary Table 2). Similarly for SCr, <strong>the</strong>threshold cutoffs from <strong>the</strong> Novartis <strong>and</strong> Merckstudies were 1.14-fold <strong>and</strong> 1.2-fold, for BUN1.2 <strong>and</strong> 1.3-fold <strong>and</strong> for NAG 1.4 <strong>and</strong> 2.4 -fold,respectively. As <strong>the</strong> threshold for 95% specificityis mainly determined by <strong>the</strong> variance of <strong>the</strong>control animals, one can conclude that despitedifferent rat strains, study designs <strong>and</strong> assaysetups, <strong>the</strong> urinary Kim-1, SCr <strong>and</strong> BUN levelsacross control animals are highly reproducible,<strong>and</strong> <strong>the</strong> magnitude response necessary tosignal that a significant deviation from normalequates to pathology is also consistent.Logistic regression models were fit toassess whe<strong>the</strong>r Kim-1 <strong>and</strong> NAG add informationto models that rely on SCr <strong>and</strong> BUN.The results show that both Kim-1 <strong>and</strong> NAGprovide substantial additional information(Supplementary Table 5). Using exclusionanalysis with <strong>the</strong> binary logistic model on<strong>the</strong> Novartis data, <strong>the</strong> addition of Kim-1was statistically significant (P < 1.0E-05)<strong>and</strong> increased <strong>the</strong> concordance probability(C, equivalent to AUC from ROC in this case) by 0.159, <strong>the</strong> R2 by 0.37<strong>and</strong> IDI by 0.35. The P-value from a likelihood ratio test, <strong>the</strong> concordanceprobability, C 24 , an R2 25 statistic, <strong>and</strong> integrated discriminationimprovement index, IDI 26 were used to evaluate <strong>the</strong> improvementgained by <strong>the</strong> addition of each marker to a model containing SCrSCr (mg/dl)0.504,5004,0003,5000 3,000122,5003 2,00045 1,5001,00050003 6 9 12 18 24 48 72 96 1200 3 6 9 12 18 24 48 72 96 120Hours after bilateral renal I/R injuryHours after bilateral renal I/R injuryFigure 6 Comparison of Kim-1 with routinely used biomarkers as an early diagnostic indicator ofkidney injury after 20-min bilateral I/R. Male Wistar rats were subjected to 0 (sham) or 20 min ofbilateral ischemia by clamping <strong>the</strong> renal pedicles for 20 min <strong>and</strong> <strong>the</strong>n removing <strong>the</strong> clamps <strong>and</strong>confirming reperfusion. Two hours after reperfusion <strong>the</strong> rats were placed in metabolic cages <strong>and</strong> urine,blood <strong>and</strong> tissue collected at 3, 6, 9, 12, 18, 24, 48, 72, 96 <strong>and</strong> 120 h after reperfusion. UrinaryKim-1, BUN, SCr <strong>and</strong> urinary NAG were measured <strong>and</strong> <strong>the</strong>se levels were correlated to histopathology(H&E staining). Open squares indicate grade 0 pathology <strong>and</strong> <strong>the</strong> composite tubular severity score iscolor coded yellow (1), orange (2), red (3), purple (4) <strong>and</strong> blue (5).NAG/uCr (IU/mg) × 10 42.52.01.51.0nature biotechnology VOLUME 28 NUMBER 5 MAY 2010 483


A rt i c l e s© 2010 Nature America, Inc. All rights reserved.<strong>and</strong> BUN. The addition of NAG for <strong>the</strong> SCr+BUN model was statisticallysignificant (P < 1.0E-05) <strong>and</strong> increased C by 0.052, R2 by0.105 <strong>and</strong> IDI by 0.08. For <strong>the</strong> Merck data, Kim-1 was statisticallysignificant (P < 1.0E-05), <strong>and</strong> increased C by 0.059, R2 by 0.279 <strong>and</strong>IDI by 0.267. In <strong>the</strong> Merck data, NAG was statistically significant(P = 0.021) <strong>and</strong> increased C by 0.004, R2 by 0.028 <strong>and</strong> IDI by 0.019.Results from ordinal logistic regression tended to give similar results(Supplementary Table 5).DISCUSSIONIn an effort to evaluate <strong>the</strong> capacity of Kim-1 to identify nephrotoxicity,we compared <strong>the</strong> relative performance of four biomarkersto accurately assess kidney injury in eleven structurally <strong>and</strong> mechanisticallydifferent models of renal tubular injury in rats. Regardlessof whe<strong>the</strong>r <strong>the</strong> kidney injury was induced by well-established kidneytoxicants or ischemia, urinary Kim-1 outperformed BUN,SCr <strong>and</strong> urinary NAG, which are conventional markers for assessingrenal injury. Moreover, <strong>the</strong> ROC-AUC values of 0.91 to 0.99obtained for urinary Kim-1 demonstrate that urinary Kim-1measurements are highly sensitive, specific <strong>and</strong> accurate in diagnosingei<strong>the</strong>r drug-induced kidney tubular necrosis, degeneration,<strong>and</strong>/or dilatation, as well as regenerative basophilia when lesions areei<strong>the</strong>r subtle with little organ involvement, or very severe with disturbedrenal function. We fur<strong>the</strong>r show by exclusion analysis that athreshold increase of 1.87-fold of urinary Kim-1 concentration for95% specificity derived from one laboratory was similarly <strong>and</strong> independentlydefined in o<strong>the</strong>r laboratories using o<strong>the</strong>r study designs forkidney injury. In this set of 17 studies conducted at three sites, <strong>the</strong>increase in urinary Kim-1 was compared to histopathology, whichis considered <strong>the</strong> best available benchmark for assessing preclinicalrenal injury. The AUC <strong>and</strong> <strong>the</strong> sensitivity of Kim-1 was nearly 1,irrespective of <strong>the</strong> mechanism of kidney injury <strong>and</strong> remained >0.9whe<strong>the</strong>r <strong>the</strong> entire histopathology grade of 0 to 5 was included orwhe<strong>the</strong>r <strong>the</strong> analyzed group was restricted to histopathology gradescores of 0 <strong>and</strong> 1. We demonstrate that current markers of assessingnephrotoxicity, BUN <strong>and</strong> SCr, are effective only with more severehistopathological grades (greater than grade 2) in preclinical studies.For example, <strong>the</strong> sensitivity of SCr was remarkably low at 0.20for histology grades 0 to 1 <strong>and</strong> increased to only 0.56 with severitygrades of 0 to 3 in <strong>the</strong> Merck studies. In contrast, urinary Kim-1 wassensitive <strong>and</strong> specific for assessing subtle forms of proximal tubulardamage (histology grade 0 to 1). These AUC-ROC values represent<strong>the</strong> exclusion data analysis approach. Although generally both exclusion<strong>and</strong> inclusion analyses yielded similar comparative performanceamong <strong>the</strong> biomarkers, inclusion analysis approximately doubles <strong>the</strong>Kim-1 threshold indicative of injury to 3.9-fold higher thresholds,<strong>and</strong> thus lower sensitivity, than exclusion analysis for <strong>the</strong> same levelof specificity (Supplementary Table 2).On <strong>the</strong> basis of <strong>the</strong> striking evidence of <strong>the</strong> performance of urinaryKim-1 as a highly sensitive <strong>and</strong> specific marker of drug-inducedkidney injury, we made <strong>the</strong> following claims for <strong>the</strong> preclinical use ofKim-1 in <strong>the</strong> Voluntary eXploratory Data Submission we forwardedto <strong>the</strong> European Medicines Agency (EMEA) <strong>and</strong> US Food <strong>and</strong> <strong>Drug</strong>Administration (FDA). (i) Urinary Kim-1 can outperform <strong>and</strong> addinformation to BUN <strong>and</strong> SCr as an early diagnostic biomarker ofdrug-induced acute kidney tubular alterations in rat toxicologystudies. (ii) Urinary Kim-1 is qualified for regulatory decisionmaking as a biomarker that may be used by sponsors on a voluntarybasis to demonstrate that drug-induced acute kidney tubular alterationsare monitorable in good laboratory practice rat studies, whichare used to support <strong>the</strong> safe conduct of clinical trials. (iii) UrinaryKim-1 can be considered qualified for regulatory decision makingas a clinical bridging biomarker appropriate for use by sponsorson a voluntary basis in phase 1 <strong>and</strong> 2 clinical trials for monitoringkidney safety when animal toxicology findings generate a concernfor tubular alterations.Both FDA <strong>and</strong> EMEA agreed on <strong>the</strong> preclinical claims. Moreover,because of <strong>the</strong> potential clinical utility of KIM-1, both agencies proposed<strong>the</strong> use of Kim-1 as a translational safety biomarker on a caseby-casebasis under certain mutually agreeable conditions.Human studies have yielded promising results for potential utilityof urinary KIM-1 as a diagnostic biomarker for AKI. One studyshowed marked expression of KIM-1 in kidney biopsy specimensfrom six patients with acute tubular necrosis <strong>and</strong> elevated urinarylevels of KIM-1 after an initial ischemic renal insult before <strong>the</strong> appearanceof casts in <strong>the</strong> urine 15 . Ano<strong>the</strong>r showed urinary KIM-1 <strong>and</strong> NAGin 201 patients with established AKI <strong>and</strong> demonstrated that elevatedlevels of urinary KIM-1 <strong>and</strong> NAG were significantly associated with<strong>the</strong> clinical composite endpoint of death or dialysis requirement, evenafter adjustment for disease severity or comorbidity 4 . Researcherscompared <strong>the</strong> tissue expression of KIM-1 with histopathological <strong>and</strong>functional parameters of acute tubular injury (ATI) <strong>and</strong> acute cellularrejection (ACR) in renal transplant biopsies from 62 patients 27 .KIM-1 expression was present in all biopsies from patients withhistological changes showing ATI <strong>and</strong> in 92% of kidney biopsiesfrom patients with ACR. KIM-1 staining sensitively <strong>and</strong> specificallyidentified proximal tubular injury <strong>and</strong> significantly correlated withdeclining renal function. A longitudinal prospective study reportedthat elevated urinary KIM-1 serves as an independent predictor oflong-term graft loss in renal transplant recipients (n = 145 patients)independent of donor age, creatinine clearance <strong>and</strong> proteinuria 16 . Ina study comparing nine urinary biomarkers (KIM-1, NGAL, IL-18,NAG, protein, HGF, VEGF, IP-10 <strong>and</strong> cystatin C) in 204 patients withor without acute kidney injury, we showed that urinary KIM-1 hadan AUC-ROC of 0.93 <strong>and</strong> was significantly higher in patients whoprogressed ei<strong>the</strong>r to death or to requirement for renal replacement<strong>the</strong>rapy (RRT) when compared to those who survived <strong>and</strong> did notrequire RRT 5 . More recently, we have also reported <strong>the</strong> developmentof a rapid point-of-care diagnostic dipstick assay for measuring Kim-1in rodent <strong>and</strong> human urine samples within 15 min 28 . Qualification ofKIM-1 as a biomarker for clinical applications will involve a systematicevaluation of <strong>the</strong> diagnostic performance of KIM-1 in well-controlledobservational <strong>and</strong>/or interventional clinical protocols using bothst<strong>and</strong>ard-of-care agents with known nephrotoxic properties <strong>and</strong>/orexploratory agents with renal safety concerns. The opportunity to use<strong>the</strong> same translational marker such as Kim-1 for both <strong>the</strong> preclinical<strong>and</strong> clinical setting facilitates clinical monitoring of toxicity that hasbeen demonstrated at higher doses in preclinical development or ina single test species when human relevance is suspected.In summary, we report that urinary Kim-1 levels correlate with differentgrades of kidney tubular histopathologies in 11 well-establishedrat models of acute kidney injury. Using ei<strong>the</strong>r exclusion or inclusiondata analysis, Kim-1 had <strong>the</strong> highest AUC-ROC (>0.88) whencompared with BUN, SCr <strong>and</strong> NAG. Especially for low-grade toxicity(grade 1), Kim-1 was <strong>the</strong> only marker of <strong>the</strong> four capable of consistentlydetecting renal tubular injury. Urinary Kim-1 outperformed(P < 0.01) SCr, BUN <strong>and</strong> urinary NAG as biomarkers of renal tubularinjury in <strong>the</strong>se mechanistically distinct models of kidney injury performedat three different sites. Binary <strong>and</strong> ordinal logistic regressionmodels for exclusion <strong>and</strong> inclusion data analysis showed that additionof Kim-1 represented a statistically significant improvement <strong>and</strong>increased <strong>the</strong> concordance probability to histopathology.484 VOLUME 28 NUMBER 5 MAY 2010 nature biotechnology


A rt i c l e s© 2010 Nature America, Inc. All rights reserved.Thus, urinary Kim-1 measurement is anticipated to significantlyaid in <strong>the</strong> prediction of human nephrotoxicity during preclinical studiesby early identification, monitoring <strong>and</strong> elimination of compoundsthat are potentially nephrotoxic <strong>and</strong> may also allow nephrotoxicity tobe monitored in humans.MethodsMethods <strong>and</strong> any associated references are available in <strong>the</strong> online versionof <strong>the</strong> paper at http://www.nature.com/naturebiotechnology/.Note: Supplementary information is available on <strong>the</strong> Nature Biotechnology website.AcknowledgmentsPart of this work was presented at <strong>the</strong> American Society of Nephrology meetingin Philadelphia, November 7–11, 2005 <strong>and</strong> <strong>the</strong> Society of Toxicology meeting inCharlotte, North Carolina, March 4–9, 2007. This work was supported by NationalInstitutes of Health grants ES016723 to V.S.V.; DK39773, DK72831 <strong>and</strong> DK74099to J.V.B., <strong>and</strong> by research grants G34511M <strong>and</strong> CO1-40182A-1 from <strong>the</strong> MexicanCouncil of Science <strong>and</strong> Technology (CONACYT) <strong>and</strong> DGAPA IN208602-3 ofNational University of Mexico to N.A.B. We thank T.W. Forest, B. Sacre-Salem <strong>and</strong>T.E. Adams for providing histomorphologic readings for <strong>the</strong> Merck studies. TheNovartis Biomarker CRADA team is acknowledged for contributing to <strong>the</strong> project,in particular D.R. Roth, A. Mahl, F. Staedtler, P. Verdes, D. Wahl, F. Legay, P. End <strong>and</strong>S.-D. Chibout. We thank P. Bernd for performing <strong>the</strong> protein homogenization.S. Leuillet <strong>and</strong> B. Palate from CIT are acknowledged for performing <strong>the</strong> Novartisin-life studies <strong>and</strong> <strong>the</strong> histopathology assessment. J. Mapes from Rules BasedMedicine is acknowledged for <strong>the</strong> Kim-1 measurements of <strong>the</strong> Novartis studies. Wethank D. Moor <strong>and</strong> P. Brodmann from Biolytix for <strong>the</strong> validation <strong>and</strong> measurementsof <strong>the</strong> RT-PCR assays. We thank M. Topper, W. Bailey, G. Miller <strong>and</strong> P. Srinivasafor helpful comments on <strong>the</strong> manuscript. We thank K. Thompson from Center for<strong>Drug</strong> Evaluation <strong>and</strong> Research, US FDA for critically reviewing <strong>the</strong> manuscript.AUTHOR CONTRIBUTIONSV.S.V., J.S.O., N.A.B., F.D.S., F.D., J.V., G.M. <strong>and</strong> J.V.B. designed research; V.S.V., J.S.O.,F.B.C., V.R., S.T., N.M., D.T., D.G., D.J.H., E.P. <strong>and</strong> A.C. performed research; V.S.V.,J.S.O., S.T., D.J.H., N.A.B., F.D.S. <strong>and</strong> J.V.B. contributed <strong>new</strong> reagents/analytic tools;V.S.V., J.S.O., S.T., N.M., D.T., D.G., D.J.H., N.A.B., F.D.S., E.M., F.D. <strong>and</strong> J.V.B. analyzeddata; <strong>and</strong> V.S.V., J.S.O., N.A.B., F.D.S., E.M., F.D., P.L.G. <strong>and</strong> J.V.B. wrote <strong>the</strong> paper.COMPETING FINANCIAL INTERESTSThe authors declare competing financial interests: details accompany <strong>the</strong> full-textHTML version of <strong>the</strong> paper at http://www.nature.com/naturebiotechnology/.Published online at http://www.nature.com/naturebiotechnology/.Reprints <strong>and</strong> permissions information is available online at http://npg.nature.com/reprints<strong>and</strong>permissions/.1. Chertow, G.M., Burdick, E., Honour, M., Bonventre, J.V. & Bates, D.W. Acute kidneyinjury, mortality, length of stay, <strong>and</strong> costs in hospitalized patients. J. Am. Soc.Nephrol. 16, 3365–3370 (2005).2. Choudhury, D. & Ziauddin, A. <strong>Drug</strong>-associated renal dysfunction <strong>and</strong> injury.Nat. Clin. Pract. Nephrol. 2, 80–91 (2006).3. Vaidya, V.S., Ramirez, V., Ichimura, T., Bobadilla, N.A. & Bonventre, J.V. Urinarykidney injury molecule-1: a sensitive quantitative biomarker for early detection ofkidney tubular injury. Am. J. Physiol. Renal Physiol. 290, F517–F529 (2006).4. Liangos, O. et al. Urinary N-acetyl-beta-(d)-glucosaminidase activity <strong>and</strong> kidneyinjury molecule-1 level are associated with adverse outcomes in acute renal failure.J. Am. Soc. Nephrol. 18, 904–912 (2007).5. Vaidya, V.S. et al. Urinary biomarkers for sensitive <strong>and</strong> specific detection of acutekidney injury in humans. Clin. Transl. Sci. 1, 200–208 (2008).6. Emeigh Hart, S.G. Assessment of renal injury in vivo. J. Pharmacol. Toxicol. Methods52, 30–45 (2005).7. Price, R.G. The role of NAG (N-acetyl-beta-D-glucosaminidase) in <strong>the</strong> diagnosis ofkidney disease including <strong>the</strong> monitoring of nephrotoxicity. Clin. Nephrol. 38 Suppl1, S14–S19 (1992).8. Bonventre, J.V., Vaidya, V.S., Schmouder, R., Feig, P. & Dieterle, F. Nextgenerationbiomarkers for detecting kidney toxicity. Nat. Biotechnol. 28,436–440 (2010).9. Ichimura, T. et al. Kidney injury molecule-1 (KIM-1), a putative epi<strong>the</strong>lial celladhesion molecule containing a novel immunoglobulin domain, is up-regulated inrenal cells after injury. J. Biol. Chem. 273, 4135–4142 (1998).10. Ichimura, T. et al. Kidney injury molecule-1 is a phosphatidylserine receptor thatconfers a phagocytic phenotype on epi<strong>the</strong>lial cells. J. Clin. Invest. 118, 1657–1668(2008).11. Amin, R.P. et al. Identification of putative gene based markers of renal toxicity.Environ. Health Perspect. 112, 465–479 (2004).12. Bailly, V. et al. Shedding of kidney injury molecule-1, a putative adhesionprotein involved in renal regeneration. J. Biol. Chem. 277, 39739–39748(2002).13. Prozialeck, W.C. et al. Kidney injury molecule-1 is an early biomarker of cadmiumnephrotoxicity. Kidney Int. 72, 985–993 (2007).14. Zhou, Y. et al. Comparison of kidney injury molecule-1 <strong>and</strong> o<strong>the</strong>r nephrotoxicitybiomarkers in urine <strong>and</strong> kidney following acute exposure to gentamicin, mercury,<strong>and</strong> chromium. Toxicol. Sci. 101, 159–170 (2008).15. Han, W.K., Bailly, V., Abich<strong>and</strong>ani, R., Thadhani, R. & Bonventre, J.V. Kidney InjuryMolecule-1 (KIM-1): a novel biomarker for human renal proximal tubule injury.Kidney Int. 62, 237–244 (2002).16. van Timmeren, M.M. et al. High urinary excretion of kidney injury molecule-1 is anindependent predictor of graft loss in renal transplant recipients. Transplantation84, 1625–1630 (2007).17. Perez-Rojas, J. et al. Mineralocorticoid receptor blockade confers renoprotection inpreexisting chronic cyclosporine nephrotoxicity. Am. J. Physiol. Renal Physiol. 292,F131–F139 (2007).18. Ichimura, T., Hung, C.C., Yang, S.A., Stevens, J.L. & Bonventre, J.V. Kidney injurymolecule-1: a tissue <strong>and</strong> urinary biomarker for nephrotoxicant-induced renal injury.Am. J. Physiol. Renal Physiol. 286, F552–F563 (2004).19. van Timmeren, M.M. et al. Tubular kidney injury molecule-1 in protein-overloadnephropathy. Am. J. Physiol. Renal Physiol. 291, F456–F464 (2006).20. Carson, R.T. & Vignali, D.A. Simultaneous quantitation of 15 cytokinesusing a multiplexed flow cytometric assay. J. Immunol. Methods 227, 41–52(1999).21. Mattes, W.B. & Walker, E.G. Translational toxicology <strong>and</strong> <strong>the</strong> work of <strong>the</strong> predictivesafety testing consortium. Clin. Pharmacol. Ther. 85, 327–330 (2009).22. Sistare, F.D. et al. Towards consensus practices to qualify safety biomarkers for usein early drug development. Nat. Biotechnol. 28, 446–454 (2010).23. Barker, E.A. & Smuckler, E.A. Nonhepatic thioacetamide injury. II. The morphologicfeatures of proximal renal tubular injury. Am. J. Pathol. 74, 575–590 (1974).24. Harrell, F.E. Regression Modeling Strategies, edn. 1. (Springer, New York;2001).25. Nagelkerke, N.J. A note on a general definition of <strong>the</strong> coefficient of determination.Biometrika 78, 691–692 (1991).26. Pencina, M.J., D’Agostino, R.B. Sr., D’Agostino, R.B. Jr. & Vasan, R.S. Evaluating<strong>the</strong> added predictive ability of a <strong>new</strong> marker: from area under <strong>the</strong> ROC curve toreclassification <strong>and</strong> beyond. Stat. Med. 27, 157–172 (2008).27. Zhang, P.L. et al. Kidney injury molecule-1 expression in transplant biopsies is asensitive measure of cell injury. Kidney Int. 73, 608–614 (2008).28. Vaidya, V.S. et al. A rapid urine test for early detection of kidney injury. Kidney Int.76, 108–114 (2009).nature biotechnology VOLUME 28 NUMBER 5 MAY 2010 485


© 2010 Nature America, Inc. All rights reserved.ONLINE METHODSAnimals. Male HAN Wistar rats (275–300 g) <strong>and</strong> male Sprague Dawley (SD)rats (275–325 g) were purchased from Harlan <strong>and</strong> maintained in centralanimal facility over wood chips free of any known chemical contaminantsunder conditions of 21 ± 1 °C <strong>and</strong> 50–80% relative humidity at all times in analternating 12 h light-dark cycle. Rats were fed with commercial rodent chow(Teklad rodent diet no. 7012), given water ad libitum <strong>and</strong> were acclimated for1 week before use. All animal maintenance <strong>and</strong> treatment protocols were incompliance with <strong>the</strong> Guide for Care <strong>and</strong> Use of Laboratory Animals as adopted<strong>and</strong> promulgated by <strong>the</strong> National Institutes of Health <strong>and</strong> were approved byrespective Institutional Animal Care <strong>and</strong> Use Committees (IACUC).Experimental design. Novartis Studies. Ten studies using HAN Wistar ratsdosed with eight nephrotoxicants <strong>and</strong> two hepatotoxicants were conductedto generate dose- <strong>and</strong> time-dependent nephrotoxicity <strong>and</strong> hepatotoxicity. Allstudies followed a specific generic study design. For specific differences, suchas differences in termination time points, see Supplementary Table 6. Threegroups of 24 Wistar Han males (11–12 weeks old) per study group receiveddaily <strong>the</strong> test item, at <strong>the</strong> three different dose levels listed in SupplementaryTable 6. An additional group of 24 males received <strong>the</strong> vehicle under <strong>the</strong> sameexperimental conditions <strong>and</strong> acted as a control group. The animals werechecked daily for mortality <strong>and</strong> clinical signs. Body weight <strong>and</strong> food consumptionwere recorded regularly during <strong>the</strong> study. Urine was collected forurinalysis <strong>and</strong> biomarker measurement as described below. For <strong>the</strong> terminationtime point day 1 no urine was collected, <strong>and</strong> for some animals not enoughurine for all investigations was obtained. In total, 739 samples were availablefor urinary biomarker analysis. On completion of <strong>the</strong> treatment periods,six non-fasted animals per group at each time point (four termination timepoints) were sampled 3 h after dosing for laboratory investigations (bloodbiochemistry, urinalysis <strong>and</strong> histopathology). The animals were euthanized<strong>and</strong> examined macroscopically post-mortem. Designated organs (kidneys,liver <strong>and</strong> brain) were weighed <strong>and</strong> specified tissues preserved. The kidneys<strong>and</strong> liver were processed using conventional methods for histological assessment.The right kidney of all dose groups <strong>and</strong> <strong>the</strong> liver of all animals of <strong>the</strong>control <strong>and</strong> high-dose groups <strong>and</strong> additionally of <strong>the</strong> low-dose <strong>and</strong> mid-dosegroups for <strong>the</strong> two hepatotoxicant studies were examined microscopically afterH&E staining. Histopathology of <strong>the</strong> kidneys was evaluated according to <strong>the</strong>Predictive Safety Testing Consortium Nephrotoxicity Working Group (PSTCNWG) histopathology lexicon <strong>and</strong> scoring system. All studies were performedat Centre International de Toxicologie (CIT) (BP 563, 27005 Evreux, France)in compliance with animal health regulations, in particular with <strong>the</strong> CouncilDirective No. 86/609/EEC of 24th November 1986.Merck Studies. Male Sprague Dawley rats received one of four nephrotoxicants(gentamicin, cisplatin, thioacetamide or cyclosporine), or one well-establishedhepatotoxicant (carbon tetrachloride), or one well-established cardiotoxicant(isoproterenol) for sensitivity <strong>and</strong> specificity studies. Gentamicin sulfatewas administered at 0, 20, 80 or 240 mg/kg/d (n = 5 rats/dose group/timepoint) <strong>and</strong> <strong>the</strong> animals were necropsied on days 3, 9 or 15 for toxicity evaluation,which included serum clinical chemistry (BUN, SCr), analysis of urineKim-1 <strong>and</strong> NAG <strong>and</strong> histomorphologic evaluation of kidneys (H&E staining),as described below. The 240 mg/kg/d gentamicin sulfate day 15 group wasterminated early at treatment day 12 due to physical signs of animal distress.In <strong>the</strong> cisplatin groups a single dose of cisplatin was administered intraperitoneally(i.p.) to male Sprague Dawley rats (n = 5 rats/dose group/time point)at doses of 0, 0.5, 3.5 or 7 mg/kg/d <strong>and</strong> necropsy was performed on day 3 or 8post-treatment. Cyclosporine A was administered subcutaneously (s.c.) at 0, 6,30 or 60 mg/kg/d to rats (n = 5/dose/time point) <strong>and</strong> necropsy was performedon day 3, 9 or 15. A single dose of TAA was administered by oral gavage at 0,50, 100 or 200 mg/kg (n = 5 rats/dose group/time point) <strong>and</strong> necropsy wasperformed on day 2 (24 h post-dose) or day 3 (48 h post-dose). CBrCl3 wasadministered orally (p.o.) at 0, 0.03, 0.1 ml/kg to rats (n = 5/dose/time point)<strong>and</strong> necropsy was performed on day 2 or 4. Isoproterenol was administeredintravenously (i.v.) at 0, 0.064, 0.25, or 1 mg/kg/day to rats (n = 5/dose/timepoint) <strong>and</strong> necropsy was performed on day 3 or 8.Renal ischemia-reperfusion studies. Eighty male Wistar (W) rats weighing~270–300 g were anes<strong>the</strong>tized with an intraperitoneal injection of pentobarbitalsodium (30 mg/kg) <strong>and</strong> placed on a homeo<strong>the</strong>rmic table to maintaincore body temperature at 37 °C, by means of a rectal probe attached to atemperature regulator, which was in turn attached to a homeo<strong>the</strong>rmic blanket.A midline laparotomy was made, renal pedicles were isolated <strong>and</strong> bilateralrenal ischemia was induced by clamping <strong>the</strong> renal pedicles for 0–20 min asdescribed previously 3 . Occlusion was verified visually by change in <strong>the</strong> colorof <strong>the</strong> kidneys to a paler shade <strong>and</strong> reperfusion by a blush. Reperfusion commencedwhen <strong>the</strong> clips were removed. The rats were divided into groups of sixrats each after 3, 6, 9, 12, 18, 24, 48, 72, 96 <strong>and</strong> 120 h of reperfusion. Rats (n = 4)per group were im<strong>media</strong>tely placed in metabolic cages at 22 °C. Individualurine samples were collected at 3, 6, 9, 12, 18, 24, 48, 72, 96 <strong>and</strong> 120 h afterreperfusion. Urinary NAG (Roche Diagnostics) was measured spectrophotometrically3 <strong>and</strong> urinary Kim-1 was measured using <strong>the</strong> xMAP Luminex technologydescribed below. Ano<strong>the</strong>r set of rats (n = 4) was euthanized by overdoseof pentobarbital (200 mg/kg, i.p.) at 3, 6, 9, 12, 18, 24, 48, 72, 96 <strong>and</strong> 120 hafter reperfusion. Blood was collected from <strong>the</strong> dorsal aorta in heparinizedtubes for measurement of BUN <strong>and</strong> SCr. One kidney was perfused through<strong>the</strong> left ventricle with PBS <strong>and</strong> <strong>the</strong>n with paraformaldehyde lysine periodatefor 10 min for histology.Urine collection. Novartis studies. Urine was collected, from 2:00–8:00 p.m.<strong>and</strong> from 8:00 p.m. to 6:00 a.m. on <strong>the</strong> days listed in Supplementary Table 6from fasted animals, into tubes <strong>and</strong> kept at ~4 °C during <strong>the</strong> collection period.The sampled urine fractions were split in 2-ml aliquots <strong>and</strong> centrifuged at4 °C for 30 min at 10,000g. Urinalysis <strong>and</strong> urinary biomarker analysis (separatealiquot) were subsequently performed on <strong>the</strong> urine samples collected overnight.Urine analyses were performed with an Advia 1650 analyzer. For <strong>the</strong>termination time-point day 1, no urine was collected <strong>and</strong> for some animalsnot enough urine for all investigations was obtained. In total, 739 samples forurinary biomarker analysis were available.Merck studies. Urine was collected before necropsy (18 h ± 2 h collectionperiod) <strong>and</strong> <strong>the</strong> rats were placed in st<strong>and</strong>ard metabolic cages <strong>and</strong> fasted beforecollection. Urine samples were collected from individual animals into containerssurrounded by dry ice <strong>and</strong> were stored at −80 °C until being thawed forurinalysis. After <strong>the</strong> initial thawing at 22 °C, samples were placed on wet ice<strong>and</strong> volume measurement was performed (precipitates were allowed to settleby gravity <strong>and</strong> were discarded). Typically, 2.5-ml urine samples was usedfor routine clinical chemistry urinalysis (Roche Modular Analyzer): manualspecific gravity, pH, protein, glucose, SCr, occult blood, SCr <strong>and</strong> ketones weremeasured (only Scr shown). For <strong>the</strong> remaining urine volumes, small aliquotswere made <strong>and</strong> stored at –80 °C until biomarker analysis to avoid repeatedfreeze-thaw cycles.Blood collection <strong>and</strong> clinical chemistry. Novartis studies. On completion of<strong>the</strong> treatment periods, six non-fasted animals per group at each time point(four termination time points) were sampled for laboratory investigations(blood biochemistry, urinalysis <strong>and</strong> histopathology) 3 h after dosing. Themaximum blood volume (at least 5 ml) was taken im<strong>media</strong>tely before schedulednecropsy, from <strong>the</strong> retro-orbital sinus of <strong>the</strong> animals, under light isofluraneanes<strong>the</strong>sia, <strong>and</strong> collected into tubes. The tubes for determination ofplasma levels of <strong>the</strong> test item were placed before <strong>and</strong> after blood samplingin wet ice. The blood sampling was split for (i) RNA extraction: 1 ml intoFastubes; (ii) Blood biochemistry: 0.7 ml into a lithium heparin tube <strong>and</strong>(iii) Biomarker assays: <strong>the</strong> remaining blood was collected into sodium EDTAtubes. Clinical chemistry analyses of urine <strong>and</strong> blood were performed withan Advia 1650 device for measuring BUN (using Urease UV from Bayer) <strong>and</strong>creatinine (using Jaffe from Bayer).Merck studies. Rats were fasted overnight before necropsy <strong>and</strong> bled from<strong>the</strong> vena cava with 2 ml collected into a serum separator tube <strong>and</strong> centrifuged1,500g for 10 min at 4 °C. An additional 2 ml of collected bloodwas placed into an EDTA collection tube <strong>and</strong> centrifuged 900g for 15 minat 4 °C to isolate plasma. Isolated plasma <strong>and</strong> serum samples were storedat −80 °C until use. BUN (mg/dl) <strong>and</strong> creatinine were measured usinga st<strong>and</strong>ard clinical chemistry analyzer (Roche-Modular). AST (aspartateaminotransferase) (IU/L), ALT (alanine aminotransferase) (IU/L), alkalinephosphatase (IU/L) <strong>and</strong> creatinine kinase (IU/L) were measuredusing <strong>the</strong> same clinical chemistry analyzer for isoproterenol <strong>and</strong> CBrCl 3toxicity studies.nature biotechnologydoi:10.1038/nbt.1623


© 2010 Nature America, Inc. All rights reserved.Histopathology. A compendium of kidney histology images taken at low tohigh magnification from rat kidneys with low to severe histological damageis compiled in Supplementary Data.Novartis studies. A microscopic examination was performed on <strong>the</strong> rightupper part of <strong>the</strong> kidney of all animals <strong>and</strong> on <strong>the</strong> caudate lobes of <strong>the</strong> liver ofanimals of <strong>the</strong> control <strong>and</strong> high-dose groups <strong>and</strong> additionally of <strong>the</strong> low-dose<strong>and</strong> mid-dose groups for <strong>the</strong> two-hepatotoxicant studies. Histopathology of <strong>the</strong>kidneys was evaluated according to <strong>the</strong> PSTC NWG histopathology lexicon <strong>and</strong>scoring system (localized lesions with a five-grade system). The histopathologyassessment, including a peer review, was first performed at CIT. Subsequently~33% of <strong>the</strong> samples were reviewed at Novartis. In case of major discrepancies,a discussion between all involved Novartis pathologists <strong>and</strong> CIT pathologistswas performed for resolution. For <strong>the</strong> assessment of proximal tubular injury,<strong>the</strong> highest grade of necrosis, apoptosis, tubular degeneration <strong>and</strong> cell sloughingin <strong>the</strong> nephron segments S1 to S3 <strong>and</strong> non-localizable was assigned to <strong>the</strong> histopathologycomposite score “proximal tubular injury.” A severity score gradingscale of 0–5 was used to grade pathological lesions from 0 (no observablepathology), 1 (minimal), 2 (slight), 3 (moderate), 4 (marked) or 5 (severe).Merck studies. At necropsy, tissues were collected for histology soon after<strong>the</strong> last blood collection <strong>and</strong> exsanguination. The left quadriceps (3-mm sectionincluding all four muscle groups), left kidney (5-mm section including<strong>the</strong> papilla, cortex <strong>and</strong> medulla), right lateral lobe of <strong>the</strong> liver, <strong>and</strong> heartwere isolated from each animal <strong>and</strong> placed in 10% neutral buffered formalin.Tissues were fixed for a minimum of 24 h, processed <strong>and</strong> embedded in paraffin.Embedded tissues were cut into 4- to 6-µm sections <strong>and</strong> stained withH&E. Tissues from control, high-dose animals, <strong>and</strong> organs with test articlerelatedchanges from lower dose groups, were examined microscopically bya Merck pathologist unaware of any of <strong>the</strong> biomarker data <strong>and</strong> studies werereviewed by a supervising pathologist as part of a final report. Histopathologyof <strong>the</strong> kidneys was evaluated according to <strong>the</strong> PSTC NWG histopathologylexicon <strong>and</strong> scoring system. Diagnoses for individual animals were groupedinto composite categories for statistical analysis: (i) tubular degeneration<strong>and</strong> necrosis composite, (ii) tubular basophilia <strong>and</strong> regeneration composite,or (iii) o<strong>the</strong>r composite (glomerulopathy, fibrosis <strong>and</strong> tubular dilatation).Because glomerulopathy <strong>and</strong> fibrosis was not observed in <strong>the</strong> renal studies,<strong>the</strong> composite score is considered a tubular composite. The composite scorefor an individual animal was derived from <strong>the</strong> highest pathology score of <strong>the</strong>diagnoses comprising a given composite.Development <strong>and</strong> evaluation of rat Kim-1 micro-bead assay. Coupling ofbeads to Kim-1 capture antibodies. The polystyrene 5.6-µm microspheres containspectrally distinct fluorochromes. Microsphere (Bio-Rad bead no. 27) wascoupled with monoclonal anti-rat Kim-1 ectodomain antibody using a Bioplexamine coupling Kit from Bio-Rad. Mouse monoclonal antibody raised againstrat Kim-1 ectodomain raised <strong>and</strong> characterized in our laboratory 3 was usedas primary (capture) antibody to couple to beads.Evaluation of <strong>the</strong> assay. The performance characteristics of <strong>the</strong> microbeadbasedassay was evaluated similarly to <strong>the</strong> Kim-1 ELISA 3 by measuring <strong>the</strong> sensitivity,assay range, specificity, reproducibility, recovery <strong>and</strong> interference.Transfer of <strong>the</strong> assay for Novartis studies. The assay for measuring urinaryKim-1 in <strong>the</strong> Merck study <strong>and</strong> in renal ischemia reperfusion model was performedin Vaidya <strong>and</strong> Bonventre laboratories using <strong>the</strong> microbead technologydescribed above. The Kim-1 assay for Novartis studies was set-up at Rules BasedMedicine using <strong>the</strong> reagents obtained from <strong>the</strong> Vaidya/Bonventre laboratory,as described above. The validation of <strong>the</strong> assay followed accepted proceduresrecommended in The Bioanalytical Method Validation Guidance for Industry(http://www.fda.gov/downloads/<strong>Drug</strong>s/GuidanceComplianceRegulatoryInformation/Guidances/UCM070107.pdf) with <strong>the</strong> exception that for <strong>the</strong> accuracyat <strong>the</strong> lower limit of quantification (LLOQ) <strong>and</strong> upper limit of quantification(ULOQ), a mean deviation of 30% instead of <strong>the</strong> recommended 20% <strong>and</strong> inbetween LLOQ <strong>and</strong> ULOQ a mean deviation of 20% instead of 15% for <strong>the</strong> qualitycontrols were accepted. The assay validation covered inter-day, inter-operator,inter-instrument reproducibility, linearity, parallelism, spike recovery, freezethawstability, short-term stability, long-term stability, matrix interferences <strong>and</strong>cross-reactivity for both urine samples <strong>and</strong> protein extracts. The LLOQ was determinedas 0.058 ng/ml for urine samples <strong>and</strong> 0.1 ng/ml for protein extracts fromkidney <strong>and</strong> <strong>the</strong> ULOQ was determined as 140 ng/ml for urine <strong>and</strong> 280 ng/ml forprotein extracts. The Kim-1 measurements for <strong>the</strong> Merck Studies were performedin <strong>the</strong> Vaidya/Bonventre laboratory at Brigham <strong>and</strong> Women’s Hospital, HarvardMedical School. The assays in both laboratories were st<strong>and</strong>ardized with extensiveperformance evaluation in terms of lowest limit of detection, dilutional linearity,recovery, precision profile <strong>and</strong> variability between <strong>the</strong> two sites. Along with <strong>the</strong>primary <strong>and</strong> secondary antibodies for Kim-1 <strong>the</strong> Vaidya/Bonventre laboratory wehad sent to Rules Based Medicine recombinant protein <strong>and</strong> 30 rat urine sampleseach with low, medium <strong>and</strong> high values of Kim-1. The results were compared<strong>and</strong> <strong>the</strong> coefficient of variability 0 was considered ‘positive’ for all samples. Thus, allpositive grades of histopathology (grades 1 to 5) were treated with equal weightfor <strong>the</strong> initial ROC analysis. Different severity grades of histopathologic changewere grouped for a subsequent ROC analysis as indicated below. Only analytesamples taken within 2 d of necropsy were considered for ROC analyses. Onlysamples that had non-missing values for all of <strong>the</strong> c<strong>and</strong>idate markers, clinicalchemistry values, <strong>and</strong> histopathologic changes were used for <strong>the</strong> analyses.Our sample Exclusion model includes <strong>the</strong> union of <strong>the</strong> following sample sets:(i) control animals with kidney histopathology = 0; (ii) kidney toxicant–treatedanimals with histopathology >0; (iii) all non-kidney toxicants (isoproterenol<strong>and</strong> CBrCl3)-treated animals with kidney histopathology = 0. Samples fromanimals treated with a nephrotoxicant that did not have a positive compositekidney histopathology score were excluded in this model. The reason for <strong>the</strong>exclusion is so as to not penalize markers that may be prodromal. For comparisonpurposes inclusion models were also fit to <strong>the</strong> data. These models includedall of <strong>the</strong> data, treating samples from animals treated with a nephrotoxicant thathad a negative composite kidney histopathology score as a true negative. For <strong>the</strong>most part, <strong>the</strong> effect of inclusion of <strong>the</strong>se data on each marker was an increasein 95% specificity threshold <strong>and</strong> a decrease in AUC performance. The relativeperformance of <strong>the</strong> markers to each o<strong>the</strong>r was not greatly affected.The ROC methods described were also applied to specific subsets of samplesbased on <strong>the</strong> severity grade of <strong>the</strong> histopathologic alteration score. The subsetsused in <strong>the</strong> analyses are <strong>the</strong> following:1. All <strong>the</strong> samples (as defined by exclusion criterion).2. Only samples with maximum composite histopathology scores of 0, 1, 2 or 3.3. Only samples with maximum composite histopathology scores of 0, 1 or 2.4. Only samples with maximum composite histopathology scores of 0 or 1.We state <strong>the</strong> AUC from each ROC curve or <strong>the</strong> sensitivity (at 95% specificity)range over <strong>the</strong> histomorphologic severity grade subsets. Then <strong>the</strong> difference in<strong>the</strong> AUC (AUC for biomarker – AUC for BUN or SCr) or sensitivity (SENS forbiomarker – SENS for BUN or SCr) changes between <strong>the</strong> subset that includesall <strong>the</strong> nephrotoxicity samples <strong>and</strong> <strong>the</strong> subset that is restricted to samples thatinclude histopathology severity grades 0 <strong>and</strong> 1.Nested logistic regression models were used to assess whe<strong>the</strong>r Kim-1 orNAG complements or “adds value” to <strong>the</strong> st<strong>and</strong>ard SCr <strong>and</strong> BUN measures.Improvement gained by <strong>the</strong> addition of each marker to a model containing SCr<strong>and</strong> BUN was assessed using a P-value from a likelihood ratio test, <strong>the</strong> concordanceprobability, C 24 , an R2 25 statistic, <strong>and</strong> integrated discrimination improvementindex, IDI 26 . Models treating <strong>the</strong> histopathology score as binary <strong>and</strong> as orderedcategories (ordinal logistic regression) were assessed. Note that for binary logisticregression C is equivalent to <strong>the</strong> AUC from an ROC curve. In both <strong>the</strong> binary <strong>and</strong>ordinal logistic models, <strong>the</strong> IDI was calculated as <strong>the</strong> mean of <strong>the</strong> predictions forpositive samples minus <strong>the</strong> mean of predictions for <strong>the</strong> non-positive samples.doi:10.1038/nbt.1623nature biotechnology


Details of all data (including treatment regimen, histopathology, <strong>and</strong> biomarkerlevels) for every animal in <strong>the</strong> study is shown in Supplementary Table 7.Detailed protocol about protein extractions from kidneys <strong>and</strong> geneexpression (mRNA extraction <strong>and</strong> RT-PCR measurements) is available inSupplementary Methods.29. Sing, T., S<strong>and</strong>er, O., Beerenwinkel, N. & Lengauer, T. ROCR: visualizing classifierperformance in R. Bioinformatics 21, 3940–3941 (2005).30. Hanley, J.A. & McNeil, B.J. A method of comparing <strong>the</strong> areas under receiveroperating characteristic curves derived from <strong>the</strong> same cases. Radiology 148,839–843 (1983).© 2010 Nature America, Inc. All rights reserved.nature biotechnologydoi:10.1038/nbt.1623


A rt i c l e sA panel of urinary biomarkers to monitor reversibilityof renal injury <strong>and</strong> a serum marker with improvedpotential to assess renal function© 2010 Nature America, Inc. All rights reserved.Josef S Ozer 1,6,7 , Frank Dieterle 2,7 , Sean Troth 3 , Elias Perentes 2 , André Cordier 2 , Pablo Verdes 2 , Frank Staedtler 2 ,Andreas Mahl 2 , Olivier Grenet 2 , Daniel R Roth 2 , Daniel Wahl 2 , François Legay 2 , Daniel Holder 4 , Zoltan Erdos 1 ,Katerina Vlasakova 1 , Hong Jin 1 , Yan Yu 1 , Nagaraja Muniappa 3 , Tom Forest 1 , Holly K Clouse 5 , Spencer Reynolds 1 ,Wendy J Bailey 1 , Douglas T Thudium 1 , Michael J Topper 1 , Thomas R Skopek 1 , Joseph F Sina 1 , Warren E Glaab 1 ,Jacky Vonderscher 2,6 , Gérard Maurer 2 , Salah-Dine Chibout 2 , Frank D Sistare 1 & David L Gerhold 1The Predictive Safety Testing Consortium’s first regulatory submission to qualify kidney safety biomarkers revealed two deficiencies.To address <strong>the</strong> need for biomarkers that monitor recovery from agent-induced renal damage, we scored changes in <strong>the</strong> levels ofurinary biomarkers in rats during recovery from renal injury induced by exposure to carbapenem A or gentamicin. All biomarkersresponded to histologic tubular toxicities to varied degrees <strong>and</strong> with different kinetics. After a recovery period, all biomarkersreturned to levels approaching those observed in uninjured animals. We next addressed <strong>the</strong> need for a serum biomarker thatreflects general kidney function regardless of <strong>the</strong> exact site of renal injury. Our assay for serum cystatin C is more sensitive <strong>and</strong>specific than serum creatinine (SCr) or blood urea nitrogen (BUN) in monitoring generalized renal function after exposure of ratsto eight nephrotoxicants <strong>and</strong> two hepatotoxicants. This sensitive serum biomarker will enable testing of renal function in animalstudies that do not involve urine collection.Acute kidney injury caused by a variety of chemicals, including contrastagents, amine antibiotics <strong>and</strong> chemo<strong>the</strong>rapeutics, poses an importantproblem in clinical settings. It is also of particular relevance during drugdevelopment <strong>and</strong> <strong>the</strong> optimization of c<strong>and</strong>idates in preclinical <strong>and</strong> clinicaltrials. Acute kidney injury is typically diagnosed by monitoring SCr<strong>and</strong> BUN, <strong>the</strong> levels of which elevate only after nearly half of functionalhuman kidney capacity has been compromised 1 . More sensitive renalfunctional biomarkers would enable more reliable diagnosis of druginducedacute kidney injury <strong>and</strong> intervention by providing earlier <strong>and</strong>more reliable signs of injury 2 . Renal diagnostic biomarkers would alsoenable safer <strong>and</strong> easier-to-monitor <strong>the</strong>rapeutic treatments with narrowermargins between safety <strong>and</strong> efficacy than for markers currently usedfor clinical <strong>and</strong> preclinical applications. Additionally, <strong>the</strong>se biomarkersmight be used to diagnose particular forms of kidney injury 3,4 .The first biomarker qualification submission (Voluntary eXploratoryData Submission, VXDS) brought forward by <strong>the</strong> Predictive SafetyTesting Consortium (PSTC) set out to address <strong>the</strong> need for improvedmarkers of nephrotoxicity with <strong>the</strong> initial goal of qualifying markersfor preclinical applications during drug development <strong>and</strong> <strong>the</strong> eventualgoal of translating <strong>the</strong>se markers for use in clinical settings 5–8 . Theinitial submission process revealed two limitations in <strong>the</strong> studies thatwe address here. The first gap concerns <strong>the</strong> need for recovery studiesto demonstrate that reversibility of histopathologic renal lesions couldbe similarly monitored by biomarker changes. To address this need,we conducted two treatment-recovery studies in rats. Both involvedmeasuring levels of a panel of seven renal tubular safety biomarkers,many of which were submitted in this VXDS application. The secondgap was to identify a more sensitive serum biomarker of renal function,which allows general monitoring of impaired renal function. As renalinjury most often is manifested by damage to <strong>the</strong> proximal tubule,injury to o<strong>the</strong>r parts of <strong>the</strong> organ is difficult to track in <strong>the</strong> absence ofan improved biomarker that detects impaired functional capacity. Aspreclinical pharmaceutical studies routinely include blood collection, asensitive serum biomarker would enable retrospective testing in animalstudies that do not involve urine collection. Fur<strong>the</strong>rmore, interpretationof a serum biomarker data is less complex in controlled animalstudies than in clinical patients with prevalent comorbidities.Most drug-induced acute renal toxicity primarily affects <strong>the</strong> sensitiveproximal tubule epi<strong>the</strong>lium. Acute necrosis of moderate numbers ofproximal tubule cells is a reversible process, where regeneration of a contiguousproximal tubule layer restores <strong>the</strong> integrity of <strong>the</strong> tubule 9 . Suchregeneration comprises reversible kidney injury <strong>and</strong> is accompanied1 Department of Investigative Laboratory Sciences, Safety Assessment, Merck Research Laboratories, West Point, Pennsylvania, USA. 2 Translational Sciences,Novartis Institutes for BioMedical Research, Novartis, Basel, Switzerl<strong>and</strong>. 3 Department of Pathology, Safety Assessment, Merck Research Laboratories, West Point,Pennsylvania, USA. 4 Department of Biometrics, Merck Research Laboratories, West Point, Pennsylvania, USA. 5 Department of Exploratory Toxicology, SafetyAssessment, Merck Research Laboratories, West Point, Pennsylvania, USA. 6 Present addresses: Pharmacokinetics, Dynamics, <strong>and</strong> Metabolism, PGRD, Pfizer, AndoverLaboratories, Andover, Massachusetts, USA (J.S.O.) <strong>and</strong> Molecular Medicine Labs, Group Research, Hoffmann-La Roche, Basel, Switzerl<strong>and</strong> (J.V.). 7 These authorscontributed equally to this work. Correspondence should be addressed to D.L.G. (david_gerhold@merck.com).Received 9 October 2009; accepted 22 March 2010; published online 10 May 2010; doi:10.1038/nbt.1627486 VOLUME 28 NUMBER 5 MAY 2010 nature biotechnology


A rt i c l e s© 2010 Nature America, Inc. All rights reserved.by limited inflammatory response. However, it is not known whe<strong>the</strong>rbiomarkers respond acutely <strong>and</strong> return to baseline during recovery,or whe<strong>the</strong>r certain biomarkers remain elevated beyond baseline levelsduring regenerative processes. Urinary biomarkers for which assaysare available in <strong>the</strong> rat include markers of functional deficits <strong>and</strong>proximal tubule dysfunction (such as albumin), biomarkers lost fromdead or injured cells (such as glutathione-S-transferase α (GSTα)),<strong>and</strong> actively secreted proteins that are ei<strong>the</strong>r induced or repressedas a result of injury. The last class include kidney injury molecule 1(Kim-1), osteopontin (OPN), neutrophil gelatinase–associatedlipocalin/lipocalin 2 (NGAL/LCN2), clusterin (CLU) <strong>and</strong> trefoilfactor 3 (TFF3). Albumin is a well-established biomarker of glomerular<strong>and</strong> proximal tubule cell dysfunction. GSTα is a detoxificationenzyme, associated with <strong>the</strong> apical membrane of proximal tubule cellsthat is lost into <strong>the</strong> urine acutely upon injury 10 . Kim-1 is an extracellularprotein anchored in <strong>the</strong> membrane of proximal tubule cellsthat is cleaved by a metalloprotease <strong>and</strong> excreted into urine. Evidenceobtained using rats <strong>and</strong> humans indicates that Kim-1 responds bothsensitively <strong>and</strong> dynamically to proximal tubule injury from a varietyof sources 11,12 . OPN is secreted by a variety of cells <strong>and</strong> organs uponinjury as part of an inflammatory response. LCN2 is secreted by avariety of epi<strong>the</strong>lial cells <strong>and</strong> binds siderophores capable of chelatingiron. Mice lacking LCN2 are susceptible to Escherichia coli infectionsbut are not susceptible to renal damage resulting from reperfusionischemia13 . CLU is a glycoprotein secreted by a variety of cell types <strong>and</strong>organs, notably dedifferentiated tubular cells in <strong>the</strong> kidney. SecretedCLU is thought to play a cellular pro-survival function 14–16 . TFF3is a small secreted mucin <strong>and</strong> hormone that shows reduced urineexcretion in response to acute kidney injury <strong>and</strong> promotes survival<strong>and</strong> differentiation of epi<strong>the</strong>lial cells in several tissues 5,17 . We showthat levels of Kim-1, CLU, OPN, LCN2, albumin, GSTα <strong>and</strong> TFF3change dynamically after treatment-related renal injury <strong>and</strong> returnto baseline levels upon recovery.Whereas current urinary biomarkers for nephrotoxicity respondprimarily to damage of ei<strong>the</strong>r <strong>the</strong> proximal tubule or glomerulus, afunctional serum biomarker would enable tracking of renal injuryfrom those <strong>and</strong> o<strong>the</strong>r locations of injury such as <strong>the</strong> distal tubule.An improved functional renal marker will add value for monitoringinjury, relative to markers that leak from injured cells or markers thatreflect a response to injury, even if o<strong>the</strong>r renal injury markers, suchas Kim-1, LCN2 <strong>and</strong> albumin, are more sensitive for <strong>the</strong>ir specializedapplications. Serum cystatin C (S-cystatin C) is a renal functionmarker that is rapidly gaining increased use in clinical applications,but has not been tested <strong>and</strong> qualified in preclinical studies. Cystatin Cis a nonglycosylated low-molecular protein with a molecular weight of13 kDa. It is continuously produced by all nucleated cells <strong>and</strong> functionsas a housekeeping factor 18 . S-cystatin C is directly <strong>and</strong> freely filteredfrom blood into <strong>the</strong> glomerulus, <strong>and</strong> is <strong>the</strong>refore an ideal estimatorof <strong>the</strong> glomerular filtration rate due to (i) greatly reduced impact ofage, sex, muscle mass, dehydration state <strong>and</strong> circadian rhythm onS-cystatin C levels in contrast to SCr; (ii) an unhindered straightforwardfiltration of cystatin C by glomeruli; <strong>and</strong> (iii) an absence oftubular secretion or extra-renal clearance in contrast to SCr 19 . It hasbeen shown in clinical studies that S-cystatin C ei<strong>the</strong>r outperformsor performs similarly to SCr for <strong>the</strong> estimation of <strong>the</strong> glomerularfiltration rate in broad contexts of kidney injury (e.g., acute kidneyinjury <strong>and</strong> chronic kidney disease <strong>and</strong> glomerular function impairment)20–23 . The US Food <strong>and</strong> <strong>Drug</strong> Administration (FDA) approval ofan assay to measure S-cystatin C shows <strong>the</strong> assay’s increasing importance<strong>and</strong> value in clinical practice 24 . As an extension of drug-inducedrenal injury as reported with urinary biomarkers 6,25–27 , this studyalso involved a systematic preclinical qualification assessment of <strong>the</strong>merits of using S-cystatin C as a marker for kidney function.RESULTSReversible tubular injury with carbapenem A treatmentWe treated rats for 3 d with carbapenem A, a potent renal tubulartoxicant in rats <strong>and</strong> a discontinued c<strong>and</strong>idate antibiotic 28 , <strong>and</strong> <strong>the</strong>nfollowed this with a 15-d recovery period to measure biomarkerresponses during treatment <strong>and</strong> recovery from drug-induced kidneydamage. Modest treatment-related increases in kidney weightswere observed in rats dosed with carbapenem A (150 mg/kg/d). Renalcortical pallor was observed with tubular degeneration, necrosis <strong>and</strong>regeneration observed at multiple time points.Histomorphologic renal changes consisted of tubular epi<strong>the</strong>lialdegeneration <strong>and</strong> necrosis of <strong>the</strong> deep cortex, beginning on day 1 withcumulative injury <strong>and</strong> peaking in severity on day 4 (SupplementaryFig. 1). Renal tubular epi<strong>the</strong>lial degeneration was most severe on days 1<strong>and</strong> 2, progressing predominantly to necrosis by day 4 (SupplementaryFig. 1). Regeneration of tubular epi<strong>the</strong>lium was first observed on day 4with <strong>the</strong> peak response on day 8 (Supplementary Fig. 1). Regeneration,very slight interstitial fibrosis <strong>and</strong> minimal inflammation were presentin males between days 3 <strong>and</strong> 18 in response to tubular damage(Supplementary Fig. 1). Tubular dilatation was sporadically identified.Very slight to moderate tubular proteinaceous cast accumulationwas observed on days 1, 2 <strong>and</strong> 4 in males <strong>and</strong> females within <strong>the</strong> cortex<strong>and</strong>/or medulla, which was attributable to tubule damage.Necrosis <strong>and</strong> degeneration, a designated tubular histomorphologicchange, peaked on days 2 <strong>and</strong> 4, <strong>and</strong> was more severe in males thanfemales (Supplementary Fig. 2). A necrosis <strong>and</strong> degeneration tubularhistomorphologic scatter plot indicates <strong>the</strong> severity grade at necropsyfor individual animals (Figs. 1 <strong>and</strong> 2, color scale).Urinary biomarkers monitor carbapenem A–inducednephrotoxicity <strong>and</strong> recoveryFor <strong>the</strong> carbapenem A reversibility studies, serum samples were collectedon days 2, 4, 8 <strong>and</strong> 18, whereas urine samples were collectedat days 2, 4, 8 <strong>and</strong> 15. Traditional serum clinical chemistry markersBUN <strong>and</strong> SCr were plotted for individual animals by study day<strong>and</strong> were correlated to overall tubular histomorphologic change ona severity scale of 1 to 5 (Fig. 1 <strong>and</strong> Supplementary Figs. 1 <strong>and</strong> 2).Previous reports 5–7 indicate that a >1.2-fold increase in SCr relativeto <strong>the</strong> mean from concurrent controls is considered positive for injury(95% specificity, receiver operating characteristic (ROC) curve exclusionmodel). All carbapenem A control animal SCr values are below<strong>the</strong> threshold cut-off (Fig. 1, dotted red line). At day 2, all animals withhistomorphologic change (grades 2 <strong>and</strong> 3) show SCr values elevatedbetween 1.7- <strong>and</strong> fivefold relative to controls (Fig. 1). At day 4, fiveof seven animals with histomorphologic change (grades 3 <strong>and</strong> 4) areelevated between 1.7- <strong>and</strong> sevenfold. At day 8, three of seven ratswith histomorphologic change (grade 1) are between 1.3- <strong>and</strong> twofoldelevated, whereas no animals showed SCr elevations at day 18 (Fig. 1).BUN values of carbapenem A–treated rats showed high similarity to<strong>the</strong> SCr data, except that <strong>the</strong> positive value cutoff was at 1.7-fold elevation(95% specificity, exclusion model) (Supplementary Fig. 2).Urinary biomarker values were determined by enzyme-linkedimmunosorbent assay (ELISA) or MesoScale Discovery methods,plotted individually by study day <strong>and</strong> correlated to histomorphologicchange on a severity scale of 1 to 5. Previous experience 7 indicates thatchanges in Kim-1 abundance >1.9-fold relative to concurrent controlsis positive for injury at 95% specificity (exclusion model). Mostcarbapenem A individual control-animal Kim-1 values were belownature biotechnology VOLUME 28 NUMBER 5 MAY 2010 487


A rt i c l e s© 2010 Nature America, Inc. All rights reserved.Figure 1 For carbapenem A–treated rats,correlation of urinary ELISA <strong>and</strong> MesoScaleDiscovery biomarker levels with histomorphologicchange. Male or female Sprague Dawley ratswere administered carbapenem A at 150 mg/kg/d(groups of five rats/dose/time point for up to 3 d).The animals were euthanized on days 2, 4, 8 or18 for toxicity evaluation <strong>and</strong> urinary ELISA <strong>and</strong>immunoturbometric biomarker levels (TFF3 <strong>and</strong>albumin, respectively) <strong>and</strong> MesoScale DiscoveryGSTα levels were measured (ng/ml) <strong>and</strong>normalized to urinary creatinine. Treated male(T:M, square), treated female (T:F, circle),vehicle male (V:M, star), <strong>and</strong> vehicle female (V:F,triangle) are indicated. TFF3 (top left), albumin(top right), <strong>and</strong> GSTα (bottom left) abundancesare shown as fold-change relative to <strong>the</strong> averageof concurrent controls. SCr change (bottomright) is indicated as fold-change relative to <strong>the</strong>average of concurrent controls. Red dotted lineindicates threshold from ROC analysis 5 . Theseverity grades of histopathologic change aftercarbapenem A treatment are indicated on ascale of 0 (no observed pathology) to 5 with <strong>the</strong><strong>the</strong> group mean 1.9-fold change threshold, except for two animals atday 15 (Fig. 2). All animals with histomorphologic changes at day 2(grades 2 <strong>and</strong> 3), showed Kim-1 elevations above threshold. At day 4(grades 3 <strong>and</strong> 4), levels of Kim-1 were elevated more than eightfoldrelative to concurrent vehicle controls samples (Fig. 2). At day 8, allanimals with histomorphologic change (grades 1 <strong>and</strong> 2) <strong>and</strong> fourof five animals with histomorphologic changes (grade 1) at day 15showed Kim-1 elevations above threshold (Fig. 2).Changes in urinary CLU >1.85-fold relative to concurrent controls areconsidered to indicate injury (95% specificity, exclusion model) 6 . With <strong>the</strong>exception of one animal at day 15, all carbapenem A individual controlF∆F∆–30–25–20–15–10–50120100806040200D2D4Carbapenem AD8V:FV:MTFF3D15T:FT:MGSTαD2 D4 D8 D15F∆F∆3503002502001501005007654321D20 1 2 3 4 5D2Albuminanimal CLU values were at or below <strong>the</strong> 1.85-fold mean change threshold,except one animal at day 15 (Fig. 2). At days 2 <strong>and</strong> 4, CLU levels wereelevated above <strong>the</strong> threshold in all animals with histomorphologic change(Fig. 2). At day 8, CLU levels in five of eight animals with histomorphologicchange were between seven <strong>and</strong> nearly 40-fold elevated, with oneadditional animal just above <strong>the</strong> threshold. In contrast, at day 15, onlythree of eight animals are above <strong>the</strong> threshold for CLU (Fig. 2).Fold-changes in urinary OPN relative to concurrent controls thatwould be considered positive for injury have not yet been determined.None<strong>the</strong>less, a twofold elevation was used based upon <strong>the</strong>data observed on day 15, where control animals were placed below <strong>the</strong>D4Histo N&DD8D15SCrD4 D8 D18indicated grades displayed as <strong>the</strong> following color: grade 0 (white), grade 1 (yellow), grade 2 (orange), grade 3 (red), grade 4 (blue), grade 5 (black). Thehistomorphologic change is shown at each necropsy day <strong>and</strong> vehicle-treated animals (control) are shown in white. Renal tubular necrosis <strong>and</strong> degeneration isshown in all biomarker panels except SCr, which is correlated to <strong>the</strong> renal composite, an overall score of tubular damage 29 . TFF3 control levels are high <strong>and</strong>are reduced with toxicity. TFF3 levels are displayed as fold-change in <strong>the</strong> negative direction (minus F∆).Figure 2 Correlation of urinary MesoScaleDiscovery biomarker levels withhistomorphologic change for carbapenemA-treated rats. Male or female Sprague Dawleyrats were administered carbapenem A at150 mg/kg/d (groups of five rats/dose/time point for up to 3 d) <strong>and</strong> <strong>the</strong> animalswere euthanized on days 2, 4, 8 or 18 fortoxicity evaluation <strong>and</strong> measurement <strong>and</strong>normalization of urinary biomarker levels(Kim-1, LCN2, OPN <strong>and</strong> CLU) (ng/ml) relativeto urinary creatinine. Treated male (T:M,square), treated female (T:F, circle), vehiclemale (V:M, star) <strong>and</strong> vehicle female (V:F,triangle) are indicated. Abundances of Kim-1(top left), LCN2 (top right), OPN (bottom left)<strong>and</strong> CLU (bottom right) are shown as foldchangerelative to <strong>the</strong> average of concurrentcontrols. Red dotted line indicates thresholdfrom ROC analysis 5–7 (data not shown). Theseverity grades of histopathologic change aftercarbapenem A treatment are indicated on ascale of 0 (no observed pathology) to 5 with<strong>the</strong> indicated grades displayed as <strong>the</strong> followingF∆F∆4540353025201510501614121086420D2D4Carbapenem AD8V:FV:MKim-1D15T:FT:MOPND2 D4 D8 D15F∆F∆4035302520151050D20 1 2 3 4 550403020100D4Histo N&DD8LCN2D15CLUD2 D4 D8 D15color: grade 0 (white), grade 1 (yellow), grade 2 (orange), grade 3 (red), grade 4 (blue), grade 5 (black). The histomorphologic change is shown ateach necropsy day <strong>and</strong> vehicle-treated animals (control) are shown in white. Renal tubular necrosis <strong>and</strong> degeneration are shown in all panels.488 VOLUME 28 NUMBER 5 MAY 2010 nature biotechnology


A rt i c l e sFigure 3 Correlation of urinary ELISA- <strong>and</strong> MesoScale Discovery-derivedbiomarker levels with histomorphologic change in gentamicin-treated rats.Male Sprague Dawley rats were administered gentamicin at 120 mg/kg/d togroups of five rats/dose/time point for 9 d <strong>and</strong> <strong>the</strong> animals were euthanizedei<strong>the</strong>r on day 10 (upper panel) or 39 (lower panel) for toxicity evaluation.Urinary biomarker levels of albumin (ALB), CLU, GSTα, Kim-1, LCN2, OPNwere measured (ng/ml) <strong>and</strong> serum chemistry parameters BUN <strong>and</strong> SCr weredetermined. Treated male (T:M, square), vehicle male (V:M, circle) <strong>and</strong>treated average (T:A, black triangle) are indicated. Urinary biomarker <strong>and</strong>serum chemistry values are shown as fold-change relative to <strong>the</strong> averageof concurrent controls. The severity grades of histopathologic change aftergentamicin treatment are indicated on a scale of 0 (no observed pathology)to 5 with <strong>the</strong> indicated grades displayed as <strong>the</strong> following color: grade 0(white), grade 1 (yellow), grade 2 (orange), grade 3 (red), grade 4 (blue),grade 5 (black). Renal tubular necrosis <strong>and</strong> degeneration at day 10 (toppanel) <strong>and</strong> regeneration at day 39 (bottom panel) is shown. Fold-change isrelative to day 10 control group average.F∆ 1060504030201000 1 2 345HistoT:MV:MT:A20© 2010 Nature America, Inc. All rights reserved.threshold (~92% specificity; Fig. 2). Thus, all carbapenem A controlanimal OPN values are below this arbitrary threshold. At day 2, three ofeight animals with histomorphologic change are above this threshold,whereas at day 4, seven of eight animals with histomorphologic changeare above this threshold (Fig. 2). At day 8, values of half of <strong>the</strong> animalswith histomorphologic change are above this threshold, whereas atday 15, values from two animals are above this threshold (Fig. 2).LCN2 fold changes >2.5 relative to concurrent controls are consideredpositive for injury at 95% specificity (F.D., unpublished exclusionmodel data). All carbapenem A individual control animal LCN2 valuesare at or below <strong>the</strong> 2.5-fold change threshold (Fig. 2). At day 2, sevenof eight animals with histomorphologic change are between 18- <strong>and</strong>40-fold (274 ng/ml upper limit of quantification, ULOQ) elevated forLCN2 (Fig. 2). At day 4, seven of eight animals with histomorphologicchange are between 5- <strong>and</strong> 15-fold elevated for LCN2 whereas at day 8,half <strong>the</strong> animals with histomorphologic change are above <strong>the</strong> threshold<strong>and</strong> at day 15 no animals are above <strong>the</strong> threshold (Fig. 2).Urinary TFF3 reductions 100-fold) <strong>and</strong> 4 showed albumin changes that were above <strong>the</strong>1.9-fold cutoff, whereas several day-8 animals were moderately above <strong>the</strong>threshold, <strong>and</strong> day-15 animals also showed subtle elevations (Fig. 1).GSTα similarly has a determined threshold value of 1.8-fold, with20 exploratory renal toxicity studies (J.S.O. <strong>and</strong> D.L.G., unpublishedobservations). All treated day-2 animals with histomorphologic changeappear to have elevations of GSTα above threshold, whereas half of <strong>the</strong>treated animals showed modest elevations at day 4 <strong>and</strong> no changes ofthis biomarker were seen at later study times (Fig. 1).Reversible tubular injury with gentamicin treatmentWe observed increases in treatment-related kidney weight increases(group average, 33%) on day 10 in high-dose (120 mg/kg/d for 9 d)gentamicin-treated rats. These correlated with bilateral renal enlargement<strong>and</strong> pallor. There were no significant gross or organ weightF∆ 10151050BUN SCr ALB CLU GSTα Kim-1 LCN2 OPNfindings in <strong>the</strong> 40 mg/kg/d dose group or at any o<strong>the</strong>r time point in<strong>the</strong> study, including after <strong>the</strong> 29-d recovery period.Moderate to severe renal tubular degeneration, necrosis <strong>and</strong> regenerationwere observed by histomorphology on day 10 at <strong>the</strong> high dose(120 mg/kg/d × 9 d) (Supplementary Fig. 3). After <strong>the</strong> 29-d recoveryperiod, tubular changes at <strong>the</strong> high dose (120 mg/kg/d × 9 d) were limitedto very slight regeneration, indicating nearly complete recovery(Supplementary Fig. 3). One rat, which received a low-dose gentamicin(40/mg/kg/d) treatment for 9 d had very slight tubular regenerationon day 10. Treatment-related focal areas of very slight to slightinterstitial inflammation were noted at day 10.Urinary biomarkers monitor gentamicin-inducednephrotoxicity <strong>and</strong> recoveryIn <strong>the</strong> gentamicin time course study, renal tubular necrosis <strong>and</strong> degenerationwere observed at day 10 in <strong>the</strong> 120 mg/kg/d dose group, whereasBUN <strong>and</strong> SCr elevations were more modest compared to those seenin <strong>the</strong> carbapenem A study. The urinary biomarker panel (albumin,CLU, GSTα, Kim-1 <strong>and</strong> LCN2) showed large elevations more thantenfold with Kim-1 increases nearly 50-fold (Fig. 3). TFF3 fold-changereductions were very small in this dose group (not shown), whereasOPN showed modest elevations (about fivefold) (Fig. 3). Consideringthat similar grade histomorphologic change at day 10 was observed fornecrosis <strong>and</strong> degeneration <strong>and</strong> regeneration, biomarker responses cannot be readily assigned to specific designations of histomorphologicchange (Fig. 3 <strong>and</strong> Supplementary Fig. 4). Twenty-nine days of recoveryafter gentamicin treatment showed both that serum chemistrymarkers <strong>and</strong> <strong>the</strong> urinary biomarker panel values returned to baseli<strong>new</strong>ith no observable necrosis <strong>and</strong> degeneration (Fig. 3 <strong>and</strong> data notshown). Thus, none of <strong>the</strong> biomarker levels appeared to correlate withobserved grade 1 regeneration seen at study day 39 (Fig. 3).Cystatin C as a serum marker of kidney dysfunctionWe induced a variety of renal lesions in rats by treating <strong>the</strong>m with oneof eight nephrotoxicants—cisplatin, gentamicin, tacrolimus (Protopic,Prograf), vancomycin, furosemide (Lasix), lithium (Eskalith), doxorubicinnature biotechnology VOLUME 28 NUMBER 5 MAY 2010 489


A rt i c l e sTable 1 Overview of <strong>the</strong> design of <strong>the</strong> studiesTest compoundDose levels,route,regimenNecropsy/histopath. (d)Urine collectiontimes (d)Blood/plasma (d)Animals strain n per groupn totalStudy data setNephrotoxicants© 2010 Nature America, Inc. All rights reserved.Carbapenem AGentamicin sulfateGentamicin sulfateVancomycinhydrochlorideDoxorubicinchlorhydrateFurosemideLithium carbonateCisplatinPuromycindihydrochlorideTacrolimus/FK506ANITMethapyrilenehydrochloride0, 100 mg/kgi.v.1× daily (3 d) 5 ml/kg0, 120 mg/kgi.p.1 daily (9 d) 5 ml/kg0, 35, 70, 140 mg/kgi.p.1× daily 5 ml/kg0, 70, 140, 210 mg/kgi.p.1 daily injection 5 ml/kg0, 2.5, 5.0, 7.5 mg/kgi.v.Once at day 15 ml/kg0, 45, 90, 180 mg/kgoral gavage2× daily 5 ml/kg0, 1, 2, 3 mEQu/kgoral gavage1× daily 5 ml/kg0,0.5, 1, 3 mg/kgi.p.Once at day 1 5 ml/kg0, 10, 20, 40 mg/kgi.p.1× daily 10 ml/kg0, 9, 12, 15,i.p.1× daily 5 ml/kg0, 5, 15, 30 mg/kgOral gavagedaily 5 ml/kg0, 15, 30, 60 mg/kgOral gavage1× daily 5 ml/kg2, 4 (3 d dosed), 8 (3 ddosed), 18 (3 d dosed)1, 10 (9 d dosed), 39 (9 ddosed)1–23–47–814–159–1038–391, 3, 7, 14 2– 36–713–141, 3, 7, 14 2–36–713–141, 3, 7, 14 2–36–713–141, 3, 7, 14 2–36–713–141, 3, 7, 14 2–36–713–141, 3, 7, 14 2–36–713–143, 7, 14, 22 2–36–713–1421–223, 7, 14, 21 2–36– 713–1420–21Hepatotoxicants1, 3, 7, 14 2–36–713–141, 3, 7, 14 2– 36–713–142, 4, 8, 18 Sprague Dawley 2 (V),4 (T)(male <strong>and</strong> female)48Merck10, 39 Sprague Dawley5301, 3, 7, 14 Han Wistar6961, 3, 7, 14 Han Wistar6961, 3, 7, 14 Han Wistar6961, 3, 7, 14 Han Wistar6961, 3, 7, 14 Han Wistar6961, 3, 7, 14 Han Wistar6963, 7, 14, 22 Han Wistar6963, 7, 14, 21 Han Wistar6961, 3, 7, 14 Han Wistar6961, 3, 7, 14 Han Wistar696MerckMerckNovartisNovartisNovartisNovartisNovartisNovartisNovartisNovartisNovartisNovartisi.v., intravenous; i.p., intraperitoneal; mEQu/kg, milli equivalent/kg.(Doxil, Adriamycin) or puromycin—that reflect different modes oftoxicity (Table 1). In contrast, two hepatotoxicants (alpha-naphtylisothiocyanate(ANIT) <strong>and</strong> methapyrilene) not induce kidney injury(Supplementary Table 1) 6 . For 942 animals in <strong>the</strong> ten studies, we measuredBUN <strong>and</strong> Scr levels by a clinical chemistry analyzer <strong>and</strong> scoredS-cystatin C abundance as part of a multiplexed protein assay. Kidneyinjury was assessed by histopathology, applying a systematic grading system(grade 0–5) <strong>and</strong> a controlled lexicon to describe <strong>the</strong> types of lesions <strong>and</strong><strong>the</strong> exact localization 29 . Exemplary photos of <strong>the</strong> variety of <strong>the</strong> lesionsobserved are provided elsewhere in this issue 6 . In <strong>the</strong> ideal case, a renalfunctional marker should capture functional changes resulting fromall types of kidney injury. Therefore, all observed major drug-inducedlesions were integrated into one composite histopathology bin using<strong>the</strong> highest grade of all lesions reported for that animal. In particular,we integrated lesions all along <strong>the</strong> nephron categorized as ‘tubularinjury’ (degeneration, necrosis, apoptosis, cell sloughing), ‘tubularregeneration’ (basophilia, mitosis), ‘intratubular casts’ (granular, leukocytic,hyaline, mineral), ‘tubular dilatation’, ‘glomerular alterations’(mesangial proliferation, glomerular vacuolization, glomerular fibrosis)<strong>and</strong> ‘interstitial fibrosis’ (cortex <strong>and</strong> medulla).We next did ROC inclusion <strong>and</strong> exclusion analyses similar to thoseoutlined elsewhere in this issue 29 . The results of <strong>the</strong> ROC exclusionanalysis (nephrotoxicant-dosed animals with a renal injury reportedversus animals not dosed with a nephrotoxicant <strong>and</strong> without a reportedrenal injury) are shown in Figure 4 <strong>and</strong> in Supplementary Tables 1<strong>and</strong> 2. These data highlight <strong>the</strong> superiority of S-cystatin C comparedto use of SCr <strong>and</strong> BUN for diagnosing renal injury. For all observedhistopathology grades, S-cystatin C has <strong>the</strong> highest AUC (exclusion,0.79; inclusion, 0.67) compared to <strong>the</strong> current peripheral st<strong>and</strong>ards SCr(exclusion, 0.68; inclusion, 0.65) <strong>and</strong> BUN (exclusion, 0.70; inclusion,0.62). It is also demonstrated by statistical methods that S-cystatin Cclearly outperformed both clinical chemistry parameters in <strong>the</strong> exclusionanalysis (differences of AUCs with P < 0.01). The fact that <strong>the</strong> differencesof AUCs between S-cystatin C <strong>and</strong> SCr <strong>and</strong> BUN were smallerfor <strong>the</strong> inclusion analysis (P = 0.40 for SCr <strong>and</strong> P = 0.02 for BUN) canbe attributed to <strong>the</strong> fact that S-cystatin C was initially increased in490 VOLUME 28 NUMBER 5 MAY 2010 nature biotechnology


A rt i c l e s© 2010 Nature America, Inc. All rights reserved.Figure 4 ROC curves for <strong>the</strong> inclusion<strong>and</strong> exclusion analysis with eight differentnephrotoxicant studies <strong>and</strong> two differen<strong>the</strong>patotoxicant studies from Novartis.(a–d) The sensitivity <strong>and</strong> specificity ofBUN, SCr, <strong>and</strong> S-cystatin C with respect toa composite histopathology score includesdata involving all histopathology grades (a),histopathology grade 0 to 3 (b), histopathologygrade 0 to 2 (c) <strong>and</strong> histopathology grade 0<strong>and</strong> 1 (d). (e,f) Area under <strong>the</strong> curve (e) <strong>and</strong>sensitivity (f) (at 95% specificity) of BUN, SCr,<strong>and</strong> S-cystatin C compared to <strong>the</strong> gold st<strong>and</strong>ard,histopathology. Animal numbers, n. Negative:n = 322. Positive: all, n = 253; 0 to 3, n = 251;0 to 2, n = 204; 0 to 1, n = 127.a number of nephrotoxicant-dosed animals,when histopathology was not observed at earlynecropsies, which was scored as false positivesin <strong>the</strong> inclusion analysis. Figure 4 also shows <strong>the</strong> diagnostic performancein terms of AUCs <strong>and</strong> sensitivity for 95% specificity when onlylow-grade histopathology was compared between experimental <strong>and</strong>control animals (grade 1 to 3 versus grade 0, grade 1 to 2 versus grade 0,<strong>and</strong> in particular grade 1 versus grade 0). In all cases, S-cystatin C showsan unquestionably improved performance relative to BUN <strong>and</strong> SCr forall studies. When restricting pathology to low-severity grades, S-cystatinC also shows an improved diagnostic performance relative to both SCr<strong>and</strong> BUN (Supplementary Table 1).In Figures 5 <strong>and</strong> 6, <strong>the</strong> levels of S-cystatin C, BUN <strong>and</strong> SCr areshown as fold-changes for all animals whereby each data point iscoded by <strong>the</strong> highest grade of renal lesion reported <strong>and</strong> <strong>the</strong> horizontalaSCr levels (fold-change) BUN levels (fold-change) S-cystatin C levels (fold-change)bc54321109876543212.22.01.81.61.41.21.00.8012SensitivitySensitivity1.00.90.80.70.60.50.40.30.20.1001.00.90.80.70.60.50.40.30.20.100R<strong>and</strong> 0.500SCr 0.684BUN 0.697Cst3 0.788R<strong>and</strong> 0.500SCr 0.674BUN 0.694Cst3 0.77100.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.71 – specificity 1 – specificity1.00.90.1 0.2Cisplatin Gentamicin Vancomycin Tacrolimus Puromycin3a b eR<strong>and</strong> 0.500SCr 0.654BUN 0.654Cst3 0.7500.3 0.4 0.5 0.6 0.7 0.8 0.9 1.01 – specificitySensitivitySensitivity1.00.90.80.70.60.50.40.30.20.10.80.70.60.50.40.30.20.1000.1 0.2 0.3 0.4 0.5 0.61 – specificity0.8 0.9 1.0c d f4R<strong>and</strong> 0.500SCr 0.572BUN 0.645Cst3 0.7150.7 0.8 0.90.550.50All 0 to 3 0 to 2 0 <strong>and</strong> 1Histopathology grade subsets0.600.550.500.450.400.350.300.250.200.150.10Cst3SCrBUNCst3SCrBUNAll 0 to 3 0 to 2 0 <strong>and</strong> 1Histopathology grade subsetsline represents <strong>the</strong> thresholds for 95% specificity (exclusion: 1.21-foldfor S-cystatin C, 1.21-fold for BUN <strong>and</strong> 1.12-fold for SCr). Looking at<strong>the</strong> color-coded dots with respect to <strong>the</strong> thresholds in <strong>the</strong> plots allowsdetection of true <strong>and</strong> false positives <strong>and</strong> true <strong>and</strong> false negatives on astudy-by-study basis (<strong>the</strong> main histopathology findings behind <strong>the</strong>plotted grades are briefly listed after <strong>the</strong> compound name).In <strong>the</strong> cisplatin study (tubular injury <strong>and</strong> regeneration in <strong>the</strong>tubular segments S1-S3, tubular dilatation in cortex <strong>and</strong> medulla,intratubular hyaline casts in thick ascending tubules), S-cystatin Cperformed best in identifying animals with lesions of grades 3<strong>and</strong> 4 <strong>and</strong> a number of animals with grades 1 <strong>and</strong> 2 lesions. BUNchanges detected injury in only a few high-grade animals, whereas SCrdetected injury in a few low-grade <strong>and</strong> some high-grade lesions. Also,for gentamicin treatment (tubular injury <strong>and</strong> regeneration in S1-S2),S-cystatin C detected all grade 3 <strong>and</strong> 4 lesions <strong>and</strong> a number of grade 1<strong>and</strong> 2 lesions, whereas BUN did not detect drug-induced lesions. Incontrast, in <strong>the</strong> vancomycin study (tubular injury <strong>and</strong> regenerationin S3 <strong>and</strong> thick ascending tubules, tubular dilatation in cortex <strong>and</strong>medulla, intratubular hyaline casts), both SCr <strong>and</strong> BUN outperformedS-cystatin C, which shows some false-negative grade 3 animals at <strong>the</strong>last termination time point. However, some low-dosed animals withouthistopathology findings have increased S-cystatin C values, whichmight reflect a signal earlier than histopathology observations. In <strong>the</strong>tacrolimus study (tubular regeneration in thick ascending tubules <strong>and</strong>distal tubules, intratubular mineralization in S3 <strong>and</strong> thick ascendingtubules, juxtaglomerular apparatus hypertrophy) <strong>and</strong> in <strong>the</strong> puromycinstudy (glomerular alterations/damage, tubular injury <strong>and</strong>Figure 5 Levels of S-cystatin C, BUN <strong>and</strong> SCr observed in individualanimals. (a–c) Correlation of S-cystatin C (a), BUN (b) <strong>and</strong> SCr (c) levelswith severity grades of histopathology for 470 animals in five Novartisstudies (cisplatin, gentamicin, vancomycin, tacrolimus <strong>and</strong> puromycin)involving Han Wistar rats. All values are represented as fold-changesversus <strong>the</strong> average values of study-matched <strong>and</strong> time-matched controlanimals on a logarithmic scale. The animals are ordered by study, withineach study by dose group (with increasing doses) <strong>and</strong> within each dosegroup by termination time point (with increasing time). The symbols <strong>and</strong><strong>the</strong> colors represent <strong>the</strong> histopathology readout (no histopathology findingobserved (red), grade 1 (green), grade 2 (blue), grade 3 (orange) <strong>and</strong>grade 4 (black) on a 5 grade severity scale). The magenta lines represent<strong>the</strong> thresholds determined for 95% specificity in <strong>the</strong> ROC analysis for allhistopathology grades (1.209 for S-cystatin C, 1.208 for BUN <strong>and</strong>1.129 for SCr).1.00.850.800.750.700.650.60nature biotechnology VOLUME 28 NUMBER 5 MAY 2010 491


A rt i c l e s© 2010 Nature America, Inc. All rights reserved.Figure 6 Levels of S-cystatin C, BUN <strong>and</strong> SCr observedin individual animals. (a–c) Correlation of S-cystatin C (a),BUN (b) <strong>and</strong> SCr (c) levels with severity grades of histopathologyfor 474 animals in five Novartis studies (doxorubicin, lithium,furosemide, methapyrilene <strong>and</strong> ANIT) involving Han Wistar rats. Allvalues are represented as fold-changes versus <strong>the</strong> average values ofstudy-matched <strong>and</strong> time-matched control animals on a logarithmicscale. The animals are ordered by study, within each study by dosegroup(with increasing doses) <strong>and</strong> within each dose-group by terminationtime point (with increasing time). The symbols <strong>and</strong> <strong>the</strong> colors represent<strong>the</strong> histopathology readout [no histopathology finding observed (red),grade 1 (green), grade 2 (blue), grade 3 (orange) on a 5 grade severityscale]. The magenta lines represent <strong>the</strong> thresholds determined for 95%specificity in <strong>the</strong> ROC analysis for all histopathology grades (1.209for S-cystatin C, 1.208 for BUN <strong>and</strong> 1.129 for SCr).regeneration in S1-S3, intratubular hyaline casts all along <strong>the</strong> nephron,tubular dilatation in cortex <strong>and</strong> medulla), S-cystatin C detects mostanimals with positive histopathology <strong>and</strong> is increased in certain dosedanimals without observed histopathology changes, perhaps revealingearly symptoms of injury or prodromal processes.For <strong>the</strong> doxorubicin study (tubular injury <strong>and</strong> regeneration inS1-S3 <strong>and</strong> thick ascending tubules, intratubular casts from S1 tothick ascending tubules, tubular dilation in cortex, medulla <strong>and</strong>papilla), S-cystatin C elevations were seen in most high-gradeanimals. However, SCr did not distinguish kidney toxicity at all,showing significantly decreased values even below <strong>the</strong> controlvalues, which might be ascribed to <strong>the</strong> high unspecific general cytotoxicityof doxorubicin. Yet, in <strong>the</strong> lithium study (tubular injury<strong>and</strong> regeneration in collecting duct, tubular dilatation in cortex<strong>and</strong> medulla), SCr identified slightly more animals with positivehistopathology than S-cystatin C. In <strong>the</strong> furosemide study (tubularinjury <strong>and</strong> regeneration in S3, regeneration in thick ascendingtubules, intratubular casts <strong>and</strong>/or mineralization in S3), BUNoutperformed SCr <strong>and</strong> S-cystatin C, which missed some animals,mainly with grade 1 lesions. In this study, all three markers providesensitive (low-dose group) <strong>and</strong> earlier (mid-dose group early timepoints) assessment of kidney injury compared to histopathology inthis study. For <strong>the</strong> hepatotoxicant methapyrilene (only spontaneousregeneration changes observed), none of <strong>the</strong> three markers showedfalse-positive measurements or increased levels for <strong>the</strong> animals withspontaneous regeneration lesions. Similarly for ANIT, S-cystatin C<strong>and</strong> BUN revealed instances of spontaneous regeneration, whereasSCr shows systematically a number of false positives in <strong>the</strong> highdosegroup. The increased SCr levels might be explained by a druginducedmuscle breakdown supported by reduced body weights of<strong>the</strong> animals.In summary, when compared with <strong>the</strong> current st<strong>and</strong>ards BUN <strong>and</strong>SCr, S-cystatin C elevation detected more animals with renal injury infive of eight nephrotoxicant studies compared to BUN <strong>and</strong> SCr, whicheach showed better performance than S-cystatin C in just two studies.Except for some animals in <strong>the</strong> vancomycin study, S-cystatin C levelsshowed a notable correlation with <strong>the</strong> severity grade of histopathologiclesions. In several studies, S-cystatin C was more sensitive <strong>and</strong>showed changes even earlier than observed with histopathology, visibleas groups of nephrotoxicant animals with systematically increasedS-cystatin C relative to control animals. Finally, no significant specificityissues were identified for S-cystatin C with <strong>the</strong>se studies incontrast to SCr, which shows systematic false positives for <strong>the</strong> animalsdosed with <strong>the</strong> hepatotoxicant ANIT. The visual inspection on ananimal-by-animal basis reconfirms <strong>the</strong> statistical ROC analysis, whichshows that S-cystatin C outperformed SCr <strong>and</strong> BUN.S-cystatin C levels (fold-change)blevels (fold-change) BUN levels (fold-change)aSCrc2.01.81.61.41.21.00.80.62.22.01.81.61.41.21.00.80.60.41.51.41.31.21.11.00.90.8Doxorubicin0 12 3LithiumFurosemideMethapyrileneANITDISCUSSIONUrinary biomarkers hold considerable promise for monitoring potentialadverse effects on kidney integrity <strong>and</strong> function in both clinical<strong>and</strong> nonclinical settings in <strong>the</strong> absence of biopsy. Ultimate clinicalbiomarker utility would monitor both <strong>the</strong> progression <strong>and</strong> recoveryfrom onset of renal injury. This necessitates preclinical demonstrationof injury reversibility both for <strong>the</strong> biomarker signal(s) <strong>and</strong> histopathologicobservation. Our studies to evaluate <strong>the</strong> utility of a panel ofbiomarkers to monitor <strong>the</strong> reversibility of kidney damage after cessationof drug treatment address a key limitation identified by regulatoryauthorities during evaluation of <strong>the</strong> first VXDS of safety biomarkersfor kidney toxicity. This is <strong>the</strong> first report to use a broad panel of urinarybiomarker values to demonstrate that renal injury can be monitoredat both <strong>the</strong> point where toxicity begins <strong>and</strong> when it reverses after<strong>the</strong> withdrawal of treatment. S-cystatin C shows improved sensitivity<strong>and</strong> specificity over <strong>the</strong> historical st<strong>and</strong>ard markers SCr <strong>and</strong> BUN,allowing renal injuries o<strong>the</strong>r than proximal tubular <strong>and</strong> glomerulardamage to be monitored for drug development <strong>and</strong> for pediatric <strong>and</strong>geriatric clinical populations, where <strong>the</strong> st<strong>and</strong>ards are less optimalfor monitoring.Carbapenem A treatment (3 d)-related tubular injury was trackedwell by serum chemistry marker elevations up to day 4 collection<strong>and</strong> returned toward baseline at study days 8 <strong>and</strong> 18 (Fig. 1 <strong>and</strong>Supplementary Fig. 2). If a similar clinical trial were designed withserum sampling at 1 or 2 weeks, <strong>the</strong>n little information identifying renalinjury would have been revealed. By contrast, in <strong>the</strong> carbapenem Astudy at day 8, urinary Kim-1, OPN, CLU <strong>and</strong> TFF3 values showedlarge fold-change alterations for most histopathologic-positive animals<strong>and</strong> at day 15, Kim-1, CLU, OPN <strong>and</strong> TFF3 values still revealedelevations above threshold (Figs. 1 <strong>and</strong> 2). The panel of urinarymarkers adds information to <strong>the</strong> clinical chemistry markers in <strong>the</strong>carbapenem A time course where all urinary markers show a trend492 VOLUME 28 NUMBER 5 MAY 2010 nature biotechnology


A rt i c l e s© 2010 Nature America, Inc. All rights reserved.toward baseline at study day 15. Urinary albumin <strong>and</strong> GSTα showedsimilar time-course elevations to clinical chemistry markers <strong>and</strong> adynamic range in <strong>the</strong> 100-fold range that far exceeded that of SCr<strong>and</strong> BUN (Fig. 1 <strong>and</strong> Supplementary Fig. 2). The urinary markersalbumin <strong>and</strong> GSTα appear to be useful for <strong>the</strong> monitoring of rapidonset renal injury that occurs soon after drug administration. BUN<strong>and</strong> SCr elevations were less elevated in <strong>the</strong> gentamicin time coursecompared to <strong>the</strong> carbapenem A study or <strong>the</strong> majority of studies from<strong>the</strong> VXDS submission <strong>and</strong> might be considered borderline positive 5–7 .With <strong>the</strong> exception of TFF3, <strong>the</strong> urinary biomarker panel (albumin,CLU, GSTα, Kim-1, LCN2 <strong>and</strong> to a lesser degree OPN) showed largeelevations at day 10 with gentamicin treatment (Fig. 3). An analysison day 29 after cessation of gentamicin treatment revealed that <strong>the</strong>serum chemistry markers <strong>and</strong> <strong>the</strong> urinary biomarker panel valuesapproached baseline with no observable necrosis <strong>and</strong> degenerationinjury <strong>and</strong> limited grade 1 regeneration <strong>and</strong> fibrosis. Thus, <strong>the</strong> urinarybiomarker panel (albumin, CLU, GSTα, Kim-1 <strong>and</strong> LCN2) in <strong>the</strong>gentamicin time course added value to subtle SCr <strong>and</strong> BUN changesto monitor renal injury, repair <strong>and</strong> function.In conclusion, <strong>the</strong> use of <strong>the</strong> urinary renal toxicity biomarkerpanel enables injury monitoring for a broad context of study designs<strong>and</strong> potential sampling time points. No single marker is likely to beapplied universally across many possible renal injury contexts. Forexample, GSTα appears to be an excellent early toxicity biomarkerfor epi<strong>the</strong>lial necrosis. In contrast, Kim-1 <strong>and</strong> clusterin levels persistduring regeneration <strong>and</strong> appear to reflect <strong>the</strong> triggering <strong>and</strong> continuationof <strong>the</strong> repair process. Elevations in levels of albumin correlatestrictly with early loss of function seen after tubular epi<strong>the</strong>lial necrosis<strong>and</strong> degeneration. Measurement of all <strong>the</strong> renal injury markersmeasured in parallel enables <strong>the</strong> investigator to capture criticalinformation with regard to renal toxicity, repair <strong>and</strong> function, fromstudy start to finish. There are often limitations regarding frequencyof sample collection due to study design <strong>and</strong> dosing requirements.Thus, use of a biomarker panel insures that data generated are notdependent upon preconceived viewpoints regarding <strong>the</strong> expectedperformance of any single marker for a given study. Measuring <strong>the</strong>panel of injury markers maximizes <strong>the</strong> level of interpretation froma study design compared to a single marker being deployed in amonitoring study. Multiple markers, however, require more expertiseto interpret many biomarker signals compared to just one or two.Clinical decisions are often made with a few critical markers ra<strong>the</strong>rthan a large panel. Integration of <strong>the</strong>se advantages <strong>and</strong> concernswill be resolved with experience, in addition to appropriate models<strong>and</strong> algorithms.The second gap identified during <strong>the</strong> qualification <strong>and</strong> submissionof renal safety biomarkers was a concern regarding whe<strong>the</strong>r particularrenal injuries respond sensitively toward only specific typesof lesions. For example, Kim-1 is specifically expressed in proximaltubules only in <strong>the</strong> case of proximal tubular injury but may havelimited sensitivity to lesions in o<strong>the</strong>r compartments of <strong>the</strong> kidney.A panel of biomarkers that respond collectively <strong>and</strong> complement oneano<strong>the</strong>r with respect to potential kidney injuries in various nephronsegments would be a tremendous advantage to localize renal lesions.An alternative view is that a comprehensive panel of specific markerswould be needed to cover every compartment <strong>and</strong> every possiblelesion <strong>and</strong> injury in <strong>the</strong> kidney to monitor overall renal safety. Thisreport presents an alternative to such localized markers. Novel renalfunction markers, such as S-cystatin C, monitor <strong>the</strong> general functionof <strong>the</strong> kidney. The current st<strong>and</strong>ards BUN <strong>and</strong> SCr are both kidneyfunction markers, but both markers are faced with several limitations,including extra-glomerular (SCr can be cleared trans-proximaltubules) clearance, variability in production, limited sensitivity,<strong>and</strong> poor specificity 1 . The panel of biomarkers evaluated here<strong>and</strong> assessed in accompanying manuscripts extend <strong>the</strong> diagnosticcapabilities to a variety of acute toxicant injuries at increased sensitivity<strong>and</strong> reliability. S-cystatin C has gained increasing use for clinicalpurposes, including treatment of chronic kidney diseases, such asdiabetic nephropathy, kidney transplantation, treatment of elderly<strong>and</strong> pediatric populations (muscle mass changes), detection of acuterenal failure <strong>and</strong> prediction of cardiovascular-associated risks.Several reviews <strong>and</strong> meta-analyses have demonstrated that S-cystatin Cis more sensitive, specific <strong>and</strong> reliable than SCr looking at dosingregimens <strong>and</strong> clinical outcome 18–23 . Yet, no preclinical qualificationhas been performed because of <strong>the</strong> lack of available assays for <strong>the</strong> rat.In addition, it has been elucidated here, that a qualified biomarker inpreclinical studies can be crucial for <strong>the</strong> translation of potentially preclinicalnephrotoxic drugs safely into human 5–7 . We have described asubstantial analysis to close gaps in <strong>the</strong> preclinical qualification of thispromising renal function biomarker using <strong>the</strong> statistical analyses <strong>and</strong>assessments established in <strong>the</strong> first VXDS of renal injury markers. Wehave demonstrated that <strong>the</strong> marker outperformed <strong>the</strong> current st<strong>and</strong>ardsSCr <strong>and</strong> BUN both in terms of sensitivity <strong>and</strong> robustness. Thisreconfirms in <strong>the</strong> rat what has been described in numerous publicationsabout <strong>the</strong> clinical use in humans of S-cystatin C 18–23 . In addition,<strong>the</strong> preclinical qualification is anchored in a histopathologic readoutof <strong>the</strong> target organ, which is not available in most clinical situations.It was demonstrated that S-cystatin C as a renal function marker c<strong>and</strong>etect various types of drug-induced lesions in various compartmentsof <strong>the</strong> kidney independent from <strong>the</strong> mode of nephrotoxicity also in<strong>the</strong> case of minimal injury (grade 1). From a preclinical perspective,ano<strong>the</strong>r advantage of S-cystatin C is its measurement in blood. Urinecollection in preclinical studies can be tedious <strong>and</strong> is not as routinelycollected in drug development compared to serum clinical chemistryanalyses. The results of <strong>the</strong> biological qualification of S-cystatin Cas a preclinical functional biomarker presented in this manuscriptmay have several effects on drug development. First of all, a sensitive<strong>and</strong> reproducible assay is available for measuring S-cystatin Cin rats. Second, <strong>the</strong> sensitivity <strong>and</strong> specificity of <strong>the</strong> biomarker wasdemonstrated <strong>and</strong> rules, such as thresholds with associated specificity<strong>and</strong> sensitivity, were established for its use in routine preclinical drugdevelopment. Finally, this work establishes <strong>the</strong> necessary bridge to itsclinical use.We will submit <strong>the</strong>se data to regulatory bodies in an attempt toclose both gaps of <strong>the</strong> first VXDS intended to qualify biomarkers tomonitor nephrotoxicity as part of <strong>the</strong> rolling qualification process.The recovery data will allow an extension of <strong>the</strong> context of <strong>the</strong>preclinically qualified biomarkers to monitor reversibility of lesions.S-cystatin C can monitor renal function in rat good laboratorypractice studies <strong>and</strong> be used as a translational biomarker for earlyclinical trials in a narrow context. Taken toge<strong>the</strong>r, <strong>the</strong> results presentedshould help to promote <strong>the</strong> use of <strong>the</strong>se additional renal safety testsin both drug development <strong>and</strong> routine clinical practice.MethodsMethods <strong>and</strong> any associated references are available in <strong>the</strong> onlineversion of <strong>the</strong> paper at http://www.nature.com/naturebiotechnology/.Note: Supplementary information is available on <strong>the</strong> Nature Biotechnology website.AcknowledgmentsS. Leuillet <strong>and</strong> B. Palate (Centre International de Toxicologie (CIT)) kindlyperformed Novartis studies <strong>and</strong> <strong>the</strong> histopathology assessment <strong>and</strong>J. Mapes (RBM) developed <strong>the</strong> S-cystatin C assay. We thank G. Miller <strong>and</strong>nature biotechnology VOLUME 28 NUMBER 5 MAY 2010 493


A rt i c l e s© 2010 Nature America, Inc. All rights reserved.P. Srinivasa for helpful comments on <strong>the</strong> manuscript. Z.E., K.V. <strong>and</strong> W.E.G.kindly shared unpublished observations for GST alpha.AUTHOR CONTRIBUTIONSJ.S.O., F.D., W.J.B., M.J.T., T.R.S., J.F.S., W.E.G., E.P., A.C., F.S., A.M., O.G., D.R.R.,F.L., S.-D.C., G.M., J.V., D.L.G., F.D.S. <strong>and</strong> D.W. designed research; Z.E., T.F., N.M.,E.P., D.R.R., S.T., H.K.C., S.R., D.T.T., K.V. <strong>and</strong> H.J. performed research; Z.E. <strong>and</strong>K.V. contributed <strong>new</strong> reagents/analytic tools; J.S.O., D.H., N.M., W.E.G., F.D., Y.Y.,G.M., P.V., A.C., D.L.G. <strong>and</strong> F.D.S. analyzed data; <strong>and</strong> J.S.O., S.T., Z.E., K.V., F.D.,D.L.G. <strong>and</strong> F.D.S. wrote <strong>the</strong> paper.COMPETING FINANCIAL INTERESTSThe authors declare competing financial interests: details accompany <strong>the</strong> full-textHTML version of <strong>the</strong> paper at http://www.nature.com/naturebiotechnology/.Published online at http://www.nature.com/naturebiotechnology/.Reprints <strong>and</strong> permissions information is available online at http://npg.nature.com/reprints<strong>and</strong>permissions/.1. Bonventre, J.V. et al. Next-generation biomarkers for detecting kidney toxicity.Nat. Biotechnol. 28, 436–440 (2010).2. Ferguson, M.A., Vaidya, V.S. & Bonventre, J.V. Biomarkers of nephrotoxic acutekidney injury. Toxicology 245, 182–193 (2008).3. Vaidya, V.S., Ramirez, V., Ichimura, T., Bobadilla, N.A. & Bonventre, J.V. Urinarykidney injury molecule-1: a sensitive quantitative biomarker for early detection ofkidney tubular injury. Am. J. Physiol. Renal Physiol. 290, F517–F529 (2006).4. Han, W.K. et al. Urinary biomarkers in <strong>the</strong> early diagnosis of acute kidney injury.Kidney Int. 73, 863–869 (2008).5. Yu, Y., Jin, H., Holder, D., Ozer, J.S. & Villarreal, S. Urinary biomarkers trefoil factor 3<strong>and</strong> albumin enable early detection of kidney tubular injury. Nat. Biotechnol. 28,470–477 (2010).6. Dieterle, F. et al. Urinary clusterin, cystatin C, β2-microglobulin <strong>and</strong> total protein asmarkers to detect drug-induced kidney injury. Nat. Biotechnol. 28, 463–469 (2010).7. Vaidya, V.S. et al. Kidney injury molecule-1 outperforms traditional biomarkers ofkidney injury in preclinical biomarker qualification studies. Nat. Biotechnol. 28,478–485 (2010).8. Mattes, W.B. & Walker, E.G. Translational toxicology <strong>and</strong> <strong>the</strong> work of <strong>the</strong> predictivesafety testing consortium. Clin. Pharmacol. Ther. 85, 327–330 (2009).9. Razzaque, M.S. & Taguchi, T. Cellular <strong>and</strong> molecular events leading to renaltubulointerstitial fibrosis. Med. Electron Microsc. 35, 68–80 (2002).10. Westhuyzen, J. et al. Measurement of tubular enzymuria facilitates early detectionof acute renal impairment in <strong>the</strong> intensive care unit. Nephrol. Dial. Transplant. 18,543–551 (2003).11. Bailly, V. et al. Shedding of kidney injury molecule-1, a putative adhesion proteininvolved in renal regeneration. J. Biol. Chem. 277, 39739–39748 (2002).12. Ichimura, T., Hung, C.C., Yang, S.A., Stevens, J.L. & Bonventre, J.V. Kidney injurymolecule-1: a tissue <strong>and</strong> urinary biomarker for nephrotoxicant-induced renal injury.Am. J. Physiol. Renal Physiol. 286, F552–F563 (2004).13. Berger, T. et al. Lipocalin 2-deficient mice exhibit increased sensitivity to Escherichiacoli infection but not to ischemia-reperfusion injury. Proc. Natl. Acad. Sci. USA103, 1834–1839 (2006).14. Silkensen, J.R., Agarwal, A., Nath, K.A., Manivel, J.C. & Rosenberg, M.E. Temporalinduction of clusterin in cisplatin nephrotoxicity. J. Am. Soc. Nephrol. 8, 302–305(1997).15. Orl<strong>and</strong>i, A. et al. Modulation of clusterin isoforms is associated with all-trans retinoicacid-induced proliferative arrest <strong>and</strong> apoptosis of intimal smooth muscle cells.Arterioscler. Thromb. Vasc. Biol. 25, 348–353 (2005).16. Vaidya, V.S. & Bonventre, J.V. Mechanistic biomarkers for cytotoxic acute kidneyinjury. Expert Opin. <strong>Drug</strong> Metab. Toxicol. 2, 697–713 (2006).17. Hoffmann, W. Trefoil factors TFF (trefoil factor family) peptide-triggered signalspromoting mucosal restitution. Cell. Mol. Life Sci. 62, 2932–2938 (2005).18. Mussap, M. & Plebani, M. Biochemistry <strong>and</strong> clinical role of human cystatin C. Crit.Rev. Clin. Lab. Sci. 41, 467–550 (2004).19. Takuwa, S., Ito, Y., Ushijima, K. & Uchida, K. Serum cystatin-C values in childrenby age <strong>and</strong> <strong>the</strong>ir fluctuation during dehydration. Pediatr. Int. 44, 28–31 (2002).20. Madero, M., Sarnak, M.J. & Stevens, L.A. Serum cystatin C as a marker of glomerularfiltration rate. Curr. Opin. Nephrol. Hypertens. 15, 610–616 (2006).21. Dharnidharka, V.R., Kwond, C. & Stevens, G. Serum cystatin C is superior to serumcreatinine as a marker of kidney function: a meta-analysis. Am. J. Kidney Dis. 40,221–226 (2002).22. Shlipak, M.G., Praught, M.L. & Sarnak, M.J. Update on cystatin C: <strong>new</strong> insightsinto <strong>the</strong> importance of mild kidney dysfunction. Curr. Opin. Nephrol. Hypertens.15, 270–275 (2006).23. Herget-Rosenthal, S. et al. Early detection of acute renal failure by serum cystatin C.Kidney Int. 66, 1115–1122 (2004).24. Anonymous US Food <strong>and</strong> <strong>Drug</strong> Administration Agency 510(k) Substantialequivalence determination decision summary device only. (FDA, Rockville, Maryl<strong>and</strong>,USA, 2007) .25. Sing, T., S<strong>and</strong>er, O., Beerenwinkel, N. & Lengauer, T. ROCR: visualizing classifierperformance in R. Bioinformatics 21, 3940–3941 (2005).26. Hanley, J.A. & McNeil, B.J. The meaning <strong>and</strong> use of <strong>the</strong> area under a receiveroperating characteristic (ROC) curve. Radiology 143, 29–36 (1982).27. DeLong, E.R., DeLong, D.M. & Clarke-Pearson, D.L. Comparing <strong>the</strong> areas under twoor more correlated receiver operating characteristic curves: a nonparametricapproach. Biometrics 44, 837–845 (1988).28. Rosen, H. et al. Reduced immunotoxicity <strong>and</strong> preservation of antibacterial activityin a releasable side-chain carbapenem antibiotic. Science 283, 703–706 (1999).29. Sistare, F.D. et al. Towards consensus practices to qualify safety biomarkers for usein early drug development. Nat. Biotechnol. 28, 446–454 (2010).494 VOLUME 28 NUMBER 5 MAY 2010 nature biotechnology


© 2010 Nature America, Inc. All rights reserved.ONLINE METHODSData availability. For <strong>the</strong> complete data set, see Supplementary Data Set.Reversibility studies. Animals. Male <strong>and</strong> female Sprague Dawley Crl:CD(SD)rats (66–72 d old; 180–320 g) were purchased from Charles River Laboratories<strong>and</strong> maintained in a central animal facility free of known chemical contaminantsunder conditions of 21 ± 1 °C <strong>and</strong> 50–80% relative humidity in analternating 12 h light-dark cycle. Rats were fed with commercial rodent chow(PMI-certified rodent diet) (22 g/d, males; 16 g/d, females), given waterad libitum, <strong>and</strong> were acclimated for 1 week before use. All animal maintenance<strong>and</strong> treatment protocols were in compliance with <strong>the</strong> Guide for Care<strong>and</strong> Use of Laboratory Animals adopted <strong>and</strong> promulgated by <strong>the</strong> NationalInstitutes of Health <strong>and</strong> were approved by <strong>the</strong> Institutional Animal Care <strong>and</strong>Use Committee.Toxicity study dosing. Sprague Dawley rats received ei<strong>the</strong>r gentamicin or carbapenemA treatment. Gentamicin sulfate was administered to male rats by intraperitoneal(i.p.) injection at 0, 40 or 120 mg/kg/d (n = 5 rats/dose group/timepoint) for up to 9 d. In <strong>the</strong> gentamicin study, <strong>the</strong> animals were necropsied on studyday 10 or 39 (29-d recovery) for toxicity evaluation. Carbapenem A was administeredintravenously (2 ml/min) to male <strong>and</strong> female rats at 150 mg/kg once dailyfor 3 d followed by a recovery period of up to 15 d (n = 4 rats/dose group/timepoint for treated groups; n = 2 for control groups). For <strong>the</strong> carbapenem A study,necropsy was performed on study day 2, 4, 8 or 18. For both studies, <strong>the</strong> vehicle<strong>and</strong> control article was 0.9% NaCl <strong>and</strong> <strong>the</strong> dosing volume was 5 ml/kg.Urine collection. Urine was collected 18 ± 2 h before necropsy from fastedrats placed in st<strong>and</strong>ard metabolic cages. Individual urine samples were collectedinto containers (placed around dry ice) <strong>and</strong> were stored at −80 °C untilurinalysis. After thawing, samples were placed on wet ice <strong>and</strong> volumes wereevaluated (precipitates settled by gravity <strong>and</strong> were discarded). Urine samples(2.5 ml) were tested for routine clinical chemistry urinalysis (Roche ModularAnalyzer): manual specific gravity, pH, protein, glucose, creatinine, occult blood<strong>and</strong> ketones. For <strong>the</strong> remaining urine, small aliquots were stored at −80 °Cfor biomarker analysis <strong>and</strong> repeated freeze-thaw cycles were avoided.Blood collection <strong>and</strong> clinical chemistry. Fasted rats were bled from <strong>the</strong> venacava with 2 ml collected into a serum separator tube (centrifuged 1,500g for10 min at 4 °C). To isolate plasma, 2 ml of blood was placed into an EDTA collectiontube (centrifuged 900g for 15 min at 4 °C). Isolated plasma <strong>and</strong> serumsamples were stored at −80 °C. BUN (mg/dl) <strong>and</strong> SCr were evaluated using ast<strong>and</strong>ard clinical chemistry analyzer (Roche-Modular).Histopathology. Necropsies were limited to examination <strong>and</strong> collection ofliver <strong>and</strong> kidney. The terminal body weights <strong>and</strong> liver <strong>and</strong> kidney weights wererecorded from all <strong>the</strong> rats at scheduled necropsies (data not shown). Kidneytissue was collected for histology at necropsy. A left kidney section (5 mm sectionincluding <strong>the</strong> papilla, cortex, <strong>and</strong> medulla) was fixed in 10% neutral bufferedformalin (NBF) for 24 h, processed <strong>and</strong> embedded in paraffin. Embeddedtissues were cut into 4–6 µm sections <strong>and</strong> stained with H&E. Kidneys fromcontrol, high-dose animals, <strong>and</strong> organs with test article–related renal changesfrom lower dosed groups, were examined microscopically by a Merck pathologist(blinded from biomarker data) <strong>and</strong> results were subsequently reviewed byano<strong>the</strong>r supervising pathologist. A severity score grading scale of 0 to 5 wasemployed to grade pathological lesions from 0 (no observable pathology),1 (very slight), 2 (slight), 3 (moderate), 4 (marked) or 5 (severe). Diagnosesfor individual animals were grouped into composite categories for statisticalanalysis: 1) tubular degeneration <strong>and</strong> necrosis composite, 2) tubular basophilia<strong>and</strong> regeneration composite, 3) tubular dilatation, <strong>and</strong> 4) since primary histomorphologicchanges were confined to <strong>the</strong> tubular epi<strong>the</strong>lium of <strong>the</strong> renalcortex in <strong>the</strong>se studies, a fourth overall composite score was considered for allrenal injury. The composite score for an individual animal was derived from<strong>the</strong> highest pathology score of <strong>the</strong> diagnoses comprising a given composite.Fibrosis was observed on occasion in <strong>the</strong> recovery phase <strong>and</strong> described, yet isnot part of <strong>the</strong> composite tubular score.MesoScale discovery assays. Urinary ELISA assays were performed as indicated.The performance characteristics of <strong>the</strong> MesoScale Discovery immunogenicassays were evaluated by measuring <strong>the</strong> sensitivity, assay range, specificity,reproducibility, recovery, <strong>and</strong> interference (Erdos, Vlasakova, <strong>and</strong> Glaab,unpublished data to be reported elsewhere). MesoScale Discovery assays usedantibody pairs specific to each analyte. Acceptance criteria used for biomarkerULOQ <strong>and</strong> lowest limit of quantification (LLOQ) was CV


© 2010 Nature America, Inc. All rights reserved.Rules Based Medicine developed <strong>the</strong> S-cystatin C assay as part of a threeanalytemultiplex panel toge<strong>the</strong>r with clusterin <strong>and</strong> osteopontin <strong>and</strong> validatedthis multiplex for urine <strong>and</strong> serum measurements. Capture antibodies(Upstate for <strong>the</strong> S-cystatin C assay) were covalently coupled through <strong>the</strong>irfree amino groups by st<strong>and</strong>ard EDC/NHS ((1-Ethyl-3-[3-dimethylaminopropyl]carbodiimideHydrochloride)/N-hydroxysulfosuccinimide) chemistryon to one set of <strong>the</strong> fluorescently encoded, carboxylated microspheres asper st<strong>and</strong>ard operating procedure. Detection antibodies specific for eachof <strong>the</strong> analytes in <strong>the</strong> multiplexes (R&D Systems, 6 µg/ml for <strong>the</strong> cystatin Cassay) were biotinylated using a NHS-biotin procedure as per st<strong>and</strong>ardoperating procedure.The validation of <strong>the</strong> assay followed accepted procedures recommendedin The Bioanalytical Method Validation Guidance for Industry (http://www.fda.gov/downloads/<strong>Drug</strong>s/GuidanceComplianceRegulatoryInformation/Guidances/UCM070107.pdf) with <strong>the</strong> exception that (i) for <strong>the</strong> accuracy at<strong>the</strong> LLOQ <strong>and</strong> ULOQ, a mean deviation of 30% instead of <strong>the</strong> recommended20% <strong>and</strong> (ii) for <strong>the</strong> accuracy in between LLOQ <strong>and</strong> ULOQ, a mean deviationof 20% instead of 15% was accepted for <strong>the</strong> quality controls. The assayvalidation covered inter-day, inter-operator, inter-instrument reproducibility,linearity, parallelism, spike recovery, freeze-thaw stability (three cycles), shorttermstability (1–24 h at 22 °C, long-term stability (5 weeks, 3 months), matrixinterferences <strong>and</strong> cross-reactivity (against each of <strong>the</strong> 66 antigens measuredin <strong>the</strong> Rules Based Medicine rodent MAP version 1.6). The LLOQ was determinedto be 25 ng/ml <strong>and</strong> <strong>the</strong> ULOQ was determined to be 839 ng/ml. In<strong>the</strong> working range between LLOQ <strong>and</strong> LLOQ, at least 66% of all determinationsfor each validation criterion cited above were within <strong>the</strong> mentionedacceptance criteria.For <strong>the</strong> biomarker measurements, samples were thawed, centrifuged at6,000g for 10 min, <strong>and</strong> divided into aliquots. One aliquot was used fortesting <strong>and</strong> <strong>the</strong> o<strong>the</strong>rs were frozen at −80 °C within 3 h. Using automatedpipetting techniques, each sample was introduced into one of <strong>the</strong> capturemicrospheres multiplexes. These mixtures were thoroughly mixed<strong>and</strong> incubated at 22 °C for 1 h. A multiplexed cocktail of biotinylated,reporter antibodies was <strong>the</strong>n added, thoroughly mixed, <strong>and</strong> incubated for1 h at 22 °C. Assays were <strong>the</strong>n developed using an excess of streptavidinphycoerythrin,which was evenly mixed into each multiplex <strong>and</strong> incubatedfor 1 additional hour at 25 °C. The volume of each multiplexed reactionwas reduced by vacuum filtration <strong>and</strong> <strong>the</strong> reaction mixture increased bydilution into matrix buffer for analysis (2,000-fold dilution for <strong>the</strong> cystatin Cassay). The analysis was performed in a Luminex 100 instrument <strong>and</strong> <strong>the</strong>resulting data stream was interpreted using proprietary data analysis softwaredeveloped at Rules-Based Medicine. For each multiplex, calibrators<strong>and</strong> quality controls were included on each microtiter plate (R&D Systems1238 pi as antigen for <strong>the</strong> cystatin C assay). Eight-point calibrators wereassayed in <strong>the</strong> first <strong>and</strong> last column of each plate <strong>and</strong> three-level quality controlswere included in duplicate. A value for each of <strong>the</strong> analytes localizedin a specific multiplex was determined using a RBM custom written dataanalysis package available commercially from Qiagen.ROC. Measurements, which were ei<strong>the</strong>r below <strong>the</strong> LLOQ or above <strong>the</strong> ULOQ,were imputed to <strong>the</strong>ir respective limits. Variables were normalized to <strong>the</strong> meanof controls measured in <strong>the</strong> same study <strong>and</strong> day for statistical analyses (expressionof changes as fold-changes). Analyses were limited to animals, for whichS-cystatin C, BUN, SCr <strong>and</strong> histopathology were all available. Curves werecreated using ROC methods <strong>and</strong> computed <strong>and</strong> summarized using reportedmethods 25 . St<strong>and</strong>ard errors of <strong>the</strong> AUC for <strong>the</strong> ROC curve, differences of <strong>the</strong>AUC, st<strong>and</strong>ard errors of <strong>the</strong> differences of AUC, <strong>and</strong> P-values for <strong>the</strong> significanceof <strong>the</strong> differences of AUC were calculated using reported formula 26,27 .For <strong>the</strong> purpose of ROC curve analysis, a no observable histopathologyscore of ‘0’ was considered ‘negative’ <strong>and</strong> a histopathology score >0 was considered‘positive’ for all samples. Thus, all positive grades of histopathology(grades 1–4) were treated with equal weight for <strong>the</strong> initial ROC analysis. TheROC methods described were also applied to specific subsets of samples basedon <strong>the</strong> severity grade of <strong>the</strong> histopathologic alteration score. The subsets usedin <strong>the</strong> analyses were <strong>the</strong> following:1. all samples2. only samples with maximum composite histopathology scores of 0, 1, 2, or 33. only samples with maximum composite histopathology scores of 0, 1, or 24. only samples with maximum composite histopathology scores of 0 or 1Two types of ROC analyses were done in this work: <strong>the</strong> inclusion <strong>and</strong> exclusionanalyses.In <strong>the</strong> inclusion analysis, data from all animals were used. Animals with ahistopathology score of 0 were treated as negative cases <strong>and</strong> animals with ahistopathology score >0 were considered as positive cases.In <strong>the</strong> exclusion analysis, animals treated with vehicles or non-kidney toxicants(ANIT <strong>and</strong> methapyrilene) having a kidney histopathology grade = 0reported were considered as negative cases <strong>and</strong> animals treated with kidneytoxicants having a histopathology grade > 0 were considered as positive cases.Samples from animals treated with a nephrotoxicant that did not have a positivecomposite kidney histopathology score were excluded in this model. Thereason for <strong>the</strong> exclusion is to prevent <strong>the</strong> ambiguity of decision if animalbiomarker changes are prodromic (markers changes that might show earlierthan histopathologic change), if histopathology is false negative or if <strong>the</strong> markersare false positive in possible cases of discrepancies between markers <strong>and</strong>histopathology for those animals.The AUC from each ROC curve, <strong>the</strong> sensitivity at a predefined specificity,<strong>and</strong> <strong>the</strong> specificity at a predefined sensitivity, as well as <strong>the</strong> comparisons to BUN<strong>and</strong> SCr <strong>and</strong> <strong>the</strong> results of significance tests for <strong>the</strong>se comparisons to support aclaim that <strong>the</strong> <strong>new</strong> biomarkers outperform BUN/SCr were calculated <strong>and</strong> statedfor subset 1. In addition <strong>the</strong> AUC, sensitivity <strong>and</strong> specificity for <strong>the</strong> o<strong>the</strong>r subsetsrestricted to lower histopathology grades were determined <strong>and</strong> plotted.nature biotechnologydoi:10.1038/nbt.1627


A n a ly s i sGREAT improves functional interpretation ofcis-regulatory regionsCory Y McLean 1 , Dave Bristor 1,2 , Michael Hiller 2 , Shoa L Clarke 3 , Bruce T Schaar 2 , Craig B Lowe 4 ,Aaron M Wenger 1 & Gill Bejerano 1,2© 2010 Nature America, Inc. All rights reserved.We developed <strong>the</strong> Genomic Regions Enrichment of AnnotationsTool (GREAT) to analyze <strong>the</strong> functional significance of cisregulatoryregions identified by localized measurements of DNAbinding events across an entire genome. Whereas previousmethods took into account only binding proximal to genes,GREAT is able to properly incorporate distal binding sites<strong>and</strong> control for false positives using a binomial test over <strong>the</strong>input genomic regions. GREAT incorporates annotations from20 ontologies <strong>and</strong> is available as a web application. ApplyingGREAT to data sets from chromatin immunoprecipitationcoupled with massively parallel sequencing (ChIP-seq) ofmultiple transcription-associated factors, including SRF,NRSF, GABP, Stat3 <strong>and</strong> p300 in different developmentalcontexts, we recover many functions of <strong>the</strong>se factors that aremissed by existing gene-based tools, <strong>and</strong> we generate testablehypo<strong>the</strong>ses. The utility of GREAT is not limited to ChIP-seq,as it could also be applied to open chromatin, localizedepigenomic markers <strong>and</strong> similar functional data sets, as wellas comparative genomics sets.The coupling of chromatin immunoprecipitation with massively parallelsequencing, ChIP-seq, is ushering in a <strong>new</strong> era of genome-widefunctional analysis 1–3 . Thus far, computational efforts have focusedon pinpointing <strong>the</strong> genomic locations of binding events from <strong>the</strong>deluge of reads produced by deep sequencing 4–8 . Functional interpretationis <strong>the</strong>n performed using gene-based tools developed in <strong>the</strong>wake of <strong>the</strong> preceding microarray revolution 9–11 . In a typical analysis,one compares <strong>the</strong> total fraction of genes annotated for a given ontologyterm with <strong>the</strong> fraction of annotated genes picked by proximalbinding events to obtain a gene-based P value for enrichment (Fig. 1<strong>and</strong> Online Methods).This procedure has a fundamental drawback: associating only proximalbinding events (for example, under 2–5 kb from <strong>the</strong> transcriptionstart site) typically discards over half of <strong>the</strong> observed bindingevents (Fig. 2a). However, <strong>the</strong> st<strong>and</strong>ard approach to capturing distalevents—associating each binding site with <strong>the</strong> one or two nearest1 Department of Computer Science, 2 Department of Developmental Biology <strong>and</strong>3 Department of Genetics, Stanford University, Stanford, California, USA.4 Center for Biomolecular Science <strong>and</strong> Engineering, University of CaliforniaSanta Cruz, Santa Cruz, California, USA. Correspondence should be addressed toG.B. (bejerano@stanford.edu).Published online 2 May 2010; doi:10.1038/nbt.1630genes—introduces a strong bias toward genes that are flanked by largeintergenic regions 12,13 . For example, though <strong>the</strong> Gene Ontology 14(GO) term ‘multicellular organismal development’ is associated with14% of human genes, <strong>the</strong> ‘nearest genes’ approach associates over33% of <strong>the</strong> genome with <strong>the</strong>se genes. This biological bias results innumerous false positive enrichments, particularly for <strong>the</strong> input setsizes typical of a ChIP-seq experiment (Fig. 2b <strong>and</strong> SupplementaryFig. 1). Building on our experience in addressing <strong>the</strong>se pitfalls 12,15,16 ,we have developed a tool that robustly integrates distal binding eventswhile eliminating <strong>the</strong> bias that leads to false positive enrichments.RESULTSHere we describe GREAT, which analyzes <strong>the</strong> functional significance ofsets of cis-regulatory regions by explicitly modeling <strong>the</strong> vertebrate genomeregulatory l<strong>and</strong>scape <strong>and</strong> using many rich information sources.A binomial test for long-range gene regulatory domainsGREAT associates genomic regions with genes by defining a ‘regulatorydomain’ for each gene in <strong>the</strong> genome. Each genomic regionis associated with all genes in whose regulatory domains it lies(Fig. 1b). High-throughput chromosomal conformation capture(3C) approaches such as 5C (ref. 17), Hi-C (ref. 18) or enhancedChIP-4C (ref. 19) are providing first glimpses of actual gene regulatorydomains. Because we still lack precise empirical maps, however,GREAT assigns each gene a regulatory domain consisting of a basaldomain that extends 5 kb upstream <strong>and</strong> 1 kb downstream from itstranscription start site (denoted below as 5+1 kb), <strong>and</strong> an extensionup to <strong>the</strong> basal regulatory domain of <strong>the</strong> nearest upstream <strong>and</strong> downstreamgenes within 1 Mb (GREAT allows <strong>the</strong> user to modify <strong>the</strong> rule<strong>and</strong> distances). GREAT fur<strong>the</strong>r refines <strong>the</strong> regulatory domains of ah<strong>and</strong>ful of genes, including several global control regions 20 , by using<strong>the</strong>ir experimentally determined regulatory domains. Our tool canalso incorporate additional locus-based <strong>and</strong> genome-wide data as <strong>the</strong>ybecome available (Supplementary Fig. 2 <strong>and</strong> Online Methods).Given a set of input genomic regions <strong>and</strong> an ontology of geneannotations, GREAT computes ontology term enrichments using abinomial test that explicitly accounts for variability in gene regulatorydomain size by measuring <strong>the</strong> total fraction of <strong>the</strong> genome annotatedfor any given ontology term <strong>and</strong> counting how many input genomicregions fall into those areas (Fig. 1b <strong>and</strong> Online Methods). In <strong>the</strong>example above, GREAT expects 33% of all input elements to be associatedwith ‘multicellular organismal development’ by chance, ra<strong>the</strong>rthan <strong>the</strong> 14% of input elements that a gene-based test assumes. Thenature biotechnology VOLUME 28 NUMBER 5 MAY 2010 495


A n a ly s i s© 2010 Nature America, Inc. All rights reserved.binomial test integrates distal binding eventsin a way that remains robust regardless oferroneous assignments of genomic regionsto genes. Namely, <strong>the</strong> longer <strong>the</strong> regulatorydomain of any gene—<strong>and</strong>, by extension, ofany ontology term—<strong>the</strong> greater <strong>the</strong> expectednumber of regions associated with this termby chance. Indeed, <strong>the</strong> binomial statisticmarkedly reduces <strong>the</strong> number of false positiveenriched terms even when very largeregulatory domains are used (Fig. 2b <strong>and</strong>Supplementary Fig. 1). The binomial testtreats each input genomic region as a pointbindingevent, making it most suitable for testingtargets with localized binding peaks. Thebinomial test also highlights cases in whicha single gene attracts an unlikely number ofinput genomic regions. To separate <strong>the</strong>se biologicallyinteresting gene-specific events fromterm-derived enrichments that are distributedacross multiple genes, we perform both <strong>the</strong>binomial test <strong>and</strong> <strong>the</strong> traditional hypergeometricgene-based test. In doing so, we highlightontology terms enriched by both tests(term-derived enrichment) separately fromthose enriched by only <strong>the</strong> binomial test(gene-specific enrichment) or <strong>the</strong> hypergeometrictest (regulatory domain bias) (Fig. 2c<strong>and</strong> Supplementary Fig. 3).GREAT supports direct enrichment analysisof both <strong>the</strong> human <strong>and</strong> mouse genomes. Itintegrates 20 separate ontologies containingbiological knowledge about gene functions,phenotype <strong>and</strong> disease associations, regulatory<strong>and</strong> metabolic pathways, gene expression data,presence of regulatory motifs to capture cofactor dependencies, <strong>and</strong>gene families (Supplementary Tables 1–3 <strong>and</strong> Online Methods). Corecomputations are performed by <strong>the</strong> GREAT server while subsequentbrowsing is executed on <strong>the</strong> user’s machine. An overview of <strong>the</strong> tool’sfunctionality <strong>and</strong> options when analyzing data is given in Table 1, <strong>and</strong>its current web interface is shown in Supplementary Figure 4.Comparison of enrichment tests <strong>and</strong> regulatory domain rangesTo demonstrate <strong>the</strong> utility of our approach, we compared GREATresults to previously published gene-based analyses as well as toenrichments from <strong>the</strong> Database for Annotation, Visualization, <strong>and</strong>Integrated Discovery (DAVID) 21 . Most gene-based tools assess enrichmentsin a very similar manner; we chose DAVID as a representativegene-based tool owing to its popularity <strong>and</strong> its ability to test a breadthof data sources similar to that of GREAT (Supplementary Table 4).We analyzed eight ChIP-seq data sets from a range of human <strong>and</strong>mouse cells <strong>and</strong> tissues (Supplementary Table 5), each with a differentdistribution of proximal <strong>and</strong> distal binding events (Fig. 2a). We testedeach data set in six different ways: (i) by reproducing <strong>the</strong> original study’slist of enrichments, or if <strong>the</strong> original study did not report enrichments,by using DAVID on <strong>the</strong> set of genes with binding events within 2 kb of<strong>the</strong> transcription start site; (ii) by using GREAT with <strong>the</strong> default regulatorydomain definition (basal promoter 5+1 kb <strong>and</strong> extension up to1 Mb); (iii) by using GREAT’s hypergeometric test on <strong>the</strong> set of geneswith binding events within 2 kb of <strong>the</strong> transcription start site, to controlfor <strong>the</strong> different gene mappings <strong>and</strong> ontologies in DAVID <strong>and</strong> GREAT;aHypergeometric test over genesStep 1: Infer proximal gene regulatory domainsStep 2: Associate genomic regions withStep 2:genes via regulatory domainsStep 3:Step 4: Perform hypergeometric test over genesbStep 1:Step 3:Binomial test over genomic regionsInfer distal gene regulatory domainsGene transcription start siteGene transcription start site Ontology annotation(e.g., “actin cytoskeleton”)Proximal regulatory domainof gene with/without Ontology annotation(e.g., “actin cytoskeleton”)Distal regulatory domainof gene with/without π π π π π πGenomic region associatedwith nearby geneIgnored distal genomic regionπ π πCount genes selected byproximal genomic regions2 genes selected by proximal genomic regions1 gene selected carries annotation N = 8 genes in genomeK = 3 genes in genome carry annotation n = 2 genes selected by proximal genomic regionsk = 1 gene selected carries annotation Calculate annotated fraction of genome0.6 of genome is annotated with Count genomic regionsassociated with <strong>the</strong> annotationGenomic region5 genomic regions hit annotation Step 4: Perform binomial test over genomic regionsn = 6 total genomic regionsp = 0.6 fraction of genome annotated with k = 5 genomic regions hit annotation P = Pr hyper (k ≥ 1 | N = 8, K = 3, n = 2) P = Pr binom (k ≥ 5 | n = 6, p = 0.6)Figure 1 Enrichment analysis of a set of cis-regulatory regions. (a) The current prevailingmethodology associates only proximal binding events with genes <strong>and</strong> performs a gene-list test offunctional enrichments using tools originally designed for microarray analysis. (b) GREAT’s binomialapproach over genomic regions uses <strong>the</strong> total fraction of <strong>the</strong> genome associated with a given ontologyterm (green bar) as <strong>the</strong> expected fraction of input regions associated with <strong>the</strong> term by chance.(iv) by using GREAT with a 5+1 kb basal promoter <strong>and</strong> a more limited50 kb extension; <strong>and</strong> (v, vi) by using GREAT with ei<strong>the</strong>r one (v) or two(vi) nearest genes up to 1 Mb (Tables 2 <strong>and</strong> 3, <strong>and</strong> SupplementaryTables 6–44, indexed in Supplementary Table 45).GREAT invariably revealed strong enrichments for experimentallyvalidated functions of <strong>the</strong> specific factors, as well as for testable—<strong>and</strong>,to our knowledge, novel—functions. It also implicated subsets of regulatoryregions in driving <strong>the</strong> assayed developmental processes <strong>and</strong> inactivating key signaling pathways. In a majority of data sets, distalbinding events were essential to recover known functions, strongly suggestingthat many of <strong>the</strong> distal associations are biologically meaningful(see below). Fur<strong>the</strong>rmore, in most sets, restricting regulatory domainextension to 50 kb retains many enriched terms but omits roughly halfof both <strong>the</strong> binding events <strong>and</strong> <strong>the</strong> genes implicated using <strong>the</strong> full 1-Mbextension. Although including distal associations is crucial, <strong>the</strong> exactdistal association rule is not—<strong>the</strong> default rule, <strong>the</strong> nearest-gene rule,<strong>and</strong> <strong>the</strong> two-nearest-genes rule (tests ii, v <strong>and</strong> vi, respectively) behavedvery similarly. Additionally, inclusion of <strong>the</strong> small set of experimentallydetermined gene regulatory domains we curated from <strong>the</strong> literaturemade very little difference in <strong>the</strong> rankings of any of <strong>the</strong> sets (data notshown). We present <strong>the</strong> analysis of four ChIP-seq data sets below <strong>and</strong>discuss <strong>the</strong> remainder in <strong>the</strong> Supplementary Note.Serum response factor binding in human Jurkat cellsFirst, we analyzed a set of genomic regions bound by <strong>the</strong> serumresponse factor (SRF) in <strong>the</strong> human Jurkat cell line, identified via496 VOLUME 28 NUMBER 5 MAY 2010 nature biotechnology


a n a ly s i sa b cFraction of all elementsSRF (H: Jurkat) NRSF (H: Jurkat) GABP (H: Jurkat)0.60.50.40.30.20.10 00–2 2–5Distance to nearest transcription start site (kb)Stat3 (M: ESC)p300 (M: ESC) p300 (M: limb) p300 (M: forebrain) p300 (M: midbrain)0.75–50 50–500 > 500False positive enriched terms605040302010Binomial test overgenomic regions0.1 0.5 1 5Hypergeometrictest over genes10 50 100Input set size (thous<strong>and</strong>s)–log(hypergeometric P value)1086420H \ Bh5+h9 h8 h7++ +h10+b10:h3b7:h1b3:h2*b8:h4b9:h6B \ Hb1×b5× b2× ×× b6 b40 2 4 6 8 10* *B H*–log(binomial P value)© 2010 Nature America, Inc. All rights reserved.Figure 2 Binding profiles <strong>and</strong> <strong>the</strong>ir effects on statistical tests. (a) ChIP-seq data sets of several regulatory proteins show that <strong>the</strong> majority of bindingevents lie well outside <strong>the</strong> proximal promoter, both for sequence-specific transcription factors (SRF <strong>and</strong> NRSF, ref. 8; Stat3, ref. 43) <strong>and</strong> a generalenhancer-associated protein (p300, refs. 33,43). Cell type is given in paren<strong>the</strong>ses: H, human; M, mouse. (b) When not restricted to proximal promoters,<strong>the</strong> gene-based hypergeometric test (red) generates false positive enriched terms, especially at <strong>the</strong> size range of 1,000–50,000 input regions typicalof a ChIP-seq set. Negligible false positive enrichment was observed for <strong>the</strong> region-based binomial test (blue). For each set size, we generated 1,000r<strong>and</strong>om input sets in which each base pair in <strong>the</strong> human genome was equally likely to be included in each set, avoiding assembly gaps. We calculatedall GO term enrichments for both hypergeometric <strong>and</strong> binomial tests using GREAT’s 5+1 kb basal promoter <strong>and</strong> up to 1 Mb extension association rule(see Results). Plotted is <strong>the</strong> average number of terms artificially significant at a threshold of 0.05 after application of <strong>the</strong> conservative Bonferronicorrection. (c) GO enrichment P values using <strong>the</strong> genomic region-based binomial (x axis) <strong>and</strong> gene-based hypergeometric (y axis) tests on <strong>the</strong> SRF data 8with GREAT’s 5+1 kb basal promoter <strong>and</strong> up to 1 Mb extension association rule (see Results). b1 through b10 denote <strong>the</strong> top ten most enriched termswhen we used <strong>the</strong> binomial test. h1 through h10 denote <strong>the</strong> top ten most enriched terms when we used <strong>the</strong> hypergeometric test. Terms significant byboth tests (B ∩ H) provide specific <strong>and</strong> accurate annotations supported by multiple genes <strong>and</strong> binding events (Table 3). Terms significant by only <strong>the</strong>hypergeometric test (H\B) are general <strong>and</strong> often associated with genes of large regulatory domains, whereas terms significant by only <strong>the</strong> binomial test(B\H) cluster four to six genomic regions near only one or two genes annotated with <strong>the</strong> term (Supplementary Table 46).ChIP-seq <strong>and</strong> mapped to <strong>the</strong> genome using <strong>the</strong> quantitative enrichmentof sequence tags (QuEST) ChIP-seq peak-calling tool 8 . This dataset’s authors applied existing gene-based enrichment tools, which didnot discern specific functions of SRF from <strong>the</strong> set of regions it binds 8 ,<strong>and</strong> concluded that SRF is a regulator of basic cellular processeswith no specific physiological roles (results reproduced in Table 2).Although SRF is indeed a regulator of basic cellular functions, numerousstudies have implicated SRF in more specific biological contexts. SRFis a key regulator of <strong>the</strong> Fos oncogene 22 <strong>and</strong> has also been describedas a “master regulator of actin cytoskeleton” 23 . Nei<strong>the</strong>r FOS nor actinappeared in <strong>the</strong> top ten hypo<strong>the</strong>ses generated by <strong>the</strong> previous study(Table 2). The same was true when we used GREAT with only proximal(2 kb) associations (Supplementary Table 6).However, GREAT analysis of <strong>the</strong> most significant SRF ChIP-seqpeaks 8 (QuEST score > 1; n = 556) using <strong>the</strong> default settings (5+1 kbbasal, up to 1 Mb extension) prominently highlights <strong>the</strong> key observationthat gene-based analyses were unable to reveal: SRF regulatesgenes associated with <strong>the</strong> actin cytoskeleton 23 (Table 3). As postulatedabove, using both binomial <strong>and</strong> hypergeometric enrichment tests doeshighlight informative GO terms more effectively than using ei<strong>the</strong>rtest alone (Fig. 2c <strong>and</strong> Supplementary Table 46). Moreover, whenextension of regulatory domains is limited to 50 kb, one-third of <strong>the</strong>supporting regions <strong>and</strong> associated genes are lost, <strong>and</strong> actin-relatedterms drop in rank (Supplementary Table 7).Coupling distal (up to 1 Mb) associations with <strong>the</strong> many additionalontologies available within GREAT provides a wealth of enrichmentsfor specific known functions of SRF. An enrichment analysis of TreeFamgene families 24 shows that SRF binds in proximity to five of six membersof <strong>the</strong> FOS family. Two genes within <strong>the</strong> Fos family, Fos <strong>and</strong> Fosb,are previously known targets of SRF (ref. 22). The Transcription FactorTargets ontology 25 has compiled data from ChIP experiments that linktranscription factor regulators to downstream target genes. GREATshows that many genes proximal to SRF binding events (in Jurkat cells)are also proximal to YY1 binding events (in HeLa cells), consistentwith experiments showing that SRF acts in conjunction with YY1 toregulate Fos (ref. 26). The top six hits in <strong>the</strong> Predicted Promoter Motifsontology 27 are all variants of <strong>the</strong> SRF motif generated from differentexperiments <strong>and</strong> thus serve as strong positive controls of our method.Using <strong>the</strong> Pathway Commons ontology 28 , GREAT predicts that SRFregulates components of <strong>the</strong> TRAIL signaling pathway <strong>and</strong> <strong>the</strong> class IPI3K signaling pathway. Previous experimental work has demonstratedthat <strong>the</strong>re is an association between SRF <strong>and</strong> TRAIL signaling 29 <strong>and</strong>that SRF is needed for PI3K-dependent cell proliferation 30 .In addition to rediscovering <strong>and</strong> exp<strong>and</strong>ing specific known functionsof SRF, GREAT produces testable hypo<strong>the</strong>ses even for this wellstudiedtranscription factor. The Transcription Factor Targets ontologyindicates that SRF binds near genes regulated by E2F4 (in T98G, U2OS<strong>and</strong> WI-38 cells; Table 3). SRF <strong>and</strong> E2F4 have not been shown to coregulatetarget genes; however, both SRF <strong>and</strong> E2F4 are known to interactwith Smad3 (refs. 31,32), <strong>and</strong> <strong>the</strong>y may thus be co-regulators of acommon set of genes. The Predicted Promoter Motifs ontology revealsadditional potential cofactors <strong>and</strong> co-regulators. It is particularlyuseful given that many more genes have characterized binding motifsthan have genome-wide ChIP data available. In this case, it shows enrichmentfor SRF binding near genes containing GABP motifs in <strong>the</strong>ir promoters.Notably, an independent experiment measuring GABP-boundregions of <strong>the</strong> genome in Jurkat cells has found that 29% of SRF peaksoccur within 100 bp of a GABP peak, suggesting that SRF <strong>and</strong> GABPmay indeed work toge<strong>the</strong>r 8 . We were able to generate this same hypo<strong>the</strong>sisusing GREAT, without observing <strong>the</strong> GABP ChIP-seq data.P300 binding in <strong>the</strong> developing mouse limbsSecond, we analyzed a recent ChIP-seq data set comprising 2,105regions of <strong>the</strong> mouse genome bound by <strong>the</strong> general enhancer-associatednature biotechnology VOLUME 28 NUMBER 5 MAY 2010 497


A n a ly s i sTable 1 GREAT parameters, filters <strong>and</strong> options, <strong>and</strong> <strong>the</strong>ir effectsParameterEffectRegion-gene association ruleDetermines how gene regulatory domains are calculated. When we allowed for distal associations, <strong>the</strong> sets we examinedremained robust regardless of <strong>the</strong> exact choice of association rule. Our default rule (basal <strong>and</strong> extension; see Results)models a current hypo<strong>the</strong>sis of gene regulatory domains.Region-gene association rule parameters Determine <strong>the</strong> length of each inferred gene regulatory domain. As we show, when <strong>the</strong> right statistical model is used,including distal associations of up to 1 Mb can strongly increase biological signals.Statistical significance visual filter Highlights statistically significant results in bold font. Multiple test correction options <strong>and</strong> thresholds for significancecan be modified.Binomial fold enrichment filterComplements P value by requiring that statistically significant terms have strong biological effects. Often filters generalontology terms that apply to thous<strong>and</strong>s of genes.Observed gene hits filterShows only enriched terms for which input regions select at least this many genes. Helps avoid enrichments owing tonumerous regions selecting a small number of genes.Minimum annotation count threshold Increases statistical power by reducing <strong>the</strong> number of tests performed, by testing only ontology terms associated a prioriwith at least this many genes.Display typeSummary display shows only terms statistically significant by both binomial <strong>and</strong> hypergeometric tests. Full display ignores<strong>the</strong> statistical significance filter <strong>and</strong> shows terms that meet all o<strong>the</strong>r criteria.ExportExport tables individually or in batches into a file of tab-separated values or publication-ready HTML.UCSC custom tracks Clicking a specific region from within a term details page opens <strong>the</strong> University of California Santa Cruz Genome Browser 44focused on that region, with two custom tracks automatically loaded—one for <strong>the</strong> total set of input regions <strong>and</strong> ano<strong>the</strong>rfor <strong>the</strong> subset of regions associated with <strong>the</strong> chosen term.© 2010 Nature America, Inc. All rights reserved.protein p300 in embryonic limb tissue 33 . Of 25 such regions tested intransgenic mouse assays, 20 showed reproducible enhancer activityin <strong>the</strong> developing limbs 33 . Our analysis shows that GREAT identifiesfunctions of enhancers active during embryonic development thatgene-based tools do not detect. DAVID analysis of <strong>the</strong> genes with proximalp300 limb binding events produces only enrichments associatedwith transcription <strong>and</strong> involvement in organ morphogenesis, with<strong>the</strong> closest enrichments being <strong>the</strong> much broader terms ‘organ development’<strong>and</strong> ‘anatomical structure morphogenesis’ (SupplementaryTable 10a). In contrast, GREAT analysis of <strong>the</strong> 2,105 p300 limb peaksusing <strong>the</strong> default settings (5+1 kb basal, up to 1 Mb extension) producesoverwhelming support for <strong>the</strong>ir putative functional role in limbdevelopment (Supplementary Table 10b).GO enrichments highlight <strong>the</strong> regulation of transcription factorsinvolved specifically in embryonic limb morphogenesis. The MousePhenotype ontology 34 points to <strong>the</strong> developing limbs <strong>and</strong> skull, hintingat <strong>the</strong> remarkable overlap of signaling processes involved in head<strong>and</strong> limb development 35 . The p300 limb peaks are enriched near genesin <strong>the</strong> TGF-β signaling pathway, which is known to be involved in limbdevelopment 36 , <strong>and</strong> <strong>the</strong> InterPro ontology highlights genes in <strong>the</strong>Smad family containing <strong>the</strong> Dwarfin-type MAD homology-1 proteindomain (Supplementary Table 10b), which is known to <strong>media</strong>te <strong>and</strong>regulate TGF-β signaling 37 .Table 2 Gene-based ontology enrichments regions bound by SRFin human Jurkat cellsTermP valueNucleus5.18 × 10 −70Protein binding2.16 × 10 −50Cytoplasm6.67 × 10 −27Transcription4.13 × 10 −26Nucleotide binding1.04 × 10 −23Metal ion binding1.92 × 10 −22Zinc ion binding5.76 × 10 −20RNA binding3.38 × 10 −18Regulation of transcription, DNA-dependent 1.15 × 10 −15ATP binding4.84 × 10 −15Listed are <strong>the</strong> top ten enriched GO terms found using a gene-based enrichmentanalysis of <strong>the</strong> 1,936 genes that possess an SRF binding peak within 2 kb (adaptedfrom ref. 8). Though <strong>the</strong> large number of selected genes produces strong P values, <strong>the</strong>most significant terms are general <strong>and</strong> yield only a very broad view of SRF functions.The first actin-related term, ‘actin binding’, is ranked 28th (data not shown).Perhaps <strong>the</strong> strongest validation for <strong>the</strong> GREAT methodologycomes from <strong>the</strong> MGI Expression: Detected ontology 38 . Notably, <strong>the</strong>enrichments highlighted most prominently by GREAT pinpoint <strong>the</strong>exact tissue <strong>and</strong> time point at which <strong>the</strong> experiment in ref. 33 wasperformed, providing unique large-scale evidence for <strong>the</strong> relevanceof p300-bound regions to limb gene regulation. The top two ontologyterms suggest limb-specific expression during Theiler stage 19 (TS19),which corresponds precisely with embryonic day 11.5, <strong>the</strong> time pointat which <strong>the</strong> p300 limb peaks were assayed in ref. 33 (SupplementaryTable 10b). In contrast, GREAT run with proximal (2 kb) associationsretrieves only weak enrichments for limb-associated genes <strong>and</strong>limb TS19, implicating 7-fold fewer genes <strong>and</strong> 16-fold fewer p300limb peaks as being involved in TS19 limb expression than GREATrun with <strong>the</strong> default association rule (Supplementary Table 11).Moreover, GREAT run with proximal associations completely missesgenes with crucial roles in limb development such as Gli3, Grem1 <strong>and</strong>Wnt7a (ref. 39).When GREAT’s regulatory domains are extended up to 50 kb, itcorrectly recovers limb terms, but still implicates only half <strong>the</strong> genesfound with <strong>the</strong> default association rule <strong>and</strong> yields P values manyorders of magnitude weaker (Fig. 3 <strong>and</strong> Supplementary Table 12). Byextending regulatory domains, we increase both <strong>the</strong> number of limbrelatedgenes containing one or more p300 limb peaks within <strong>the</strong>irregulatory domains <strong>and</strong> <strong>the</strong> number of p300 limb peaks associatedwith limb-related genes (Fig. 3). When regulatory domains are fur<strong>the</strong>rextended from 50 kb to 1 Mb, <strong>the</strong>y include even more p300 limb peaksthan expected by chance (Fig. 3c), providing strong evidence thatmany of <strong>the</strong>se distal associations are biologically meaningful.P300 binding in <strong>the</strong> developing mouse forebrain <strong>and</strong> midbrainFinally, we analyzed two ChIP-seq data sets comprising regions boundby p300 in mouse embryonic forebrain <strong>and</strong> midbrain tissue 33 . Using<strong>the</strong> 2,453 forebrain peaks, DAVID correctly highlights forebrain <strong>and</strong>general brain development (0.004 < P < 0.05), but with terms implicatingfewer than ten genes (Supplementary Table 15a). GREAT run withproximal regulatory regions (2 kb) ranks forebrain development higher<strong>and</strong> is able to implicate additional genes <strong>and</strong> regions using its uniquephenotype <strong>and</strong> expression ontologies (Supplementary Table 16).Using up to 50 kb extension adds additional related terms <strong>and</strong> raises<strong>the</strong> number of genes associated with each term (SupplementaryTable 17). This trend continues when <strong>the</strong> extension is increased to up498 VOLUME 28 NUMBER 5 MAY 2010 nature biotechnology


a n a ly s i s© 2010 Nature America, Inc. All rights reserved.Table 3 GREAT ontology enrichments for regions bound by SRF in human Jurkat cellsOntology Term Binomial P valueto 1 Mb, <strong>and</strong> only this inclusion of distal binding allows detection ofsignificant associations (P = 0.001) with Wnt signaling genes that haveknown roles in forebrain development 40 (Supplementary Table 15b).When run on <strong>the</strong> 561 midbrain p300 peaks, DAVID does not yield significantresults (P > 0.05; Supplementary Table 20a) <strong>and</strong> proximal (2 kb)Binomial foldenrichment Hypergeometric P value Distal binding a Experimental supportGO: cellular component Actin cytoskeleton 6.91 × 10 −9 3.05 2.22 × 10 −7 38.9% Ref. 23Cortical cytoskeleton 4.03 × 10 −6 5.90 5.41 × 10 −4 54.5% Ref. 23GO: molecular function Actin binding 5.21 × 10 −5 2.03 2.74 × 10 −5 51.4% Ref. 23Transcription factor targets SRF targets (Jurkat,4.97 × 10 −76 13.22 9.79 × 10 −68 14.3% Positive controlT/G HA-VSMC, Be(2)-C)YY1 targets (HeLa) 1.45 × 10 −6 2.09 0.0084 20.4% Ref. 26 bE2F4 <strong>and</strong> p1300.0047 2.01 0.0027 44.4% Novel c(T98G, U2OS)E2F4 (WI-38) 0.0194 2.08 0.0031 36.4% Novel cPredicted promoter motifs SRF variants 4.54 × 10 −28 to 3.69 to 15.46 1.71 × 10 −25 to 17.4% to Positive controls4.19 × 10 −12 2.04 × 10 −9 28.6%GABPA or GABPB 4.20 × 10 −9 3.67 6.68 × 10 −6 27.6% Novel cMotif NGGGACTTTCCA 1.02 × 10 −4 2.12 8.30 × 10 −5 20.0% Novel cEGR1 1.71 × 10 −4 2.03 0.0013 46.9% Novel cPathway commons TRAIL signaling pathway 2.37 × 10 −7 2.45 1.71 × 10 −5 46.3% Ref. 29Class I PI3K signaling events 9.92 × 10 −7 2.56 4.45 × 10 −5 44.1% Ref. 30TreeFam FOS family 9.66 × 10 −9 27.89 1.21 × 10 −6 28.6% Ref. 22 dEnriched terms for a variety of ontologies obtained using GREAT analysis (5+1 kb basal, up to 1 Mb extension) of proximal <strong>and</strong> distal binding events. The enriched terms highlightexperimentally validated functions <strong>and</strong> cofactors of SRF that lend im<strong>media</strong>te insight into its biological roles as well as propose testable hypo<strong>the</strong>ses of SRF functions that are, toour knowledge, novel (see Results). Shown are all binomial enriched terms at a false discovery rate of 0.05 with a fold enrichment of at least two that are also significant at a falsediscovery rate of 0.05 by <strong>the</strong> hypergeometric test, using <strong>the</strong> highest-scoring SRF peaks anywhere in <strong>the</strong> genome (QuEST score > 1; n = 556).a The fraction of binding peaks contributing to <strong>the</strong> enrichment located >10 kb from <strong>the</strong> transcription start site of <strong>the</strong> nearest gene. b Known interactions often also give rise to novel hypo<strong>the</strong>ses; forexample, SRF is known to co-regulate some genes with YY1, <strong>and</strong> GREAT identifies many additional genes potentially bound by both SRF <strong>and</strong> YY1. c Hypo<strong>the</strong>sis: SRF acts with E2F4, GABP, EGR1<strong>and</strong> a previously uncharacterized binding motif to co-regulate target genes (see Results for supporting evidence). d SRF is known to regulate Fos <strong>and</strong> Fosb (ref. 22); GREAT highlights three o<strong>the</strong>rmembers of <strong>the</strong> FOS family that may also be regulated by SRF.Figure 3 Distal binding events contributesubstantially to accurate functional enrichmentsof p300 limb peaks. We examined properties of<strong>the</strong> 2,105 p300 mouse embryonic limb peaks 33in <strong>the</strong> context of three known limb-relatedterms <strong>and</strong> a negative control term (GO corticalcytoskeleton). Three different association ruleswere used (see Results): a gene-based GREATanalysis using only peaks within 2 kb of <strong>the</strong>nearest transcription start site (labeled 2 kb),an analysis with 5+1 kb basal <strong>and</strong> up to 50 kbextension (50 kb), <strong>and</strong> an analysis with 5+1 kbbasal <strong>and</strong> up to 1 Mb extension (1 Mb). For eachterm, we examined <strong>the</strong> relevance of distal bindingpeaks by comparing <strong>the</strong> experimental results(black bars) to <strong>the</strong> average values of 1,000simulated data sets (gray bars) in which <strong>the</strong>192 proximal ChIP-seq peaks within 2 kb of <strong>the</strong>nearest transcription start site were fixed <strong>and</strong> <strong>the</strong>1,913 distal peaks were shuffled uniformly within<strong>the</strong> mouse genome, avoiding assembly gaps <strong>and</strong>proximal promoters. By design, simulation resultsfor proximal, 2-kb GREAT are identical to <strong>the</strong>actual data <strong>and</strong> are thus omitted. (a) Leng<strong>the</strong>ninga 2-kb proximal promoter to a 50-kb extension,GREAT performs only slightly better, offering three relevant terms associatedwith very few genes from our unique ontologies (SupplementaryTable 21). In contrast, GREAT with up to 1 Mb extension highlightstwelve brain-specific enriched terms (Supplementary Table 20b).Many GREAT enriched terms are shared between <strong>the</strong> forebraina b c dOntology: term Genome fraction Genes Regions (obs – exp) Statistical significance*Gene Ontology:embryonic limbmorphogenesisPANTHER:TGF-β signalingpathwayMGI Expression:Theiler stage 19limb expressionRegulatory domain extentGene Ontology:corticalcytoskeleton(negative control)2 kb50 kb1 Mb2 kb50 kb1 Mb2 kb50 kb1 Mb2 kb50 kb1 Mb0 0.007 0.0140 0.005 0.010 0.025 0.0500.0015 0.003Fraction of <strong>the</strong> genomeoverlapped by <strong>the</strong>regulatory domain ofa gene annotatedwith <strong>the</strong> termp300 limb setSimulations0 15 30 45 600010 20 30 030 60 90 1200 2 4 6 8 –2 –1 0 1 2 3Number of genesannotated with <strong>the</strong>term containing agenomic region in<strong>the</strong>ir regulatorydomain0 25 50 75 100 0 7.5 15 22.5 3010 20 300 50 100150 200Number of genomicregions in <strong>the</strong> regulatorydomain of a geneannotated with <strong>the</strong> termin excess of <strong>the</strong> numberexpected by chance0 0.75 1.5 2.25 30NSNSNSNSNS15 30 45 600 0.25 0.5 0.75 1–log 10 (FDR q)expected to increase genome coverage per term (p π in Fig. 1b) by 25-fold, causes an actual increase of 19- to 24-fold; in contrast, leng<strong>the</strong>ning a 50-kbextension rule to a 1-Mb extension rule, expected to raise genome coverage 20-fold, leads to an actual increase of only 2.5- to 6-fold because regulatorydomains are not extended through neighboring genes. (b) As regulatory domains increase in length from only <strong>the</strong> proximal 2 kb up to 50 kb <strong>and</strong> 1 Mb, <strong>the</strong>number of relevant genes with a p300 limb peak in <strong>the</strong>ir regulatory domain increases. The added genes selected only by distal associations are typicallyenriched for limb functionality compared to simulated data. (c) As regulatory domains increase in length, <strong>the</strong> number of p300 limb peaks associated witha relevant gene in excess of <strong>the</strong> number expected by chance increases for all limb-related terms. (d) As in c, <strong>the</strong> inclusion of distal peaks markedly increases<strong>the</strong> statistical significance of <strong>the</strong> correct terms alone. *Statistical significance is measured using <strong>the</strong> hypergeometric test over genes for 2 kb to mimiccurrent gene-based approaches, <strong>and</strong> using <strong>the</strong> binomial test over genomic regions for 50 kb <strong>and</strong> 1 Mb. Error bars indicate s.d.; NS, not significant at athreshold of 0.05 after false discovery rate multiple test correction; obs, observed; exp, expected. Note scale changes on x axes.nature biotechnology VOLUME 28 NUMBER 5 MAY 2010 499


A n a ly s i s© 2010 Nature America, Inc. All rights reserved.<strong>and</strong> midbrain peaks (as discussed in Supplementary Note), butGREAT correctly identifies midbrain-specific enrichments such as <strong>the</strong>GO term ‘compartment specification’. Compartment specification isof interest, as within this tissue at this developmental age, Fgf8 inducesWnt (also enriched within this set) to set up a gene network thatestablishes <strong>the</strong> boundary between <strong>the</strong> midbrain <strong>and</strong> hindbrain compartments41 . GREAT with up to 50 kb extension is able to highlightmany of <strong>the</strong> same terms, but loses roughly half <strong>the</strong> associated genes<strong>and</strong> regions <strong>and</strong> <strong>the</strong> Wnt enrichment (Supplementary Table 22).DISCUSSIONGREAT is a <strong>new</strong>-generation tool aimed at <strong>the</strong> interpretation ofgenome-wide cis-regulatory data sets. It explicitly models <strong>the</strong> vertebratecis-regulatory l<strong>and</strong>scape through <strong>the</strong> use of long-range regulatorydomains <strong>and</strong> a genomic region–based enrichment test, allowinganalyses that take into consideration <strong>the</strong> large number of bindingevents that occur far beyond proximal promoters. By accounting for<strong>the</strong> length of gene regulatory domains, GREAT is able to highlightbiologically meaningful terms <strong>and</strong> <strong>the</strong>ir associated cis-regulatoryregions <strong>and</strong> genes, in a manner that remains robust if <strong>the</strong>re are falseassociations between input regions <strong>and</strong> genes. Moreover, <strong>the</strong>se regulatory-domaindefinitions can naturally incorporate future resultsfrom three-dimensional conformation capture studies 17–19 , radiationhybrid maps 42 <strong>and</strong> o<strong>the</strong>r emerging approaches for measuring<strong>the</strong> regulatory genome in action. By coupling this methodology withmany ontologies that span a wealth of biological information types,GREAT produces specific, accurate enrichments that provide insightinto <strong>the</strong> biological roles of cis-regulatory data sets of interest.We comprehensively tested GREAT on multiple ChIP-seq datasets <strong>and</strong> found that it is able to reproduce many known biologicalfacts that existing methods do not detect, as well as suggest novelhypo<strong>the</strong>ses for fur<strong>the</strong>r experimental characterization. In particular,our analysis shows that ignoring distal binding events often leadsto missing target gene associations, to obtaining weaker P valuesor even to completely omitting relevant enrichment terms. BesidesChIP-seq data, GREAT can also be applied to <strong>the</strong> analysis of any dataset thought to be enriched for localized cis-regulatory regions. Thisincludes functional genomic data sets of open chromatin, localizedepigenomic markers, <strong>and</strong> comparative genomic sets. GREAT may thusprove invaluable in elucidating <strong>the</strong> cis-regulatory functions encodedin genomes.GREAT is available online (http://great.stanford.edu/); also providedis a means for direct submission from o<strong>the</strong>r applications suchas genome portals <strong>and</strong> peak calling tools.MethodsMethods <strong>and</strong> any associated references are available in <strong>the</strong> onlineversion of <strong>the</strong> paper at http://www.nature.com/naturebiotechnology/.Note: Supplementary information is available on <strong>the</strong> Nature Biotechnology website.AcknowledgmentsWe thank M. Sirota for an early survey of ontologies, F. Sathira for developingan inter<strong>media</strong>ry core calculation engine, T. Capellini for critical reading of<strong>the</strong> manuscript, M. Davis <strong>and</strong> S. Gutierrez for system administration <strong>and</strong> <strong>the</strong>communities of ontology developers <strong>and</strong> curators for providing invaluable datasources. C.Y.M. is supported by a Bio-X graduate fellowship. M.H. is supported bya German Research Foundation Fellowship (Hi 1423/2-1) <strong>and</strong> <strong>the</strong> Human FrontierScience Program (fellowship LT000896/2009-l). S.L.C. is a Howard Hughes MedicalInstitute Gilliam Fellow. A.M.W. is supported by a Stanford Graduate Fellowship.G.B. is a Packard Fellow, Searle Scholar, Microsoft Research Faculty Fellow <strong>and</strong> anAlfred P. Sloan Fellow. Research was also supported by an Edward Mallinckrodt, Jr.Foundation junior faculty grant <strong>and</strong> US National Institutes of Health grant1R01HD059862 to G.B.AUTHOR CONTRIBUTIONSC.Y.M. developed <strong>the</strong> core calculation engine, processed ontologies, analyzeddata sets <strong>and</strong> co-wrote <strong>the</strong> manuscript. D.B. designed <strong>and</strong> developed <strong>the</strong> webapplication. M.H. added key ontologies <strong>and</strong> calculated ontology statistics. S.L.C.performed <strong>and</strong> wrote <strong>the</strong> SRF analysis. B.T.S. contributed to data set analysis <strong>and</strong>manuscript writing. A.M.W. guided website design <strong>and</strong> wrote user documentation.G.B. <strong>and</strong> C.B.L. devised <strong>the</strong> different enrichment tests <strong>and</strong> developed early corecalculation engines. G.B. supervised <strong>the</strong> project <strong>and</strong> co-wrote <strong>the</strong> manuscript. Allauthors edited <strong>the</strong> manuscript.COMPETING FINANCIAL INTERESTSThe authors declare no competing financial interests.Published online at http://www.nature.com/naturebiotechnology/.Reprints <strong>and</strong> permissions information is available online at http://npg.nature.com/reprints<strong>and</strong>permissions/.1. Johnson, D.S., Mortazavi, A., Myers, R.M. & Wold, B. Genome-wide mapping of invivo protein-DNA interactions. Science 316, 1497–1502 (2007).2. Mardis, E.R. ChIP-seq: welcome to <strong>the</strong> <strong>new</strong> frontier. Nat. Methods 4, 613–614(2007).3. Park, P.J. ChIP-seq: advantages <strong>and</strong> challenges of a maturing technology. Nat. Rev.Genet. 10, 669–680 (2009).4. Ji, H. et al. An integrated software system for analyzing ChIP-chip <strong>and</strong> ChIP-seqdata. Nat. Biotechnol. 26, 1293–1300 (2008).5. Kharchenko, P.V., Tolstorukov, M.Y. & Park, P.J. Design <strong>and</strong> analysis of ChIP-seqexperiments for DNA-binding proteins. Nat. Biotechnol. 26, 1351–1359 (2008).6. Rozowsky, J. et al. PeakSeq enables systematic scoring of ChIP-seq experimentsrelative to controls. Nat. Biotechnol. 27, 66–75 (2009).7. Tuteja, G., White, P., Schug, J. & Kaestner, K.H. Extracting transcription factortargets from ChIP-Seq data. Nucleic Acids Res. 37, e113 (2009).8. Valouev, A. et al. Genome-wide analysis of transcription factor binding sites basedon ChIP-Seq data. Nat. Methods 5, 829–834 (2008).9. Khatri, P. & Draghici, S. Ontological analysis of gene expression data: current tools,limitations, <strong>and</strong> open problems. Bioinformatics 21, 3587–3595 (2005).10. Allison, D.B., Cui, X., Page, G.P. & Sabripour, M. Microarray data analysis: fromdisarray to consolidation <strong>and</strong> consensus. Nat. Rev. Genet. 7, 55–65 (2006).11. Dopazo, J. Functional interpretation of microarray experiments. OMICS 10,398–410 (2006).12. Lowe, C.B., Bejerano, G. & Haussler, D. Thous<strong>and</strong>s of human mobile elementfragments undergo strong purifying selection near developmental genes. Proc. Natl.Acad. Sci. USA 104, 8005–8010 (2007).13. Taher, L. & Ovcharenko, I. Variable locus length in <strong>the</strong> human genome leads toascertainment bias in functional inference for non-coding elements. Bioinformatics25, 578–584 (2009).14. Ashburner, M. et al. Gene Ontology: tool for <strong>the</strong> unification of biology. The GeneOntology Consortium. Nat. Genet. 25, 25–29 (2000).15. Bejerano, G. et al. Ultraconserved elements in <strong>the</strong> human genome. Science 304,1321–1325 (2004).16. Bejerano, G. et al. A distal enhancer <strong>and</strong> an ultraconserved exon are derived froma novel retroposon. Nature 441, 87–90 (2006).17. Dostie, J. et al. Chromosome Conformation Capture Carbon Copy (5C): a massivelyparallel solution for mapping interactions between genomic elements. Genome Res.16, 1299–1309 (2006).18. Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactionsreveals folding principles of <strong>the</strong> human genome. Science 326, 289–293 (2009).19. Schoenfelder, S. et al. Preferential associations between co-regulated genes reveala transcriptional interactome in erythroid cells. Nat. Genet. 42, 53–61 (2010).20. Spitz, F. & Duboule, D. Global control regions <strong>and</strong> regulatory l<strong>and</strong>scapes in vertebratedevelopment <strong>and</strong> evolution. Adv. Genet. 61, 175–205 (2008).21. Huang, da W. et al. DAVID Bioinformatics Resources: exp<strong>and</strong>ed annotation database<strong>and</strong> novel algorithms to better extract biology from large gene lists. Nucleic AcidsRes. 35, W169–W175 (2007).22. Chai, J. & Tarnawski, A.S. Serum response factor: discovery, biochemistry, biologicalroles <strong>and</strong> implications for tissue injury healing. J. Physiol. Pharmacol. 53, 147–157(2002).23. Miano, J.M., Long, X. & Fujiwara, K. Serum response factor: master regulator of<strong>the</strong> actin cytoskeleton <strong>and</strong> contractile apparatus. Am. J. Physiol. Cell Physiol. 292,70–81 (2007).24. Ruan, J. et al. TreeFam: 2008 update. Nucleic Acids Res. 36, D735–D740 (2008).25. Linhart, C., Halperin, Y. & Shamir, R. Transcription factor <strong>and</strong> microRNA motifdiscovery: <strong>the</strong> Amadeus platform <strong>and</strong> a compendium of metazoan target sets.Genome Res. 18, 1180–1189 (2008).26. Natesan, S. & Gilman, M. YY1 facilitates <strong>the</strong> association of serum response factorwith <strong>the</strong> c-fos serum response element. Mol. Cell. Biol. 15, 5975–5982 (1995).27. Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approachfor interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 102,15545–15550 (2005).28. Cerami, E.G., Bader, G.D., Gross, B.E. & S<strong>and</strong>er, C. cPath: open source softwarefor collecting, storing, <strong>and</strong> querying biological pathways. BMC Bioinformatics 7,497 (2006).500 VOLUME 28 NUMBER 5 MAY 2010 nature biotechnology


a n a ly s i s29. Bertolotto, C. et al. Cleavage of <strong>the</strong> serum response factor during death receptorinducedapoptosis results in an inhibition of <strong>the</strong> c-FOS promoter transcriptionalactivity. J. Biol. Chem. 275, 12941–12947 (2000).30. Poser, S., Impey, S., Trinh, K., Xia, Z. & Storm, D.R. SRF-dependent gene expressionis required for PI3-kinase-regulated cell proliferation. EMBO J. 19, 4955–4966(2000).31. Lee, H.J. et al. SRF is a nuclear repressor of Smad3-<strong>media</strong>ted TGF-beta signaling.Oncogene 26, 173–185 (2007).32. Chen, C.R., Kang, Y., Siegel, P.M. & Massagué, J. E2F4/5 <strong>and</strong> p107 as Smad cofactorslinking <strong>the</strong> TGFbeta receptor to c-myc repression. Cell 110, 19–32 (2002).33. Visel, A. et al. ChIP-seq accurately predicts tissue-specific activity of enhancers.Nature 457, 854–858 (2009).34. Blake, J.A. et al. The Mouse Genome Database genotypes:phenotypes. Nucleic AcidsRes. 37, D712–D719 (2009).35. Wilkie, A.O. & Morriss-Kay, G.M. Genetics of craniofacial development <strong>and</strong>malformation. Nat. Rev. Genet. 2, 458–468 (2001).36. Capdevila, J. & Izpisúa Belmonte, J.C. Patterning mechanisms controlling vertebratelimb development. Annu. Rev. Cell Dev. Biol. 17, 87–132 (2001).37. Kretzschmar, M. & Massagué, J. SMADs: <strong>media</strong>tors <strong>and</strong> regulators of TGF-betasignaling. Curr. Opin. Genet. Dev. 8, 103–111 (1998).38. Bult, C.J., Eppig, J.T., Kadin, J.A., Richardson, J.E. & Blake, J.A. The MouseGenome Database (MGD): mouse biology <strong>and</strong> model systems. Nucleic Acids Res.36, D724–D728 (2008).39. Nisw<strong>and</strong>er, L. Pattern formation: old models out on a limb. Nat. Rev. Genet. 4,133–143 (2003).40. Zhou, C.J., Borello, U., Rubenstein, J.L. & Pleasure, S.J. Neuronal production <strong>and</strong>precursor proliferation defects in <strong>the</strong> neocortex of mice with loss of function in <strong>the</strong>canonical Wnt signaling pathway. Neuroscience 142, 1119–1131 (2006).41. Wurst, W. & Bally-Cuif, L. Neural plate patterning: upstream <strong>and</strong> downstream of<strong>the</strong> isthmic organizer. Nat. Rev. Neurosci. 2, 99–108 (2001).42. Park, C.C. et al. Fine mapping of regulatory loci for mammalian gene expressionusing radiation hybrids. Nat. Genet. 40, 421–429 (2008).43. Chen, X. et al. Integration of external signaling pathways with <strong>the</strong> core transcriptionalnetwork in embryonic stem cells. Cell 133, 1106–1117 (2008).44. Kent, W.J. et al. The human genome browser at UCSC. Genome Res. 12, 996–1006(2002).© 2010 Nature America, Inc. All rights reserved.nature biotechnology VOLUME 28 NUMBER 5 MAY 2010 501


© 2010 Nature America, Inc. All rights reserved.ONLINE METHODSGene set definition. Statistical enrichment of ontology terms is dependentupon <strong>the</strong> genome-wide gene set used in <strong>the</strong> analysis. GREAT currently supportstesting of human (Homo sapiens NCBI Build 36.1, or UCSC hg18) <strong>and</strong>mouse (Mus musculus NCBI Build 37, or UCSC mm9). To limit <strong>the</strong> gene setsto only high-confidence genes <strong>and</strong> gene predictions, we use only <strong>the</strong> subsetof <strong>the</strong> UCSC Known Genes 45 that are protein coding, are on assembledchromosomes <strong>and</strong> possess at least one meaningful GO annotation 14 . GO isan ontological representation of information related to <strong>the</strong> biological processes,cellular components <strong>and</strong> molecular functions of genes. We rely on <strong>the</strong>idea that if a gene has been annotated for function it should be included in<strong>the</strong> gene set, <strong>and</strong> if no function has been ascribed to a gene its status may beunclear <strong>and</strong> thus it is best omitted from <strong>the</strong> gene set. In GREAT version 1.1.3,we use GO data downloaded on 5 March 2009 for human <strong>and</strong> 23 March 2009for mouse, leading to gene sets of 17,217 <strong>and</strong> 17,506 genes for human <strong>and</strong>mouse, respectively.A single gene may have multiple splice variants. As annotations are generallygiven at <strong>the</strong> gene level, GREAT uses a single transcription start site (TSS) tospecify <strong>the</strong> location of each gene. The TSS used is that of <strong>the</strong> ‘canonical isoform’of <strong>the</strong> gene as defined by <strong>the</strong> UCSC Known Genes track 45 .Association rules from genomic regions to genes. For each gene, we definea ‘regulatory domain’ such that all noncoding sequences that lie within <strong>the</strong>regulatory domain are assumed to regulate that gene. GREAT currently supportsthree different parametrized association rules to define gene regulatorydomains (Supplementary Fig. 2). The default ‘basal plus extension’ associationrule assigns a ‘basal regulatory region’ irrespective of <strong>the</strong> presence ofneighboring genes that extends (using default parameters) 5 kb upstream <strong>and</strong>1 kb downstream of <strong>the</strong> TSS (Supplementary Fig. 2a). Each gene’s regulatorydomain is <strong>the</strong>n extended up to <strong>the</strong> basal regulatory region of <strong>the</strong> nearestupstream <strong>and</strong> downstream genes, but no longer than 1 Mb in each direction.The choice of basal regulatory region size <strong>and</strong> placement was motivated by<strong>the</strong> location of histone modifications <strong>and</strong> measures of chromatin accessibilitynear <strong>the</strong> TSS of genes 46 , <strong>and</strong> <strong>the</strong> maximum extension distance is based uponwork showing that long-range distal enhancers can regulate expression oftarget genes up to 1 Mb away 47,48 . All three parameters (basal upstream, basaldownstream <strong>and</strong> maximum extension distance) can be set by <strong>the</strong> user.The ‘two nearest genes’ association rule extends each gene’s regulatorydomain from <strong>the</strong> TSS of <strong>the</strong> canonical isoform to <strong>the</strong> nearest upstream <strong>and</strong>downstream TSS (Supplementary Fig. 2b), up to 1 Mb in each direction. Thisassociation rule stipulates that each base pair cannot be assigned to morethan two genes.The ‘single nearest gene’ association rule extends each gene’s regulatorydomain from <strong>the</strong> TSS of <strong>the</strong> canonical isoform in each direction to <strong>the</strong> midpointbetween <strong>the</strong> TSS <strong>and</strong> <strong>the</strong> nearest adjacent TSS (Supplementary Fig. 2c),up to 1 Mb in each direction. This association rule stipulates that each basepair cannot be assigned to more than one gene.For well-studied genes with experimentally detected distal regulatory elements(reviewed in ref. 20), we manually override <strong>the</strong> computationally definedregulatory domains. GREAT version 1.1.3 uses experimentally validated regulatorydomains for SHH 47 , genes in <strong>the</strong> β-globin locus 49 , <strong>and</strong> KIAA1715, EVX2,HOXD10, HOXD11, HOXD12 <strong>and</strong> HOXD13 (ref. 50). Future releases of <strong>the</strong>tool will continue to refine regulatory domains as technological advances,including three-dimensional conformation capture studies 17–19 <strong>and</strong> radiationhybrid maps 42 , fur<strong>the</strong>r elucidate interactions between regulatory DNA<strong>and</strong> its target genes.Hypergeometric test over genes. The hypergeometric test over genes identifiesall genes whose regulatory domains possess one or more genomic regionsfrom <strong>the</strong> input set <strong>and</strong> calculates enrichments over <strong>the</strong> genes with respect to<strong>the</strong> defined gene set using a hypergeometric distribution. More formally, <strong>the</strong>hypergeometric test is executed separately for each ontology term π <strong>and</strong> isdefined by four parameters:1. N is <strong>the</strong> total number of genes in <strong>the</strong> genome.2. K π is <strong>the</strong> number of genes in <strong>the</strong> genome that possess ontology annotationπ.3. n is <strong>the</strong> number of genes selected because one or more input genomicregions resides in <strong>the</strong>ir regulatory domains.4. k π is <strong>the</strong> number of selected genes that possess ontology annotation π.The test calculates <strong>the</strong> P value of <strong>the</strong> observed enrichment for term π as <strong>the</strong>fraction of ways to choose n genes without replacement from <strong>the</strong> entire groupof N genes such that at least k π of <strong>the</strong> n possess ontology annotation π, using<strong>the</strong> formula below.⎛ Kp⎞ ⎛ N − Kp⎞min( n, Kp)⎝⎜ i ⎠⎟⎝⎜ n −i⎠⎟(1)∑⎛ N⎞i = kp⎝⎜ n ⎠⎟In particular, <strong>the</strong> hypergeometric test counts every gene only once even ifit was picked by multiple genomic regions. Terms enriched by <strong>the</strong> hypergeometrictest thus indicate a high ‘term coverage’, where a larger fraction ofall genes annotated with <strong>the</strong> term are selected by <strong>the</strong> input genomic regionsthan expected by chance.Binomial test over genomic regions. To account for <strong>the</strong> length variabilitywithin gene regulatory domains, we implemented a binomial test over genomicregions that uses <strong>the</strong> fraction of <strong>the</strong> genome associated with each ontologyterm as <strong>the</strong> probability of selecting <strong>the</strong> term. The binomial test is executedseparately for each ontology term π <strong>and</strong> is defined by three parameters:1. n is <strong>the</strong> total number of genomic regions in <strong>the</strong> input set.2. p π is <strong>the</strong> a priori probability of selecting a base pair annotated withπ when selecting a single base pair uniformly from all non–assemblygap base pairs in <strong>the</strong> genome.3. k π is <strong>the</strong> number of genomic regions in <strong>the</strong> input set that cause annotationπ to be selected.The test calculates <strong>the</strong> P value of <strong>the</strong> observed enrichment for term π as <strong>the</strong>probability of selecting annotation π at least kπ times in n attempts using <strong>the</strong>formula below.n ⎛ n⎞pi pn i⎝ ⎜ −∑i ⎠⎟ p ( 1−p )i = kp(2)The binomial test first maps each input genomic region to <strong>the</strong> left <strong>media</strong>nbase pair in its span, making it most appropriate for assessing enrichmentof factors with narrow, precise peaks. The value of p π is calculated for eachontology annotation π as <strong>the</strong> fraction of non–assembly gap base pairs in <strong>the</strong>genome associated with annotation π. Each input genomic region can <strong>the</strong>nbe thought of as a ‘dart’ thrown at <strong>the</strong> genome, counting as a hit if <strong>the</strong> left<strong>media</strong>n base pair is annotated with ontology term π. In this test, <strong>the</strong> lengthof each gene’s regulatory domain is explicitly accounted for in <strong>the</strong> calculationof p π . This explicit use of regulatory domain size in <strong>the</strong> significancecalculation provides a proper assessment of <strong>the</strong> enrichment for ontologyterms by noncoding sequences. Notably, as <strong>the</strong> binomial test incorporates <strong>the</strong>fraction of <strong>the</strong> genome assigned to each gene in <strong>the</strong> calculation of statisticalsignificance, it is robust regardless of variation in association rules <strong>and</strong>occasional incorrect assignments of genomic regions to distal target genes.Ontology terms assigned to genes that have large regulatory domains areinherently weighted such that each binding event associated with <strong>the</strong> termcontributes less to <strong>the</strong> resulting enrichment than binding events associatedwith terms assigned to genes with small regulatory domains. However,nature biotechnologydoi:10.1038/nbt.1630


© 2010 Nature America, Inc. All rights reserved.enrichments under <strong>the</strong> binomial test may arise from clusters of noncodingregions all near one or a few genes with a particular ontology annotation,as well as from noncoding regions associating with many genes that possessa particular ontology annotation. The hypergeometric test over genes(described above) provides a measure of ‘term coverage’ that can be usedto identify terms significant by <strong>the</strong> binomial test that have many annotatedgenes selected as well.Foreground/background hypergeometric test over genomic regions. Whena set of input genomic regions is selected from a superset of ‘backgroundgenomic regions’ (for example, <strong>the</strong> repetitive elements that have been exaptedinto functional roles selected from all repetitive elements in <strong>the</strong> genome 12 ),one should consider whe<strong>the</strong>r <strong>the</strong> input genomic regions differ in functionalcomposition from <strong>the</strong> entire set of background genomic regions as a whole.The foreground/background hypergeometric test over genomic regions posesthis statistical question by mapping all ontology annotations of each gene toall background genomic regions that lie within its regulatory domain; it <strong>the</strong>ncalculates enrichments over <strong>the</strong> input genomic regions with respect to <strong>the</strong>superset of background genomic regions using a hypergeometric distribution.Formally, <strong>the</strong> foreground/background hypergeometric test over genomicregions is executed separately for each ontology term π <strong>and</strong> is defined byfour parameters:1. N is <strong>the</strong> number of genomic regions in <strong>the</strong> background set.2. K π is <strong>the</strong> number of genomic regions in <strong>the</strong> background set that liewithin <strong>the</strong> regulatory domain of some gene annotated with term π.3. n is <strong>the</strong> number of genomic regions in <strong>the</strong> foreground set.4. k π is <strong>the</strong> number of genomic regions in <strong>the</strong> foreground set that liewithin <strong>the</strong> regulatory domain of some gene annotated with term π.The test calculates <strong>the</strong> P value of <strong>the</strong> observed enrichment for term π using<strong>the</strong> hypergeometric equation shown above, equation (1).GREAT software. The GREAT core calculation engine is implemented in C <strong>and</strong><strong>the</strong> source code is publicly available for download (http://great.stanford.edu/).45. Hsu, F. et al. The UCSC Known Genes. Bioinformatics 22, 1036–1046 (2006).46. The ENCODE Project Consortium Identification <strong>and</strong> analysis of functional elementsin 1% of <strong>the</strong> human genome by <strong>the</strong> ENCODE pilot project. Nature 447, 799–816(2007).47. Lettice, L.A. et al. A long-range Shh enhancer regulates expression in <strong>the</strong> developinglimb <strong>and</strong> fin <strong>and</strong> is associated with preaxial polydactyly. Hum. Mol. Genet. 12,1725–1735 (2003).48. Maston, G.A., Evans, S.K. & Green, M.R. Transcriptional regulatory elements in <strong>the</strong>human genome. Annu. Rev. Genomics Hum. Genet. 7, 29–59 (2006).49. Levings, P.P. & Bungert, J. The human beta-globin locus control region. Eur.J. Biochem. 269, 1589–1599 (2002).50. Spitz, F., Gonzalez, F. & Duboule, D. A global control region defines a chromosomalregulatory l<strong>and</strong>scape containing <strong>the</strong> HoxD cluster. Cell 113, 405–417 (2003).doi:10.1038/nbt.1630nature biotechnology


A r t i c l e sAb initio reconstruction of cell type–specifictranscriptomes in mouse reveals <strong>the</strong> conservedmulti-exonic structure of lincRNAsMitchell Guttman 1,2,6 , Manuel Garber 1,6 , Joshua Z Levin 1 , Julie Donaghey 1 , James Robinson 1 , Xian Adiconis 1 , Lin Fan 1 ,Magdalena J Koziol 1,3 , Andreas Gnirke 1 , Chad Nusbaum 1 , John L Rinn 1,3 , Eric S L<strong>and</strong>er 1,2,4 & Aviv Regev 1,2,5© 2010 Nature America, Inc. All rights reserved.Massively parallel cDNA sequencing (RNA-Seq) provides an unbiased way to study a transcriptome, including both coding<strong>and</strong> noncoding genes. Until now, most RNA-Seq studies have depended crucially on existing annotations <strong>and</strong> thus focused onexpression levels <strong>and</strong> variation in known transcripts. Here, we present Scripture, a method to reconstruct <strong>the</strong> transcriptome of amammalian cell using only RNA-Seq reads <strong>and</strong> <strong>the</strong> genome sequence. We applied it to mouse embryonic stem cells, neuronalprecursor cells <strong>and</strong> lung fibroblasts to accurately reconstruct <strong>the</strong> full-length gene structures for most known expressed genes.We identified substantial variation in protein coding genes, including thous<strong>and</strong>s of novel 5′ start sites, 3′ ends <strong>and</strong> internalcoding exons. We <strong>the</strong>n determined <strong>the</strong> gene structures of more than a thous<strong>and</strong> large intergenic noncoding RNA (lincRNA) <strong>and</strong>antisense loci. Our results open <strong>the</strong> way to direct experimental manipulation of thous<strong>and</strong>s of noncoding RNAs <strong>and</strong> demonstrate<strong>the</strong> power of ab initio reconstruction to render a comprehensive picture of mammalian transcriptomes.A critical task in underst<strong>and</strong>ing mammalian biology is defining a precisemap of all <strong>the</strong> transcripts encoded in a genome. Although muchis known about protein coding genes in mammals, recent studies havesuggested that <strong>the</strong> mammalian genome also encodes many thous<strong>and</strong>s oflarge noncoding RNA (ncRNA) genes 1–4 . Recently, we used a chromatinsignature, combining histone-3 Lys4 trimethylation modifications(H3K4me3), known to mark promoter regions, <strong>and</strong> histone-3 Lys36trimethylation modifications (H3K36me3), known to mark <strong>the</strong> entiretranscribed regions (K4-K36 region; see Supplementary Fig. 1), todiscover <strong>the</strong> genomic regions encoding ~1,600 lincRNAs in four mousecell types 4 <strong>and</strong> ~3,300 lincRNAs across six human cell types 5 .Defining <strong>the</strong> complete gene structure of <strong>the</strong>se lincRNAs is a prerequisitefor experimental <strong>and</strong> computational studies of <strong>the</strong>ir function.We previously gained initial insights by hybridizing total RNA to tilingmicroarrays defined across <strong>the</strong> K4-K36 region 4 . This provided acoarse list of putative exonic locations but could not define <strong>the</strong> precisegene structures <strong>and</strong> exon connectivity.Advances in RNA-Seq have opened <strong>the</strong> way to unbiased <strong>and</strong> efficientassays of <strong>the</strong> transcriptome of any mammalian cell 6–10 . Recent studiesin mouse <strong>and</strong> human cells have mostly focused on using RNA-Seq tostudy known genes 6–8,10,11 <strong>and</strong> have depended on existing annotations.They were thus of limited utility for discovering <strong>the</strong> completegene structure of lincRNAs or o<strong>the</strong>r noncoding transcripts.An alternative strategy is to use an ab initio reconstructionapproach 9,12–14 to learn <strong>the</strong> complete transcriptome of an individualsample from solely <strong>the</strong> unannotated genome sequence <strong>and</strong> millionsof relatively short sequence reads. A complete ab initio transcriptomereconstruction of a sample will (i) identify all expressed exons;(ii) enumerate all <strong>the</strong> splicing events that connect <strong>the</strong>m; (iii) combine<strong>the</strong>m into transcriptional units; (iv) determine all isoforms, includingalternative ends <strong>and</strong> (v) discover novel transcripts. A successfulab initio method should be applicable to large <strong>and</strong> complex mammaliangenomes <strong>and</strong> should be able to reconstruct transcripts of variablesizes, expression levels <strong>and</strong> protein coding capacity.Despite early successes in yeast 9 , ab initio reconstruction of a mammaliantranscriptome has remained an elusive <strong>and</strong> substantial computationalchallenge. There has been important recent progress, including(i) efficient gapped aligners (for example, TopHat 13 ) that can mapshort reads that span splice junctions (‘spliced reads’); (ii) use of suchgapped alignments to discover splicing events 9,13 ; (iii) exon identificationmethods 14 ; <strong>and</strong> (iv) genome-independent assembly of unmappedreads to sequence contigs (for example, Abyss 12 ). Each of <strong>the</strong>se methodsprovides an important component toward reconstruction, but nonecan reconstruct <strong>the</strong> complete transcriptome of a mammalian cell,due to scaling issues 9 , limitations in h<strong>and</strong>ling splicing 14 or inability toidentify transcripts with moderate coverage 12 .Here we present Scripture, a comprehensive method for ab initioreconstruction of <strong>the</strong> transcriptome of a mammalian cell that usesgapped alignments of reads across splice junctions (exploiting recentincreases in read length) <strong>and</strong> reconstructs reads into statistically significanttranscript structures. We applied Scripture to RNA-Seq datafrom mouse embryonic stem cells (ESC), mouse neural progenitor1 Broad Institute of MIT <strong>and</strong> Harvard, Cambridge, Massachusetts, USA. 2 Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.3 Department of Pathology, Beth Israel Deaconess Medical Center, Boston, Massachusetts, USA. 4 Department of Systems Biology, Harvard Medical School, Boston,Massachusetts, USA. 5 Howard Hughes Medical Institute, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA. 6 These authors contributed equallyto this work. Correspondence should be addressed to M. Guttman (mguttman@mit.edu), M. Garber (mgarber@broadinstitute.org) or A.R. (aregev@broad.mit.edu).Received 10 March; accepted 6 April; published online 2 May 2010; doi:10.1038/nbt.1633nature biotechnology VOLUME 28 NUMBER 5 MAY 2010 503


A rt i c l e s© 2010 Nature America, Inc. All rights reserved.cells (NPC) <strong>and</strong> mouse lung fibroblasts (MLF) <strong>and</strong> correctly identified<strong>the</strong> complete annotated full-length gene structures for most expressed,known, protein coding genes. The reconstruction of <strong>the</strong> three transcriptomesrevealed substantial variation in protein coding genes betweencell types, including thous<strong>and</strong>s of novel 5′ start sites, 3′ ends or additionalcoding exons. Many of <strong>the</strong>se variant structures are supported byindependent data. We also discovered <strong>the</strong> gene structure <strong>and</strong> expressionlevel of over 2,000 noncoding transcripts, including hundreds of transcriptsfrom previously identified lincRNA loci, over a thous<strong>and</strong> morelincRNAs with similar properties <strong>and</strong> hundreds of multi-exonic antisensencRNAs. We show that lincRNAs have no significant coding potential<strong>and</strong> that <strong>the</strong>y are evolutionary conserved. Our results open <strong>the</strong> way todirect experimental manipulation of this <strong>new</strong> class of genes <strong>and</strong> highlight<strong>the</strong> power of RNA-Seq along with an ab initio reconstruction to providea comprehensive picture of cell-specific transcriptomes.RESULTSRNA-Seq librariesWe used massively parallel (Illumina) sequencing to sequence cDNAlibraries from poly(A) + mRNA from ESC, NPC <strong>and</strong> MLF cells, withFigure 1 Scripture: a method for ab initiotranscriptome reconstruction from RNA-Seqdata. (a) Spliced <strong>and</strong> unspliced reads. A typicalexpressed four-exon gene (1500032D16Rik,top; exons, gray boxes) with coverage fromdifferent type of reads. Unspliced reads (blackbars) fall within a single exon, whereas splicedreads (bars broken into ‘dumbbells’) span exon–exon junctions (thin horizontal lines connect <strong>the</strong>alignment of a read to <strong>the</strong> exons it spans). Thecoverage track (bottom) shows <strong>the</strong> aggregatecoverage of both spliced <strong>and</strong> unspliced reads.(b–g) A schematic description of Scripture.(b) A cartoon example. Reads (black bars)originate from sequencing a contiguous RNAmolecule. Shown are transcripts from two differentgenes (blue <strong>and</strong> red boxes), one with seven exons(blue boxes) <strong>and</strong> one with three exons (red boxes),which are adjacent in <strong>the</strong> genome (black line).The grayscale vertical shading in subsequentpanels is shown for visual tracking. (c) Splicedreads. Scripture is initiated with a referencegenome sequence <strong>and</strong> spliced aligned reads(dumbbells) with gaps in <strong>the</strong>ir alignment (thinhorizontal lines). Scripture uses splice siteinformation to orient spliced reads (arrowheads).(d) Connectivity graph construction. Scripturebuilds a connectivity graph by drawing an edge(curved arrow) between any two bases that areconnected by a spliced read gap. Edges arecolor coded to relate to <strong>the</strong> original RNA <strong>and</strong>eventual transcript. (e) Path scoring. Scripturescans <strong>the</strong> graph with fixed-sized windows <strong>and</strong>uses coverage from all reads (spliced <strong>and</strong>unspliced; bottom track) to score each path forsignificance (P-values shown as edge labels).(f) Transcript graph construction. Scripturemerges all significant windows <strong>and</strong> uses <strong>the</strong>connectivity graph to give significant segmentsa graph structure (three graphs, in this example).(g) Refinement with paired-end data. Scriptureuses paired-end (dashed curved lines) to joinpreviously disconnected graphs (gene 1, boldabcdShort readsConstruct connectivity graph of individual basesConnectivitygraphefCoverageTranscriptgraphgSample dataTranscriptUnsplicedreadsSplicedreadsCoverageSequence RNAAlign reads0.00776-base paired-end reads. For <strong>the</strong> ESC library, we generated a total of152 million paired-end reads. Using a gapped aligner 13 , 93 million of<strong>the</strong>se were alignable (497 Mb aligned bases, 262-fold average coverageof known protein coding genes expressed in ESC). We obtainedsimilar numbers for <strong>the</strong> NPC <strong>and</strong> MLF libraries (Online Methods).In ESC, 76% of <strong>the</strong>se reads mapped within <strong>the</strong> exonic regions ofknown protein coding genes, 9% were in introns of known proteincoding genes, <strong>and</strong> 15% mapped in intergenic regions. We found astrong correlation between expression levels of protein coding genesas measured by RNA-Seq <strong>and</strong> Affymetrix expression arrays (r = 0.88for all genes; Supplementary Fig. 2).Scripture: a method for transcriptome reconstructionWe next developed Scripture, a genome-guided method to reconstruct<strong>the</strong> transcriptome using only an RNA-Seq data set <strong>and</strong> an(unannotated) reference genome sequence. Scripture consists of fivesteps (Fig. 1, Supplementary Note 1 <strong>and</strong> Online Methods). (i) Weuse reads aligned to <strong>the</strong> genome, including those with gapped alignments13 spanning exon-exon junctions (‘aligned spliced reads’, Fig. 1a,c).‘Spliced’ reads provide direct information on <strong>the</strong> location of spliceScan fixed-size windows across graphs <strong>and</strong> score pathsP-values0.01 0.50.8Identify significant paths <strong>and</strong> build transcript graphAdd paired-end read informationIsoform 1Isoform 2RNAmolecule 1RNAmolecule 2Gene 1 Gene 2dashed line), find breakpoint regions within contiguous segments (detectable in this example by <strong>the</strong> lack of dashed lines between genes 1 <strong>and</strong> 2) <strong>and</strong>eliminate isoforms that result in paired-end reads mapping at a distance with low likelihood.Genome504 VOLUME 28 NUMBER 5 MAY 2010 nature biotechnology


A rt i c l e s© 2010 Nature America, Inc. All rights reserved.Figure 2 Scripture correctly reconstructsfull-length transcripts for most annotatedprotein coding genes. (a) A typical Scripturereconstruction on mouse chromosome 9. Top,RNA-Seq read coverage (from both unspliced<strong>and</strong> spliced reads); middle, three transcriptsreconstructed by Scripture, including exons(black boxes) <strong>and</strong> orientation (arrow heads);bottom, RefSeq annotations for this region.All three transcripts are fully reconstructedfrom 5′ to 3′ ends, capturing all internal exons;Scripture correctly reconstructed <strong>the</strong> overlappingtranscripts Pus3 <strong>and</strong> Hyls1. (b) Fraction ofgenes fully reconstructed in different expressionquantiles (in 5% increments) in ESC. Eachbar represents a 5% quantile of read coveragefor genes expressed; mean read coverage isnoted in blue. The height of each bar is <strong>the</strong>fraction of genes in that quantile that werefully reconstructed. For example, ~20% of <strong>the</strong>transcripts at <strong>the</strong> bottom 5% of expression levelswere fully reconstructed; ~94% of <strong>the</strong> genes at<strong>the</strong> top 95% of expression are fully reconstructed.(c) Portion of gene length reconstructed indifferent expression quantiles in ESC. Shown is abox plot of <strong>the</strong> portion of each transcript’s lengththat was covered by a Scripture reconstruction ineach 5% coverage quantile. Black line in eachbox, <strong>media</strong>n; rectangle, 25%–75% coveragequantiles; whiskers, extreme coverage valueswithin expression quantile. For example, at <strong>the</strong>bottom 5% of expression, Scripture reconstructeda <strong>media</strong>n length of 60% of <strong>the</strong> full lengthtranscript.junctions within <strong>the</strong> transcript, <strong>and</strong> ~30% of 76-base reads areexpected on average to span an exon–exon junction. From <strong>the</strong> alignedspliced reads, we construct a ‘connectivity graph’ (Fig. 1d), where twobases in <strong>the</strong> genome are connected if <strong>the</strong>y are im<strong>media</strong>te neighbors ei<strong>the</strong>rin <strong>the</strong> genomic sequence itself or within a spliced read. We use agreementwith splicing motifs at each putative junction to orient <strong>the</strong> connection(edge) in <strong>the</strong> connectivity graph 9,13 (Fig. 1d). (ii) To infer transcripts, weuse a statistical segmentation approach 4 <strong>and</strong> both spliced <strong>and</strong> unsplicedreads to identify paths in <strong>the</strong> connectivity graph with mapped readenrichment compared to <strong>the</strong> genomic background (Fig. 1e). This isdone by scoring a sliding window using a test statistic for each region,computing a threshold for genome-wide significance, <strong>and</strong> using <strong>the</strong> significantwindows to define intervals. (iii) From <strong>the</strong> paths, we construct a‘transcript graph’ connecting each exon in <strong>the</strong> transcript (Fig. 1f). Eachpath through <strong>the</strong> graph is directed <strong>and</strong> represents one oriented(str<strong>and</strong>-specific) isoform of <strong>the</strong> gene. Alternative spliced isoformsare identified by considering all possible paths in <strong>the</strong> transcript graph.(iv) We augment <strong>the</strong> transcript graph with connections based on pairedendreads <strong>and</strong> <strong>the</strong>ir distance constraints, allowing us to join transcriptsor remove unlikely isoforms (Fig. 1g, below). (v) We generate a catalogof transcripts defined by <strong>the</strong> paths through <strong>the</strong> transcript graph.Paired-end reads aid in transcriptome reconstructionPaired-end information, consisting of reads that came from <strong>the</strong> twoends of <strong>the</strong> sequenced RNA fragment, provides valuable additionalinformation in <strong>the</strong> reconstruction.First, <strong>the</strong> presence of paired ends linking two regions shows that<strong>the</strong>y appear in <strong>the</strong> same transcript; such a connection might noto<strong>the</strong>rwise be apparent because low expression levels or unalignablesequence might prevent a continuous chain of overlapping sequenceabFraction of transcripts fully reconstructed1.00.80.60.40.2RNA-SeqReconstructionAnnotationDdx25Fraction fully reconstructed by coverage quantile00 0.2 0.4 0.6 0.8 1.015 45 106 174 395 21,074Coverage percentileMean coveragecFraction of transcript recovered1.00.80.60.40.2Pus3Hyls1Fraction reconstructed per coverage quantile00 0.2 0.4 0.6 0.8 1.015 45 106 174 395 21,074Coverage percentileMean coveragereads (spliced or unspliced) across <strong>the</strong> transcript. We thus augment<strong>the</strong> transcript graphs with paired-end information, where available,to (indirectly) link nodes in <strong>the</strong> graph. We use <strong>the</strong>se indirect links(Fig. 1g) to add edges between disconnected graphs, add internal nodes(exons) that might have been missed within a path (transcript) <strong>and</strong>add extra support for existing edges. This refines <strong>the</strong> structure of ourtranscripts <strong>and</strong> increases our confidence in <strong>the</strong>m, especially in weaklyexpressed transcripts, which are more likely to have coverage gaps.Second, <strong>the</strong> distribution of library insert sizes constrains <strong>the</strong> distancebetween <strong>the</strong> paired-end reads; <strong>the</strong>se distance constraints canbe used to infer <strong>the</strong> relative likelihood of some potential transcripts(for example, those in which <strong>the</strong> paired ends would be much closer ormuch fur<strong>the</strong>r than expected). We infer <strong>the</strong> distribution of insert sizesfor a given library from <strong>the</strong> position of read pairs on transcripts fromthose genes for which <strong>the</strong>re is only a single transcript model (that is,no detectable alternative splicing; Online Methods). For example, in<strong>the</strong> ESC library, this distribution matches well with <strong>the</strong> experimentallydetermined sizes. Using this distribution, we assign likelihoods to eachconnection, filtering unlikely ones (Online Methods).Reconstruction of full-length gene structuresWe applied Scripture to our mouse ESC RNA-Seq data set <strong>and</strong> comparedour reconstructions to protein coding gene annotations 15 .Scripture identified 16,389 nonoverlapping, multi-exonic transcriptgraphs that correspond to 15,352 known multi-exonic genes (OnlineMethods). Of reconstructed genes, 88.4% are covered by a singlegraph (no fragmentation of <strong>the</strong> reconstructed transcript) <strong>and</strong> 8.0%are covered by two transcript graphs (fragmentation of <strong>the</strong> transcriptto two separate pieces in <strong>the</strong> reconstruction). Focusing on <strong>the</strong>13,362 genes with significant expression (P < 0.05 compared withnature biotechnology VOLUME 28 NUMBER 5 MAY 2010 505


A rt i c l e s© 2010 Nature America, Inc. All rights reserved.Figure 3 Alternative 5′ ends, 3′ ends <strong>and</strong> novelcoding exons in transcripts reconstructed byScripture. Representative examples (tracks, left)<strong>and</strong> summary counts (Venn diagrams, rightnumbers represent those unique to each celltype compared to o<strong>the</strong>r two) of five categoriesof variation discovered in Scripture transcriptscompared to <strong>the</strong> known annotations. In eachrepresentative example, shown is <strong>the</strong> coverageby RNA-Seq reads (top track), <strong>the</strong> reconstructedannotation (middle track) <strong>and</strong> <strong>the</strong> knownannotation (bottom track). The novel regions in<strong>the</strong> reconstruction are marked by gray shading.In each proportional Venn diagram we show <strong>the</strong>number of transcripts in this class in each celltype (ESC, green; NPC, blue; MLF, red) <strong>and</strong><strong>the</strong>ir overlap. (a) Internal alternative 5′ startsites. (b) External alternative 5′ start sites.(c) Alternative downstream 3′ end (extendedtermination). (d) Alternative upstream 3′ end(early termination). (e) Novel coding exons.Internal alternative 5′ start sitesbackground coverage; see Online Methods),Scripture reconstructed <strong>the</strong> full-length structureReconstructionof <strong>the</strong> longest known splice isoform(from 5′ to 3′ end, including all exons <strong>and</strong> Annotationsplice junctions; Fig. 2a) for 10,355 of <strong>the</strong>m(~78%). All of our reconstructed transcriptsfor known multi-exonic transcripts also had d<strong>the</strong> correct orientation (str<strong>and</strong>), allowing usto reconstruct genes that overlap one ano<strong>the</strong>rRNA-Seqon opposite str<strong>and</strong>s (Fig. 2a).Complete transcript structures were recoveredacross a very broad range of expressionReconstructionAnnotationlevels (Fig. 2b,c) for both single <strong>and</strong>multi-exonic genes. For example, Scriptureaccurately reconstructed <strong>the</strong> full-length transcriptof ~73% of <strong>the</strong> known protein codinggenes at <strong>the</strong> second quintile of expression,<strong>and</strong> ~94% of <strong>the</strong> genes from <strong>the</strong> top quintile.Fur<strong>the</strong>rmore, <strong>the</strong> average proportionof bases reconstructed for each transcriptwas high (Fig. 2c). Even for <strong>the</strong> bottom5% of expressed genes, we recovered on average62% of each of <strong>the</strong>se transcripts’ basese Novel coding exonsRNA-SeqReconstructionAnnotation(Fig. 2c). For single-exon genes, we recovered on average 80% of <strong>the</strong>transcribed bases. We obtained similar results in <strong>the</strong> o<strong>the</strong>r two celltypes (19,835 <strong>and</strong> 20,407 transcript graphs for 14,212 <strong>and</strong> 13,351known genes in NPC <strong>and</strong> MLF, respectively). Most of <strong>the</strong> genes thatwere not fully reconstructed are those with low expression; it shouldbe possible to reconstruct most of <strong>the</strong>se by generating more RNA-Seqdata. The few highly expressed genes that were not fully reconstructedare ei<strong>the</strong>r <strong>the</strong> result of alignment artifacts caused by recent processedpseudogenes or stem from novel transcriptome variations, missingfrom <strong>the</strong> current annotation (explored in detail below).Novel transcriptome variations in annotated protein coding genesGiven that most of <strong>the</strong> Scripture reconstructions of protein codinggenes were accurate, we next investigated <strong>the</strong> differences between<strong>the</strong> reconstructed transcriptome <strong>and</strong> <strong>the</strong> known gene annotations(Supplementary Table 1). We focused on transcripts with (i) novel5′ start sites; (ii) novel 3′ ends; <strong>and</strong> (iii) previously unidentified exonswithin <strong>the</strong> transcriptional units of known protein coding genes.abcRNA-SeqReconstructionAnnotationExternal alternative 5′ start sitesRNA-SeqReconstructionAnnotationAlternative downstream 3′ endRNA-SeqAlternative upstream 3′ endInpp5dTtc13Pias2Wdr26Rps6kc1ES955ES221ES433ES566ES497NPC824NPC244NPC241NPC664MLF442MLF118MLF113MLF321In each category, we first discuss below <strong>the</strong> reconstructed transcriptsin ESC <strong>and</strong> <strong>the</strong>n consider <strong>the</strong> results for <strong>the</strong> NPC <strong>and</strong> MLF.1. Alternative 5′ start sites are supported by H3K4me3 marks. Wefound 1,804 transcripts in ESC that match <strong>the</strong> annotated 3′ end buthave an alternative 5′ start site, derived from an extra exon not overlapping<strong>the</strong> annotated first exon. We distinguish between internalalternative 5′ start sites (1,397 cases; Fig. 3a), which occur downstreamof <strong>the</strong> annotated start, <strong>and</strong> external alternative 5′ start sites(407 cases; Fig. 3b), which occur upstream of <strong>the</strong> annotated start.Ninety percent of <strong>the</strong> internal 5′ start sites <strong>and</strong> 75% of <strong>the</strong> external5′ start sites contained an H3K4me3 modification, a mark of <strong>the</strong>promoter region of genes 16 (Supplementary Fig. 3). These alternativestart sites are on average 21 kb upstream of <strong>the</strong> annotated site,substantially revising <strong>the</strong> annotated promoters. Notably, ~60% of <strong>the</strong>transcripts with an alternative start site (internal or external) had noreconstructed isoform starting at <strong>the</strong> annotated 5′ start site.We obtained similar results from NPC <strong>and</strong> MLF (Fig. 3a,b, right;Supplementary Table 1). Altoge<strong>the</strong>r, we identified 2,813 internalMLF76NPC274506 VOLUME 28 NUMBER 5 MAY 2010 nature biotechnology


A rt i c l e s© 2010 Nature America, Inc. All rights reserved.aK4-me3K36-me3RNA-SeqbK4-me3K36-me3RNA-SeqChr. 6lincRNAreconstructionChr. 1GPR1ReconstructionAnnotationGpr1SKAP2Antisense ncRNA78 kb104 kblincRNA315 kbHOXA1ADAM23494 kbFigure 4 Noncoding transcripts reconstructed by Scripture.(a) A representative example of a lincRNA expressed in ESC. Top:mouse genomic locus containing <strong>the</strong> lincRNA <strong>and</strong> its neighboring proteincoding genes. Bottom: magnified view of <strong>the</strong> lincRNA locus showing <strong>the</strong>coverage of H3K4me3 (green track), H3K36me3 (blue track) <strong>and</strong> RNA-Seqreads (red track) overlapping <strong>the</strong> transcribed lincRNA locus, as well as itsScripture reconstructed transcript isoforms (black). (b) A representativeexample of a multi-exonic antisense ncRNA expressed in ESC. Top: mousegenomic locus containing <strong>the</strong> antisense transcript. Bottom: magnified viewof <strong>the</strong> antisense locus showing <strong>the</strong> coverage of H3K4me3 (green track),H3K36me3 (blue track) <strong>and</strong> RNA-Seq reads (red track) overlapping <strong>the</strong>transcribed antisense locus, as well as its Scripture reconstructed genestructure (black) <strong>and</strong> <strong>the</strong> annotated overlapping transcript (blue).5′ start sites (2,302 supported by H3K4me3 in <strong>the</strong>ir respective tissues),<strong>and</strong> 807 external 5′ start sites in at least one cell type. In particular, 33% of<strong>the</strong>se novel 5′ ends are likely active in ESCs but not in NPCs or MLFs.2. Alternative 3′ untranslated regions are supported by polyadenylationmotifs. There are 551 (~4%) ESC-reconstructed transcriptswith an alternative 3′ end downstream of <strong>the</strong> annotated 3′ end (me<strong>and</strong>istance 30 kb downstream, Fig. 3c). Of <strong>the</strong>se, 275 (~50%) showedevidence of a polyadenylation motif within <strong>the</strong> novel 3′ exon, whichis only slightly lower than for annotated 3′ ends (60%) <strong>and</strong> muchhigher than for r<strong>and</strong>omly chosen size-matched exons (6%). Thefrequency of <strong>the</strong> polyadenylation motif supports <strong>the</strong> accuracy of<strong>the</strong> reconstruction.To conservatively distinguish between upstream (early) termination<strong>and</strong> incomplete reconstruction, we designated novel 3′ ends onlyin those cases that did not overlap any of <strong>the</strong> known exons in <strong>the</strong>annotated transcript <strong>and</strong> that contained complete 5′ start sites. Weidentified 759 transcripts with upstream 3′ ends in ESC (Fig. 3d);44% of <strong>the</strong>m contained a polyadenylation motif, supporting <strong>the</strong>irbiological relevance. For most (90%) of <strong>the</strong>se transcripts, Scripturealso reconstructed an isoform that contained <strong>the</strong> annotated 3′ end.We obtained similar results for NPC <strong>and</strong> MLF (Fig. 3c,d, right;Supplementary Table 1). Altoge<strong>the</strong>r, we identified 940 downstream3′ ends <strong>and</strong> 1,850 upstream 3′ ends in at least one cell type.3. Additional coding exons are highly conserved <strong>and</strong> preserveORFs. We found 534 transcripts in ESC with at least one extra,previously unannotated internal coding exon spliced into annotatedprotein coding transcripts (Fig. 3e). These transcripts contained 588novel internal exons, ranging in length from 6 bp to 3.5 kb (<strong>media</strong>n,111 bp; 20–80% quantiles, 60–224 bp). Of <strong>the</strong>se extra exons, 322(54.5%) were present in all versions of <strong>the</strong> reconstructed transcriptin ESC. Most (83%) of <strong>the</strong>se novel exons maintain <strong>the</strong> readingframe of <strong>the</strong> transcript <strong>and</strong> are as highly conserved as known codingexons (Supplementary Fig. 4), consistent with <strong>the</strong>ir codingcapacity. We validated <strong>the</strong> presence of <strong>the</strong> novel exons within five offive tested transcripts, using reverse transcription followed by PCR(RT-PCR) followed by Sanger sequencing (Online Methods).We obtained similar results in MLF (124 transcripts, 144 exons) <strong>and</strong>NPC (325 transcripts, 363 exons) (Fig. 3e, right). A majority of exons(~70%) were present in all versions of <strong>the</strong> reconstructed transcriptwithin a cell type. Altoge<strong>the</strong>r, we identified 960 novel internal exonsin at least one cell type (Fig. 3e, right).Gene structures of previously identified lincRNA lociWe next turned to identifying <strong>the</strong> gene structures of transcriptsexpressed from known lincRNAs loci. We had previously identified317 lincRNA loci on <strong>the</strong> basis of K4-K36 domains in ESC cells 4 . Whenapplied to ESC RNA-Seq data, Scripture reconstructed multi-exonicgene structures for 250 (78.8%) of <strong>the</strong>m (Fig. 4). This is comparableto <strong>the</strong> proportion (78.5%) reconstructed for protein coding geneswith K4-K36 domains in ESC. Scripture reconstructed 87% (160 of183) of ESC lincRNAs for which we previously identified an RNAhybridization signal from tiling microarrays. We discuss possiblereasons for <strong>the</strong> few remaining discrepancies in Supplementary Note 2.The reconstructed lincRNA transcripts in ESC have on average3.7 exons, an average exon size of 350 bp <strong>and</strong> an average maturespliced size of 3.2 kb (in comparison, protein coding genes have onaverage 9.7 exons, exon length of 291 bp <strong>and</strong> length of 2.9 kb). TheScripture-identified str<strong>and</strong> information for each lincRNA is consistentwith that inferred from <strong>the</strong> location of H3K4me3 modification <strong>and</strong>with <strong>the</strong> orientation determined from a str<strong>and</strong>-specific RNA-Seqlibrary which we generated independently (Online Methods). MostlincRNAs likely represent 5′ complete transcripts based on overlapwith H3K4me3 (82%) <strong>and</strong> 3′ complete transcripts based on presenceof a polyadenylation motif (~50%, comparable to 60% for proteincoding genes <strong>and</strong> far above background of 6%).Similarly, Scripture successfully reconstructed lincRNA gene structuresfor K4-K36 lincRNA loci in MLF <strong>and</strong> NPC (232 of 289 in MLF<strong>and</strong> 224 of 270 in NPC). Most are likely 5′ complete (69% in MLF<strong>and</strong> 81% in NPC based on overlap with H3K4me3) <strong>and</strong> many maybe 3′ complete based on detectable 3′ polyadenylation sites (18% inMLF <strong>and</strong> 37% in NPC). In addition, we successfully reconstructedano<strong>the</strong>r 116 lincRNAs previously identified only in mouse embryonicfibroblasts but which were now reconstructed in at least one of<strong>the</strong> o<strong>the</strong>r three cell types. Altoge<strong>the</strong>r, we identified gene structuresfor 609 previously defined lincRNA loci in at least one of <strong>the</strong> threecell types.Discovery of novel lincRNAsIn addition to <strong>the</strong> previously identified lincRNAs, we found ano<strong>the</strong>r1,140 multi-exonic transcripts that map to intergenic regions (591 inESC, 318 in MLF, <strong>and</strong> 528 in NPC; Fig. 5). Most of <strong>the</strong>se transcriptsdo not seem to encode proteins, <strong>and</strong> are designated as noncoding,on <strong>the</strong> basis of <strong>the</strong>ir codon substitution frequency (CSF) scores 17,18(Online Methods) across <strong>the</strong> mature (spliced) RNA transcript(88%; Fig. 5a) <strong>and</strong> on <strong>the</strong> lack of an open reading frame (ORF)larger than 100 amino acids (80%; Fig. 5b). Careful review of <strong>the</strong>nature biotechnology VOLUME 28 NUMBER 5 MAY 2010 507


A rt i c l e s© 2010 Nature America, Inc. All rights reserved.remaining ~12% revealed 66 loci that arelikely to be novel protein coding genes (highCSF score, ORF >200 amino acids <strong>and</strong> veryhigh evolutionary conservation; SupplementaryFig. 5).Most of <strong>the</strong> novel lincRNA loci were notidentified in our previous study owing to<strong>the</strong> stringent criteria we imposed whenusing chromatin maps to identify lincRNAs.Specifically, we required that a K4-K36domain extend over at least 5 kb <strong>and</strong> bewell separated from <strong>the</strong> nearest known genelocus 4 . Indeed, most novel intergenic transcripts(76%) were enriched for a K4-K36domain (a comparable proportion as that forexpressed protein coding genes) but failed tomeet <strong>the</strong> size <strong>and</strong> distance criteria or couldnot be identified at a genome-wide significancelevel (without knowing <strong>the</strong>ir locusa priori). On average, <strong>the</strong> genomic loci of<strong>the</strong> novel lincRNAs are closer to neighboringgenes <strong>and</strong> have smaller sizes (~3.5 kbaverage), <strong>and</strong> <strong>the</strong> transcripts are shorter(859 bp). Of <strong>the</strong> lincRNAs that did not havea chromatin signature that reached genomewidesignificance, ~40% showed chromatinmodifications enriched at a nominal significancelevel (compared to 57% for proteincoding genes).On average, <strong>the</strong> lincRNAs are expressed atlevels that are readily detectable, albeit somewhatlower than those of protein codinggenes. The <strong>media</strong>n expression level of <strong>the</strong>reconstructed lincRNAs, as estimated by readsper kilobase of exonic sequence per millionaligned reads (RPKM; see Online Methods)was approximately one-third of <strong>the</strong> expressionof protein coding genes (Fig. 5d), with~25% of lincRNAs having expression levelshigher than <strong>the</strong> <strong>media</strong>n level for protein coding genes (Fig. 5d). Thenovel lincRNAs identified in this study are expressed at somewhatlower levels than those from chromatin identified loci, consistentwith <strong>the</strong> fact that chromatin enrichment is positively correlated wi<strong>the</strong>xpression levels (Fig. 5d).We compared <strong>the</strong> novel lincRNA genes to a collection of ~35,000mouse cDNA <strong>and</strong> found evidence that ~43% of our lincRNAs werepresent in this collection 1 . This is comparable to <strong>the</strong> reported fraction(40%) of known transcripts covered by <strong>the</strong> same cDNA catalog 1 . Theremaining lincRNAs are found in our study but not in <strong>the</strong> comparisoncatalog. These were likely previously missed owing to <strong>the</strong> different celltypes <strong>and</strong> limited coverage of <strong>the</strong> previous study 1 .Most lincRNAs are evolutionarily conservedThe reconstructed full-length gene structures of lincRNAs allowus to accurately assess <strong>the</strong>ir evolutionary sequence conservation ineach exon <strong>and</strong> in small windows. To this end, we identified <strong>the</strong> orthologoussequences for each lincRNA across 29 mammals <strong>and</strong> estimatedconservation by a metric (ω; Online Methods) reflecting <strong>the</strong> totalcontraction of <strong>the</strong> branch length of <strong>the</strong> evolutionary tree connecting<strong>the</strong>m 19 . We calculated ω over <strong>the</strong> entire lincRNA transcript, as wellas over individual exons.aCumulative frequency1.00.80.60.40.20c 1.0Cumulative frequencylincRNAAntisenseProtein coding–6,000 –4,000 –2,000 0 2,000 4,000 6,000Protein-coding capacity (CSF score)0.80.60.4Protein codingIntrons0.2K4-K36 lincRNAsNovel lincRNAsAntisense00 0.5 1.0 1.5 2.0More conservedLess conservedbCumulative frequencyCumulative frequency1.00.80.60.40.20d 1.00.80.60.40.20100aminoacids0 2,000 4,000 6,000Protein-coding capacity(longest possible ORF)lincRNAAntisenseProteinProtein codingK4-K36 lincRNAsNovel lincRNAsAntisense0 10 20 30 40 50Expression level (RPKM)Figure 5 Protein coding capacity, conservation levels <strong>and</strong> expression of lincRNAs <strong>and</strong> multi-exonicantisense transcripts. (a,b) Coding capacity of protein coding, lincRNAs <strong>and</strong> multi-exonic antisensetranscripts. Shown is <strong>the</strong> cumulative distribution of CSF scores (a) <strong>and</strong> maximal ORF length (b) forprotein coding transcripts, lincRNAs <strong>and</strong> multi-exonic antisense transcripts. (c) Conservation levelsfor exons from protein coding transcripts, lincRNAs, multi-exonic antisense transcripts <strong>and</strong> introns.Shown is <strong>the</strong> cumulative distribution of sequence conservation across 29 mammals for exons fromprotein coding exons, introns, exons from previously annotated lincRNA loci, exons from <strong>new</strong>lyannotated lincRNA transcripts <strong>and</strong> exons from multi-exonic antisense transcripts. (d) Expressionlevels of protein coding, lincRNAs <strong>and</strong> multi-exonic antisense transcripts. Shown is <strong>the</strong> cumulativedistribution of expression levels, in reads per kilobase of exonic sequence per million aligned reads(RPKM) in ESC for protein coding transcripts, transcripts from previously annotated lincRNA loci,transcripts from <strong>new</strong>ly annotated lincRNA loci <strong>and</strong> multi-exonic antisense transcripts.On <strong>the</strong> basis of our high-resolution gene structures, <strong>the</strong> lincRNAsequences show greater conservation than r<strong>and</strong>om genomic regions orintrons (Fig. 5c), comparable to eight known functional lincRNAs 20–22 ,<strong>and</strong> lower than protein coding exons. The results are consistent with ourprevious estimates of conservation 4 . Interestingly, conservation levelsare indistinguishable between <strong>the</strong> chromatin-defined lincRNAs 4 <strong>and</strong><strong>the</strong> novel ones identified only in this study (Fig. 5c), consistent withmembership in <strong>the</strong> same class of functional large ncRNA genes. Theseconservation levels are considerably higher than those reported for aprevious catalog of large noncoding RNAs 1 .We also determined <strong>the</strong> specific regions within each lincRNA thatare under purifying selection <strong>and</strong> thus likely to be functional, by computingω within short windows (Online Methods). On average, 22%of <strong>the</strong> bases within <strong>the</strong> lincRNAs lie within conserved patches (comparableto <strong>the</strong> value of 25% for <strong>the</strong> eight known functional lincRNAs,much higher than <strong>the</strong> 7% for intronic bases <strong>and</strong> lower than <strong>the</strong> 77%for protein coding bases, Supplementary Fig. 6). These conservedpatches provide a critical starting point for functional studies 23 .Variations in lincRNA expression <strong>and</strong> isoformsA substantial fraction (~41%) of <strong>the</strong> novel lincRNAs reconstructed inat least one cell type show evidence for expression in at least two of <strong>the</strong>508 VOLUME 28 NUMBER 5 MAY 2010 nature biotechnology


A rt i c l e s© 2010 Nature America, Inc. All rights reserved.three cell types. This is comparable to <strong>the</strong> 45% of <strong>the</strong> previously identifiedlincRNAs present in at least two out of <strong>the</strong> three cell types. Incontrast, 80% of expressed protein coding genes are expressed acrosstwo of <strong>the</strong> three cell types. This is not merely a result of <strong>the</strong> loweroverall expression of lincRNAs, as <strong>the</strong> fraction of cell type–specificlincRNAs is higher than that of tissue specific protein coding genes inevery expression quantile (Supplementary Fig. 7). Thus, lincRNAs arelikely to be more tissue specific than protein coding genes.A substantial portion of lincRNA loci also produce alternativespliced isoforms. For example, within ESC we identified two or morealternative spliced isoforms for 25% of lincRNA genes, comparableto <strong>the</strong> 30% for protein coding genes (15% of lincRNAs in MLF havealternative spliced isoforms, <strong>and</strong> 14.7% in NPC). Altoge<strong>the</strong>r, 28.8%of <strong>the</strong> 1,749 lincRNA loci had evidence for alternative isoforms in anyof <strong>the</strong> three cell types.Identification of hundreds of large antisense transcriptsScripture reconstructed hundreds of transcripts that overlap knownprotein coding gene loci but are transcribed in <strong>the</strong> opposite orientation<strong>and</strong> likely represent antisense transcripts. To determine orientation,we required that any identified antisense transcript bemulti-exonic (Online Methods).Using <strong>the</strong>se criteria, we identified 201 antisense multi-exonic transcriptsin ESC (Fig. 4b); <strong>the</strong>se transcripts had an average five exonsper transcript <strong>and</strong> an average transcript size of 1.7 kb. On average,<strong>the</strong> antisense transcripts overlapped <strong>the</strong> genomic locus of <strong>the</strong> senseprotein coding gene by 1,023 bp (83% of <strong>the</strong> locus), <strong>and</strong> most (64%)overlapped at least one sense exon, but this overlap was substantiallylower (766 bp, 48% of <strong>the</strong> transcript exons). Some of <strong>the</strong>se antisensetranscripts (79, ~40%) were identified by a previous cDNA sequencingstudy 1,24 , but most (122, ~60%) were previously unidentified. Most(~85%) antisense transcripts were non–protein coding by both ORFanalysis (Fig. 5b) <strong>and</strong> CSF scores (Fig. 5a). Four of <strong>the</strong> <strong>new</strong>ly identifiedantisense transcripts had a large, conserved open reading frame <strong>and</strong>are likely novel, previously unannotated protein coding genes.We validated <strong>the</strong> reconstructed ESC antisense transcripts by threeindependent sets of experimental data. (i) Most of <strong>the</strong> antisense locicarried an H3K4me3 mark at <strong>the</strong>ir 5′ end (Fig. 4b), consistent with<strong>the</strong>ir independent <strong>and</strong> antisense transcription (for example, 64% of<strong>the</strong> 164 transcripts where it was possible to detect an independentH3K4me3 mark because <strong>the</strong> 5′ end of <strong>the</strong> antisense transcript didnot overlap <strong>the</strong> 5′ ends of <strong>the</strong> sense gene). (ii) We generated <strong>and</strong>sequenced a str<strong>and</strong>-specific library in ESC (17.5 million Illuminareads; Online Methods), <strong>and</strong> found a significant (P < 0.05) numberof reads on <strong>the</strong> antisense str<strong>and</strong> in >90% of cases (<strong>the</strong> remaining arelikely missed in this limited sequencing owing to lower expression).(iii) We confirmed five of five tested antisense transcripts usingRT-PCR to unique exons of <strong>the</strong> antisense transcript (Online Methods)followed by Sanger sequencing.We obtained similar results for antisense transcripts in MLF <strong>and</strong>NPC (112 <strong>and</strong> 202 multi-exonic antisense transcripts, respectively).Altoge<strong>the</strong>r, we identified 469 antisense transcripts expressed in atleast one cell type, only 125 of which (27%) were previously identifiedin large-scale sequencing of mouse cDNAs 24 . The remaining344 (73%) were unidentified by <strong>the</strong> previous study, likely reflecting<strong>the</strong> distinct cell types used in that study <strong>and</strong> <strong>the</strong> limited coverage ofprevious catalogs.The 469 antisense transcripts are expressed at levels comparableto those of <strong>the</strong> novel lincRNAs (Fig. 5d) but show substantially lowersequence conservation. Indeed, <strong>the</strong> antisense ncRNAs showed verylittle evolutionary conservation as estimated by <strong>the</strong> ω metric for <strong>the</strong>portions that do not overlap protein coding exons on <strong>the</strong> sense str<strong>and</strong>,suggesting that <strong>the</strong> antisense ncRNAs are a distinct class from <strong>the</strong>lincRNAs (Fig. 5c).DISCUSSIONDespite <strong>the</strong> availability of <strong>the</strong> genome sequence of many mammals, acomprehensive underst<strong>and</strong>ing of <strong>the</strong> mammalian transcriptome hasbeen an elusive goal. In particular, <strong>the</strong> computational tools needed toreconstruct all full-length transcripts from <strong>the</strong> wealth of short readdata were largely missing. A recent study proposed to overcome thislimitation experimentally by using very long reads (for example, 454sequencing) as a scaffold for short read reconstruction 25 . This is applicable,albeit at a substantial cost, for highly expressed genes but wouldrequire extraordinary depth to cover more weakly expressed ones.Here we present Scripture, a <strong>new</strong> computational method toreconstruct a mammalian transcriptome with no prior knowledgeof gene annotations. Scripture relies on longer reads that span splicejunctions to connect discontiguous (spliced) segments <strong>and</strong> resolvemultiple splice isoforms, <strong>and</strong> uses paired-end information to refine<strong>the</strong>se transcripts. Scripture can identify short but strongly expressedtranscripts as well as transcripts with much lower expression forwhich <strong>the</strong>re is aggregate evidence along <strong>the</strong> entire transcript length.Although Scripture does rely on a reference genome sequence, manyof its components can also be used in <strong>the</strong> development of methodsfor assembly of transcripts from read data only.We applied Scripture to RNA-Seq data from pluripotent ESCs <strong>and</strong>differentiated lineages <strong>and</strong> showed that we can accurately reconstructmost expressed, annotated protein coding genes, at a broad range ofexpression levels, as well as uncover many <strong>new</strong> isoforms in <strong>the</strong> proteincoding transcriptome. This variation may have key regulatory roles,defining <strong>new</strong> cell type–specific promoters, untranslated regions <strong>and</strong>protein coding exons. We used Scripture’s sensitivity <strong>and</strong> resolution toreconstruct <strong>the</strong> gene structures <strong>and</strong> str<strong>and</strong> information of hundredsof lincRNAs <strong>and</strong> multi-exonic antisense transcripts, many of whichare only moderately expressed.Scripture identified over a thous<strong>and</strong> lincRNAs across <strong>the</strong> three celltypes studied. Most of <strong>the</strong> lincRNAs identified were not previouslyfound by classical large-scale cDNA sequencing 1 . Many of <strong>the</strong>se lincRNAscould not be reliably identified solely on <strong>the</strong> basis of chromatinstructure owing to <strong>the</strong>ir proximity to protein coding genes or <strong>the</strong>irshort genomic lengths. Overall, we found that <strong>the</strong> ratio of expressedprotein coding to noncoding genes in <strong>the</strong>se cell types was ~10:1 butthat <strong>the</strong> total number of RNA molecules was more heavily biasedtoward <strong>the</strong> protein coding fraction (~30:1), results similar to previousobservations 26 .Scripture identifies precise gene structures for most previouslyfound lincRNA loci (as well as for <strong>the</strong> <strong>new</strong>ly discovered ones), a prerequisitefor fur<strong>the</strong>r studies. For example, we used <strong>the</strong>se to identify<strong>the</strong> specific regions within each lincRNA that are under purifyingselection (conservation), a starting point for experimental <strong>and</strong> computationalinvestigation.Taken toge<strong>the</strong>r, our results highlight <strong>the</strong> power of ab initio reconstructionsto annotate a genome, to discover transcriptional variationwithin known protein coding genes <strong>and</strong> to provide a rich catalog ofprecise gene structures for noncoding RNAs. The next step is clearlyto apply this approach to a wide range of mammalian cell types, toobtain a comprehensive picture of <strong>the</strong> mammalian transcriptome.MethodsMethods <strong>and</strong> any associated references are available in <strong>the</strong> online versionof <strong>the</strong> paper at http://www.nature.com/naturebiotechnology/.nature biotechnology VOLUME 28 NUMBER 5 MAY 2010 509


A rt i c l e s© 2010 Nature America, Inc. All rights reserved.Accession codes. NCBI Gene Expression Omnibus (GEO), GSE20851.Note: Supplementary information is available on <strong>the</strong> Nature Biotechnology website.AcknowledgmentsWe thank M. Wernig (MIT) for providing NPC; M. Lin <strong>and</strong> M. Kellis (MIT) forCSF code; <strong>the</strong> Broad Sequencing Platform for sample sequencing; L. Gaffneyfor assistance with graphics; <strong>and</strong> C. Burge, J. Merkin, R. Bradley <strong>and</strong> membersof L<strong>and</strong>er <strong>and</strong> Regev laboratories—in particular, M. Yassour, T. Mikkelsen <strong>and</strong>I. Amit—for discussions. A.R. <strong>and</strong> J.L.R. were supported by <strong>the</strong> Merkin FamilyFoundation for Stem Cell Research at <strong>the</strong> Broad Institute. M. Guttman wassupported by a Vertex scholarship. Work was supported by a Burroughs WellcomeFund Career Award at <strong>the</strong> Scientific Interface, a US National Institutes of HealthPIONEER award, a US National Human Genome Research Institute (NHGRI) R01grant <strong>and</strong> <strong>the</strong> Howard Hughes Medical Institute (A.R.), <strong>and</strong> NHGRI <strong>and</strong> <strong>the</strong> BroadInstitute of MIT <strong>and</strong> Harvard (E.S.L.).AUTHOR CONTRIBUTIONSM. Guttman <strong>and</strong> M. Garber conceived <strong>the</strong> project, designed research, implementedScripture, performed computational analysis <strong>and</strong> wrote <strong>the</strong> paper. A.G., C.N. <strong>and</strong>J.Z.L. oversaw cDNA sequencing, provided molecular biology advice <strong>and</strong> helpedto edit <strong>the</strong> manuscript. J.D. constructed cDNA libraries, performed validationexperiments <strong>and</strong> helped to edit <strong>the</strong> manuscript. J.R. implemented components ofScripture <strong>and</strong> provided computational support <strong>and</strong> technical advice. X.A., L.F. <strong>and</strong>M.J.K. constructed cDNA libraries. J.L.R. provided reagents <strong>and</strong> helped edit <strong>the</strong>manuscript. E.S.L. designed research direction <strong>and</strong> wrote <strong>the</strong> paper. A.R. providedcDNA sequencing guidance, conceived <strong>the</strong> project, designed research direction <strong>and</strong>wrote <strong>the</strong> paper.COMPETING FINANCIAL INTERESTSThe authors declare no competing financial interests.Published online at http://www.nature.com/naturebiotechnology/.Reprints <strong>and</strong> permissions information is available online at http://npg.nature.com/reprints<strong>and</strong>permissions/.1. Carninci, P. et al. The transcriptional l<strong>and</strong>scape of <strong>the</strong> mammalian genome. Science309, 1559–1563 (2005).2. Kapranov, P. et al. RNA maps reveal <strong>new</strong> RNA classes <strong>and</strong> a possible function forpervasive transcription. Science 316, 1484–1488 (2007).3. Bertone, P. et al. Global identification of human transcribed sequences with genometiling arrays. Science 306, 2242–2246 (2004).4. Guttman, M. et al. Chromatin signature reveals over a thous<strong>and</strong> highly conservedlarge non-coding RNAs in mammals. Nature 458, 223–227 (2009).5. Khalil, A.M. et al. Many human large intergenic noncoding RNAs associate withchromatin-modifying complexes <strong>and</strong> affect gene expression. Proc. Natl. Acad. Sci.USA 106, 11667–11672 (2009).6. Cloonan, N. et al. Stem cell transcriptome profiling via massive-scale mRNAsequencing. Nat. Methods 5, 613–619 (2008).7. Wang, E.T. et al. Alternative isoform regulation in human tissue transcriptomes.Nature 456, 470–476 (2008).8. Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L. & Wold, B. Mapping <strong>and</strong> quantifyingmammalian transcriptomes by RNA-Seq. Nat. Methods 5, 621–628 (2008).9. Yassour, M. et al. Ab initio construction of a eukaryotic transcriptome by massivelyparallel mRNA sequencing. Proc. Natl. Acad. Sci. USA 106, 3264–3269 (2009).10. Pan, Q., Shai, O., Lee, L.J., Frey, B.J. & Blencowe, B.J. Deep surveying of alternativesplicing complexity in <strong>the</strong> human transcriptome by high-throughput sequencing.Nat. Genet. 40, 1413–1415 (2008).11. Maher, C.A. et al. Transcriptome sequencing to detect gene fusions in cancer. Nature458, 97–101 (2009).12. Birol, I. et al. De novo transcriptome assembly with ABySS. Bioinformatics 25,2872–2877 (2009).13. Trapnell, C., Pachter, L. & Salzberg, S.L. TopHat: discovering splice junctions withRNA-Seq. Bioinformatics 25, 1105–1111 (2009).14. Denoeud, F. et al. Annotating genomes with massive-scale RNA sequencing. GenomeBiol. 9, R175 (2008).15. Pruitt, K.D., Tatusova, T. & Maglott, D.R. NCBI reference sequences (RefSeq): acurated non-redundant sequence database of genomes, transcripts <strong>and</strong> proteins.Nucleic Acids Res. 35, D61–D65 (2007).16. Mikkelsen, T.S. et al. Genome-wide maps of chromatin state in pluripotent <strong>and</strong>lineage-committed cells. Nature 448, 553–560 (2007).17. Lin, M.F., Deoras, A.N., Rasmussen, M.D. & Kellis, M. Performance <strong>and</strong> scalabilityof discriminative metrics for comparative gene identification in 12 Drosophilagenomes. PLOS Comput. Biol. 4, e1000067 (2008).18. Lin, M.F. et al. Revisiting <strong>the</strong> protein-coding gene catalog of Drosophila melanogasterusing 12 fly genomes. Genome Res. 17, 1823–1836 (2007).19. Garber, M. et al. Identifying novel constrained elements by exploiting biasedsubstitution patterns. Bioinformatics 25, i54–i62 (2009).20. Brown, C.J. et al. A gene from <strong>the</strong> region of <strong>the</strong> human X inactivation centre isexpressed exclusively from <strong>the</strong> inactive X chromosome. Nature 349, 38–44 (1991).21. Rinn, J.L. et al. Functional demarcation of active <strong>and</strong> silent chromatin domains inhuman HOX loci by noncoding RNAs. Cell 129, 1311–1323 (2007).22. Willingham, A.T. et al. A strategy for probing <strong>the</strong> function of noncoding RNAs findsa repressor of NFAT. Science 309, 1570–1573 (2005).23. Zhao, J., Sun, B.K., Erwin, J.A., Song, J.J. & Lee, J.T. Polycomb proteins targeted bya short repeat RNA to <strong>the</strong> mouse X chromosome. Science 322, 750–756 (2008).24. Katayama, S. et al. Antisense transcription in <strong>the</strong> mammalian transcriptome.Science 309, 1564–1566 (2005).25. Wu, J. Q. et al. Dynamic transcriptomes during neural differentiation of humanembryonic stem cells revealed by short, long, <strong>and</strong> paired-end sequencing. Proc. Natl.Acad. Sci. USA 107, 5254–5259 (2010).26. Ramsköld, D., Wang, E.T., Burge, C.B. & S<strong>and</strong>berg, R. An abundance of ubiquitouslyexpressed genes revealed by tissue transcriptome sequence data. PLOS Comput.Biol. 5, e1000598 (2009).510 VOLUME 28 NUMBER 5 MAY 2010 nature biotechnology


© 2010 Nature America, Inc. All rights reserved.ONLINE METHODSCell culture. Mouse ESCs (V6.5) were cultured with irradiated mouseembryonic fibroblasts (GlobalStem; GSC-6002C) on 0.2% gelatin-coatedplates in a culture medium consisting of Knockout DMEM (Invitrogen;10829018) containing 10% FBS (GlobalStem; GSM-6002), 1% penicillinstreptomycin(Invitrogen 15140-163), 1% non-essential amino acids(Invitrogen 11140-076), 1% l-glutamine, 4 µl β-mercaptoethanol <strong>and</strong> 0.01%leukemia inhibitory factor (LIF; Millipore; ESG1106). ESCs were passagedonce on gelatin without mouse embryonic fibroblasts before RNA extraction.V6.5 ESCs were differentiated into NPCs through embryoid body formationfor 4 d <strong>and</strong> selection in ITSFn medium 27 for 5–7 <strong>and</strong> maintained infibroblast growth factor-2 (FGF-2) <strong>and</strong> epidermal growth factor-2 (EGF-2)(R&D Systems) as described 27 . The cells uniformly express Nestin <strong>and</strong> Sox2<strong>and</strong> can differentiate into neurons, astrocytes <strong>and</strong> oligodendrocytes. Mouselung fibroblasts (ATCC) were grown in DMEM with 10% FBS <strong>and</strong> penicillin/streptomycin at 37 °C, 5% CO 2 .RNA extraction <strong>and</strong> library preparation. RNA was extracted using <strong>the</strong>protocol outlined in <strong>the</strong> RNeasy kit (Qiagen). Extracts were treated withDNase (Ambion 2238). Polyadenylated RNAs were selected using Ambion’sMicroPoly(A)Purist kit (AM1919M) <strong>and</strong> RNA integrity confirmed usingBioanalyzer (Agilent). We used a cDNA preparation procedure that combinesa r<strong>and</strong>om priming step with a shearing step 8,9,28 <strong>and</strong> results in fragments of~700 bp in size. We previously found 9,28 that this protocol provides relativelyuniform coverage of <strong>the</strong> whole transcript, thus assisting in ab initio reconstruction.Specifically, a ‘regular’ RNA sequencing library (non–str<strong>and</strong> specific) wascreated as previously described 28 , with <strong>the</strong> following modifications. Poly(A) +RNA (250 ng) was fragmented by heating at 98 °C for 33 min in 0.2 mM sodiumcitrate, pH 6.4 (Ambion). Fragmented RNA was mixed with 3 µg r<strong>and</strong>om hexamers(Invitrogen), incubated at 70 °C for 10 min, <strong>and</strong> placed on ice brieflybefore starting cDNA syn<strong>the</strong>sis. First-str<strong>and</strong> cDNA syn<strong>the</strong>sis was performedusing Superscript III (Invitrogen) for 1 h at 55 °C, <strong>and</strong> second-str<strong>and</strong> usingE. coli DNA polymerase <strong>and</strong> E. coli DNA ligase at 16 °C for 2 h. cDNA was elutedusing <strong>the</strong> Qiagen MiniElute kit with 30 µl of <strong>the</strong> manufacturer’s EB buffer.DNA ends were repaired using dNTPs <strong>and</strong> T4 polymerase (NEB), followedby purification using <strong>the</strong> MiniElute kit. Adenine was added to <strong>the</strong> 3′ end of<strong>the</strong> DNA fragments using dATP <strong>and</strong> Klenow exonuclease (NEB; M0212S) toallow adaptor ligation, <strong>and</strong> fragments were purified using MiniElute. Adaptorswere ligated <strong>and</strong> incubated for 15 min at room temperature (25 °C). Phenol/chloroform/isoamyl alcohol (Invitrogen 15593-031) extraction followed toremove <strong>the</strong> DNA ligase. The pellet was <strong>the</strong>n resuspended in 10 µl EB buffer.The sample was run on a 3% agarose gel (Nusieve 3:1 agarose, Lonza) <strong>and</strong> a160–380 base pair fragment was cut out <strong>and</strong> extracted. PCR was performedwith Phusion High-Fidelity DNA Polymerase with <strong>the</strong> manufacturer’s GCbuffer (New Engl<strong>and</strong> Biolabs) <strong>and</strong> 2 M betaine (Sigma). PCR conditions were30 s at 98 °C; 16 cycles of 10 s at 98 °C, 30 s at 65 °C, 30 s at 72 °C; 5 min at72 °C; forever at 4 °C. Products were run on a polyacrylamide gel for 60 minat 120 V. The PCR products were cleaned up with Agencourt AMPure XPmagnetic beads (A63880) to completely remove primers <strong>and</strong> product wassubmitted for Illumina sequencing.The str<strong>and</strong>-specific library was created from 100 ng of poly(A) + RNAusing <strong>the</strong> previously published RNA ligation method 29 with modificationsfrom <strong>the</strong> manufacturer (Illumina; data not shown). The insert size was110 to 170 bp.RNA-Seq library sequencing. All libraries were sequenced using <strong>the</strong> IlluminaGenome Analyzer (GAII). We sequenced three lanes for ESC, correspondingto 152 million reads; two lanes for MLF, corresponding to 161 million reads;<strong>and</strong> two lanes for NPC, corresponding to 180 million reads.Alignments of reads to <strong>the</strong> genome. All reads were aligned to <strong>the</strong> mousereference genome (NCBI 37, MM9) using <strong>the</strong> TopHat aligner 13 . Briefly,TopHat uses a two-step mapping process, first using Bowtie 30 to align allreads that map directly to <strong>the</strong> genome (with no gaps), <strong>and</strong> <strong>the</strong>n mapping allreads that were not aligned in <strong>the</strong> first step using gapped alignment. TopHatuses canonical <strong>and</strong> noncanonical splice sites to determine possible locationsfor gaps in <strong>the</strong> alignment.Generation of connectivity graph. Given a set of reads aligned to <strong>the</strong> genome,we first identified all spliced reads as those whose alignment to <strong>the</strong> referencegenome contained a gap. These reads <strong>and</strong> <strong>the</strong> reference genome were usedto construct connectivity graphs. Each connectivity graph contains all basesfrom a single chromosome. The nodes in <strong>the</strong> graph are bases <strong>and</strong> <strong>the</strong> edgesconnect each base to <strong>the</strong> next base in <strong>the</strong> genome as well as to all bases to whichit is connected through a spliced read (Fig. 1). In <strong>the</strong> analysis presented, weidentified as an edge any two bases in <strong>the</strong> chromosome that were connectedby two or more spliced reads. The connectivity graph thus represents <strong>the</strong>contiguity that exists in <strong>the</strong> RNA but that is interrupted by intron sequencesin <strong>the</strong> reference genome.Identification of splice site motifs <strong>and</strong> directionality. We restricted ouranalysis to spliced reads that mapped connecting donor/acceptor splice sites,ei<strong>the</strong>r canonical (GT/AG) or noncanonical (GC/AG <strong>and</strong> AT/AC). We orientedeach mapped spliced read using <strong>the</strong> orientation of <strong>the</strong> donor/acceptorsites it connected.Construction of transcript graphs. The spliced edges in <strong>the</strong> connectivitygraph reflect bases that were connected in <strong>the</strong> original RNA but are not contiguousin <strong>the</strong> genome. To construct a transcript graph, we use a statisticalsegmentation strategy to traverse <strong>the</strong> graph topology directly <strong>and</strong> determine‘paths’ through <strong>the</strong> connectivity graph that represent a contiguous path ofsignificant enrichment over <strong>the</strong> background distribution (see below). In thissegmentation process, we scan variably sized windows across <strong>the</strong> graph <strong>and</strong>assign significance to each window. We <strong>the</strong>n merge significant paths into a‘transcript graph’. Specifically, for a window of fixed size, we slide <strong>the</strong> windowacross each base in <strong>the</strong> connectivity graph (after augmenting it with<strong>the</strong> unspliced reads). If a window contains only contiguous unspliced reads,<strong>the</strong>n it represents an unspliced part of <strong>the</strong> transcript. However, if <strong>the</strong> windowhits an edge in <strong>the</strong> connectivity graph connecting two separate parts of <strong>the</strong>genome (based on two or more spliced reads), <strong>the</strong>n <strong>the</strong> path follows this edgeto a noncontiguous part of <strong>the</strong> genome, denoting a splicing event. Similarly,when alternative splice isoforms are present, if a base connects to multiplepossible places, <strong>the</strong>n we compute all windows across <strong>the</strong>se alternative paths.Using a simple recursive procedure, we can compute all paths of a fixed sizeacross <strong>the</strong> graph.Identification of significant segments. To assess <strong>the</strong> significance of eachpath, we first define a background distribution. We estimate a genomicdefined background distribution by permuting <strong>the</strong> read alignments in <strong>the</strong>genome <strong>and</strong> counting <strong>the</strong> number of reads that overlap each region <strong>and</strong> <strong>the</strong>frequency by which <strong>the</strong>y each occur. Specifically, if we are interested in computing<strong>the</strong> probability of observing alignment a (of length r) at position i(out of a total genome size of L) we can permute <strong>the</strong> alignments <strong>and</strong> ask howoften read a overlaps position i. Under this uniform permutation model,<strong>the</strong> probability that read a overlaps position i is simply r/L. Extending thisreasoning, we can compute <strong>the</strong> probability of observing k reads (of averagelength r) at position i as <strong>the</strong> binomial probability. Given <strong>the</strong> many reads<strong>and</strong> <strong>the</strong> large genome size, <strong>the</strong> binomial formula can be well approximatedby a Poisson distribution where λ = np (that is, <strong>the</strong> number of reads times<strong>the</strong> number of possible positions).Given a distribution for <strong>the</strong> real number of counts over each position,we scan <strong>the</strong> genome for regions that deviate from <strong>the</strong> expected backgrounddistribution. First, consider a fixed window size w. We slide this windowacross each position (allowing for overlapping windows), <strong>and</strong> compute<strong>the</strong> probability of each observed window based on a Poisson distributionwith λ = wnp. Since we are sliding this window across a genome of size L,we correct our nominal significance for multiple testing by computing<strong>the</strong> maximum value observed for a window size (w) across a number ofpermutations of <strong>the</strong> data. This distribution controls <strong>the</strong> family wise errorrate, defined as <strong>the</strong> probability of observing at least one such value in <strong>the</strong>null distribution 31 . Notably, we can estimate this maximum permutationdistribution well by a distribution known as <strong>the</strong> scan statistic distribution 32 ,which depends on <strong>the</strong> size of <strong>the</strong> genome that we scan, <strong>the</strong> window size used<strong>and</strong> our estimate of <strong>the</strong> Poisson λ parameter. This method provides us witha general strategy to determine a multiple testing–corrected P-value for adoi:10.1038/nbt.1633nature biotechnology


© 2010 Nature America, Inc. All rights reserved.specified region of <strong>the</strong> genome in any given sample. We use this method tocompute a corrected significance cutoff for any given region.Finally, to identify significant intervals, we scan <strong>the</strong> genome using variablysized windows, computing significance values for each <strong>and</strong> filtering by a 0.05significance threshold. For each window size, we merge <strong>the</strong> significant regionsthat pass this cutoff into consecutive intervals. We trim <strong>the</strong> ends of <strong>the</strong> intervalsas needed, because we are computing significant windows (ra<strong>the</strong>r thanregions) <strong>and</strong> it is possible that an interval need not be fully contained withina significant region. Trimming is performed by computing a normalized readcount for each base in <strong>the</strong> interval compared to <strong>the</strong> average number of readsin <strong>the</strong> genome. We <strong>the</strong>n trim <strong>the</strong> interval to <strong>the</strong> maximum contiguous subsequenceof this value. We test this trimmed interval using <strong>the</strong> scan procedure<strong>and</strong> retain it only if it passes our defined significance level.We work with a range of different window sizes in order to detect paths(intervals) with variable support. Small windows have <strong>the</strong> power to identifyshort regions of strong enrichment (for example, a short exon that ishighly expressed), whereas long windows capture long contiguous regionswith often lower <strong>and</strong> more ‘diffuse’ enrichment (for example, a longer,lower-expression transcript, whose ‘moderate evidence’ aggregates alongits entire length).Estimation of library insert size. We estimated <strong>the</strong> insert size distribution bytaking all reconstructed transcripts for which we only reconstructed a singleisoform <strong>and</strong> computing <strong>the</strong> distribution of distances between <strong>the</strong> paired-endreads that aligned to <strong>the</strong>m.Weighting of isoforms using paired end edges. Using <strong>the</strong> size constraintsimposed by <strong>the</strong> length of <strong>the</strong> paired ends, we assigned weights to each pathin <strong>the</strong> transcript graph. We classified all paired ends overlapping a given path<strong>and</strong> assigned <strong>the</strong>m to all possible paths that <strong>the</strong>y overlapped. We <strong>the</strong>n assigneda probability to each paired end of <strong>the</strong> likelihood that it was observed fromthis transcript given <strong>the</strong> inferred insert size for <strong>the</strong> pair in that path. We usedan empirically determined distribution of insert sizes, estimated from singleisoform graphs. We <strong>the</strong>n scaled each value by <strong>the</strong> average insert size. We referto this scaled value as our insert distribution. For each paired end in a path,we computed I, <strong>the</strong> inferred insert size (<strong>the</strong> distance between nodes followingalong <strong>the</strong> full path) minus <strong>the</strong> average insert size. We <strong>the</strong>n determined <strong>the</strong>probability of I as <strong>the</strong> area in our insert distribution between –I <strong>and</strong> I. Thisvalue is <strong>the</strong> probability of obtaining <strong>the</strong> observed paired-end insert distancegiven this distribution of paired-end reads. We use this probability to computea weighted score for each path by summing all paired ends that fall within<strong>the</strong> path weighted by <strong>the</strong> probability of observing insert size <strong>the</strong>y span in <strong>the</strong>path. Paired ends that support multiple isoforms equally will count equallyfor all, but paired ends with biases toward some isoforms <strong>and</strong> against o<strong>the</strong>rswill provide weighted evidence for each isoform. We assign this weight to eachisoform path. This score is normalized by <strong>the</strong> number of paired ends overlapping<strong>the</strong> path. We filter out paths with little support (normalized score


3% agarose gel, <strong>and</strong> all b<strong>and</strong>s were cut out <strong>and</strong> gel extracted using <strong>the</strong> QIAquickGel Extraction kit (Qiagen 28706). DNA (30 ng) was mixed with 3.2 pmol M13forward or M13 reverse primer <strong>and</strong> sequenced in both directions.Data availability. The sequencing data from this study are available at <strong>the</strong>NCBI Gene Expression Omnibus (GEO) under accession code GSE20851 <strong>and</strong>as Supplementary Data. The Scripture method is implemented as a st<strong>and</strong>aloneJava application <strong>and</strong> is available as Supplementary Software <strong>and</strong> at http://www.broadinstitute.org/software/Scripture/, along with all assembled transcripts inboth GFF <strong>and</strong> BED file formats. All transcript graphs are also available in <strong>the</strong> dotgraph language.27. Conti, L. et al. Niche-independent symmetrical self-re<strong>new</strong>al of a mammalian tissuestem cell. PLoS Biol. 3, e283 (2005).28. Berger, M. F. et al. Integrative analysis of <strong>the</strong> melanoma transcriptome. GenomeRes. 20, 413–427 (2010).29. Lister, R. et al. Highly integrated single-base resolution maps of <strong>the</strong> epigenome inArabidopsis. Cell 133, 523–536 (2008).30. Langmead, B., Trapnell, C., Pop, M. & Salzberg, S.L. Ultrafast <strong>and</strong> memory-efficientalignment of short DNA sequences to <strong>the</strong> human genome. Genome Biol. 10, R25(2009).31. Ewens, W.J. & Grant, G.R. Statistical Methods in Bioinformatics: An Introduction2nd edn. (Springer, 2005).32. Glaz, J., Naus, J.I. & Wallenstein, S. Scan Statistics (Springer, 2001).© 2010 Nature America, Inc. All rights reserved.doi:10.1038/nbt.1633nature biotechnology


l e t t e r sSingle base–resolution methylome of <strong>the</strong> silkwormreveals a sparse epigenomic map© 2010 Nature America, Inc. All rights reserved.Hui Xiang 1,2,10 , Jingde Zhu 3,4,10 , Quan Chen 2,10 , Fangyin Dai 5,10 , Xin Li 1,10 , Muwang Li 6 , Hongyu Zhang 3 ,Guojie Zhang 2 , Dong Li 5 , Yang Dong 1 , Li Zhao 1 , Ying Lin 5 , Daojun Cheng 5 , Jian Yu 3 , Jinfeng Sun 3 , Xiaoyu Zhou 3 ,Kelong Ma 3 , Yinghua He 3 , Yangxing Zhao 3 , Shicheng Guo 3 , Mingzhi Ye 2 , Guangwu Guo 2 , Yingrui Li 2 ,Ruiqiang Li 2 , Xiuqing Zhang 2 , Lijia Ma 2 , Karsten Kristiansen 7 , Qiuhong Guo 8 , Jianhao Jiang 8 , Stephan Beck 9 ,Qingyou Xia 5 , Wen Wang 1 & Jun Wang 2,7Epigenetic regulation in insects may have effects on diversebiological processes. Here we survey <strong>the</strong> methylome of amodel insect, <strong>the</strong> silkworm Bombyx mori, at single-baseresolution using Illumina high-throughput bisulfite sequencing(MethylC-Seq). We conservatively estimate that 0.11% ofgenomic cytosines are methylcytosines, all of which probablyoccur in CG dinucleotides. CG methylation is substantiallyenriched in gene bodies <strong>and</strong> is positively correlated with geneexpression levels, suggesting it has a positive role in genetranscription. We find that transposable elements, promoters<strong>and</strong> ribosomal DNAs are hypomethylated, but in contrast,genomic loci matching small RNAs in gene bodies are denselymethylated. This work contributes to our underst<strong>and</strong>ing ofepigenetics in insects, <strong>and</strong> in contrast to previous studies of<strong>the</strong> highly methylated genomes of Arabidopsis 1 <strong>and</strong> human 2 ,demonstrates a strategy for sequencing <strong>the</strong> epigenomes oforganisms such as insects that have low levels of methylation.The recently developed MethylC-Seq 1 technology couples bisulfitebaseddetection of methylated cytosines to high-throughput wholegenomesequencing. Application of this technology to Arabidopsis 1<strong>and</strong> humans 2 has revealed that <strong>the</strong>se species are highly methylated(about 5% genomic cytosines), <strong>and</strong> <strong>the</strong> high resolution of <strong>the</strong>sestudies identified <strong>new</strong> elaborate patterns <strong>and</strong> functional effects ofDNA methylation.Insects, however, seem to have lower levels of methylation 3 , with~0.15–0.19% of DNA being methylated in <strong>the</strong> silk gl<strong>and</strong> of <strong>the</strong>silkworm (Bombyx mori) 4 , as assayed by high-performance liquidchromatography, <strong>and</strong> even lower levels observed in flies, mosquitoes<strong>and</strong> honeybees 3,4 . The feasibility of performing MethylC-Seq on organismswith such low methylation levels has not yet been evaluated. Recentinterest in DNA methylation in insects has been sparked by evidence for<strong>the</strong> existence both of active methyltransferase enzymes, which attachmethyl groups to DNA, <strong>and</strong> of methylated genes in Drosophila, <strong>the</strong>aphid Myzus persicae <strong>and</strong> particularly <strong>the</strong> honeybee Apis mellifica 5–7 .The absence of comprehensive genome-wide profiling <strong>and</strong> functionalanalysis of DNA methylation in insects, however, has hindered ourunderst<strong>and</strong>ing of epigenetic regulation in <strong>the</strong>se organisms.The silkworm, which has been subjected to domestication for 5,000years 8 , is an economically important model insect of Lepidoptera, anorder that includes many crop pests, such as <strong>the</strong> cotton bollworm. Asan alternative mechanism to mutations in germline DNA, epigeneticchanges via DNA methylation called epimutations have been reportedto influence ecologically favorable traits, <strong>and</strong> thus species evolution,in both plants <strong>and</strong> mammals 9,10 . Therefore, <strong>the</strong> silkworm could bea valuable model, not only for studying functional effects of DNAmethylation in insects but also for exploring <strong>the</strong> effects of epigeneticsduring domestication.The number of DNA methyltransferase enzymes encoded in <strong>the</strong>genomes of different insect species varies greatly 11 . In <strong>the</strong> silkworm(B. mori), previous studies 11,12 identified two DNA methyltransferasegenes (dnmt1 <strong>and</strong> dnmt2) <strong>and</strong> experimentally characterized <strong>the</strong> methylDNA–binding protein MBD2/3, providing intriguing evidence for <strong>the</strong>presence of DNA methylation in this insect species. We conductedextensive searches in <strong>the</strong> silkworm genome <strong>and</strong> confirmed that <strong>the</strong>reare only dnmt1 <strong>and</strong> dnmt2 DNA methyltransferase genes (Bmdnmt1<strong>and</strong> Bmdnmt2). Our PCR experiments with reverse transcription(RT-PCR) show that <strong>the</strong> two silkworm methyltransferase genesare expressed in a development- <strong>and</strong> tissue-regulated pattern(Supplementary Fig. 1a). Nuclear protein extracts from early embryos(8-h eggs) <strong>and</strong> silk gl<strong>and</strong>s fur<strong>the</strong>r demonstrate <strong>the</strong> presence of catalyticactivity of DNA methylation in silkworms (Supplementary Fig. 1b).1 CAS-Max Planck Junior Research Group, State Key Laboratory of Genetic Resources <strong>and</strong> Evolution, Kunming Institute of Zoology, The Chinese Academy of Sciences,Kunming, China. 2 BGI-Shenzhen, Shenzhen, China. 3 Cancer Epigenetics <strong>and</strong> Gene Therapy Program, The State-key Laboratory for Oncogenes <strong>and</strong> Related Genes,Shanghai Cancer Institute, Shanghai Jiaotong University, Shanghai, China. 4 Cancer Epigenetics Laboratory, Obstetrics <strong>and</strong> Gynecology Hospital, Fudan University,Shanghai, China. 5 The Key Sericultural Laboratory of Agricultural Ministry, College of Biotechnology, Institute of Sericulture <strong>and</strong> Systems Biology, SouthwestUniversity, Chongqing, China. 6 Sericultural Research Institute, Chinese Academy of Agricultural Sciences, Zhenjiang, China. 7 Department of Biology, University ofCopenhagen, Denmark. 8 Shanghai Institute of Plant Physiology <strong>and</strong> Ecology, Shanghai Institutes for Biological Sciences, The Chinese Academy of Sciences, Shanghai,China. 9 UCL Cancer Institute, University College London, London, UK. 10 These authors contributed equally to this work. Correspondence should be addressed toJ.W. (wangj@genomics.org.cn) or W.W. (wwang@mail.kiz.ac.cn) or Q.Y.X. (xiaqy@swu.edu.cn).Received 8 February; accepted 23 March; published online 2 May 2010; doi:10.1038/nbt.1626516 VOLUME 28 NUMBER 5 MAY 2010 nature biotechnology


l e t t e r s© 2010 Nature America, Inc. All rights reserved.acDensity of mCG0.0040.0020.0000.0020.004CGCHGmCG 99.2%WatsonCrickCHHbFraction of totalmethylcytosineThe above results led us to apply MethylC-Seq to reveal <strong>the</strong>genome-wide DNA methylation pattern in <strong>the</strong> silkworm. First, wesequenced bisulfite-treated total DNA, extracted from <strong>the</strong> silk gl<strong>and</strong>of an individual of <strong>the</strong> Dazao strain, whose genome has alreadybeen sequenced 13 . In total, 272,312,422 raw reads were produced(Supplementary Table 1). After removing low-quality <strong>and</strong> clonalreads, we obtained 133,765,113 effective reads, <strong>and</strong> <strong>the</strong> sequenceyield for final analysis was 5.9 gigabase pairs (Gb), covering 92% ofall cytosines in <strong>the</strong> genome with an average depth 7.4 × per str<strong>and</strong>(Supplementary Table 1). Initially we observed overall genome-widemethylation levels of 0.67% at CG, 0.21% at CHG <strong>and</strong> 0.24% atCHH sites (H = A, C or T), indicating higher CG methylation thannon-CG methylation.Because non-CG methylation is reported to be ei<strong>the</strong>r very rare ornonexistent in honeybees 6 , we selected a series of genomic regionsto validate our initial results. Based on <strong>the</strong> MethylC-Seq results, wepicked five genomic regions that contain 26 mCGs, as well as threeregions that contain 98 clustered mCHHs <strong>and</strong> one mCHG. In <strong>the</strong>seregions, we performed traditional bisulfite-PCR <strong>and</strong> sequencingvalidation (BS-PCR). Notably, although 92.3% of <strong>the</strong> methylatedcytosines (mCs) at CG sites were validated, none of <strong>the</strong> non-CGmCs were validated by <strong>the</strong> BS-PCR (Supplementary Table 2).To fur<strong>the</strong>r confirm this result, we validated a larger batch of regionswith methylation sites (692 CGs, 29 CHGs <strong>and</strong> 63 CHHs, respectively)using BS-PCR followed by 454 sequencing (454 Life Sciences).Similarly, a high percentage of CG methylations were validated(82.9%) but none of <strong>the</strong> non-CG mCs (Supplementary Table 2).These results suggest that non-CG mCs are ei<strong>the</strong>r nonexistent or veryrare in <strong>the</strong> silkworm, as was found in <strong>the</strong> honeybee 6 . To account forthis fact, we used <strong>the</strong> non-CG mC rate as <strong>the</strong> background control 2to calculate <strong>the</strong> false-positive rate (non-conversion <strong>and</strong> thymidineto-cytosinesequencing errors), <strong>the</strong> value of which is estimated to be0.23%. After corrections based on this value, we identified 600,422mCs, accounting for 0.40% of all genomic cytosines. Unfortunately,0.250.150.05010 20 30 40 50 60 70 80 90 100Methylation level (%)Figure 1 DNA methylation patterns <strong>and</strong> chromosomal distribution inBombyx mori. (a) Fraction of mCs identified in each sequence context for<strong>the</strong> strain Dazao, indicating ra<strong>the</strong>r low <strong>and</strong> non-CG methylation, whichare likely to be false positives. (b) Distribution of mCs (y axis) acrossmethylation levels (x axis). Methylation level was determined by dividing<strong>the</strong> number of reads covering each mC by <strong>the</strong> total reads covering thatcytosine. (c) Density of mCs identified on <strong>the</strong> two DNA str<strong>and</strong>s (Watson<strong>and</strong> Crick) throughout chromosome 1 (out of 28). Density was calculatedin 25-kb bins. The value refers to <strong>the</strong> number of mCs per base pair, asshown on <strong>the</strong> y axis.0.20.1about 45% of <strong>the</strong>se mCs were at non-CG sites, indicating that falsemCs were still prevalent even after this correction.To remove as many of <strong>the</strong> remaining false positives as possible, wedecided to adopt a biological replicate strategy <strong>and</strong> thus conductedMethylC-Seq on silk gl<strong>and</strong> DNA from a second individual from <strong>the</strong>same Dazao strain. The sequence yield for final analysis is 9.9 Gb,covering 92% of all cytosines in <strong>the</strong> genome with an average depthof 9.0 × per str<strong>and</strong> (Supplementary Table 1). After <strong>the</strong> same processof mC identification as used for <strong>the</strong> first individual, we observed983,395 mCs, 58.3% Cs of which were at non-CG sites. Comparisonof <strong>the</strong> mCs identified independently in <strong>the</strong> two individuals revealeda high concordance for mCG sites but overall discordance for mCs atnon-CG sites (Supplementary Fig. 2). This again indicates that in <strong>the</strong>silkworm, non-CG mCs are all, or nearly all, false positives, whereasmCGs are frequently genuine.Although different individuals probably have variable levels ofmethylation owing to subtle physiological differences, overlap of mCsin two individuals gives a very conservative estimation of real mCsin <strong>the</strong> Dazao silkworm genome. More specifically, 11.3% (65 of 574)of <strong>the</strong> real mCGs validated by BS-PCR for <strong>the</strong> first individual wereexcluded in <strong>the</strong> final mC map, whereas 99.5% (190 of 191) of falsepositivenon-CG mCs were excluded (Supplementary Table 2).By combining <strong>the</strong>se two individuals’ mCs data, we were able toobtain a high-quality, high-resolution silkworm methylome, with anaverage read depth of 15 × per str<strong>and</strong>. In this final DNA methylationmap, <strong>the</strong>re are 173,505 mCs, 99.2% of which are at CG sites (Fig. 1a),<strong>and</strong> non-CG mCs, which are still likely to be false positives as suggestedby BS-PCR validation (Supplementary Table 2), occupy only0.8% (Fig. 1a). BS-PCR validation results indicated that 85.2% (489of 574) of <strong>the</strong> total real mCGs in <strong>the</strong> tested regions were detected by<strong>the</strong> final map. The conservatively retained mCGs account for 0.11%of all genomic cytosines, which is consistent with previous highperformanceliquid chromatography results 7 .We define <strong>the</strong> methylation level of a specific cytosine as <strong>the</strong>proportion of reads covering each mC to <strong>the</strong> total reads covering<strong>the</strong> site. The majority of mCs have moderate levels of methylation(Fig. 1b). CG methylation levels fluctuate drastically across <strong>the</strong>genome (Fig. 1c <strong>and</strong> Supplementary Fig. 3), indicating a mosaicmethylation pattern 14 , where relatively dense methylated domainsare interspersed with regions that are not methylated. This patternis most frequent in invertebrate animals. Detailed informationon str<strong>and</strong>-specific identification of mCs throughout <strong>the</strong> wholegenome is available at our ftp site (ftp://ftp.genomics.org.cn/silkworm_methylation).To underst<strong>and</strong> <strong>the</strong> functional significance of this ra<strong>the</strong>r lowlevel of DNA methylation in silkworms, we analyzed <strong>the</strong> methylationprofiles of genes (coding sequences + introns), genomic lociof small RNAs, transposable elements (TEs) <strong>and</strong> ribosomal DNAs(rDNAs). Both absolute methylation levels (total methylation levelof mCs divided by sequence length) <strong>and</strong> relative methylation levels(total methylation level of mCs divided by total number of CGsites) were used as predictor variables. Notably, methylation withingenes, especially coding sequences, is higher than <strong>the</strong> genome average(Fig. 2a). We fur<strong>the</strong>r calculated methylation levels in <strong>the</strong> contextof gene regions <strong>and</strong> <strong>the</strong>ir 2-kilobase (kb) upstream <strong>and</strong> downstreamregions (Fig. 2b). Consistently, both absolute <strong>and</strong> relative methylationlevels are obviously higher within genes. Boundaries betweengene bodies <strong>and</strong> flanking DNA show a sharp drop in methylation(Fig. 2b), with 3′ downstream regions showing a little more methylationthan 5′ upstream regions. We excluded <strong>the</strong> contribution of TEsto <strong>the</strong> enrichment of gene body methylation, as <strong>the</strong>re is a similarnature biotechnology VOLUME 28 NUMBER 5 MAY 2010 517


l e t t e r s© 2010 Nature America, Inc. All rights reserved.a Absolute methylation level b Absolute methylation level cAbsolute methylation level(mCG/length)0.00160.00120.0008Relative methylation level0.0200.0150.010Relative methylation level(mCG/CG)Absolute methylation level(mCG/length)0.00080.00060.00040.0004 0.0050.00020 00Genome Gene cds Intron smRNA TE rRNAFigure 2 Methylation of different functional regions ofBombyx mori (Dazao). Absolute methylation level wascalculated as total methylation level of mCs divided bylength of <strong>the</strong> corresponding region. Relative methylationlevel was calculated as total methylation level of mCsdivided by total number of CGs in <strong>the</strong> correspondingregions. (a) Methylation level at different functional regions.(b,c) Analysis of coding genes. Two-kilobase regionsupstream <strong>and</strong> downstream of each gene were divided intoabundance of TEs within <strong>and</strong> outside gene regions, <strong>and</strong> methylationis more prominent in coding sequences than it is in introns (Fig. 2a,c).In o<strong>the</strong>r insects, such as <strong>the</strong> aphid <strong>and</strong> <strong>the</strong> honeybee, body methylationhas also been observed in some genes 5,6,15 , <strong>and</strong> <strong>the</strong>refore thispattern may be a common feature in insects.Methylation at genomic sequences that are complementary tosmall RNAs (smRNAs) is also higher than <strong>the</strong> genome averagefor <strong>the</strong> silkworm (Fig. 2a). Notably, <strong>the</strong>se genomic loci matchingsmRNAs tend to be found in gene bodies but not in TEs (Fig. 2c,d).Our analysis showed a significant excess of methylated genomicloci matching smRNA within genes (86.9% of all methylatedsmRNAs within genes) compared with <strong>the</strong> genomic background(57.3% of all CG-containing smRNAs within genes) (P < 0.001, χ 2 test).In contrast, methylated genomic loci matching smRNA were significantlydepleted within TEs (0.5% of all methylated smRNAs in TEsversus 2.9% of all CG-containing smRNAs in TEs, P < 0.01, χ 2 test).This pattern contrasts with observations in plants, where highlymethylated genomic loci matching smRNAs were barely found ingene bodies but are prevalent in TEs <strong>and</strong> o<strong>the</strong>r repeats (ref. 16 <strong>and</strong>our unpublished data on rice). In plants, smRNA-directed methylationthat targets homologous DNA plays an important role in TEsilencing 17 , which explains why smRNAs in TEs are highly methylated.smRNAs were also observed to target methylated genes inArabidopsis 18 , although this was relatively rare. In silkworms, <strong>the</strong>prevalence of genomic loci of smRNA in gene bodies <strong>and</strong> <strong>the</strong>ir denseCG methylation imply that smRNAs could be involved in gene bodyCG methylation.Methylation in TEs seems to be low compared with <strong>the</strong> genomeaverage (Fig. 2a). Only about 1.2% (5521 of 431,743) of TEs haveat least one mCGs, <strong>and</strong> of <strong>the</strong>se, <strong>the</strong> majority have low levels ofmethylation (Supplementary Fig. 4), indicating that TEs are usuallyunmethylated in <strong>the</strong> Bombyx silk gl<strong>and</strong>. In contrast to a recentstudy 4 on Drosophila early embryos that suggested methylationplays a role in transposon silencing, our genome-wide pattern ofTE methylation in <strong>the</strong> silkworm silk gl<strong>and</strong> does not support a generaldsmRNA densityRelative methylation levelGene (%) DownstreamUpstream (kb)(kb)2 1 0 50 100 1 20.0250.0200.0150.010smRNA0.050.040.030.020.010.00Relative methylation level(mCG/CG)0.005TE (%)Upstream (kb)Downstream (kb)0.0000.5 0.25 0 50 100 0.25 0.5eTE densityGC content or CpG(O/E)0.200.150.10smRNATE0.05Gene (%)Upstream (kb)0.00Downstream(kb)2 1 0 50 100 1 20.070.060.050.040.030.020.010.001.2GC contentCpG(O/E)CpG densityGC contentCpG(O/E)CpG density 0.0610.050.80.040.60.030.40.020.20.01000 1 2 3 4 5 6 7 8 9 10Rank of methylation level100–base pair (bp) intervals. Each gene was dividedinto 20 intervals (5% per interval). Plots show methylation level (b) or percentage of TEs <strong>and</strong> smRNAs (c) in each interval. A schematic representationof a gene is shown as a thick horizontal bar (scaled to 10 kb, <strong>the</strong> average length of Bombyx mori gene coding regions). (d) Abundance of genomic lociof smRNAs was plotted for TEs, as in c, except that <strong>the</strong> upstream <strong>and</strong> downstream lengths are 0.5 kb, considering that <strong>the</strong> average length of TEs is200 bp. (e) Relationship between methylation level <strong>and</strong> GC content, CpG dinucleotide density <strong>and</strong> CpG 22 (O/E) . Genes were ranked based on absolutemethylation level ( , , ) or relative methylation level ( , , ). 0, unmethylated genes. 1–10, <strong>the</strong> lowest to <strong>the</strong>highest methylated genes. Genes were divided into deciles based on methylation levels, from <strong>the</strong> bottom 10% to <strong>the</strong> top 10%. 0, unmethylated genes.1–10, <strong>the</strong> lowest to <strong>the</strong> highest deciles of methylated genes.role for methylation on TEs. We did not observe any mCs in rDNAs,which have been proposed to act as a switch controlling ribosomalgene transcription in plants <strong>and</strong> mammals 19 , implying that, ininsects, as suggested by o<strong>the</strong>r case studies 20 , <strong>the</strong> regulation of rDNAtranscription via methylation has probably not developed.We found that CG methylation level is not correlated with GC contentbut with CpG dinucleotide density <strong>and</strong> CpG observed/expected(O/E) ratio (Fig. 2e). CpG O/E ratio is a widely used parameter topredict DNA methylation level based on C→T transition mechanismsresulting from deamination of mCs over <strong>the</strong> course of evolution 21,22 .Consistent with previous predictions <strong>and</strong> observations 22,23 , geneswith higher methylation level tend to have lower CpG dinucleotide<strong>and</strong> density CpG O/E ratios (Fig. 2e).To reveal <strong>the</strong> functional consequences of gene body methylation,we generated expression profiles for <strong>the</strong> two individuals’ silk gl<strong>and</strong>susing digital gene expression (DGE) tag profiling technology, whichuses Illumina high-throughput sequencing as a readout for a classicalSAGE (Serial Analysis of Gene Expression) assay. For <strong>the</strong> two biologicalreplicates, 7,991,117 <strong>and</strong> 4,620,989 raw reads were generated,<strong>and</strong> 4,811,597 (60.2%) <strong>and</strong> 2,435,608 (52.7%), respectively, uniquelymapped to annotated genes. We detected 7,445 <strong>and</strong> 6,780 annotatedgenes by at least one unique read (Supplementary Table 3). We werealso able to detect expression of Bmdnmt1 <strong>and</strong> Bmdnmt2 genes in<strong>the</strong> DGE data, which is consistent with <strong>the</strong> results shown by RT-PCR(Supplementary Fig. 1a).We divided genes into five groups based on expression levels, from<strong>the</strong> bottom 20% to <strong>the</strong> top 20%. Notably, we observed that methylationlevel is positively correlated with expression level in both individuals(Fig. 3a <strong>and</strong> Supplementary Fig. 5a). A similar pattern wasobserved when grouping genes by <strong>the</strong>ir methylation levels (Fig. 3b <strong>and</strong>Supplementary Fig. 5b). The observed correlations are supported bySpearman correlation analyses (Fig. 3c <strong>and</strong> Supplementary Fig. 5c).This result suggests that gene body methylation may be an ancientsystem, because <strong>the</strong> same pattern has been reported in plants <strong>and</strong>chordates 18,23 . However, no correlation between expression level <strong>and</strong>smRNA density CpG density518 VOLUME 28 NUMBER 5 MAY 2010 nature biotechnology


l e t t e r s© 2010 Nature America, Inc. All rights reserved.aAbsolute methylation level(mCG/length)cSpearman rank correlation0.0040.0030.0020.00100.30.20.10–0.1–0.2–0.3Absolute methylation level2Upstream(kb)1 0 50 100 1 2Gene (%) Downstream(kb)0 1 0 50 100 1 2Upstream Gene (%) Downstream(kb)(kb)methylation level in <strong>the</strong> promoter regions was detected (Fig. 3d <strong>and</strong>Supplementary Fig. 5d), which suggests that <strong>the</strong> well-known generegulatory function of promoter methylation in plants <strong>and</strong> mammals17,24,25 may not operate in insects.We fur<strong>the</strong>r used <strong>the</strong> BGI WEGO (Web Gene Ontology AnnotationPlotting) 26 to functionally categorize <strong>the</strong> methylated <strong>and</strong> unmethylatedRelative methylation level (mCG/CG)0.150.100.050.30.20.10–0.1–0.2–0.30Relative methylation level1st quintile2nd quintile3rd quintile4th quintile5th quintile2Upstream(kb)1 0 50 100 1 2Gene (%) Downstream(kb)2 1 0 50 100 1 2Upstream Gene (%) Downstream(kb)(kb)bFrequencydFrequency0.40.30.20.100.40.30.20.1000Absolute methylation level2 4 6 8 10 12Expression (LgTag_number)2 4 6 8 10 12Expression (LgTag_number)All methylated genesGene without methylation1st quintile2nd quintile3rd quintile4th quintile5th quintilegenes <strong>and</strong> observed significant differences (Fig. 4a). Methylated genestend to be enriched in binding activities, including translation regulators.As for biological processes, <strong>the</strong>y are enriched in functionsassociated with cellular metabolic <strong>and</strong> biosyn<strong>the</strong>tic processes as wellas cellular response to stimulus. In contrast, unmethylated genes areenriched in transcription regulators, such as transcription factors, <strong>and</strong>0.40.30.20.100.40.30.20.1000Relative methylation level2 4 6 8 10 12Expression (LgTag_number)All methylated genesGene without methylation1st quintile2nd quintile3rd quintile4th quintile5th quintile2 4 6 8 10 12Expression (LgTag_number)Figure 3 Relationship between DNA methylation <strong>and</strong> expression levels of genes in Bombyx mori (Dazao). (a) Methylation level within gene bodiesdivided by expression level. Genes were classified into quintiles based on expression: 1st quintile is lowest <strong>and</strong> 5th is highest. Two-kilobase regionsupstream <strong>and</strong> downstream of each gene were divided into 100-bp intervals. Each gene was divided into 20 intervals (5% per interval). Plots show <strong>the</strong>methylation level of each interval. (b) Expression of methylated compared with unmethylated genes. Genes were rank-ordered based on gene bodymethylation level <strong>and</strong> divided into quintiles. For <strong>the</strong> methylated genes, 1st quintile is <strong>the</strong> lowest <strong>and</strong> 5th is <strong>the</strong> highest. (c) Spearman correlation indexbetween methylation level <strong>and</strong> gene expression level. Two-kilobase regions upstream <strong>and</strong> downstream of each gene were divided into 100-bp intervals.Each gene was divided into 20 intervals (5% each interval). Plots show <strong>the</strong> Spearman correlation index of each interval. (d) The same as b, except forpromoter methylation. Absolute <strong>and</strong> relative methylation levels were calculated as described for Figure 2.Molecular functionaPercent of genes (Lg)1001010.10.01Methylated genesBiological processWEGO outputAll genesAntioxidantBindingMolecular transducerTranscription regulatorTranslation regulatorTransporterAlcohol metabolic processBiological adhesionBiological regulationBiosyn<strong>the</strong>tic processCatabolic processCell adhesionCell communicationCellular component assemblyCellular component biogenesisCellular component organizationCellular localizationCellular macromolecular complex subunit organizationCellular metabolic processCellular processCellular response to stimulusDevelopmental processMacromolecular complex subunit organizationMacromolecule metabolic processFigure 4 Annotation <strong>and</strong> microarray analysis of methylated <strong>and</strong>unmethylated genes. (a,b) Annotation of methylated (a) <strong>and</strong>unmethylated genes (b) with WEGO 26 . Of <strong>the</strong> 5,971 genes thatPrimary metabolic processMetabolic processMulticellular organismal processOrganelle organizationcFraction0.300.250.200.150.100.0502333597123359723592500Number of genesbPercent of genes (Lg)1001010.10.01Molecular function1 2 3 4 5 6 7 8 9 10 >10Expression (Ln intensity)dFractionUnmethylated genesWEGO outputAll genesBindingMolecular transducerSignal transducerTranscription factorTranscription regulatorTranslation regulatorTransporterAnatomical structure formationBiological adhesionBiological regulationCell adhesionCellular component biogenesisBiological process0.300.400.250.350.300.200.250.15 0.200.10 0.150.100.050.05001 2 3 4 5 6 7 8 9 10 >100.1Expression (Ln intensity)FractionRegulation of cellular processCellular component organizationCellular processMetabolic processPigmentationRegulation of biological processLow0.35TissueSpecificity (T)3314597133159733593500Methylated genesUnmethylated geneshave GO annotations, 2,333 methylated <strong>and</strong> 3,314 unmethylatedgenes showed significant enrichment difference (P < 0.05, χ 2 test) compared with total analyzed genes. Annotations are grouped by molecularfunction or biological process based on <strong>the</strong> silkworm Bombyx mori GO annotation information (ftp://silkdb.org/pub/current/o<strong>the</strong>rdata/Gene_ontology/silkworm_glean_gene.go). Gene numbers <strong>and</strong> percentages (on log scale) are listed for each category. (c,d) Expression in <strong>the</strong> anterior-mid silk gl<strong>and</strong>(c) <strong>and</strong> posterior silk gl<strong>and</strong> (d) of methylated <strong>and</strong> unmethylated genes examined by microarray analysis. (e) Tissue expression specificity of methylated<strong>and</strong> unmethylated genes measured by τ value 27 .e0.6HighNumber of genesnature biotechnology VOLUME 28 NUMBER 5 MAY 2010 519


l e t t e r s© 2010 Nature America, Inc. All rights reserved.transducers <strong>and</strong> transporters. Unmethylated genes are also enrichedin functions associated with regulation <strong>and</strong> adhesion processes. Weconfirmed that methylated genes tend to be more highly expressedthan unmethylated genes in <strong>the</strong> silk gl<strong>and</strong> (Fig. 4b,c) by analyzing <strong>the</strong>relationship between gene body methylation <strong>and</strong> tissue expressionspecificity using <strong>the</strong> available microarray data from B. mori tissueson day three of <strong>the</strong> fifth-instar larvae (BmMDB: http://silkworm.swu.edu.cn/microarray/). We suspect that methylation may contribute tomaintaining <strong>the</strong> relatively high expression of genes that are essentialfor biosyn<strong>the</strong>tic processes in <strong>the</strong> silk gl<strong>and</strong>. Fur<strong>the</strong>rmore, methylatedgenes showed lower tissue specificity (Fig. 4d), which was alsoobserved in Arabidopsis 24 .In conclusion, we have generated <strong>the</strong> first, to our knowledge, singlebase–resolution methylome for an insect species. We found thatMethylC-Seq has a considerable false-positive rate in detecting mCsin species with low methylation level. Thus, effective removal of <strong>the</strong>sefalse positives is very important before any functional analysis. In thisstudy, we used non-CG mCs as <strong>the</strong> background control in conjunctionwith a biological replicate strategy. Toge<strong>the</strong>r, <strong>the</strong>se controls identifiedmethylated CG sites that could be validated by low-throughputassays. This high-quality single-base DNA methylome map supports<strong>the</strong> functional significance of <strong>the</strong> ra<strong>the</strong>r low methylation in <strong>the</strong> silkworm<strong>and</strong> indicates that <strong>the</strong> well-established functions of methylationon TEs, rDNAs <strong>and</strong> promoters in plants <strong>and</strong> mammals may not bewell developed in insects. This DNA methylome map will be usefulfor fur<strong>the</strong>r studies on epigenetic gene regulation in silkworm <strong>and</strong>o<strong>the</strong>r insects. Moreover, <strong>the</strong> active epigenetic system existing in <strong>the</strong>silkworm lays a foundation for exploring <strong>the</strong> contributions of epigeneticsto silkworm domestication.MethodsMethods <strong>and</strong> any associated references are available in <strong>the</strong> online versionof <strong>the</strong> paper at http://www.nature.com/naturebiotechnology/.Accession codes. Sequence data is available under <strong>the</strong> GEO accessionGSE18315 <strong>and</strong> <strong>the</strong> SRA accession SRP001159.Note: Supplementary information is available on <strong>the</strong> Nature Biotechnology website.AcknowledgmentsWe thank J. Ridley for English editing on <strong>the</strong> manuscript. This work was supportedby a 973 Program grant (no. 2007CB815700), a key project of <strong>the</strong> National NaturalScience Foundation of China (no. 90919056), <strong>the</strong> 100 Talents Program of ChineseAcademy of Sciences, two Provincial Key Grants of <strong>the</strong> Department of Sciences<strong>and</strong> Technology of Yunnan Province (no. 2008CC017 <strong>and</strong> no. 2008GA002)<strong>and</strong> a Chinese Academy of Sciences–Max Planck Society Fellowship to W.W.;a National Natural Science Foundation of China grant (no. 30870296) <strong>and</strong>a China Postdoctoral Science Foundation grant to H.X.; <strong>the</strong> National NaturalScience Foundation of China (no. 30725008), a Chinese 863 Program grant(no. 2006AA10A121), <strong>the</strong> Danish Platform for Integrative Biology, <strong>the</strong> Ole Rømergrant from <strong>the</strong> Danish Natural Science Research Council, <strong>and</strong> a Solexa Projectgrant (no. 272-07-0196) to J.W.; a 973 Program grant (no. 2005CB121000) toQ.X.; a Shanghai Science Foundation grant (no. 07DJ14074), two National ScienceFoundation grants (no. 90919024 <strong>and</strong> no. 30872963), two 973 Program grants(no. 2009CB825606 <strong>and</strong> no. 2009CB825607) <strong>and</strong> a European 6th program grant(no. LSHB-CT-2005-019067) to J.Z.AUTHOR CONTRIBUTIONSJ.W., W.W., J.Z. <strong>and</strong> Q.X. designed <strong>the</strong> study. H.X., W.W. <strong>and</strong> X.L. wrote <strong>the</strong>manuscript. X.L., G.Z., Q.C., Y.L. <strong>and</strong> R.L. developed <strong>the</strong> method for mapping <strong>and</strong>processing BS reads. D.L. <strong>and</strong> D.C., performed microarray analysis. F.D. <strong>and</strong> M.L.provided <strong>the</strong> domestic silkworm samples <strong>and</strong> detailed background information onsilkworm domestication <strong>and</strong> breeding. H.X. <strong>and</strong> X.L. analyzed <strong>the</strong> 454 data. H.X.did RT-PCR. Y.D. performed <strong>the</strong> methyltransferase assay. H.X., Y.L., Q.G. <strong>and</strong>J.J. extracted DNAs <strong>and</strong> RNAs. J.Z., H.Z., J.Y., J.S., X.Z., K.M., L.Z., Y.H., S.G.<strong>and</strong> Y.Z. constructed <strong>the</strong> BS-seq libraries <strong>and</strong> conducted <strong>the</strong> BS validation.G.G., X.Z., L.M., M.Y. <strong>and</strong> K.K. performed <strong>the</strong> Solexa sequencing. S.B. contributedto <strong>the</strong> interpretation of <strong>the</strong> results. All authors have read <strong>and</strong> contributed to<strong>the</strong> manuscript.COMPETING FINANCIAL INTERESTSThe authors declare no competing financial interests.Published online at http://www.nature.com/naturebiotechnology/.Reprints <strong>and</strong> permissions information is available online at http://npg.nature.com/reprints<strong>and</strong>permissions/.1. Lister, R. et al. Highly integrated single-base resolution maps of <strong>the</strong> epigenome inArabidopsis. Cell 133, 523–536 (2008).2. Lister, R. et al. Human DNA methylomes at base resolution show widespreadepigenomic differences. Nature 462, 315–322 (2009).3. Regev, A., Lamb, J.M. & Jablonka, E. The role of DNA methylation in invertebrates:developmental regulation or genome defense? Mol. Biol. Evol. 15, 880–891(1998).4. Phalke, S. et al. Retrotransposon silencing <strong>and</strong> telomere integrity in somatic cellsof Drosophila depends on <strong>the</strong> cytosine-5 methyltransferase DNMT2. Nat. Genet.41, 696–702 (2009).5. Field, L.M. Methylation <strong>and</strong> expression of amplified esterase genes in <strong>the</strong> aphidMyzus persicae (Sulzer). Biochem. J. 349, 863–868 (2000).6. Wang, Y. et al. Functional CpG methylation system in a social insect. Science 314,645–647 (2006).7. Patel, C.V. & Gopinathan, K.P. Determination of trace amounts of 5-methylcytosinein DNA by reverse-phase high-performance liquid chromatography. Anal. Biochem.164, 164–169 (1987).8. Xiang, Z. Genetics <strong>and</strong> Breeding of <strong>the</strong> Silkworm (Chinese Agriculture Press, Beijing,P.R. China, 1995).9. Kalisz, S. & Purugganan, M.D. Epialleles via DNA methylation: consequences forplant evolution. Trends Ecol. Evol. 19, 309–314 (2004).10. Farcas, R. et al. Differences in DNA methylation patterns <strong>and</strong> expression of <strong>the</strong>CCRK gene in human <strong>and</strong> nonhuman primate cortices. Mol. Biol. Evol. 26,1379–1389 (2009).11. Schaefer, M. & Lyko, F. DNA methylation with a sting: an active DNA methylationsystem in <strong>the</strong> honeybee. Bioessays 29, 208–211 (2007).12. Uno, T. et al. Expression, purification <strong>and</strong> characterization of methyl DNA bindingprotein from Bombyx mori. J. Insect Sci. 5, 8 (2005).13. Xia, Q. et al. A draft sequence for <strong>the</strong> genome of <strong>the</strong> domesticated silkworm (Bombyxmori). Science 306, 1937–1940 (2004).14. Suzuki, M.M. & Bird, A. DNA methylation l<strong>and</strong>scapes: provocative insights fromepigenomics. Nat. Rev. Genet. 9, 465–476 (2008).15. M<strong>and</strong>rioli, M. & Borsatti, F. DNA methylation of fly genes <strong>and</strong> transposons. Cell.Mol. Life Sci. 63, 1933–1936 (2006).16. Cokus, S.J. et al. Shotgun bisulphite sequencing of <strong>the</strong> Arabidopsis genome revealsDNA methylation patterning. Nature 452, 215–219 (2008).17. Zhang, X. The epigenetic l<strong>and</strong>scape of plants. Science 320, 489–492 (2008).18. Zilberman, D., Gehring, M., Tran, R.K., Ballinger, T. & Henikoff, S. Genome-wideanalysis of Arabidopsis thaliana DNA methylation uncovers an interdependencebetween methylation <strong>and</strong> transcription. Nat. Genet. 39, 61–69 (2007).19. Lawrence, R.J. & Pikaard, C.S. Chromatin turn ons <strong>and</strong> turn offs of ribosomal RNAgenes. Cell Cycle 3, 880–883 (2004).20. M<strong>and</strong>rioli, M. & Borsatti, F. Analysis of heterochromatic epigenetic markers in <strong>the</strong>holocentric chromosomes of <strong>the</strong> aphid Acyrthosiphon pisum. Chromosome Res. 15,1015–1022 (2007).21. Elango, N., Kim, S.H., Vigoda, E. & Yi, S.V. Mutations of different molecular originsexhibit contrasting patterns of regional substitution rate variation. PLOS Comput.Biol. 4, e1000015 (2008).22. Elango, N., Hunt, B.G., Goodisman, M.A. & Yi, S.V. DNA methylation is widespread<strong>and</strong> associated with differential gene expression in castes of <strong>the</strong> honeybee, Apismellifera. Proc. Natl. Acad. Sci. USA 106, 11206–11211 (2009).23. Suzuki, M.M., Kerr, A.R., De Sousa, D. & Bird, A. CpG methylation is targeted totranscription units in an invertebrate genome. Genome Res. 17, 625–631(2007).24. Zhang, X. et al. Genome-wide high-resolution mapping <strong>and</strong> functional analysis ofDNA methylation in Arabidopsis. Cell 126, 1189–1201 (2006).25. Weber, M. et al. Distribution, silencing potential <strong>and</strong> evolutionary impact of promoterDNA methylation in <strong>the</strong> human genome. Nat. Genet. 39, 457–466 (2007).26. Ye, J. et al. WEGO: a web tool for plotting GO annotations. Nucleic Acids Res. 34,W293–297 (2006).27. Liao, B.Y. & Zhang, J. Low rates of expression profile divergence in highly expressedgenes <strong>and</strong> tissue-specific genes during mammalian evolution. Mol. Biol. Evol. 23,1119–1128 (2006).520 VOLUME 28 NUMBER 5 MAY 2010 nature biotechnology


© 2010 Nature America, Inc. All rights reserved.ONLINE METHODSExpression of Dnmt1 <strong>and</strong> Dnmt2 genes evaluated by RT-PCR. Total RNAswere extracted from different developmental stages (8-h-old, 3-day-old, 7-dayold<strong>and</strong> 10-day-old eggs; 1st- to 4th-instar larvae; young <strong>and</strong> old pupae; adultsof <strong>the</strong> silkworms), as well as from different tissues including heads, cuticle,silk gl<strong>and</strong>s, guts, ovaries, <strong>and</strong> testis from <strong>the</strong> 5th-instar larvae of silkworms,using Trizol (Invitrogen). Total RNA was digested with DNase I (Takara) toremove remaining DNA. Complementary DNA (cDNA) was syn<strong>the</strong>sized using<strong>the</strong> RevertAid First Str<strong>and</strong> cDNA Syn<strong>the</strong>sis Kits (Fermentas). Expression ofDnmt1 <strong>and</strong> Dnmt2 genes was evaluated by RT-PCR using primers listed inSupplementary Table 4 with 30 cycles (30 min at 94 °C, 30 min at 54 °C <strong>and</strong>30 min at 72 °C) for cDNA templates derived from materials of different developmentalstages, <strong>and</strong> 34 cycles (30 min at 94 °C, 30 min at 54 °C <strong>and</strong> 30 min at72 °C) for cDNA templates derived from different tissues, respectively.Nuclear protein extraction <strong>and</strong> assay of DNA methyltransferase activity.About 150 mg of silkworm eggs or one silk gl<strong>and</strong> from one silkworm individualwere ground into powder in liquid nitrogen <strong>and</strong> homogenized in150 µl tissue homogenization buffer (10 mmol HEPES-KOH (pH 7.6),25 mmol KCl, 0.15 mmol spermine, 0.5 mmol spermidine, 2 mol sucrose, 10%(v/v) glycerol,1 mmol EDTA). Homogenate was held on ice for 30 min <strong>and</strong><strong>the</strong>n centrifuged at 3000g for 15 min at 4 °C to obtain <strong>the</strong> protein precipitate.The protein precipitate was resuspended in 650 µl resuspension buffer(5 mmol HEPES-KOH (pH 7.9), 0.5 mmol phenylmethylsulfonyl fluoride, 26%(v/v) glycerol, 0.5 mmol dithiothreitol, 1.5 mmol MgCl 2 ) <strong>and</strong> <strong>the</strong>n centrifugedat 14,000g for 45 min at 4 °C to obtain soluble proteins. Protein concentrationwas determined by <strong>the</strong> Bio-Rad Protein Assay kit (Bio-Rad). Three independentreplicate protein samples were prepared for each material.About 15 µg nuclear protein extracts from ei<strong>the</strong>r eggs or silk gl<strong>and</strong> <strong>and</strong>equal amount of <strong>the</strong> negative control (bovine serum albumin) were respectivelyanalyzed for DNA methyltransferase activity using <strong>the</strong> EpiQuik DNAMethyltransferase Activity/Inhibition Assay Kit (Epigentek) following <strong>the</strong>manufacturer’s instructions. Pure mouse DNMT1 in <strong>the</strong> kit was used as <strong>the</strong>positive control. Methyltransferases activity is indicated by <strong>the</strong> average absorbanceat 450 nm (OD 450 ).Sample preparation for MethylC-Seq <strong>and</strong> digital gene expression analyses.Each silk gl<strong>and</strong> of 5th-instar larvae of two individuals (called biologicalreplicate 1 <strong>and</strong> 2, respectively) of <strong>the</strong> silkworm (B. mori) strain Dazao wasground into powder in liquid nitrogen. Half of <strong>the</strong> powder from each silkgl<strong>and</strong> was used to extract total DNAs using DNeasy Blood <strong>and</strong> Tissue Kit(Qiagen), <strong>and</strong> <strong>the</strong> o<strong>the</strong>r half was used to extract total RNAs using RNeasyMini Kit (Qiagen).MethylC-Seq library construction <strong>and</strong> sequencing. DNA was fragmented bysonication with a Sonicator (Sonics & Materials) to a mean size of approximately250 bp, followed by blunt ending, 3′-end addition of dA, <strong>and</strong> adapterligation, in which Illumina methylated adapters were used according to <strong>the</strong>manufacturer’s instructions (Illumina). The bisulfite conversion of silkwormDNA was carried out using a modified NH 4 SO 4 -based protocol 28 <strong>and</strong> amplifiedby 12 cycles of PCR. Ultra-high-throughput pair-end sequencing was carriedout using <strong>the</strong> Illumina Genetic Analyzer (GA2) according to manufacturerinstructions. Raw GA sequencing data were processed by Illumina base-callingpipeline (SolexaPipeline-1.0).Digital gene expression (DGE) tag libraries <strong>and</strong> sequencing. DGE tag librarieswere constructed using <strong>the</strong> silk gl<strong>and</strong> RNAs <strong>and</strong> <strong>the</strong> DGE-Tag ProfilingNlaIII Sample Prep Kit (Illumina). Libraries were sequenced using <strong>the</strong>Illumina Genetic Analyzer (GA2) according to <strong>the</strong> manufacturer’s instruction.Raw GA sequencing data were processed by Illumina base-calling pipeline(SolexaPipeline-1.0).Mapping <strong>and</strong> initial processing of MethylC-Seq reads. Short readswith 44-nucleotide (nt) reads or 75-nt reads from each end generated byIllumina sequencing were aligned to <strong>the</strong> Dazao reference genome. B. mori(Dazao) reference genome sequences were downloaded from <strong>the</strong> SilkDB(ftp://silkdb.org/pub/current/Genome/silkworm_genome_v2.0.fa.tar.gz).Because DNA methylation has str<strong>and</strong> specificity, <strong>the</strong> plus str<strong>and</strong> <strong>and</strong> <strong>the</strong>minus str<strong>and</strong> of <strong>the</strong> Dazao genome should be separated <strong>and</strong> formed alignmenttarget sequences. That is, each cytosine in genome sequences wasconverted to thymine, termed T-genome, which represents <strong>the</strong> plus str<strong>and</strong>.Meanwhile, each guanine in genome sequences was converted to adenosine,termed A-genome, which represents <strong>the</strong> minus str<strong>and</strong>. In addition,<strong>the</strong> original reads were also computationally transformed to <strong>the</strong> alignmentforms with <strong>the</strong> following steps: (i) observed cytosines on <strong>the</strong> forwardread of each read pair were in silico replaced by thymines; (ii) observedguanines on <strong>the</strong> reverse read of each read pair were in silico replacedby adenosines.We used <strong>the</strong> software named SOAPaligner 29 , allowing up to two mismatchesfor mapping both 44-nt pair-end reads (for <strong>the</strong> biological replicate 1) <strong>and</strong> upto four mismatches for 75-nt pair-end reads (for <strong>the</strong> biological replicate 2)to map <strong>the</strong> computationally transformed reads to <strong>the</strong> alignment targetsequences. Multiple reads mapped to <strong>the</strong> same start position were regardedas clonal duplication, which might be generated during PCR process, <strong>and</strong>only one of <strong>the</strong>m was kept. For mC detection, we transformed each alignedread <strong>and</strong> <strong>the</strong> two str<strong>and</strong>s of <strong>the</strong> Dazao genome back to <strong>the</strong>ir original formsto build an alignment between <strong>the</strong> original forms. Cytosines in <strong>the</strong> MethylCseqreads that are also matched to <strong>the</strong> corresponding cytosines in <strong>the</strong> plus(Watson) str<strong>and</strong>, or o<strong>the</strong>rwise guanines in <strong>the</strong> MethylC-seq reads that arealso matched to <strong>the</strong> corresponding guanines in <strong>the</strong> minus (Crick) str<strong>and</strong> willbe regarded as potential mCs. Q score, which is used in base-calling pipeline(SolexaPipeline-1.0) (Illumina) to detect sequences from <strong>the</strong> raw fluorescentimages, is calculated as:Q = 10 log 10 [ p (X) / (1- p (X)]where p(X) is <strong>the</strong> probability that a read is correctly called. We <strong>the</strong>n carriedout a filtering process to filter out all potential mCs with Q scores smaller than20, guaranteeing that a base is correctly called at more than 99% probability,highly conservative for calling reliable bases.Bisulfite-PCR validation for target regions using ei<strong>the</strong>r Sanger sequencingor 454 sequencing. One microgram of genomic DNA from <strong>the</strong> silk gl<strong>and</strong>of biological replicate 1 was bisulfite-converted following <strong>the</strong> same protocolfor constructing <strong>the</strong> MethylC-Seq library. Primers were designed to amplifya batch of target regions of <strong>the</strong> bisulfite-converted DNA for validation of<strong>the</strong> MethylC-Seq results. Initially, we validated five target regions containing26 mCGs detected by MethylC-Seq <strong>and</strong> three target regions containingone mCHG <strong>and</strong> 98 clustered mCHHs detected by MethylC-Seq by Sangersequencing multiple independent TA clones for each PCR product. Then wefur<strong>the</strong>r used 454 sequencing technique (454 Life Sciences) to confirm 107 PCRproducts in total (692 mCGs, 29 mCHGs <strong>and</strong> 63 mCHHs). We pooled PCRproducts of <strong>the</strong>se fragments, <strong>and</strong> <strong>the</strong> 454 sequencing library was constructedaccording to <strong>the</strong> manufacturer’s instruction (454 Life Sciences). Eventually weobtained sequencing data on 6,698,205 bp. BLAST searches (e-value


© 2010 Nature America, Inc. All rights reserved.Where n mCHG <strong>and</strong> n mCHH refer to <strong>the</strong> total number of sequenced Cs in <strong>the</strong>CHG <strong>and</strong> CHH contexts in <strong>the</strong> reference genome, respectively. n depth refers to<strong>the</strong> total sequenced depth at cytosine positions in CHG <strong>and</strong> CHH contexts in<strong>the</strong> reference genome. Using this value as a measure of <strong>the</strong> false mC discoveryrate, following <strong>the</strong> correction algorithm of Lister et al. 2 , we set a significancethreshold (99% confidence) to identify <strong>the</strong> presence of an mC determinedat each base position based on <strong>the</strong> binomial probability distribution, readdepth <strong>and</strong> <strong>the</strong> calculated false-positive rate. mCs that are below <strong>the</strong> minimumthreshold at a site were rejected.Despite <strong>the</strong>se filtrations, <strong>the</strong> non-CG methylation noises still occupied aconsiderable proportion, because a proportion of non-CG mCs appearedin high methylation levels in <strong>the</strong> original MethylC-Seq data. Because ourbisulfite-PCR validation showed that even high-methylation-level mCs arefalse positive, to effectively remove <strong>the</strong>se noises we decided to use a strategyof biological replicates <strong>and</strong> thus compared <strong>the</strong> mCs independently identifiedin both replicates <strong>and</strong> found that a large proportion of <strong>the</strong> mCGs are consistentlydetected in both replicates, whereas mCs in non-CG contexts are nearlyreplicate-specific (Supplementary Fig. 1), fur<strong>the</strong>r confirming that non-CGmCs are ei<strong>the</strong>r all false positive or very rare, whereas mCGs are largely realin <strong>the</strong> silkworm. In this way we effectively removed background noises <strong>and</strong>finally generated a methylome map with high reliability <strong>and</strong> high resolution(on average each cytosine in <strong>the</strong> genome is covered by 15 reads).Mapping <strong>and</strong> processing DGE tags. Sequence information of <strong>the</strong> Bombyxmori genes was downloaded from <strong>the</strong> SilkDB (ftp://silkdb.org/pub/current/Gene/Glean_genes/silkworm_glean_cds.fa.tar.gz). Gene annotation informationwas downloaded from <strong>the</strong> SilkDB (ftp://silkdb.org/pub/current/Gff/silkworm_glean.gff.tar.gz). Because annotated genes were mainly predictedusing prediction software, only open reading frame positions were available.We created putative full-length cDNA sequences for each gene by adding 1-kbdownstream sequences of open reading frame to coding sequences. Then allpossible CATG + 17 nt tag sequences were created from putative full-lengthcDNAs <strong>and</strong> used as a reference tag database. Unique tag sequences <strong>and</strong> <strong>the</strong>irnumbers were extracted from our raw DGE tags, <strong>and</strong> <strong>the</strong>se tags were alignedagainst <strong>the</strong> reference tag database using SOAP 30 . Only perfect matches werekept for fur<strong>the</strong>r analysis, <strong>and</strong> no mismatches were allowed. Expression levelof one gene was represented by <strong>the</strong> total number of tags that uniquely alignedto this gene.Analyses on abundance of TEs <strong>and</strong> genomic loci of smRNAs. Annotation ofknown TEs was downloaded from <strong>the</strong> SilkDB (ftp://silkdb.org/pub/ current/Gff/Public_ReAS_TEs/silkworm_Publicknow_TE.gff.tar.gz). The smRNAsequences were downloaded from <strong>the</strong> GenBank (http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE17965). Sequences of smRNAs were mapped to<strong>the</strong> reference genome using SOAP 30 without mismatch, <strong>and</strong> uniquely mappedsmRNAs were used for fur<strong>the</strong>r analysis. TE <strong>and</strong> smRNA densities were definedas <strong>the</strong> ratios of number of bases that belong to TEs or smRNAs divided by <strong>the</strong>total length of <strong>the</strong> calculated regions.Gene ontology (GO) annotation. GO annotations of silkworm genes weredownloaded from <strong>the</strong> SilkDB (ftp://silkdb.org/pub/current/o<strong>the</strong>rdata/Gene_ontology/silkworm_glean_gene.go). GO comparative analyses between interestedgene groups were performed using BGI WEGO (http://wego.genomics.org.cn/cgi-bin/wego/index.pl) 26 .Microarray analysis. The microarray data of <strong>the</strong> analyzed genes were obtainedfrom <strong>the</strong> B. mori microarray database (BmMDB: http://silkworm.swu.edu.cn/microarray/). Tissue specificity index τ 27 is used to measure <strong>the</strong> tissuespecificity of a silkworm gene, which is defined as:n HSHi j− ⎡j ⎣ ⎢⎤∑ ( 1 log 2 ( , )log SH( i,max) ⎥ )= 1 2 ⎦τ H =nH−1where n H is <strong>the</strong> number of female silkworm tissues examined <strong>and</strong> S H (i, max)is <strong>the</strong> highest expression signal of gene i across <strong>the</strong> n H tissues. To minimize <strong>the</strong>influence of noise from low intensity, we arbitrarily let S H (i, j) be 100 if it islower than 100. The τ value ranges from 0 to 1, with higher values indicatinghigher tissue specificity. Genes with <strong>the</strong> highest expression signal of a certaintissue were considered as expressionally upregulated in this tissue.28. Hayatsu, H., Tsuji, K. & Negishi, K. Does urea promote <strong>the</strong> bisulfite-<strong>media</strong>teddeamination of cytosine in DNA? Investigation aiming at speeding-up <strong>the</strong> procedurefor DNA methylation analysis. Nucleic Acids Symp. Ser. 50, 69–70 (2006).29. Li, R. et al. SOAP2: an improved ultrafast tool for short read alignment.Bioinformatics 25, 1966–1967 (2009).30. Li, R., Li, Y., Kristiansen, K. & Wang, J. SOAP: short oligonucleotide alignmentprogram. Bioinformatics 24, 713–714 (2008).nature biotechnologydoi:10.1038/nbt.1626


l e t t e r sTranscript assembly <strong>and</strong> quantification by RNA-Seqreveals unannotated transcripts <strong>and</strong> isoform switchingduring cell differentiationCole Trapnell 1–3 , Brian A Williams 4 , Geo Pertea 2 , Ali Mortazavi 4 , Gordon Kwan 4 , Marijke J van Baren 5 ,Steven L Salzberg 1,2 , Barbara J Wold 4 & Lior Pachter 3,6,7© 2010 Nature America, Inc. All rights reserved.High-throughput mRNA sequencing (RNA-Seq) promisessimultaneous transcript discovery <strong>and</strong> abundance estimation 1–3 .However, this would require algorithms that are not restrictedby prior gene annotations <strong>and</strong> that account for alternativetranscription <strong>and</strong> splicing. Here we introduce such algorithmsin an open-source software program called Cufflinks. To testCufflinks, we sequenced <strong>and</strong> analyzed >430 million paired75-bp RNA-Seq reads from a mouse myoblast cell line overa differentiation time series. We detected 13,692 knowntranscripts <strong>and</strong> 3,724 previously unannotated ones, 62% ofwhich are supported by independent expression data or byhomologous genes in o<strong>the</strong>r species. Over <strong>the</strong> time series, 330genes showed complete switches in <strong>the</strong> dominant transcriptionstart site (TSS) or splice isoform, <strong>and</strong> we observed moresubtle shifts in 1,304 o<strong>the</strong>r genes. These results suggest thatCufflinks can illuminate <strong>the</strong> substantial regulatory flexibility<strong>and</strong> complexity in even this well-studied model of muscledevelopment <strong>and</strong> that it can improve transcriptome-basedgenome annotation.Recently, RNA-Seq has revealed tissue-specific alternative splicing 4 ,novel genes <strong>and</strong> transcripts 5 <strong>and</strong> genomic structural variations 6 .Deeply sampled RNA-Seq permits measurement of differential geneexpression with greater sensitivity than expression 7 <strong>and</strong> tiling 8 microarrays.However, <strong>the</strong> analysis of RNA-Seq data presents major challengesin transcript assembly <strong>and</strong> abundance estimation, arising from<strong>the</strong> ambiguous assignment of reads to isoforms 8–10 .In earlier RNA-Seq experiments conducted by some of us, we estimated<strong>the</strong> relative expression for each gene as <strong>the</strong> fraction of readsmapping to its exons after normalizing for gene length 11 . We did notattempt to allocate reads to specific alternate isoforms, although wefound ample evidence that multiple splice <strong>and</strong> promoter isoforms areoften coexpressed in a given tissue 2 . This raised biological questionsabout how <strong>the</strong> different forms are distributed across cell types <strong>and</strong>physiological states. In addition, our prior methods relied on annotatedgene models that, even in mouse, are incomplete. Longer reads(75 bp in this work versus 25 bp in our previous work) <strong>and</strong> pairs ofreads from both ends of each RNA fragment can reduce uncertaintyin assigning reads to alternative splice variants 12 . To produce usefultranscript-level abundance estimates from paired-end RNA-Seqdata, we developed a <strong>new</strong> algorithm that can identify complete noveltranscripts <strong>and</strong> probabilistically assign reads to isoforms.For our initial demonstration of Cufflinks, we performed a timecourse of paired-end 75-bp RNA-Seq on a well-studied model ofskeletal muscle development, <strong>the</strong> C2C12 mouse myoblast cell line 13(see Online Methods). Regulated RNA expression of key transcriptionfactors drives myogenesis, <strong>and</strong> <strong>the</strong> execution of <strong>the</strong> differentiationprocess involves changes in expression of hundreds of genes 14,15 .Previous studies have not measured global transcript isoform expression;however, <strong>the</strong>re are well-documented expression changes at <strong>the</strong>whole-gene level for a set of marker genes in this system. We aimed toestablish <strong>the</strong> prevalence of differential promoter use <strong>and</strong> differentialsplicing, because such data could reveal much about <strong>the</strong> model system’sregulatory behavior. A gene with isoforms that code for <strong>the</strong>same protein may be subject to complex regulation to maintain acertain level of output in <strong>the</strong> face of changes in expression of itstranscription factors. Alternatively, genes with isoforms that encodedifferent proteins could be functionally specialized for different celltypes or states. By analyzing changes in <strong>the</strong> relative abundances oftranscripts produced by <strong>the</strong> alternative splicing of a single primarytranscript, we hoped to infer <strong>the</strong> effects of post-transcriptionalprocessing (for example, splicing) on RNA output separately fromrates of primary transcription. Such analysis could identify keygenes in <strong>the</strong> system <strong>and</strong> suggest experiments to establish how <strong>the</strong>yare regulated.We first mapped sequenced fragments to <strong>the</strong> mouse genome usingan improved version of TopHat 16 , which can align reads across splicejunctions without relying on gene annotation (SupplementaryMethods, section 2). A fragment corresponds to a single cDNAmolecule, which can be represented by a pair of reads from eachend. Out of 215 million fragments, 171 million (79%) mapped to<strong>the</strong> genome, <strong>and</strong> 46 million spanned at least one putative splice1 Department of Computer Science <strong>and</strong> 2 Center for Bioinformatics <strong>and</strong> Computational Biology, University of Maryl<strong>and</strong>, College Park, Maryl<strong>and</strong>, USA. 3 Department ofMa<strong>the</strong>matics, University of California, Berkeley, California, USA. 4 Division of Biology <strong>and</strong> Beckman Institute, California Institute of Technology, Pasadena, California,USA. 5 Genome Sciences Center, Washington University in St. Louis, St. Louis, Missouri, USA. 6 Department of Molecular <strong>and</strong> Cell Biology <strong>and</strong> 7 Department of ComputerScience, University of California, Berkeley, California, USA. Correspondence should be addressed to L.P. (lpachter@math.berkeley.edu).Received 2 February; accepted 22 March; published online 2 May 2010; doi:10.1038/nbt.1621nature biotechnology VOLUME 28 NUMBER 5 MAY 2010 511


l e t t e r s© 2010 Nature America, Inc. All rights reserved.junction (Supplementary Table 1). Of <strong>the</strong> splice junctions spannedby fragment alignments, 70% were present in transcripts annotatedby <strong>the</strong> UCSC, Ensembl or VEGA groups (known genes).To recover <strong>the</strong> minimal set of transcripts supported by our fragmentalignments, we designed a comparative transcriptome assemblyalgorithm. Expressed sequence tag (EST) assemblers such asPASA introduced <strong>the</strong> idea of collapsing alignments to transcriptson <strong>the</strong> basis of splicing compatibility 17 , <strong>and</strong> Dilworth’s <strong>the</strong>orem 18has been used to assemble a parsimonious set of haplotypes fromvirus population sequencing reads 19 . Cufflinks extends <strong>the</strong>se ideas,reducing <strong>the</strong> transcript assembly problem to finding a maximummatching in a weighted 4 bipartite graph that represents compatibilities17 among fragments (Fig. 1a–c <strong>and</strong> SupplementaryMethods, section 4). Noncoding RNAs 20 <strong>and</strong> microRNAs 21 havebeen reported to regulate cell differentiation <strong>and</strong> development, <strong>and</strong>coding genes are known to produce noncoding isoforms as a meansof regulating protein levels through nonsense-<strong>media</strong>ted decay 22 .For <strong>the</strong>se biologically motivated reasons, <strong>the</strong> assembler does notrequire that assembled transcripts contain an open reading frame(ORF). As Cufflinks does not make use of existing gene annotationsFigure 1 Overview of Cufflinks. (a) The algorithm takes as input cDNAfragment sequences that have been aligned to <strong>the</strong> genome by softwarecapable of producing spliced alignments, such as TopHat. (b–e) Withpaired-end RNA-Seq, Cufflinks treats each pair of fragment reads asa single alignment. The algorithm assembles overlapping ‘bundles’ offragment alignments (b,c) separately, which reduces running time <strong>and</strong>memory use, because each bundle typically contains <strong>the</strong> fragments fromno more than a few genes. Cufflinks <strong>the</strong>n estimates <strong>the</strong> abundances of<strong>the</strong> assembled transcripts (d,e). The first step in fragment assembly isto identify pairs of ‘incompatible’ fragments that must have originatedfrom distinct spliced mRNA isoforms (b). Fragments are connected in an‘overlap graph’ when <strong>the</strong>y are compatible <strong>and</strong> <strong>the</strong>ir alignments overlapin <strong>the</strong> genome. Each fragment has one node in <strong>the</strong> graph, <strong>and</strong> an edge,directed from left to right along <strong>the</strong> genome, is placed between eachpair of compatible fragments. In this example, <strong>the</strong> yellow, blue <strong>and</strong> redfragments must have originated from separate isoforms, but any o<strong>the</strong>rfragment could have come from <strong>the</strong> same transcript as one of <strong>the</strong>sethree. Isoforms are <strong>the</strong>n assembled from <strong>the</strong> overlap graph (c). Pathsthrough <strong>the</strong> graph correspond to sets of mutually compatible fragmentsthat could be merged into complete isoforms. The overlap graph here canbe minimally ‘covered’ by three paths (shaded in yellow, blue <strong>and</strong> red),each representing a different isoform. Dilworth’s Theorem states that<strong>the</strong> number of mutually incompatible reads is <strong>the</strong> same as <strong>the</strong> minimumnumber of transcripts needed to ‘explain’ all <strong>the</strong> fragments. Cufflinksimplements a proof of Dilworth’s Theorem that produces a minimal setof paths that cover all <strong>the</strong> fragments in <strong>the</strong> overlap graph by finding <strong>the</strong>largest set of reads with <strong>the</strong> property that no two could have originatedfrom <strong>the</strong> same isoform. Next, transcript abundance is estimated(d). Fragments are matched (denoted here using color) to <strong>the</strong> transcriptsfrom which <strong>the</strong>y could have originated. The violet fragment could haveoriginated from <strong>the</strong> blue or red isoform. Gray fragments could have comefrom any of <strong>the</strong> three shown. Cufflinks estimates transcript abundancesusing a statistical model in which <strong>the</strong> probability of observing eachfragment is a linear function of <strong>the</strong> abundances of <strong>the</strong> transcripts fromwhich it could have originated. Because only <strong>the</strong> ends of each fragmentare sequenced, <strong>the</strong> length of each may be unknown. Assigning a fragmentto different isoforms often implies a different length for it. Cufflinksincorporates <strong>the</strong> distribution of fragment lengths to help assign fragmentsto isoforms. For example, <strong>the</strong> violet fragment would be much longer, <strong>and</strong>very improbable according to <strong>the</strong> Cufflinks model, if it were to come from<strong>the</strong> red isoform instead of <strong>the</strong> blue isoform. Last, <strong>the</strong> program numericallymaximizes a function that assigns a likelihood to all possible sets ofrelative abundances of <strong>the</strong> yellow, red <strong>and</strong> blue isoforms (γ 1 ,γ 2 ,γ 3 )(e), producing <strong>the</strong> abundances that best explain <strong>the</strong> observed fragments,shown as a pie chart.during assembly, we validated <strong>the</strong> transcripts by first comparingindividual time point assemblies to existing annotations.We recovered a total of 13,692 known isoforms <strong>and</strong> 12,712 <strong>new</strong> isoformsof known genes. We estimate that 77% of <strong>the</strong> reads originatedfrom previously known transcripts (Supplementary Table 2). Of <strong>the</strong><strong>new</strong> isoforms, 7,395 (58%) contain novel splice junctions, with <strong>the</strong>remainder being novel combinations of known splicing outcomes;11,712 (92%) have an ORF, 8,752 of which end at an annotated stopcodon. Although we sequenced deeply by current st<strong>and</strong>ards, 73% of<strong>the</strong> moderately abundant transcripts (15–30 expected fragments perkilobase of transcript per million fragments mapped, abbreviatedFPKM; see below for fur<strong>the</strong>r explanation) detected at <strong>the</strong> 60-h timepoint with three lanes of GAII transcriptome sequencing were fullyrecovered with just a single lane. Because distinguishing a full-lengthtranscript from a partially assembled fragment is difficult, we conservativelyexcluded from fur<strong>the</strong>r analyses <strong>the</strong> novel isoforms thatwere unique to a single time point. Out of <strong>the</strong> <strong>new</strong> isoforms, 3,724were present in multiple time points, <strong>and</strong> 581 were present at alltime points; 6,518 (51%) of <strong>the</strong> <strong>new</strong> isoforms <strong>and</strong> 2,316 (62%) of<strong>the</strong> multiple time point novel isoforms were tiled by high-identityabcMap paired cDNAfragment sequencesto genomeAssemblyOverlap graphMinimum path coverTranscriptsMutuallyincompatiblefragmentsTopHatCufflinksdeSpliced fragmentalignmentsAbundance estimationTranscript coverage<strong>and</strong> compatibilityLog-likelihoodTranscripts<strong>and</strong> <strong>the</strong>irabundancesFragmentlengthdistributionMaximum likelihoodabundancesγ 2γγ 31γγ 21γ 3512 VOLUME 28 NUMBER 5 MAY 2010 nature biotechnology


l e t t e r s© 2010 Nature America, Inc. All rights reserved.aTSS IbFPKMRatios302520151050AB–24TSS IICMyc AMyc BMyc CMyc Transcriptional Posttranscriptional60 120 168EST alignments or matched RefSeq isoforms from o<strong>the</strong>r organisms,<strong>and</strong> end point RT-PCR experiments confirmed <strong>new</strong> isoforms in genesof interest (Supplementary Table 3). We concluded that most of <strong>the</strong>unannotated transcripts we found are in <strong>the</strong> myogenic transcriptome<strong>and</strong> that <strong>the</strong> mouse annotation remains incomplete.To estimate transcript abundances, we first selected a set of 11,079genes containing 17,416 high-confidence isoforms (SupplementaryData 1). Of <strong>the</strong>se, 13,692 (79%) were known, <strong>and</strong> <strong>the</strong> remaining 3,724(21%) were novel isoforms of known genes present in multiple timepoints. We <strong>the</strong>n developed a statistical model specifying <strong>the</strong> probabilityof observing an RNA-Seq fragment. The model is parameterizedby <strong>the</strong> abundances of <strong>the</strong>se transcripts (Fig. 1d–f <strong>and</strong> SupplementaryMethods, section 3). Cufflinks’ model allows for <strong>the</strong> probabilisticdeconvolution of RNA-Seq fragment densities to account for cases inwhich genome alignments of fragments do not uniquely correspondto source transcripts. The model incorporates minimal assumptions 23about <strong>the</strong> sequencing experiment, <strong>and</strong> it extends <strong>the</strong> unpaired readmodel of Jiang <strong>and</strong> Wong 8 to <strong>the</strong> paired-end case.Abundances were reported in FPKM. In <strong>the</strong>se units, <strong>the</strong> relativeabundances of transcripts are described in terms of <strong>the</strong> expected biologicalobjects (fragments) observed from an RNA-Seq experiment,which in <strong>the</strong> future may not be represented by single or paired reads.Confidence intervals for estimates were obtained using a Bayesianinference method based on importance sampling from <strong>the</strong> posteriordistribution. Abundances of spiked controlsequences (R 2 = 0.99) <strong>and</strong> benchmarks withsimulated data (R 2 = 0.96) revealed thatCufflinks’ abundance estimates are highlyaccurate. The inclusion of novel isoformsof known genes during abundance estimationhad a strong impact on <strong>the</strong> estimates ofknown isoforms in many genes (R 2 = 0.90),highlighting <strong>the</strong> importance of couplingtranscript discovery toge<strong>the</strong>r with abundanceestimation.We identified 7,770 genes <strong>and</strong> 10,480 isoformsundergoing significant abundancechanges between some successive pair ofcFPKM403020100–24A+BCRelative TSSabundances100%transcriptionalBARelative isoformabundances withinTSSQ T A T T M P L V NM P L V NM P L V N100% posttranscriptional18 60 90 144Time (h)aCufflinksassemblyTAF1RNA PolllRefSeq6.9 × 10 –3Figure 2 Distinction of transcriptional <strong>and</strong> post-transcriptional regulatoryeffects on overall transcript output. (a) When abundances of isoforms A,B <strong>and</strong> C of Myc are grouped by TSS, changes in <strong>the</strong> relative abundancesof <strong>the</strong> TSS groups indicate transcriptional regulation. Post-transcriptionaleffects are seen in changes in levels of isoforms of a single TSS group.(b) Isoforms of Myc have distinct expression dynamics. (c) Myc isoformsare downregulated as <strong>the</strong> time course proceeds. The width of <strong>the</strong> coloredb<strong>and</strong> is <strong>the</strong> measure of change in relative transcript abundance <strong>and</strong><strong>the</strong> color is <strong>the</strong> log ratio of transcriptional <strong>and</strong> post-transcriptionalcontributions to change in relative abundances (plot construction detailedin Supplementary Methods, section 5.3). Changes in relative abundancesof Myc isoforms suggest that transcriptional effects im<strong>media</strong>tely followingdifferentiation at 0 h give way to post-transcriptional effects later in <strong>the</strong>time course, as isoform A is eliminated.time points (expected false discovery rate, abbreviated FDR, of


l e t t e r sFigure 4 Robustness of assembly <strong>and</strong> abundance estimation as a functionof expression level <strong>and</strong> depth of sequencing. Subsets of <strong>the</strong> full 60-hread set were mapped <strong>and</strong> assembled with TopHat <strong>and</strong> Cufflinks, <strong>and</strong><strong>the</strong> resulting assemblies were compared for structural <strong>and</strong> abundanceagreement with <strong>the</strong> full 60-h assembly. Colored lines show <strong>the</strong> resultsobtained at different depths of sequencing in <strong>the</strong> full assembly; forexample, <strong>the</strong> light blue line tracks <strong>the</strong> performance for transcripts withFPKM >60. (a) The fraction of transcript fragments fully recoveredincreases with additional sequencing data, although nearly 75% ofmoderately expressed transcripts (≥15 FPKM) are recovered with fewerthan 40 million 75-bp paired-end reads (20 million fragments), a fractionaPercent of finaltranscripts100806040200Percent of transcripts within15% final FPKMFPKM FPKM0.01–3.740.01–3.743.75–7.493.75–7.497.5–14.997.5–14.9915–29.9915–29.9930–59.9930–59.9960+60+0 20 40 60 80 100 120 1400 120 14020 40 60 80 100Reads (millions)Reads (millions)of <strong>the</strong> data generated by a single run of <strong>the</strong> sequencer used in this experiment. (b) Abundance estimates are similarly robust. At 40 million reads,transcripts determined to be moderately expressed using all 60-h reads were estimated at within 15% of <strong>the</strong>ir final FPKM values.b100806040200© 2010 Nature America, Inc. All rights reserved.sharing a TSS produces <strong>the</strong> trajectory for <strong>the</strong>ir primary transcript, <strong>and</strong>we identified 401 (48%) genes with multiple distinct primary transcripttrajectories. However, trajectory classification was not preciseenough to prioritize fur<strong>the</strong>r investigation into individual genes <strong>and</strong>could not form <strong>the</strong> basis for statistical significance testing.We <strong>the</strong>refore formalized <strong>and</strong> quantified divergent expression patternsof isoforms within <strong>and</strong> between TSS groups with an information<strong>the</strong>oreticmetric derived from <strong>the</strong> Jensen-Shannon divergence. With thismetric, relative transcript abundances are represented as points along alogarithmic spiral in a real Hilbert space 25 , <strong>and</strong> as a result <strong>the</strong> distancebetween points measures <strong>the</strong> extent of change in relative expression.Quantification of expression change in this way revealed significant(FDR < 5%) differential transcriptional regulation <strong>and</strong> splicing in 882of 3,486 (25%) <strong>and</strong> 273 of 843 (32%) c<strong>and</strong>idate genes, respectively, with70 genes showing both types of differential regulation (SupplementaryTable 4). Myc (Fig. 2a,b) undergoes a shift in transcriptional regulationof transcript abundances to post-transcriptional control of abundances(Fig. 2c) between 60 h <strong>and</strong> 90 h, as myocytes are beginning to fuseinto myotubes.Focusing on <strong>the</strong> genes with significant promoter <strong>and</strong> isoformchanges (FDR < 5%), we noted that in many cases changes in relativeabundance reflected switch-like events in which <strong>the</strong>re was aninversion of <strong>the</strong> dominant primary transcript. For example, in <strong>the</strong>gene encoding FHL3, a transcriptional regulator recently reportedto inhibit myogenesis 26 , Cufflinks assembled <strong>the</strong> known isoform <strong>and</strong>ano<strong>the</strong>r with a novel start site. We validated <strong>the</strong> 5′ exon of this isoformalong with o<strong>the</strong>r novel start sites <strong>and</strong> splicing events by form-specificRT-PCR (Fig. 3a <strong>and</strong> Supplementary Methods, section 4). Limitinganalysis to known isoforms would have produced an incorrect abundanceestimate for <strong>the</strong> known isoform of FHL3. Moreover, <strong>the</strong> novelisoform is dominant before differentiation, so this potentially importantdifferentiation-associated promoter switch would have beenmissed (Fig. 3b). In total, we tested <strong>and</strong> validated 153 of 185 putativenovel TSSs by comparison against TAF1 <strong>and</strong> RNA polymerase IIchromatin immunoprecipitation (ChIP)-Seq peaks.We also observed switches in <strong>the</strong> major isoform of alternativelyspliced genes. In total, 10% of multi-promoter genes featured aswitch in major primary transcript, <strong>and</strong> 7% of alternatively splicedprimary transcripts switched major isoforms. We concluded that notonly does promoter switching have a substantial impact on mRNAoutput, but also many genes show evidence of post-transcriptionallyinduced expression changes, supporting a role for dynamic splicingregulation in myogenesis. A key question is whe<strong>the</strong>r genes that showdivergent expression patterns of isoforms are differentially regulatedin a particular system because <strong>the</strong>y have isoforms that are functionallyspecialized for that system. Of <strong>the</strong> genes undergoing transcriptionalor post-transcriptional isoform switches, 26% <strong>and</strong> 24%, respectively,encode multiple distinct proteins according to annotation.We excluded genes with novel isoforms from <strong>the</strong> coding sequenceanalysis, so this fraction probably underestimates <strong>the</strong> impact ofdifferential regulation on coding potential. We thus speculate thatdifferential RNA level isoform regulation, whe<strong>the</strong>r transcriptional,post-transcriptional or mixed in underlying mechanism, suggestsfunctional specialization of <strong>the</strong> isoforms in many genes.Although Cufflinks was designed to investigate transcriptional <strong>and</strong>splicing regulation in this experiment, it is applicable to a broad rangeof RNA-Seq studies (Fig. 4). The open-source software runs on commonlyavailable <strong>and</strong> inexpensive hardware, making it accessible to anyresearcher using RNA-Seq data. We are currently exploring <strong>the</strong> useof <strong>the</strong> Cufflinks assembler to annotate genomes of <strong>new</strong>ly sequencedorganisms <strong>and</strong> to quantify <strong>the</strong> effect of various mechanisms of generegulation on expression. When coupled with assays of upstreamregulatory activity, such as chromatin-state mapping or promoteroccupancy, Cufflinks should help unveil <strong>the</strong> range of mechanismsgoverning RNA manufacture <strong>and</strong> processing.MethodsMethods <strong>and</strong> any associated references are available in <strong>the</strong> onlineversion of <strong>the</strong> paper at http://www.nature.com/naturebiotechnology/.Accession code. NCBI Gene Expression Omnibus: The data discussedin this publication have been deposited with accession numberGSE20846.Note: Supplementary information is available on <strong>the</strong> Nature Biotechnology website.AcknowledgmentsThis work was supported in part by <strong>the</strong> US National Institutes of Health (NIH)grants R01-LM006845 <strong>and</strong> ENCODE U54-HG004576, as well as <strong>the</strong> BeckmanFoundation, <strong>the</strong> Bren Foundation, <strong>the</strong> Moore Foundation (Cell Center Program)<strong>and</strong> <strong>the</strong> Miller Research Institute. We thank I. Antosechken <strong>and</strong> L. Schaeffer of <strong>the</strong>Caltech Jacobs Genome Center for DNA sequencing, <strong>and</strong> D. Trout, B. King <strong>and</strong>H. Amrhein for data pipeline <strong>and</strong> database design, operation <strong>and</strong> display. We aregrateful to R. K. Bradley, K. Datchev, I. Hallgrímsdóttir, J. L<strong>and</strong>olin, B. Langmead,A. Roberts, M. Schatz <strong>and</strong> D. Sturgill for helpful discussions.AUTHOR CONTRIBUTIONSC.T. <strong>and</strong> L.P. developed <strong>the</strong> ma<strong>the</strong>matics <strong>and</strong> statistics <strong>and</strong> designed <strong>the</strong>algorithms; B.A.W. <strong>and</strong> G.K. performed <strong>the</strong> RNA-Seq <strong>and</strong> B.A.W. designed <strong>and</strong>executed experimental validations; C.T. implemented Cufflinks <strong>and</strong> Cuffdiff;G.P. implemented Cuffcompare; M.J.v.B. <strong>and</strong> A.M. tested <strong>the</strong> software; C.T., G.P.<strong>and</strong> A.M. performed <strong>the</strong> analysis; L.P., A.M. <strong>and</strong> B.J.W. conceived <strong>the</strong> project;C.T., L.P., A.M., B. J.W. <strong>and</strong> S.L.S. wrote <strong>the</strong> manuscript.COMPETING FINANCIAL INTERESTSThe authors declare no competing financial interests.Published online at http://www.nature.com/naturebiotechnology/.Reprints <strong>and</strong> permissions information is available online at http://npg.nature.com/reprints<strong>and</strong>permissions/.514 VOLUME 28 NUMBER 5 MAY 2010 nature biotechnology


l e t t e r s© 2010 Nature America, Inc. All rights reserved.1. Cloonan, N. et al. Stem cell transcriptome profiling via massive-scale mRNAsequencing. Nat. Methods 5, 613–619 (2008).2. Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L. & Wold, B. Mapping <strong>and</strong>quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 5, 621–628(2008).3. Nagalakshmi, U., Wang, Z., Waern, K., Shou, C. & Raha, D. The transcriptionall<strong>and</strong>scape of <strong>the</strong> yeast genome defined by RNA sequencing. Science 320,1344–1349 (2008).4. Wang, E. et al. Alternative isoform regulation in human tissue transcriptomes. Nature456, 470–476 (2008).5. Denoeud, F. et al. Annotating genomes with massive-scale RNA sequencing. GenomeBiol. 9, R175 (2008).6. Maher, C. et al. Transcriptome sequencing to detect gene fusions in cancer. Nature458, 97–101 (2009).7. Marioni, J., Mason, C., Mane, S., Stephens, M. & Gilad, Y. RNA-seq: an assessmentof technical reproducibility <strong>and</strong> comparison with gene expression arrays. GenomeRes. 18, 1509–1517 (2008).8. Hiller, D., Jiang, H., Xu, W. & Wong, W. Identifiability of isoform deconvolution fromjunction arrays <strong>and</strong> RNA-Seq. Bioinformatics 25, 3056–3059 (2009).9. Jiang, H. & Wong, W.H. Statistical inferences for isoform expression in RNA-Seq.Bioinformatics 25, 1026–1032 (2009).10. Li, B., Ruotti, V., Stewart, R.M., Thomson, J.A. & Dewey, C.N. RNA-Seq geneexpression estimation with read mapping uncertainty. Bioinformatics 26, 493–500(2010).11. Mortazavi, A., Williams, B., McCue, K., Schaeffer, L. & Wold, B. Mapping <strong>and</strong>quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 5, 621–628(2008).12. Pepke, S., Wold, B. & Mortazavi, A. Computation for ChIP-Seq <strong>and</strong> RNA-Seq studies.Nat. Methods 6, S22–S32 (2009).13. Yaffe, D. & Saxel, O. A myogenic cell line with altered serum requirements fordifferentiation. Differentiation 7, 159–166 (1977).14. Yun, K. & Wold, B. Skeletal muscle determination <strong>and</strong> differentiation: story of acore regulatory network <strong>and</strong> its context. Curr. Opin. Cell Biol. 8, 877–889(1996).15. Tapscott, S.J. The circuitry of a master switch: Myod <strong>and</strong> <strong>the</strong> regulation of skeletalmuscle gene transcription. Development 132, 2685–2695 (2005).16. Trapnell, C., Pachter, L. & Salzberg, S. TopHat: discovering splice junctions withRNA-Seq. Bioinformatics 25, 1105–1111 (2009).17. Haas, B.J. et al. Improving <strong>the</strong> Arabidopsis genome annotation using maximaltranscript alignment assemblies. Nucleic Acids Res. 31, 5654–5666 (2003).18. Dilworth, R. A decomposition <strong>the</strong>orem for partially ordered sets. Ann. Math. 51,161–166 (1950).19. Eriksson, N. et al. Viral population estimation using pyrosequencing. PLOS Comput.Biol. 4, e1000074 (2008).20. Guttman, M. et al. Chromatin signature reveals over a thous<strong>and</strong> highly conservedlarge non-coding RNAs in mammals. Nature 458, 223–227 (2009).21. Cordes, K.R. et al. miR-145 <strong>and</strong> miR-143 regulate smooth muscle cell fate <strong>and</strong>plasticity. Nature 460, 705–710 (2009).22. Lareau, L.F., Inada, M., Green, R.E., Wengrod, J.C. & Brenner, S.E. Unproductivesplicing of SR genes associated with highly conserved <strong>and</strong> ultraconserved DNAelements. Nature 446, 926–929 (2007).23. Bullard, J., Purdom, E., Hansen, K., Durinck, S. & Dudoit, S. Evaluation of statisticalmethods for normalization <strong>and</strong> differential expression in mRNA-Seq experiments.BMC Bioinformatics 11, 94 (2010).24. Endo, T. & Nadal-Ginard, B. Transcriptional <strong>and</strong> posttranscriptional control of c-mycduring myogenesis: its mRNA remains inducible in differentiated cells <strong>and</strong> doesnot suppress <strong>the</strong> differentiated phenotype. Mol. Cell. Biol. 6, 1412–1421(1986).25. Fuglede, B. & Topsøe, F. Jensen-Shannon divergence <strong>and</strong> Hilbert space embedding. inProceedings of <strong>the</strong> IEEE International Symposium on Information Theory, 31 (2004).26. Cottle, D.L., McGrath, M.J., Cowling, B.S. & Coghill, I.D. FHL3 binds MyoD <strong>and</strong>negatively regulates myotube formation. J. Cell Sci. 120, 1423–1435 (2007).nature biotechnology VOLUME 28 NUMBER 5 MAY 2010 515


© 2010 Nature America, Inc. All rights reserved.ONLINE METHODSRNA isolation. Mouse skeletal muscle C2C12 cells were initially plated on15-cm plates in DMEM with 20% FBS. At confluence, <strong>the</strong> cells were switchedto low-serum medium to initiate myogenic differentiation. For extractionof total RNA, cells were first rinsed in PBS <strong>and</strong> <strong>the</strong>n lysed in Trizol reagent(Invitrogen, catalog no. 15596-026), ei<strong>the</strong>r during exponential growth inhigh-serum medium or at 60 h, 5 d <strong>and</strong> 7 d after medium shift. Residualcontaminating genomic DNA was removed from <strong>the</strong> total RNA fraction usingTurbo DNA-free (Ambion, catalog no. AM1907M). mRNA was isolated fromDNA-free total RNA using <strong>the</strong> Dynabeads mRNA Purification Kit (Invitrogen,catalog no. 610-06).Fragmentation <strong>and</strong> reverse transcription. Preparation of cDNA followed <strong>the</strong>procedure described previously 2 , with minor modifications as described below.Before fragmentation, a 7-µl aliquot (total mass ~500 pg) containing knownconcentrations of seven ‘spiked in’ control transcripts from Arabadopsis thaliana<strong>and</strong> <strong>the</strong> lambda phage genome were added to a 100-ng aliquot of mRNA fromeach time point. This mixture was <strong>the</strong>n fragmented to an average length of200 nucleotides by metal-ion <strong>and</strong> heat-catalyzed hydrolysis. The hydrolysiswas performed in a 25-µl volume at 94 °C for 90 s. The 5× hydrolyis buffercomponents were 200 mM Tris acetate, pH 8.2, 500 mM potassium acetate<strong>and</strong> 150 mM magnesium acetate. After removal of hydrolysis ions by G50Sephadex filtration (USA Scientific, catalog no. 1415-1602), <strong>the</strong> fragmentedmRNA was r<strong>and</strong>omly primed with hexamers <strong>and</strong> reverse-transcribed using<strong>the</strong> Super Script II cDNA syn<strong>the</strong>sis kit (Invitrogen, catalog no. 11917010).After second-str<strong>and</strong> syn<strong>the</strong>sis, <strong>the</strong> cDNA went through end-repair <strong>and</strong> ligationreactions according to <strong>the</strong> Illumina ChIP-Seq genomic DNA preparation kitprotocol (Illumina, catalog no. IP102-1001), using <strong>the</strong> paired-end adapters<strong>and</strong> amplification primers (Illumina, catalog no. PE102-1004). Ligation of <strong>the</strong>adapters adds 94 bases to <strong>the</strong> length of <strong>the</strong> cDNA molecules.Size selection. The cDNA library was size-fractionated on a 2% TAE low-meltagarose gel (Lonza, catalog no. 50080), with a 100-bp ladder (Roche, catalogno. 14703220) run in adjacent lanes. Before loading of <strong>the</strong> gel, <strong>the</strong> ligatedcDNA library was taken over a G50 Sephadex column to remove excess saltsthat interfere with loading <strong>the</strong> sample in <strong>the</strong> wells. After staining of <strong>the</strong> gel inethidium bromide, a narrow slice (~2 mm) of <strong>the</strong> cDNA lane centered at <strong>the</strong>300-bp marker was cut. The slice was extracted using <strong>the</strong> QiaEx II kit (Qiagen,catalog no. 20021), <strong>and</strong> <strong>the</strong> extract was filtered over a Microcon YM-100microconcentrator (Millipore, catalog no. 42409) to remove DNA fragmentsshorter than 100 bp. Filtration was performed by pipeting <strong>the</strong> extract into<strong>the</strong> upper chamber of a microconcentrator <strong>and</strong> adding ultra-pure water(Gibco, catalog no. 10977) to a final volume of 500 µl. The filter was spun at500g until only 50 µl remained in <strong>the</strong> upper chamber (about 20 min per spin)<strong>and</strong> <strong>the</strong>n <strong>the</strong> upper chamber volume was replenished to 500 µl. This procedurewas repeated six times. The filtered sample was <strong>the</strong>n recovered from <strong>the</strong> filterchamber according to <strong>the</strong> manufacturer’s protocol. Fragment-length distributionsobtained after size selection were estimated from <strong>the</strong> spike-in sequences<strong>and</strong> are shown in Supplementary Figure 1.Amplification. One-sixth of <strong>the</strong> filtered sample volume was used as templatefor 15 cycles of amplification using <strong>the</strong> paired-end primers <strong>and</strong> amplificationreagents supplied with <strong>the</strong> Illumina ChIP-Seq genomic DNA prep kit. Theamplified product was cleaned up over a Qiaquick PCR column (Qiagen, catalogno. 28104), <strong>and</strong> <strong>the</strong>n <strong>the</strong> filtration procedure using <strong>the</strong> Microcon YM-100microconcentrators described above was repeated, to remove both amplificationprimers <strong>and</strong> amplification products shorter than 100 bp. A final pass overa G50 Sephadex column was performed, <strong>and</strong> <strong>the</strong> library was quantified using<strong>the</strong> Qubit fluorometer <strong>and</strong> PicoGreen quantification reagents (Invitrogen, catalogno. Q32853). The library was <strong>the</strong>n used to build clusters on <strong>the</strong> Illuminaflow cell according to protocol.Mapping cDNA fragments to <strong>the</strong> genome. Fragments were mapped to build37.1 of <strong>the</strong> mouse genome using TopHat version 1.0.13. We extended our previousalgorithms to exploit <strong>the</strong> longer paired reads used in <strong>the</strong> study. TopHatversion 1.0.7 <strong>and</strong> later splits a read 75 bp or longer in three or more segmentsof approximately equal size (25 bp) <strong>and</strong> maps <strong>the</strong>m independently. Reads withsegments that can be mapped to <strong>the</strong> genome only noncontiguously are markedas possible intron-spanning reads. These ‘contiguously unmappable’ reads areused to build a set of possible introns in <strong>the</strong> transcriptome. TopHat accumulatesan index of potential splice junctions by examining segment mappingfor all contiguously unmappable reads. For each junction, <strong>the</strong> program <strong>the</strong>nconcatenates 22 bp pairs upstream of <strong>the</strong> donor to 22 bp pairs downstreamof <strong>the</strong> acceptor to form a syn<strong>the</strong>tic spliced sequence around <strong>the</strong> junction.The segments of <strong>the</strong> contiguously unmappable reads are <strong>the</strong>n aligned against<strong>the</strong>se syn<strong>the</strong>tic sequences with Bowtie. The resulting contiguous <strong>and</strong> splicedsegment alignments for <strong>the</strong>se reads are merged to form complete alignmentsto <strong>the</strong> genome, each spanning one or more splice junctions. Fur<strong>the</strong>r detailsof how version 1.0.13 of TopHat differs from <strong>the</strong> published algorithm areprovided in section 2 of <strong>the</strong> Supplementary Methods.Transcript abundance estimation. We estimated transcript abundances usinga generative statistical model of RNA-Seq experiments. The model was parameterizedby <strong>the</strong> relative abundances of <strong>the</strong> set of all transcripts in a sample.For computational convenience, abundances of non-overlapping transcriptsin disjoint genomic loci were calculated independently. The parameters of <strong>the</strong>model were <strong>the</strong> non-negative abundances ρ t . Denoting <strong>the</strong> fragment distributionby F, we defined <strong>the</strong> effective length of a transcript to be:l( t)l( t) = ∑ F( i)( l( t) − i + 1)i = 1where l(t) is <strong>the</strong> length of a transcript. The likelihood function for our modelwas <strong>the</strong>n given by:r l tL Rt ( ) ⎛ F( It( r))⎞( r| ) = ∏ ∑rul( u)⎝⎜ l( t) − I rr R t Tt ( ) + ⎠⎟∈ ∈ ∑1u∈Twhere <strong>the</strong> products were over all fragment alignments R <strong>and</strong> transcripts T in<strong>the</strong> transcriptome, <strong>and</strong> I t (r) was <strong>the</strong> implied length of a fragment determinedby a pair of reads assuming it originated from transcript t (SupplementaryFig. 2). This is <strong>the</strong> likelihood function for a non-negative linear model, <strong>and</strong><strong>the</strong>refore, <strong>the</strong> likelihood function had a unique maximum, which our implementationcalculated via a numerical optimization procedure. Ra<strong>the</strong>r thanreporting this estimate, we instead found <strong>the</strong> maximum a posteriori (MAP)estimate using a Bayesian inference procedure based on importance samplingfrom <strong>the</strong> posterior distribution. The proposal distribution we used was multivariatenormal, with a mean given by <strong>the</strong> maximum likelihood estimatediscussed above, <strong>and</strong> <strong>the</strong> variance-covariance matrix given by <strong>the</strong> inverse of <strong>the</strong>observed Fisher information matrix. The samples were also used to compute95% confidence intervals for <strong>the</strong> MAP estimates. The MAP estimates <strong>and</strong> (<strong>and</strong>associated confidence intervals) were used for differential expression testing.Abundances were reported in FPKM (expected fragments per kilobase of transcriptper million fragments sequenced). This unit is a scalar multiple of <strong>the</strong> parametersρ t . FPKM is conceptually analogous to <strong>the</strong> reads per kilobase per millionreads sequenced (RPKM) measure, but it explicitly accommodates sequencingdata with one, two or—if needed for future sequencing platforms—highernumbers of reads from single source molecules.Abundance estimates were validated using spike-in sequences(Supplementary Fig. 3) <strong>and</strong> simulations (Supplementary Fig. 4). To confirmthat all transcripts of a gene are necessary for accurate abundance estimation,novel transcripts were removed from <strong>the</strong> analysis (Supplementary Fig. 5),showing that resulting estimates may be biased.Transcript assembly. Transcripts were assembled from <strong>the</strong> mapped fragmentssorted by reference position. Fragments were first divided into non-overlappingloci, <strong>and</strong> each locus was assembled independently of <strong>the</strong> o<strong>the</strong>rs using <strong>the</strong>Cufflinks assembler. The assembler was designed to find <strong>the</strong> minimal numberof transcripts that ‘explain’ <strong>the</strong> reads (that is, every read should be contained insome transcript). First, erroneous spliced alignments or reads from incompletelyspliced RNAs were filtered out. The algorithm for assembly was based on a constructiveproof of Dilworth’s Theorem (Supplementary Methods, appendix A,<strong>the</strong>orem 17). Each fragment alignment was assigned a node in an ‘overlapnature biotechnologydoi:10.1038/nbt.1621


© 2010 Nature America, Inc. All rights reserved.graph’ G. A directed edge (x,y) was placed between nodes x <strong>and</strong> y when <strong>the</strong>alignment for x started at a lower coordinate than y, <strong>the</strong> alignments overlappedin <strong>the</strong> genome <strong>and</strong> <strong>the</strong> fragments were ‘compatible’ (Supplementary Fig. 6).Compatibility was defined for overlapping fragments for which every impliedintron in one fragment matched an identical implied intron in <strong>the</strong> o<strong>the</strong>r fragment.The resulting directed, acyclic graph was transitively reduced to produce G,to avoid including redundant path information. Cufflinks <strong>the</strong>n found a minimumpath cover of G, meaning that every fragment node was contained in somepath in <strong>the</strong> cover, <strong>and</strong> <strong>the</strong> cover contained as few paths as possible. Each path in<strong>the</strong> cover corresponded to a set of mutually compatible fragments overlappingeach o<strong>the</strong>r on <strong>the</strong> left <strong>and</strong> right (except initial <strong>and</strong> terminal fragments on <strong>the</strong>path). Dilworth’s <strong>the</strong>orem implied that this path cover could be constructed byfirst finding <strong>the</strong> largest set of fragments with <strong>the</strong> property that no two are compatible.This set was determined by finding a maximum matching in a bipartitegraph constructed from <strong>the</strong> transitive closure of G. The bipartite ‘reachabilitygraph’ had a node in each partition for all fragments in G, <strong>and</strong> nodes were connectedif <strong>the</strong>re was a path between <strong>the</strong>m in G. Given a maximum cardinalitymatching M, any fragment without an incident edge in M was a member of an‘antichain’. Each member of this antichain could be extended to a path, <strong>and</strong> thisextension was a minimum path cover of G.The minimum cardinality chain decomposition computed using <strong>the</strong>approach described above was not guaranteed to be unique. To ‘phase’ distantexons, we leveraged <strong>the</strong> fact that abundance in homogeneities could link distantexons by <strong>the</strong>ir coverage. We <strong>the</strong>refore weighted <strong>the</strong> edges of <strong>the</strong> bipartitereachability graph on <strong>the</strong> basis of <strong>the</strong> percent-spliced-in metric introducedpreviously 4 . Cufflinks arbitrated between multiple parsimonious assembliesby choosing <strong>the</strong> minimum-cost maximum matching in <strong>the</strong> reachability graph.In our setting, <strong>the</strong> percent-spliced-in ψ x for an alignment x was computedby counting <strong>the</strong> alignments overlapping x in <strong>the</strong> genome that were compatiblewith x, dividing by <strong>the</strong> total number of alignments that overlap x, <strong>and</strong><strong>the</strong>n normalizing for <strong>the</strong> length of <strong>the</strong> x. The cost C(y, z) assigned to an edgebetween alignments y <strong>and</strong> z reflected <strong>the</strong> belief that <strong>the</strong>y originated from differenttranscripts:C( y, z) = − log( 1− y y −yz ).A useful feature of <strong>the</strong> Cufflinks assemblies is that <strong>the</strong>y resulted in provably identifiablemodels. Complete details of <strong>the</strong> Cufflinks assembler are provided in <strong>the</strong>Supplementary Methods (section 4), along with proofs of several key <strong>the</strong>orems.Structural comparison of time point assemblies. To validate Cufflinks transfrags(assembled transcript fragments) against annotated transcriptomes, <strong>and</strong>also to find transfrags common to multiple assemblies, we developed a tool called‘Cuffcompare’ that builds structural equivalence classes of transcripts. We ranCuffcompare on <strong>the</strong> assembly from each time point against <strong>the</strong> combined annotatedtranscriptomes of UCSC, Ensembl <strong>and</strong> VEGA (Supplementary Fig. 7).Because of <strong>the</strong> stochastic nature of sequencing, assembly of <strong>the</strong> same transcriptin two different samples may result in transfrags of slightly different lengths.A Cufflinks transfrag was considered a complete match when <strong>the</strong>re was a transcriptwith an identical chain of introns in <strong>the</strong> combined annotation. When nocomplete match was found between a Cufflinks transfrag <strong>and</strong> <strong>the</strong> transcriptsin <strong>the</strong> combined annotation, Cuffcompare determined <strong>and</strong> reported whe<strong>the</strong>r<strong>the</strong>re was ano<strong>the</strong>r potentially significant relationship with any of <strong>the</strong> annotationtranscripts that could be found in or around <strong>the</strong> same genomic locus.Assembly <strong>and</strong> abundance robustness analysis. A total of 61,787,833 cDNAfragments were sequenced at 60 h. We mapped <strong>and</strong> assembled subsets of <strong>the</strong>sefragments (at fractions 1/64, 1/32, 1/16, 1/8, 1/4 <strong>and</strong> 1/2 of <strong>the</strong> total) usingTopHat <strong>and</strong> Cufflinks. Each assembly of parts of <strong>the</strong> data was compared to <strong>the</strong>assembly obtained with <strong>the</strong> full fragment set using Cuffcompare. We countedtranscripts recovered in assemblies from partial data that structurally matchedsome transcripts in <strong>the</strong> assembly using all <strong>the</strong> reads. We assessed robustnessof abundance estimation by counting <strong>the</strong> fraction of assembled transcriptsthat were assigned abundances within 15% of <strong>the</strong> FPKM value reported for<strong>the</strong> full fragment set transcript.Simulation-based validation. To assess <strong>the</strong> accuracy of <strong>the</strong> Cufflinks estimates,we simulated an RNA-Seq experiment using <strong>the</strong> FluxSimulator 27 , afreely available software package that models whole-transcriptome sequencingexperiments with <strong>the</strong> Illumina Genome Analyzer. The software works by firstr<strong>and</strong>omly assigning expression values to <strong>the</strong> transcripts provided by <strong>the</strong> user,constructing an amplified, size-selected library, <strong>and</strong> <strong>the</strong>n sequencing it. MouseUCSC transcripts were supplied to <strong>the</strong> software, along with build 37.1 of <strong>the</strong>genome. FluxSimulator <strong>the</strong>n r<strong>and</strong>omly assigned expression levels to 18,935UCSC transcripts. From <strong>the</strong>se relative expression levels, <strong>the</strong> software constructedan in silico RNA-Seq sample, with each transcript assigned a numberof library molecules according to its abundance. FluxSimulator produced13,203,516 75-bp paired-end RNA-Seq reads from 6,601,805 library fragments,which were mapped with TopHat to <strong>the</strong> mouse genome using identical parametersto those used to map <strong>the</strong> C2C12 reads. A total of 6,176,961 fragmentswere mapped (93% of <strong>the</strong> library). These alignments were supplied alongwith <strong>the</strong> exact set of expressed transcripts to Cufflinks, to measure Cufflinks’abundance estimation accuracy when working with a ‘perfect’ assembly.Validation of novel transcription start sites. Transcripts with 5′ exons notin <strong>the</strong> UCSC, Ensembl or VEGA annotations were selected for validation. Weexcluded transcripts with estimated abundances of


Figures 10 <strong>and</strong> 11 show examples of genes with significant changes in relativetranscript abundances during <strong>the</strong> time course.Software availability. TopHat (http://tophat.cbcb.umd.edu) is freely availableas source code. It takes a reference genome (as a Bowtie 29 index) <strong>and</strong> RNA-Seqreads as FASTA or FASTQ <strong>and</strong> produces alignments in SAM 30 format. TopHatis distributed under <strong>the</strong> Artistic License <strong>and</strong> runs on Linux <strong>and</strong> Mac OS X.The Cufflinks assembler <strong>and</strong> abundance estimation algorithms (http://cufflinks.cbcb.umd.edu/) are open-source C++ programs <strong>and</strong> are freely availablein both source <strong>and</strong> binary. The package includes <strong>the</strong> assembler along with utilitiesto structurally compare Cufflinks output between samples (Cuffcompare) <strong>and</strong>to perform differential expression testing (Cuffdiff). Cufflinks is distributedunder <strong>the</strong> Boost License <strong>and</strong> runs on Linux <strong>and</strong> Mac OS X. The source code forCufflinks version 0.8.0 is provided in Supplementary Data 3.27. Sammeth, M., Lacroix, V., Ribeca, P. & Guigó, R. The FLUX Simulator. .28. Johnson, D., Mortazavi, A., Myers, R. & Wold, B. Genome-wide mapping of in vivoprotein-DNA interactions. Science 316, 1497–1502 (2007).29. Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. Ultrafast <strong>and</strong> memory-efficientalignment of short DNA sequences to <strong>the</strong> human genome. Genome Biol. 10, R25(2009).30. Li, H. et al. The sequence alignment/map format <strong>and</strong> SAMtools. Bioinformatics 25,2078–2079 (2009).© 2010 Nature America, Inc. All rights reserved.nature biotechnologydoi:10.1038/nbt.1621


l e t t e r sDynamic single-cell imaging of direct reprogrammingreveals an early specifying eventZachary D Smith 1,2,5 , Iftach Nachman 1,3,5 , Aviv Regev 1,4 & Alex<strong>and</strong>er Meissner 1,2© 2010 Nature America, Inc. All rights reserved.The study of induced pluripotency often relies on experimentalapproaches that average measurements across a largepopulation of cells, <strong>the</strong> majority of which do not becomepluripotent. Here we used high-resolution, time-lapse imagingto trace <strong>the</strong> reprogramming process over 2 weeks fromsingle mouse embryonic fibroblasts (MEFs) to pluripotencyfactor–positive colonies. This enabled us to calculate anormalized cell-of-origin reprogramming efficiency thattakes into account only <strong>the</strong> initial MEFs that respond to formreprogrammed colonies ra<strong>the</strong>r than <strong>the</strong> larger number of finalcolonies. Fur<strong>the</strong>rmore, this retrospective analysis revealedthat successfully reprogramming cells undergo a rapid shiftin <strong>the</strong>ir proliferative rate that coincides with a reductionin cellular area. This event occurs as early as <strong>the</strong> first celldivision <strong>and</strong> with similar kinetics in all cells that form inducedpluripotent stem (iPS) cell colonies. These data contribute to<strong>the</strong> <strong>the</strong>oretical modeling of reprogramming <strong>and</strong> suggest thatcertain parts of <strong>the</strong> reprogramming process follow definedra<strong>the</strong>r than stochastic steps.Ectopic expression of Oct4 <strong>and</strong> Sox2 in combination with both Klf4<strong>and</strong> c-Myc 1 , Klf4 alone 2,3 , Lin28 <strong>and</strong> Nanog 4 or Esrrb 5 is sufficient toreprogram somatic cells to a pluripotent state. Such iPS cells exhibitmany of <strong>the</strong> molecular <strong>and</strong> functional characteristics of embryonicstem (ES) cells 6–12 . Although iPS cell technology has progressed rapidlywithin <strong>the</strong> past 3 years 13 , <strong>the</strong> extended latency <strong>and</strong> low efficiencyof reprogramming events have hindered efforts to characterize <strong>the</strong>underlying mechanisms 14 . One model suggests that continued proliferationallows for <strong>the</strong> accumulation of factor-<strong>media</strong>ted stochasticevents that lead select cells along a path toward pluripotency. In alternativemodels, <strong>the</strong> likelihood of iPS cell colony formation is specifiedat earlier time points by ei<strong>the</strong>r <strong>the</strong> innate heterogeneity within <strong>the</strong>induced pool of somatic cells or by certain factor-<strong>media</strong>ted events that‘prime’ some cells <strong>and</strong> lead to a more defined path toward pluripotency14–16 . Population-level measurements typically done in reprogrammingstudies cannot distinguish between <strong>the</strong>se models.To study reprogramming at <strong>the</strong> single-cell level, we developed alive-cell, high-throughput imaging system to monitor previously characterized,clonally inducible murine embryonic fibroblasts (MEFs) 12(Supplementary Figs. 1 <strong>and</strong> 2). High-resolution transmitted lightimages (Fig. 1a, upper panels) taken at 0.25-d intervals along a 12-dtime course from <strong>the</strong> initial fibroblasts to <strong>the</strong> final iPS cell coloniesshowed that even at low starting cell densities it was virtually impossibleto accurately follow <strong>the</strong> progeny of a single cell over <strong>the</strong> courseof days. To facilitate tracking of individual cells, we transduced MEFswith one of several lentiviral vectors encoding different fluorescentproteins <strong>and</strong> seeded <strong>the</strong>m at variable densities into unlabeled populations(Fig. 1a, lower panels, <strong>and</strong> Supplementary Movie 1).Our system allowed us to trace multiple discrete reprogramming‘lineages’ from parental fibroblast to terminal iPS cell colony. Weacquired images over complete 12- or 14-d experiments at 0.25- to0.5-d intervals across a connected spatial range to provide a representativeglobal field at a sufficient resolution for tracing lineageidentity from any starting cell (Fig. 1b). To provide information formultiple distinctly labeled lineages at every site over time, we acquiredinformation at each position in phase contrast <strong>and</strong> for up to four fluorescentwavelengths (Fig. 1). We generated >500,000 images coveringa total of >80 imaged plates for <strong>the</strong> subsequent analysis. We scoredpositive reprogramming events from terminal acquisitions (at day12 or 14) through stringent Nanog <strong>and</strong> E-cadherin (Cdh1) immunostaining(Fig. 1c), <strong>and</strong> traced <strong>the</strong>m retroactively to <strong>the</strong>ir source MEFsat t = 0 d. Using multi-wavelength overlays (Fig. 1d, lower panels <strong>and</strong>Supplementary Movie 1), we could readily distinguish initial MEFs<strong>and</strong> track <strong>the</strong> resulting iPS cell colonies in <strong>the</strong> global field (Fig. 1b,upper right corner; Fig. 1d, lower panels; <strong>and</strong> Supplementary Movie 1).We measured <strong>the</strong> reprogramming efficiencies as <strong>the</strong> fraction of Nanog<strong>and</strong> Cdh1 double-positive colonies relative to starting cell numbersfor each distinct wavelength (e.g., a representative Cdh1 stain on day12.5; Fig. 1c). Overall reprogramming efficiency fell within 0–33%,an expected variability given <strong>the</strong> low starting numbers of labeled cells(50–200). The mean efficiency of 3.7% across all examined experiments(n = 40) <strong>and</strong> <strong>the</strong> downstream characterization of isolated lines(Supplementary Figs. 2 <strong>and</strong> 3) show that our system is consistentwith o<strong>the</strong>r studies 12,17,18 .However, upon retroactive tracing, we found that only a subsetof iPS cell colonies (termed ‘primary’) could be traced to a sourceMEF at t = 0 d; <strong>the</strong>se colonies displayed characteristic iPS cellcolony behaviors after ~6 d (Fig. 2a, yellow arrowheads). Ano<strong>the</strong>r1 Broad Institute of MIT <strong>and</strong> Harvard, Cambridge, Massachusetts, USA. 2 Harvard Stem Cell Institute <strong>and</strong> Department of Stem Cell <strong>and</strong> Regenerative Biology, HarvardUniversity, Cambridge, Massachusetts, USA. 3 Department of Biochemistry <strong>and</strong> Molecular Biology, Tel Aviv University, Tel Aviv, Israel. 4 Howard Hughes MedicalInstitute <strong>and</strong> Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA. 5 These authors contributed equally to this work.Correspondence should be addressed to A.M. (alex<strong>and</strong>er_meissner@harvard.edu).Received 16 March; accepted 5 April; published online 2 May 2010; doi:10.1038/nbt.1632nature biotechnology VOLUME 28 NUMBER 5 MAY 2010 521


l e t t e r s© 2010 Nature America, Inc. All rights reserved.Figure 1 Continuous single-cell imaging allowstracking of reprogramming cells. (a) Tracking ofuniquely labeled inducible fibroblast populationsover a reprogramming time series. Selectedimages are displayed as a global 4 × 4field in phase contrast (upper panel) <strong>and</strong> withrespective wavelengths highlighted (lower panel).All images are at 10× magnification. (b) 4 × 4multi-wavelength overlay at t = 0 d. These imageswere used to accurately count <strong>the</strong> number ofseeded (<strong>and</strong> attached) starting MEFs for directassessment of reprogramming efficiency inequivalently induced populations. Cells of a givenwavelength (here yellow fluorescent protein (YFP),n = 78) within <strong>the</strong> tracked field were enumeratedfor downstream analysis. (c) Terminal (day 12.5)Cdh1 immunostaining demarcates successfullyreprogrammed colonies <strong>and</strong> demonstrates<strong>the</strong> equitable distribution of colony-formingevents across analyzed wavelengths <strong>and</strong> for<strong>the</strong> population as a whole. Yellow arrowheadsmark colonies that originated from unique YFPlabeledMEFs. Red arrowheads mark coloniesthat originated from red fluorescent protein(RFP)-labeled MEFs. Magenta numbers indicatecolonies (circled with dashed line) that werecounted. Efficiencies provided are based on<strong>the</strong> number of marker-positive colonies dividedby <strong>the</strong> number of MEFs counted in b (YFP <strong>and</strong>RFP) or <strong>the</strong> total number (including unlabeled)seeded. (d) Progression of a single fibroblast toan iPS cell colony over 12.5 d in phase contrast(upper panel) <strong>and</strong> with respective wavelengthshighlighted (lower panel). Colonies were identifiedat <strong>the</strong> terminal time point <strong>and</strong> retrospectivelytraced to <strong>the</strong>ir founding fibroblast. Tracking of asingle cell through <strong>the</strong> complete time series allowsfor comparative morphological characterization ofcells that do reprogram against those that do not.Here, a reprogramming lineage beginning with asingle YFP-labeled fibroblast (no. 16 shown inFig. 1b, magenta square) is traced to <strong>the</strong> resultingiPS colony (Supplementary Movie 1).a0 d 4 d 7 db0 d 12.5 ddcYFP = 2.56%RFP = 3.45%Total = 2.61%0 d 0.75 d 1.25 d 1.75 d 2.25 d163.0 d 4.5 d166.5 d 8.5 d 12.5 d0 d 0.75 d 1.25 d 1.75 d 2.25 d10 dsubset of smaller <strong>and</strong> more symmetricalcolonies consistently appeared later, betweendays 6 <strong>and</strong> 12, <strong>and</strong> upon close inspectioncould not be traced to an original fibroblast(Fig. 2a,b <strong>and</strong> Supplementary Fig. 4 (redarrowheads) <strong>and</strong> Supplementary Movie 2).These late colonies appeared to emerge in <strong>the</strong> inspected position withall <strong>the</strong> characteristics of iPS cells, including small size, round shape,rapid self-re<strong>new</strong>al <strong>and</strong> compacted colony growth. We concluded that<strong>the</strong>y likely arise from single cells or small compacted clusters thathad reprogrammed within an ectopic lineage outside of <strong>the</strong> space inwhich <strong>the</strong> colony itself had emerged; <strong>the</strong>se events are likely secondary<strong>and</strong> lead to a progressive enrichment of ‘satellites’ that do notuniquely correspond to a single responding lineage (Fig. 2a,b <strong>and</strong>Supplementary Figs. 4 <strong>and</strong> 5). It should be noted that <strong>the</strong> imagedareas were sufficiently large to be representative of <strong>the</strong> entire plates<strong>and</strong> <strong>the</strong>refore to capture all behaviors visible with our imaging technique(<strong>and</strong> our experimental n = 40 provides high confidence).As such, secondary satellites confound true calculations of reprogrammingefficiency. We <strong>the</strong>refore defined a normalized efficiencyin which reprogramming events were counted only if <strong>the</strong>y could betraced to <strong>the</strong>ir originating fibroblast, <strong>the</strong>reby excluding satellites.3.0 d 4.5 d 6.5 d 8.5 d 12.5 dThis normalized efficiency ranged between 0 <strong>and</strong> 8% primary coloniesper representative wavelength, with a mean of 1.15% across allexperiments (Fig. 2c <strong>and</strong> Supplementary Fig. 3). Our normalizationprocedure also took into account <strong>the</strong> many instances in whichsingle originating MEFs separated into distinct sub-populations thatindependently gave rise to iPS cell colonies with <strong>the</strong> same latency(Fig. 2d <strong>and</strong> Supplementary Movie 3). In <strong>the</strong> absence of cell tracing,such colony multiplication would contribute to an overestimation ofreprogramming efficiency, which we avoided by counting no morethan one colony per responding MEF.The distinction between primary <strong>and</strong> satellite colonies allowedus to reappraise calculated reprogramming efficiencies over time.Notably, primary colonies arose rapidly <strong>and</strong> reached a stablenumber after <strong>the</strong> first 8 d of reprogramming. In contrast, satellitesappeared later <strong>and</strong> continued to increase in number, likelyan effect of <strong>the</strong> progressive growth of iPS colonies, each cell of16522 VOLUME 28 NUMBER 5 MAY 2010 nature biotechnology


l e t t e r s© 2010 Nature America, Inc. All rights reserved.Figure 2 Progressive accumulation ofsecondary, non-unique “satellite” colonies skewinterpretation of reprogramming data. (a) GFPlabeledsatellite colonies without unique originsover a global 5 × 5 field in 10× magnification.Satellite colonies (a subset highlighted with redarrowheads; see Supplementary Figure 4 formore images) typically without a traceable originbecome macroscopically visible after day 6<strong>and</strong> after <strong>the</strong> formation of primary colonies(yellow arrowheads) (Supplementary Movie 2).A grid (light gray) <strong>and</strong> squares (red) were addedto <strong>the</strong> image to help orientation <strong>and</strong> facilitatecomparison (as apparent in b). (b) Zoom-inview of two satellite colonies (nos. 4 <strong>and</strong> 5).In colony no. 4, it is clearly visible that betweendays 9 <strong>and</strong> 10 all cells are accounted for, butthat a <strong>new</strong> cluster of cells (arrowhead) hasappeared within 24 h. Note <strong>the</strong> small greendot that has not moved. Similarly, below itis apparent that nei<strong>the</strong>r of <strong>the</strong> two coloniespresent in <strong>the</strong> day-14 image originated fromany cell in this field. The entire imaged area<strong>and</strong> additional colonies can be inspectedin Supplementary Figure 4. (c) Correctedefficiencies accounting for colonies in which aunique cell of origin status can be assigned, <strong>and</strong>which has an enhanced capacity to form its own colony upon dissociation(Fig. 2e), increasing <strong>the</strong> likelihood that cells will detachfrom a primary colony <strong>and</strong> form a satellite elsewhere. The abilityto distinguish primary from satellite colonies allows us to refinemodels of reprogramming that may have previously includedartifacts scored as de novo reprogramming events.After induction we observed several distinct cell types based onbroad morphological <strong>and</strong> proliferative characteristics. As expected,most cells failed to initiate reprogramming <strong>and</strong> generally resembled<strong>the</strong> initial somatic fibroblast population (Fig. 3a; t = 0 d),responding with ei<strong>the</strong>r arrested/apoptotic (A) or slow-dividing(SD) behaviors according to time series data <strong>and</strong> Annexin V staining(Fig. 3a, A <strong>and</strong> SD panels, <strong>and</strong> Supplementary Movie 4a,b <strong>and</strong>Supplementary Fig. 1). In addition, we observed a fast-dividingfibroblast (FD) population at a much lower frequency (~1% of <strong>the</strong>starting fibroblasts). These cells exhibited a higher proliferative ratethan normal fibroblasts <strong>and</strong> initially showed a decrease in size, butretained an elongated cellular morphology characteristic of mesenchymalcells (Fig. 3a, FD panel, <strong>and</strong> Supplementary Movie 4c).Moreover, <strong>the</strong>se cells continued to grow in monolayer clusters thatspread over large areas.When we traced primary iPS cell colonies back to <strong>the</strong>ir originalsource cells, we found that <strong>the</strong>y arose from a distinct class of small,fast-dividing cells that emerged soon after induction (Fig. 3a, iPSpanel, <strong>and</strong> Supplementary Movie 4d). These cells proliferated fasterthan <strong>the</strong> starting fibroblast population <strong>and</strong>, within a few cell divisions,showed markedly reduced cell size. To quantify <strong>the</strong>se observations, wea0 d 4bd10 d0 d23561234 4 4 4removing all apparent secondary events (means of all analyzed, n = 40, corrected <strong>and</strong> uncorrectedcounts are shown <strong>and</strong> significant to P = 0.00034, paired t-test). (d) A single YFP-labeled inducibleMEF (yellow arrow) exhibits <strong>the</strong> potential to contribute multiple (at least six) colony-formingevents (highlighted <strong>and</strong> enumerated by asterisks) before cells demonstrate an iPS cell morphology,suggesting that <strong>the</strong> ability to reprogram is specified in early precursors <strong>and</strong> can be distributed tomultiple progeny (Supplementary Movie 3). (e) Cumulative primary <strong>and</strong> satellite colonies per wellanalyzed (n = 16). Primary colonies arise during <strong>the</strong> first 4–8 d, after which <strong>the</strong> number stabilizes.Satellites were scored at day 14 <strong>and</strong> traced to <strong>the</strong> earliest time (typically between days 6 <strong>and</strong> 12)in which a founding cell could be identified. Thin lines represent individual experiments. Bold lineindicates <strong>the</strong> mean over all experiments.59 d9 d410 d5610 de1234examined 19 representative primary colonies <strong>and</strong> traced <strong>the</strong>m backto a starting MEF at t = 0 d (Supplementary Fig. 6). The cells thatled to iPS cell colonies had an increased proliferative rate (generationtime 12.2 ± 2.8 h) after <strong>the</strong> first division <strong>and</strong> grew exponentially over<strong>the</strong> next several days at a rate similar to that observed for murineES cells (11–16 h) 19 <strong>and</strong> much faster than that of somatic murinecells such as MEFs (18–22 h) 20 or <strong>the</strong> induced population as a whole(Fig. 3b). The fast proliferative trait was conferred equally to bothdaughter cells as early as <strong>the</strong> first division (Supplementary Fig. 7a<strong>and</strong> Supplementary Movie 1).IPS cell–forming populations were also distinct in cell area <strong>and</strong>shape. Lineages that formed iPS cell colonies exhibited a sequentialreduction in cellular area over time (when normalized by <strong>the</strong> numberof divisions) <strong>and</strong> acquired a <strong>new</strong>, stably maintained size within three tofour divisions, concurrent with <strong>the</strong>ir increased proliferative rate (Fig. 3c<strong>and</strong> Supplementary Fig. 7b). The narrow size range of <strong>the</strong>se smallercells stood out compared to <strong>the</strong> variability in initial fibroblasts orwithin <strong>the</strong> FD cells. IPS cells also exhibited changes in eccentricity, orcell shape, <strong>and</strong> <strong>the</strong>ir intercellular characteristics suggested an enhancedclustering compared to <strong>the</strong> original MEFs (Supplementary Fig. 7c,d).Moreover, as <strong>the</strong> number of cells descending from an individual MEFincreased, multiple progenitors conferred <strong>the</strong>se morphological <strong>and</strong>proliferative characteristics to <strong>the</strong>ir progeny cells. The apparent symmetryby which <strong>the</strong>se traits are inherited indicates a fundamentalchange in <strong>the</strong> homeostatic principles governing somatic MEFs thatcan occur as early as <strong>the</strong> first division (Figs. 1d <strong>and</strong> 3c, SupplementaryMovies 1,3 <strong>and</strong> 4d, <strong>and</strong> Supplementary Fig. 7). These results suggest5c6Nanog + colonyefficiency (%)5550Regular CorrectedCount*1*1*1*2*2*3*2*3 AP *3*5*5*5*4*4*4*6*6 Cdh1 *67 d 10 d 14 d 14 dNanogCumulative colony emergence14 d14 d110201816Satellite141210864Primary20 2 3 4 5 6 7 8 9 10 11 12Days8642234**56nature biotechnology VOLUME 28 NUMBER 5 MAY 2010 523


l e t t e r s© 2010 Nature America, Inc. All rights reserved.abCell number0 dASDFDiPS1501005000 d 2 d 3 d 3.5 d 6 d 12 d 12 diPS (n = 19)FD (n = 5)SD (n = 5)5 10 15Time pointthat establishing rapid divisions in which cell size decreases is a necessary<strong>and</strong> early step in <strong>the</strong> establishment of iPS cells.Recent reports have suggested that inhibition of p53 <strong>and</strong> itsdownstream pathways can significantly enhance murine <strong>and</strong>human reprogramming efficiency 21–26 . Given our ability to monitor<strong>the</strong> reprogramming process, we directly investigated <strong>the</strong> effect ofp53 knockdown at <strong>the</strong> single-cell level. We substituted one of ourlabeled fluorescent populations with one infected by a lentiviralvector co-expressing constitutive green fluorescent protein (GFP)<strong>and</strong> a short hairpin RNA (shRNA) targeting p53 (Fig. 4a <strong>and</strong>Supplementary Fig. 8) 27 . We found a notable (4.1-fold) increasein <strong>the</strong> number of cells that initiated <strong>and</strong> maintained a higherproliferative rate, smaller size <strong>and</strong> increased cluster formationcompared to <strong>the</strong> internal fluorescently labeled controls lackingshRNA (Fig. 4a,b). On day 14, our terminal image acquisition forthis experiment, we stained for Nanog, Cdh1 <strong>and</strong> alkaline phosphatasein all imaged wells (Fig. 4a, right panels).Although p53 knockdown appeared to exp<strong>and</strong> <strong>the</strong> globalpopulation of responding cells, many of <strong>the</strong>se cells led to aberrant(nonreprogrammed) colonies, resulting in a reduction in <strong>the</strong> overallreprogramming efficiency (normalized, as above, against <strong>the</strong> numberof responding MEFs). In particular, when we characterized <strong>the</strong> ratioof aberrant to reprogrammed colonies, we found a higher fractionof aberrant colonies in p53-depleted cells compared to controlpopulations (Fig. 4a–c <strong>and</strong> Supplementary Movie 5). Notably, <strong>the</strong>early response within p53-knockdown cells (time of first divisioncCell area (a.u.)7,0006,0005,0004,0003,0002,0001,0000 2 4log 2 (cell count)iPS (n = 19)Figure 3 Unique fates of induced fibroblasts reveal a conserved trajectory for reprogrammingcells. (a) Representation of distinct cell fates in response to factor induction. From top to bottom:apoptotic/arrested (A), slow-dividing (SD), fast-dividing fibroblast (FD) <strong>and</strong> (iPS) cell morphologiesat t = 0 d <strong>and</strong> across representative time points during <strong>the</strong> reprogramming process (SupplementaryMovies 4a–d). The left <strong>and</strong> right images are transmitted, multi- or single-wavelength overlays. Centerimages show only <strong>the</strong> different wavelength images. Time is indicated in days. Images are 10×.(b) Cell number over <strong>the</strong> first 4 d of <strong>the</strong> reprogramming timeline (time point = 0.25 d); linesrepresent data for lineages of nonreprogramming cell types (FD, magenta, n = 5; SD, red, n = 5)<strong>and</strong> cells that will form iPS cell colonies (iPS, blue, n = 19). (c) Cellular area (in arbitrary units/pixels) as mapped over division number within iPS cell–forming lineages (n = 19, <strong>media</strong>n values pertimepoint). A stable ES/iPS-like cell size is reached within two to four divisions.61.30 ± 0.19 d (n = 11)) that led to pluripotencymarker–negative colonies was similarto that of control responding-populationsthat form marker-positive iPS cells (Fig. 4d,e,Supplementary Movie 5 <strong>and</strong> SupplementaryFig. 8). However, at later time points weobserved a greater variability in proliferationrate <strong>and</strong> terminal size (SupplementaryMovie 5). Because we account exactly for <strong>the</strong>number of MEFs at <strong>the</strong> time of induction <strong>and</strong>score reprogramming by <strong>the</strong> number of thoseMEFs that form iPS cells, we can concludethat p53 knockdown exp<strong>and</strong>s <strong>the</strong> pool of cellsthat exhibits <strong>the</strong> morphological <strong>and</strong> proliferativephenotypes of reprogramming lineages,but may not improve <strong>the</strong> fraction that canform molecularly defined iPS cell colonieswithin <strong>the</strong> described temporal window.Previous reports in which p53 depletionwas shown to improve reprogramming efficiencyrelied on counting iPS cell colonies ata single, terminal time point 21,24,25 <strong>and</strong> <strong>the</strong>reforecould not discern <strong>the</strong> subset of somaticcells that responded positively to factorinduction or detect <strong>the</strong> accumulation of aberrantcolony morphologies. In certain experimentalsettings, such as in direct infection orthree-factor (Oct4, Sox2, Klf4) reprogrammingprotocols in previous studies 21 , p53inhibition may enhance reprogramming efficiencyor simply maintain <strong>the</strong> proliferationof cells that would o<strong>the</strong>rwise arrest <strong>and</strong>/orsenesce. However, our results suggest that <strong>the</strong>constitutive loss of p53 may derail cells withan o<strong>the</strong>rwise permissive stoichiometry of <strong>the</strong>Oct4, Sox2, Klf4 <strong>and</strong> c-Myc factors from <strong>the</strong> normal reprogrammingtrajectory or stabilize an inter<strong>media</strong>te state that could o<strong>the</strong>rwise leadto <strong>the</strong> formation of iPS cells 21,24,25 .As reprogramming lineages continued into inter<strong>media</strong>te pointswithin our 2-week timeline, it became increasingly difficult to identifyor segment all cells in a responding population. Never<strong>the</strong>less,distinct events within <strong>the</strong> timeline could still be identified <strong>and</strong>attributed to unique lineages. We scored <strong>the</strong> analyzed colonies forcompaction events by which cells exhibited enhanced intercellularbinding <strong>and</strong> through which final iPS cell colonies emerged. Theseconsistently arose between days 4 <strong>and</strong> 8 from <strong>the</strong> rapidly dividing,size-reduced cells with similar latency (Supplementary Fig. 9 <strong>and</strong>Supplementary Movie 6) 12,18,28 .Previous studies have proposed several models for reprogramming15,16 , including a ‘stochastic one-step model’ whereby reprogrammingof a given cell occurs stochastically in one step throughout<strong>the</strong> time line of <strong>the</strong> experiment at a uniform intrinsic probabilityper cell that depends only on <strong>the</strong> derivation conditions 16 . We tested<strong>the</strong> fit of a ‘stochastic one-step model’ when limited to <strong>the</strong> iPS celllineages alone, using colony compaction times as determinants forreprogramming. The observed rate was on <strong>the</strong> order of 0.001 percell per day (Supplementary Fig. 9e, purple curve), a rate markedlyhigher than <strong>the</strong> kinetics found when tracing reprogramming eventsthat occur after <strong>the</strong> 14-d time period when limited to populationsthat had not reprogrammed earlier but continued to proliferate 16 .Colony compaction times show a similar or better fit to a normal524 VOLUME 28 NUMBER 5 MAY 2010 nature biotechnology


l e t t e r saCFP-ctrGFP-p53 KDRFP-ctrYFP-ctrb14.5 dp53-kd (10 ×)Transmitted lightAPCdh1Nanog1 248© 2010 Nature America, Inc. All rights reserved.1 dctr-RFP (10×)1.5 d1.751 241 d1.5 d1.75 dFigure 4 Effects of p53 knockdown onsingle cells during <strong>the</strong> reprogrammingprocess. (a) A revised imagingexperiment in which control cellswere tagged as before with YFP, cyanfluorescent protein (CFP) or RFP. Thecontrol GFP vector was replaced witha p53-shRNA containing GFP vector 27 .Induction <strong>and</strong> acquisition were done2 d3 d3 d7 d7 d14.5 d14.5 dc p53 KD d p53 KD (n=15) eEfficiency (%)72 d543210**PrimaryresponseInternal controlsAP+**Nanog/cadherinCell count120100806040209 d9 dControls (n=19)12 d12 d0 2 4 6 8 10 12 14 16 18Time pointCell area (a.u.)7,0006,0005,0004,0003,0002,0001,0000AP*APCdh1Cdh12 4log 2 (cell count)p53 KD (n=15)as before over an 11 × 11 imagefield. Left: multi-wavelength overlay shows <strong>the</strong> notable increase in GFP colonies. Right: p53-depleted cells (tagged with GFP) exhibit an increasednumber of colony-like morphologies that display only minimal or incomplete activation of endogenous pluripotency markers. Most of <strong>the</strong> GFP coloniescannot be matched to an alkaline phosphatase (AP)-, Cdh1- or Nanog-positive colony. Note: <strong>the</strong> transmitted light <strong>and</strong> <strong>the</strong> marker stains show all colonies(including unlabeled controls, which represent <strong>the</strong> majority; white arrows: factor-negative colonies; colored arrows: factor-positive colonies). Colonies arecircled with dashed lines to facilitate mapping across images. (b) Selected images of <strong>the</strong> progression for a single p53-depleted cell (upper panel) <strong>and</strong> acontrol cell (tagged with RFP, bottom panel). Both exhibit similar enhanced proliferation <strong>and</strong> morphological characteristics at early time points but resultin disparate fates (Supplementary Movie 5). Last panels on <strong>the</strong> right show alkaline phosphatase <strong>and</strong> Cdh1 staining. (c) Formation of primary colonies,alkaline phosphatase positivity <strong>and</strong> Nanog/cadherin signal for p53-depleted cells compared to alternatively labeled controls; P = 0.00004, 0.4 <strong>and</strong> 0.01,respectively, paired Kolmogorov-Smirnof test, as calculated by events over starting population. Means over eight wells are shown. (d) The proliferativecharacteristics of reprogramming p53 knockdown cells are comparable to reprogramming controls over <strong>the</strong> first 4 d. (e) p53 knockdown cells exhibit sizereduction dynamics that are also similar to those for normally reprogramming cells within <strong>the</strong> first four divisions.6distribution that is more consistent with a sequential model, where aprogressive series of steps in a lineage leads to successful reprogramming(Supplementary Fig. 9e). The im<strong>media</strong>te induction of <strong>the</strong>seresponses <strong>and</strong> <strong>the</strong> consistent subsequent events are in line with bothan ‘elite’ deterministic model (where <strong>the</strong> subset of reprogramminglineages is determined early) <strong>and</strong> a stochastic model that assumes astepwise acquisition of traits in which early choices play a dominantrole (Supplementary Fig. 9). The highly synchronized <strong>and</strong> reproduciblenature of <strong>the</strong>se events argues against a model with multiple stochasticallytimed steps as it poorly explains <strong>the</strong> defined emergence ofcolonies with a similar latency within a 2-week timeline.In conclusion, our high-resolution dynamic imaging of <strong>the</strong> reprogrammingprocess enabled <strong>the</strong> identification of proliferation <strong>and</strong>morphological characteristics that precede <strong>the</strong> activation of molecularmarkers for pluripotency. This approach also provided an accuratemeasure of reprogramming efficiency that is normalized accordingto a colony’s cell of origin. The observed decoupling of cell size <strong>and</strong>proliferation in reprogramming cells is a radical departure from <strong>the</strong>fibroblast cell cycle <strong>and</strong> suggests that overcoming <strong>the</strong>se cell-size <strong>and</strong>proliferation checkpoints is an important early step in reprogramming.Normal fibroblasts maintain tight control over <strong>the</strong>ir cell size,which is retained after mitosis during <strong>the</strong> prolonged G1 phase of <strong>the</strong>somatic cell cycle 29 . The fact that all tracked cells that successfullyreprogram im<strong>media</strong>tely increased <strong>the</strong>ir proliferation rate <strong>and</strong> reduced<strong>the</strong>ir size suggests that ectopic factor expression allows <strong>the</strong>se cells toovercome those checkpoints early (Fig. 3b,c). However, <strong>the</strong> fact thatalternative fates, such as <strong>the</strong> observed FD cells <strong>and</strong> p53 knockdowncells, showed a similar initial response (Figs. 3 <strong>and</strong> 4) suggests thatincreased proliferation <strong>and</strong> size reduction are not sufficient <strong>and</strong> maydescribe an inter<strong>media</strong>te step that can itself be stabilized toward aberrant,nonreprogrammed states. Fur<strong>the</strong>rmore, although successfulreprogramming may be initiated early, it none<strong>the</strong>less requires continuedexpression of <strong>the</strong> reprogramming factors, as demonstrated byprevious doxycycline-withdrawal experiments 18,28 . More nuancedstrategies for isolating <strong>and</strong> studying this small responding populationwill be needed to underst<strong>and</strong> <strong>the</strong> molecular mechanisms underlyingnature biotechnology VOLUME 28 NUMBER 5 MAY 2010 525


l e t t e r s© 2010 Nature America, Inc. All rights reserved.<strong>the</strong> gradual reacquisition of pluripotency. We propose as one possiblemechanism that <strong>the</strong> preliminary response may rely on a unique couplingof somatic silencing <strong>media</strong>ted by Oct4/Sox2 <strong>and</strong> <strong>the</strong> acquisitionof ES cell–like biosyn<strong>the</strong>tic <strong>and</strong> cell cycle properties that are <strong>media</strong>tedby c-Myc, a predominant transcription factor with abundant somatictargets. A complete underst<strong>and</strong>ing of <strong>the</strong> changes that occur within<strong>the</strong> cells that transition to pluripotency will be necessary for safer <strong>and</strong>more efficient generation of iPS cells that will eventually unlock <strong>the</strong>irtremendous potential for regenerative medicine.MethodsMethods <strong>and</strong> any associated references are available in <strong>the</strong> online versionof <strong>the</strong> paper at http://www.nature.com/naturebiotechnology/.Accession code. GEO, GSE21361, for mRNA expression profiling.Note: Supplementary information is available on <strong>the</strong> Nature Biotechnology website.AcknowledgmentsWe thank A. Carpenter <strong>and</strong> M. Bray from <strong>the</strong> Broad Imaging Platform for helpwith <strong>the</strong> initial CellProfiler image analysis pipeline. We thank E.S. L<strong>and</strong>er <strong>and</strong>C. Bock <strong>and</strong> R.P. Koche for critical reading of <strong>the</strong> manuscript as well asM. Thomson, M. Staller, A. De Los Angeles <strong>and</strong> J. Dennett for technical assistance<strong>and</strong> intellectual input. I.N. was supported by a Merck postdoctoral fellowship <strong>and</strong>an Alon fellowship. A.R. was supported by a Career Award at <strong>the</strong> Scientific Interfacefrom <strong>the</strong> Burroughs Wellcome Fund, an National Institutes of Health PioneerAward <strong>and</strong> <strong>the</strong> Sloan Foundation. A.R. is an Early Career Scientist of <strong>the</strong> HowardHughes Medical Institute <strong>and</strong> an Investigator of <strong>the</strong> Merkin Foundation for StemCell Research at <strong>the</strong> Broad Institute. A.M. was supported by <strong>the</strong> Pew CharitableTrust <strong>and</strong> a New Investigator grant by <strong>the</strong> Massachusetts Life Science Center(MLSC). This work was funded by <strong>the</strong> Pew <strong>and</strong> MLSC.Author contributionsZ.D.S., I.N., A.R. <strong>and</strong> A.M. conceived <strong>the</strong> experiments <strong>and</strong> wrote <strong>the</strong> manuscript.Z.D.S. generated all reagents <strong>and</strong> performed <strong>the</strong> experiments. Z.D.S. <strong>and</strong> I.N.performed <strong>the</strong> analysis.COMPETING FINANCIAL INTERESTSThe authors declare no competing financial interests.Published online at http://www.nature.com/naturebiotechnology/.Reprints <strong>and</strong> permissions information is available online at http://npg.nature.com/reprints<strong>and</strong>permissions/.1. Takahashi, K. & Yamanaka, S. Induction of pluripotent stem cells from mouseembryonic <strong>and</strong> adult fibroblast cultures by defined factors. Cell 126, 663–676(2006).2. Nakagawa, M. et al. Generation of induced pluripotent stem cells without Myc frommouse <strong>and</strong> human fibroblasts. Nat. Biotechnol. 26, 101–106 (2008).3. Wernig, M., Meissner, A., Cassady, J.P. & Jaenisch, R. C-Myc is dispensable fordirect reprogramming of mouse fibroblasts. Cell Stem Cell 2, 10–12(2008).4. Yu, J. et al. Induced pluripotent stem cell lines derived from human somatic cells.Science 318, 1917–1920 (2007).5. Feng, B. et al. Reprogramming of fibroblasts into induced pluripotent stem cellswith orphan nuclear receptor Esrrb. Nat. Cell Biol. 11, 197–203 (2009).6. Wernig, M. et al. In vitro reprogramming of fibroblasts into a pluripotent ES-cell-likestate. Nature 448, 318–317 (2007).7. Okita, K., Ichisaka, T. & Yamanaka, S. Generation of germline-competent inducedpluripotent stem cells. Nature 448, 313–317 (2007).8. Maherali, N. et al. Global epigenetic remodeling in directly reprogrammed fibroblasts.Cell Stem Cell 1, 55–70 (2007).9. Bol<strong>and</strong>, M.J. et al. Adult mice generated from induced pluripotent stem cells.Nature 461, 91–94 (2009).10. Kang, L., Wang, J., Zhang, Y., Kou, Z. & Gao, S. iPS cells can support full-termdevelopment of tetraploid blastocyst-complemented embryos. Cell Stem Cell 5,135–138 (2009).11. Zhao, X.Y. et al. iPS cells produce viable mice through tetraploid complementation.Nature 461, 86–90 (2009).12. Mikkelsen, T.S. et al. Dissecting direct reprogramming through integrative genomicanalysis. Nature 454, 49–55 (2008).13. Amabile, G. & Meissner, A. Induced pluripotent stem cells: current progress <strong>and</strong>potential for regenerative medicine. Trends Mol. Med. 15, 59–68 (2009).14. Jaenisch, R. & Young, R. Stem cells, <strong>the</strong> molecular circuitry of pluripotency <strong>and</strong>nuclear reprogramming. Cell 132, 567–582 (2008).15. Yamanaka, S. Elite <strong>and</strong> stochastic models for induced pluripotent stem cellgeneration. Nature 460, 49–52 (2009).16. Hanna, J. et al. Direct cell reprogramming is a stochastic process amenable toacceleration. Nature 462, 595–601 (2009).17. Wernig, M. et al. A drug-inducible transgenic system for direct reprogramming ofmultiple somatic cell types. Nat. Biotechnol. 26, 916–924 (2008).18. Stadtfeld, M., Maherali, N., Breault, D. & Hochedlinger, K. Defining molecularcornerstones during fibroblast to iPS cell reprogramming in mouse. Cell Stem Cell2, 230–240 (2008).19. Orford, K.W. & Scadden, D.T. Deconstructing stem cell self-re<strong>new</strong>al: genetic insightsinto cell-cycle regulation. Nat. Rev. Genet. 9, 115–128 (2008).20. Kamijo, T. et al. Tumor suppression at <strong>the</strong> mouse INK4a locus <strong>media</strong>ted by <strong>the</strong>alternative reading frame product p19ARF. Cell 91, 649–659 (1997).21. Hong, H. et al. Suppression of induced pluripotent stem cell generation by <strong>the</strong>p53-p21 pathway. Nature 460, 1132–1135 (2009).22. Kawamura, T. et al. Linking <strong>the</strong> p53 tumour suppressor pathway to somatic cellreprogramming. Nature 460, 1140–1144 (2009).23. Li, H. et al. The Ink4/Arf locus is a barrier for iPS cell reprogramming. Nature 460,1136–1139 (2009).24. Marion, R.M. et al. A p53-<strong>media</strong>ted DNA damage response limits reprogrammingto ensure iPS cell genomic integrity. Nature 460, 1149–1153 (2009).25. Utikal, J. et al. Immortalization eliminates a roadblock during cellular reprogramminginto iPS cells. Nature 460, 1145–1148 (2009).26. Banito, A. et al. Senescence impairs successful reprogramming to pluripotent stemcells. Genes Dev. 23, 2134–2139 (2009).27. Ventura, A. et al. Cre-lox-regulated conditional RNA interference from transgenes.Proc. Natl. Acad. Sci. USA 101, 10380–10385 (2004).28. Brambrink, T. et al. Sequential expression of pluripotency markers duringdirect reprogramming of mouse somatic cells. Cell Stem Cell 2, 151–159(2008).29. Singh, A.M. & Dalton, S. The cell cycle <strong>and</strong> Myc intersect with mechanisms thatregulate pluripotency <strong>and</strong> reprogramming. Cell Stem Cell 5, 141–149 (2009).526 VOLUME 28 NUMBER 5 MAY 2010 nature biotechnology


© 2010 Nature America, Inc. All rights reserved.ONLINE METHODSGeneration of fluorescently labeled inducible fibroblast lines. E13.5 doxycyclineinduciblefibroblasts were generated as described previously <strong>and</strong> passed twicebefore infection with a FUW lentivirus 17 in which GFP, YFP, RFP or a CFP-B actinfusion protein (Evrogen) was cloned into EcoRI sites. Fibroblast cultures infectedwith one respective fluorescent protein were exp<strong>and</strong>ed for at least one additionalpassage before serum starvation <strong>and</strong> seeding at unique representations withincontrol, uninfected inducible MEFs that were passaged in parallel. MEFs werecultured under serum starvation conditions until <strong>the</strong> onset of imaging at whichpoint <strong>the</strong>y were switched into st<strong>and</strong>ard mouse ES medium supplemented with2 µg/ml doxycycline (Sigma). This protocol ensured a uniform initial response toectopic factor induction from a globally arrested somatic population <strong>and</strong> facilitated<strong>the</strong> tracking of single cells. Cells were kept on doxycycline for <strong>the</strong> durationof all imaging experiments. Isolated iPS cell lines were exp<strong>and</strong>ed withoutdoxycycline <strong>and</strong> characterized by immunostaining <strong>and</strong> by blastocyst injection.Primers for real time are as described 17,26 <strong>and</strong> conducted using an SuperScript IIReverse Transcriptase (Invitrogen), Power SYBR Green PCR Master Mix (AppliedBiosystems) <strong>and</strong> a 384-well 7900 RT-PCR Machine (Applied Biosystems).Image acquisition, immunohistochemistry <strong>and</strong> iPS cell colony scoring.Inducible MEFs were plated in 12-well plates at low densities <strong>and</strong> imaged usinga IX-71 microscope (Olympus) <strong>and</strong> motorized Prior XY stage (SupplementaryFig. 10 <strong>and</strong> Supplementary Movie 7). Images were taken within a connected4 × 4 or 5 × 5 spatial range at 10× magnification <strong>and</strong> in up to four fluorescentwavelengths using Metamorph Advanced High Thoughput Screening software(Metamorph). Acquisitions were taken with manual oversight every 6 to 12 hfor 10–14 d to minimize <strong>the</strong> exposure of induction plates to atmosphericconditions <strong>and</strong> temperature. At <strong>the</strong> end of a given imaging experiment, plateswere fixed in 4% paraformaldehyde <strong>and</strong> immunostained for Nanog (Abcamor Convance) <strong>and</strong>/or E-cadherin (Abcam) at 1:500 dilution <strong>and</strong> detectedusing Alexa488 or Alexa594 conjugated secondary Antibodies (JacksonImmunoresearch). Additional Immunostaining for line characterization usedOct4 (Abcam), Stella (Millipore) <strong>and</strong> SSEA1 (Santa Cruz) primary antibodies.Alkaline phosphatase was detected using a st<strong>and</strong>ard alkaline phosphatasestaining kit (Stemgent) with 8 <strong>and</strong> 40 min (Supplementary Fig. 8) sequentialincubations that provided a precise gauge of stain sensitivity. Without analyzingtime-lapse information, colonies were scored in efficiency calculations if <strong>the</strong>ydemonstrated uniform signal positivity <strong>and</strong> appeared distinct from o<strong>the</strong>rcolonies (Fig. 1c) as a st<strong>and</strong>ard metric.Image analysis. A semi-automated cell segmentation pipeline using <strong>the</strong>CellProfiler package 30 was used on images from <strong>the</strong> fluorescent channelsfor <strong>the</strong> period in which cells were discernable by eye (around 4 d for proliferativecells). The package <strong>the</strong>n calculated morphological attributes (suchas area <strong>and</strong> eccentricity) for each cell. Fur<strong>the</strong>r analysis was done in Matlab.Manual analysis such as time of compaction, or assigning of morphologicalattributes (SD, FD, iPS, A), used time-lapse images of entire 4 × 4 or 5 × 5global fluorescent overlays across <strong>the</strong> entire experimental timeline. Sitesof interest (predominantly those containing iPS colonies) were scored <strong>and</strong>tracked retrospectively to <strong>the</strong> earliest point in which a parent cell couldbe observed. Primary colonies were scored as those with initial fibroblastorigins whereas secondary events were scored if no discernable origincould be found. Primary colonies were catalogued according to <strong>the</strong>ir initialresponse, <strong>the</strong> time of compaction as measured by <strong>the</strong> earliest instance inwhich cells demonstrated compact ES-like colony growth <strong>and</strong> pluripotencymarker staining. Satellites were scored for marker positivity <strong>and</strong> for <strong>the</strong>earliest time in which <strong>the</strong>y were observed. Colonies <strong>and</strong> o<strong>the</strong>r morphologiesfor CellProfiler analysis were annotated during this manual analysis<strong>and</strong> stacks of phase contrast <strong>and</strong> respective wavelengths of interest weregenerated (Supplementary Fig. 10). Movies were constructed using basicImageJ software with StackCombiner <strong>and</strong> MtrackerJ plug-ins (ImageJ). For<strong>the</strong> characterization of <strong>the</strong> satellite colony appearance, a bounding rectanglewas manually determined for each of <strong>the</strong> analyzed satellite <strong>and</strong> primary iPScell lineages. Total fluorescent intensity in <strong>the</strong> rectangle was summed foreach time point.Modeling <strong>and</strong> statistical analysis. We tested a one-step stochastic model 16where <strong>the</strong> probability of a given cell to reprogram at time t is proportionalto e −kt . Assuming average proliferation time of τ, <strong>and</strong> neglecting cell dea<strong>the</strong>vents, <strong>the</strong> model implies <strong>the</strong> probability of a lineage to have any reprogrammingcell by time t is:t tt t −iP( tR≤ t) = 1 − exp( k∑it2)i = 0Colony compaction times were fit to this model to find <strong>the</strong> optimal k usingτ = 12 h, as well as to a Gaussian distribution model. Maximum likelihoodestimator was used to fit parameters, <strong>and</strong> a likelihood ratio test was used tocompare <strong>the</strong> fit of <strong>the</strong> models.30. Carpenter, A.E. et al. Cell profiler: image analysis software for identifying <strong>and</strong>quantifying cell phenotypes. Genome Biol. 7, R100 (2006).doi:10.1038/nbt.1632nature biotechnology


careers <strong>and</strong> recruitmentFirst quarter resurgence in biotech job postingsMichael FranciscoThe first quarter of 2010 saw a resurgence of biotech <strong>and</strong> pharmapostings on <strong>the</strong> three representative job databases tracked byNature Biotechnology (Tables 1 <strong>and</strong> 2). Most of <strong>the</strong> 10 largest biotech<strong>and</strong> 25 largest pharma companies saw an increase in listings comparedwith those in <strong>the</strong> fourth quarter of 2009 (Nat. Biotechnol. 28, 179, 2010),with three times as many biotechs listing more positions than thoselisting fewer. For pharmas, this ratio was 4:1.However, <strong>the</strong>re was still some downsizing in <strong>the</strong> life science industry(Table 3). Nature Biotechnology will continue to follow hiring <strong>and</strong>firing trends throughout 2010.© 2010 Nature America, Inc. All rights reserved.Table 1 Who’s hiring? Advertised openings at <strong>the</strong> 25 largestbiotech companiesCompany aNumber of Number of advertised openings bemployees Monster Biospace NaturejobsMonsanto 21,700 0 0 59Amgen 16,800 40 1 4Genentech 11,186 9 21 98Genzyme 11,000 93 3 152Life Technologies 9,700 36 41 0PerkinElmer 7,900 27 0 0Bio-Rad Laboratories 6,600 10 13 0Biomerieux 6,140 11 0 0Millipore 5,900 28 32 0IDEXX Laboratories 4,700 20 0 0Biogen Idec 4,700 47 50 0Gilead Sciences 3,441 0 24 0WuXi PharmaTech 3,172 0 0 0Qiagen 3,041 0 0 0Cephalon 2,780 0 2 1Biocon 2,772 0 0 0Celgene 2,441 1 12 0Biotest 2,108 8 4 0Actelion 2,054 3 3 0Amylin Pharmaceuticals 1,800 8 9 0Elan 1,687 7 5 0Illumina 1,536 20 16 4Albany Molecular Research 1,357 0 0 5Vertex Pharmaceuticals 1,322 47 62 1CK Life Sciences 1,315 0 0 0a As defined in Nature Biotechnology’s survey of public companies (27, 710–721, 2009). b Assearched on Monster.com, Biospace.com <strong>and</strong> Naturejobs.com, April 14, 2010. Jobs may overlap.Table 2 Advertised job openings at <strong>the</strong> ten largest pharmacompaniesCompany aNumber of Number of advertised openings bemployees Monster Biospace NaturejobsJohnson & Johnson 119,200 688 1 0Bayer 106,200 96 16 4GlaxoSmithKline 103,483 7 0 3Sanofi-Aventis 99,495 29 3 0Novartis 98,200 52 69 16Pfizer 86,600 94 88 90Roche 78,604 41 37 19Abbott Laboratories 68,697 85 0 0AstraZeneca 67,400 65 4 3Merck & Co. 59,800 1 2 0a Data obtained from MedAdNews. b As searched on Monster.com, Biospace.com <strong>and</strong> Naturejobs.com,April 14, 2010. Jobs may overlap.Table 3 Selected biotech <strong>and</strong> pharma downsizingsCompanyNumber ofemployeescutDetailsAstraZeneca 8,000 Disclosed in 4Q09 earnings plans to fur<strong>the</strong>rreduce head count by 2014 as part of its2007 restructuring program, bringing totalhead count reductions for <strong>the</strong> restructuringto 23,000.Cell Therapeutics 36 Laying off 34% of its workforce im<strong>media</strong>telyin an effort to save $16 million thisyear, after an FDA advisory panel unanimouslyrecommended against approval ofits lymphoma drug pixantrone in March.Exelixis 270 Restructuring <strong>and</strong> reducing head countto 403 to focus on its mid- <strong>and</strong> late-stagepipeline. The cuts will come primarily fromits early discovery program.LifeCycle Pharma 30 Restructuring <strong>and</strong> reducing head countto 35 to focus on late-stage developmentwhile adding 10 employees this half inlate-stage <strong>and</strong> business development.Lonza Group 175 Restructuring <strong>and</strong> reducing head countthrough <strong>the</strong> closure of its manufacturingplants in Conshohocken, Pennsylvania, <strong>and</strong>Shawinigan, Quebec, <strong>and</strong> a warehouse <strong>and</strong>office facility in Wokingham, UK.The Medicines Co. 43 Reducing US sales head count by 26% foran annual cost savings of $8–$9 millionstarting this quarter.Merck & Co. 2,500 Reducing head count by 15% by <strong>the</strong> endof 2012 as part of <strong>the</strong> first phase of itsrestructuring after its 2009 acquisition ofSchering-Plough. Cuts include duplicatevacant positions in sales, administration,manufacturing <strong>and</strong> R&D.Pfizer 50 Reductions are in <strong>the</strong> company’s Durham,North Carolina, facility; part of 170 jobcuts announced by Pfizer in November,less than 3 weeks after <strong>the</strong> close of Pfizer’sacquisition of Wyeth.PoniardPharmaceuticals37 Restructuring <strong>and</strong> reducing head count to12 full-time employees to reduce operatingcosts <strong>and</strong> focus its resources on <strong>the</strong> ongoingdevelopment of picoplatin to treat solidtumors.XenoPort 109 Reducing head count by 50%—witha majority of <strong>the</strong> cuts coming fromresearch—after March’s complete responseletter from <strong>the</strong> FDA for Horizant gabapentinenacarbil (XP13512; GSK1838262) totreat moderate to severe primary restlesslegs syndrome.Source: BioCentury.Michael Francisco is Senior Editor, Nature Biotechnologynature biotechnology volume 28 number 5 MAY 2010 527


people© 2010 Nature America, Inc. All rights reserved.The board of directors of Karo Bio (Stockholm) has appointedFredrik Lindgren (left) as president <strong>and</strong> CEO. He succeeds Per OlofWallström, who announced his resignation in February. Lindgrenhas taken on positions of increasing responsibility at five Swedishcompanies in <strong>the</strong> corporate finance, consumer healthcare, medicaldevice <strong>and</strong> biotech sectors, previously serving as CEO of ActiveBiotech <strong>and</strong> Biolin Scientific.Leon Rosenberg, chairman of <strong>the</strong> Karo Bio board, says Lindgren“has <strong>the</strong> vision, experience <strong>and</strong> energy to lead Karo Bio as itaddresses its many opportunities <strong>and</strong> challenges.”The Administrative Council of <strong>the</strong> EuropeanPatent Office (EPO; Munich) has elected BenoîtBattistelli as president of <strong>the</strong> EPO, breaking adeadlock that had lasted 6 months <strong>and</strong> 20 roundsof voting. Battistelli also currently serves as headof <strong>the</strong> French National Intellectual PropertyInstitute <strong>and</strong> chairman of <strong>the</strong> AdministrativeCouncil of <strong>the</strong> European Patent Organisation.He succeeds outgoing EPO president AlisonBrimelow.RNA-based drug developer AVI BioPharma(Bo<strong>the</strong>ll, WA, USA) has appointed its CFOJ. David Boyle II to <strong>the</strong> additional role of interimpresident <strong>and</strong> CEO, after <strong>the</strong> resignation ofLeslie Hudson as president, CEO <strong>and</strong> a directorof <strong>the</strong> company. AVI’s board plans to initiate asearch for CEO c<strong>and</strong>idates, which will includeboth external <strong>and</strong> internal c<strong>and</strong>idates. AVI alsoannounced that Anthony R. Chase has joined<strong>the</strong> board <strong>and</strong> its nominating <strong>and</strong> corporategovernance committee, <strong>and</strong> K. Michael Forresthas stepped down from <strong>the</strong> board.Xcellerex (Marlborough, MA, USA) hasnamed Guy Broadbent president, CEO <strong>and</strong>a member of <strong>the</strong> company’s board of directors.He succeeds Joseph Zakrzewski, whoserved as chairman, president <strong>and</strong> CEO, aspart of an existing management successionprocess. Zakrzewski will remain as chairman.Broadbent was most recently senior vicepresident, corporate development at ThermoFisher Scientific.Stephen R. Davis has been appointed executivevice president <strong>and</strong> COO of Ardea Biosciences(San Diego). Before joining Ardea, Davis waspresident, CEO <strong>and</strong> a director of Neurogen,which was acquired by Lig<strong>and</strong> Pharmaceuticalsin December 2009.Carel du Marchie Sarvaas has joined EuropaBio(Brussels) as director for agricultural biotech,taking over from Morten Nielsen, who has led<strong>the</strong> division team since September 2009. Hebrings to EuropaBio his experience as a seniorpublic affairs <strong>and</strong> communications advisor inBrussels, The Hague <strong>and</strong> Washington, DC.Jan Groen (left) hasbeen appointed CEOof OncoMethylomeSciences (Liege,Belgium). He hasmore than 25 yearsof experience in <strong>the</strong>clinical diagnosticindustry, previouslyserving as president of Agendia <strong>and</strong> vice presidentof R&D at Focus Diagnostics.Alan Hulme has been elected chairman of<strong>the</strong> board of Karolinska Institute spin-offOncopeptides (Stockholm). He has held seniorpositions at Idexx Laboratories, Affymetrix,Flow Laboratories, Molecular Devices <strong>and</strong>Endotronics.BIO Ventures for Global Health (Washington)has named Donald R. Joseph COO <strong>and</strong> a memberof <strong>the</strong> board of directors. He previouslyserved in senior executive positions in bothlegal <strong>and</strong> business roles at Renovis <strong>and</strong> Abgenix,where he played a key role in its acquisition byAmgen, <strong>and</strong> also served as COO of <strong>the</strong> Institutefor OneWorld Health, a nonprofit pharmaceuticalcompany.Robert Lammens has been named chief technologyofficer at Atacama Labs (Helsinki) aftera 23-year career in <strong>the</strong> field of solid dosage formsat Bayer. In addition, Lammens will continue assenior lecturer at <strong>the</strong> department of pharmaceuticaltechnology of <strong>the</strong> University of Bonn.Amyris Biotechnologies (Emeryville, CA, USA)has announced today <strong>the</strong> election of ArthurLevinson to its board of directors. Levinson servesas chairman of Genentech <strong>and</strong> is a director on <strong>the</strong>boards of Apple <strong>and</strong> NGM Biopharmaceuticals.Varun N<strong>and</strong>a has been named senior vicepresident of global commercial operations atDendreon (Seattle). He most recently served assenior vice president <strong>and</strong> global head of oncologyat Roche/Genentech.John A. Orwin has joined Affymax (Palo Alto,CA, USA) as president <strong>and</strong> COO, a <strong>new</strong>ly createdposition. Orwin has over 20 years of experiencein <strong>the</strong> biotech <strong>and</strong> pharma industries, mostrecently as senior vice president of Genentech’sbio-oncology business unit.QLT (Vancouver) has announced that DipakPanigrahi has joined <strong>the</strong> company as seniorvice president, R&D <strong>and</strong> chief medical officer.Most recently, he was vice president, glaucomadevelopment at Alcon Laboratories.Sangamo BioSciences(Richmond, CA, USA)has named WilliamR. Ringo (left) chairmanof <strong>the</strong> company’sboard of directors.Ringo recently retiredfrom Pfizer, where heserved as senior vicepresident of business development, strategy <strong>and</strong>innovation. Before joining Pfizer in 2008, he waspresident <strong>and</strong> CEO of Abgenix.Privately held Liquidia Technologies (ResearchTriangle Park, NC, USA) has appointed JonathanF. Smith as CSO. Smith is a co-founder <strong>and</strong> previouslyserved as CSO of AlphaVax.Robert M. Whelan has been appointed to <strong>the</strong>board of directors of ARIAD Pharmaceuticals(Cambridge, MA, USA). He has more than 35years of corporate finance <strong>and</strong> investment bankingexperience, including leadership positions atVolpe Brown Whelan, Prudential Securities <strong>and</strong>Hambrecht & Quist.528 volume 28 number 5 MAY 2010 nature biotechnology

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!