Nature Biotechnologytrawls
Nature Biotechnologytrawls
Nature Biotechnologytrawls
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
volume 28 number 8 august 2010<br />
editorials<br />
761 Wrong numbers?<br />
761 MAQC-II: analyze that!<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
A computer-generated representation<br />
of HIV on the surface of a<br />
T lymphocyte. Holt et al. block the<br />
entry of HIV into blood cells by using<br />
zinc finger nucleases to knock out<br />
CCR5 in hematopoietic stem cells<br />
(p 839). Credit: ANIMATE4.com/<br />
SciencePhotoLibrary<br />
Jackson Lab’s legal woes, p 768<br />
news<br />
763 Industry makes strides in melanoma<br />
765 Firms combine experimental cancer drugs to speed development<br />
767 FDA transparency rules could hit small companies hardest<br />
767 Supremes rule on Bilski<br />
768 Lawsuits rock Jackson<br />
769 Food firms test fry Pioneer’s trans fat–free soybean oil<br />
769 Anti-CD20 patent battle ends<br />
769 EU states free to ban GM crops<br />
770 GM alfalfa—who wins?<br />
770 Biofuel ‘Made in China’<br />
771 data page: 2Q10—spreading the wealth<br />
772 News feature: Drugmakers dance with autism<br />
Bioentrepreneur<br />
Building a business<br />
775 At ground level<br />
Julian Bertschinger<br />
opinion and comment<br />
CORRESPONDENCE<br />
778 Waking up and smelling the coffee<br />
779 Genetic stability in two commercialized transgenic lines (MON810)<br />
780 Distances needed to limit cross-fertilization between GM and conventional<br />
maize in Europe<br />
<strong>Nature</strong> Biotechnology (ISSN 1087-0156) is published monthly by <strong>Nature</strong> Publishing Group, a trading name of <strong>Nature</strong> America Inc. located at 75 Varick Street,<br />
Fl 9, New York, NY 10013-1917. Periodicals postage paid at New York, NY and additional mailing post offices. Editorial Office: 75 Varick Street, Fl 9, New York,<br />
NY 10013-1917. Tel: (212) 726 9335, Fax: (212) 696 9753. Annual subscription rates: USA/Canada: US$250 (personal), US$3,520 (institution), US$4,050<br />
(corporate institution). Canada add 5% GST #104911595RT001; Euro-zone: €202 (personal), €2,795 (institution), €3,488 (corporate institution); Rest of world<br />
(excluding China, Japan, Korea): £130 (personal), £1,806 (institution), £2,250 (corporate institution); Japan: Contact NPG <strong>Nature</strong> Asia-Pacific, Chiyoda Building,<br />
2-37 Ichigayatamachi, Shinjuku-ku, Tokyo 162-0843. Tel: 81 (03) 3267 8751, Fax: 81 (03) 3267 8746. POSTMASTER: Send address changes to <strong>Nature</strong><br />
Biotechnology, Subscriptions Department, 342 Broadway, PMB 301, New York, NY 10013-3910. Authorization to photocopy material for internal or personal<br />
use, or internal or personal use of specific clients, is granted by <strong>Nature</strong> Publishing Group to libraries and others registered with the Copyright Clearance Center<br />
(CCC) Transactional Reporting Service, provided the relevant copyright fee is paid direct to CCC, 222 Rosewood Drive, Danvers, MA 01923, USA. Identification<br />
code for <strong>Nature</strong> Biotechnology: 1087-0156/04. Back issues: US$45, Canada add 7% for GST. CPC PUB AGREEMENT #40032744. Printed by Publishers<br />
Press, Inc., Lebanon Junction, KY, USA. Copyright © 2010 <strong>Nature</strong> America, Inc. All rights reserved. Printed in USA.<br />
i
volume 28 number 8 august 2010<br />
COMMENTARY<br />
783 case study: India’s billion dollar biotech<br />
Justin Chakma, Hassan Masum, Kumar Perampaladas, Jennifer Heys &<br />
Peter A Singer<br />
784 DNA patents and diagnostics: not a pretty picture<br />
Julia Carbone, E Richard Gold, Bhaven Sampat, Subhashini Chandrasekharan,<br />
Lori Knowles, Misha Angrist & Robert Cook-Deegan<br />
Rapid bacterial engineering, p 812<br />
feature<br />
793 Public biotech 2009—the numbers<br />
Brady Huggett, John Hodgson & Riku Lähteenmäki<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
State<br />
H3K14ac<br />
H3K23ac<br />
H4K12ac<br />
H2AK9ac<br />
H4K16ac<br />
H2AK5ac<br />
H4K91ac<br />
H3K4ac<br />
H2BK20ac<br />
H3K18ac<br />
H2BK120ac<br />
H3K27ac<br />
H2BK5ac<br />
H2BK12ac<br />
H3K36ac<br />
H4K5ac<br />
H4K8ac<br />
H3K9ac<br />
PolII<br />
CTCF<br />
H2AZ<br />
H3K4me3<br />
H3K4me2<br />
H3K4me1<br />
H3K9me1<br />
H3K79me3<br />
H3K79me2<br />
H3K79me1<br />
H3K27me1<br />
H2BK5me1<br />
H4K20me1<br />
H3K36me3<br />
H3K36me1<br />
H3R2me1<br />
H3R2me2<br />
H3K27me2<br />
H3K27me3<br />
H4R3me2<br />
H3K9me2<br />
H3K9me3<br />
H4K20me3<br />
1<br />
2<br />
3<br />
4<br />
5<br />
6<br />
7<br />
8<br />
9<br />
10<br />
11<br />
12<br />
13<br />
14<br />
15<br />
16<br />
17<br />
18<br />
19<br />
20<br />
21<br />
22<br />
23<br />
24<br />
25<br />
26<br />
Epigenetic marks define chromatin<br />
states, p 817<br />
patents<br />
801 Bilski v. Kappos: the US Supreme Court broadens patent subject-matter eligibility<br />
William J Simmons<br />
806 Recent patent applications in proteomics<br />
NEWS AND VIEWS<br />
807 Can HIV be cured with stem cell therapy?<br />
Steven G Deeks & Joseph M McCune see also p 839<br />
810 Microarrays in the clinic<br />
Guy W Tillinghast see also p 827<br />
812 Shaking up genome engineering<br />
Kim A Tipton & John Dueber see also p 856<br />
813 The expanding family of dendritic cell subsets<br />
Hideki Ueno, A Karolina Palucka & Jacques Banchereau<br />
816 Research highlights<br />
computational biology<br />
analysis<br />
817 Discovery and characterization of chromatin states for systematic annotation of<br />
the human genome<br />
Jason Ernst & Manolis Kellis<br />
0.982 0.910 0.845 0.748 0.575 0.557 0.311 0.323 0.244 0.193<br />
0.973 0.918 0.829 0.792 0.493 0.437 0.322 0.306 0.307 0.202<br />
0.965 0.801 0.816 0.652 0.514 0.349 0.383 0.360 0.217 0.243<br />
0.991 0.752 0.750 0.778 0.509 0.483 0.345 0.305 0.295 0.193<br />
0.973 0.869 0.825 0.755 0.403 0.413 0.321 0.275 0.193 0.266<br />
0.982 0.762 0.823 0.702 0.533 0.557 0.284 0.203 0.143 0.257<br />
0.982 0.871 0.445 0.728 0.472 0.249 0.429 0.353 0.295 0.293<br />
0.930 0.838 0.805 0.773 0.542 0.386 0.345 0.289 0.225 0.181<br />
0.982 0.847 0.835 0.737 0.488 0.344 0.118 0.324 0.110 0.176<br />
0.973 0.860 0.829 0.690 0.371 0.376 0.344 0.229 0.057 0.243<br />
0.956 0.815 0.847 0.773 0.491 0.202 0.185 0.385 −0.014 0.187<br />
0.982 0.847 0.780 0.755 0.377 0.423 0.313 −0.042 0.198 0.241<br />
0.725 0.782 0.824 0.770 0.531 0.344 0.168 0.349 −0.096 0.165<br />
0.982 0.707 0.782 0.466 0.499 0.184 0.271 0.000 −0.062 0.203<br />
0.636 0.761 0.454 0.748 0.247 0.377 0.062 0.324 0.043 0.085<br />
0.856 0.054 0.709 0.751 0.455 −0.213 −0.078 0.114 0.479 −0.096<br />
0.982 0.830 0.595 0.544 0.036 −0.090 −0.027 0.336 −0.143 −0.030<br />
0.973 0.830 0.816 0.748 0.491 0.376 0.311 0.306 0.193 0.193<br />
0.982 0.891 0.829 0.732 0.403 0.479 0.429 0.301 0.217 0.162<br />
Evaluating microarray classifiers,<br />
p 827<br />
research<br />
ARTICLES<br />
827 The MicroArray Quality Control (MAQC)-II study of common practices for the<br />
development and validation of microarray-based predictive models<br />
MAQC Consortium see also p 810<br />
839 Human hematopoietic stem/progenitor cells modified by zinc-finger nucleases<br />
targeted to CCR5 control HIV-1 in vivo<br />
N Holt, J Wang, K Kim, G Friedman, X Wang, V Taupin, G M Crooks, D B Kohn,<br />
P D Gregory, M C Holmes & P M Cannon see also p 807<br />
nature biotechnology<br />
iii
volume 28 number 8 august 2010<br />
848 Cell type of origin influences the molecular and functional properties of mouse<br />
induced pluripotent stem cells<br />
J M Polo, S Liu, M E Figueroa, W Kulalert, S Eminli, K Yong Tan, E Apostolou,<br />
M Stadtfeld, Y Li, T Shioda, S Natesan, A J Wagers, A Melnick, T Evans &<br />
K Hochedlinger<br />
856 Rapid profiling of a microbial genome using mixtures of barcoded oligonucleotides<br />
J R Warner, P J Reeder, A Karimpour-Fard, L B A Woodruff & R T Gill<br />
see also p 812<br />
letters<br />
Epigenetics of iPS cells, p 848<br />
863 Implications of the presence of N-glycolylneuraminic acid in recombinant<br />
therapeutic glycoproteins<br />
D Ghaderi, R E Taylor, V Padler-Karavani, S Diaz & A Varki<br />
868 Global analysis of lysine ubiquitination by ubiquitin remnant immunoaffinity<br />
profiling<br />
G Xu, J S Paige & S R Jaffrey<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
careers and recruitment<br />
875 Second quarter biotech job picture<br />
Michael Francisco<br />
876 people<br />
nature biotechnology<br />
v
in this issue<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
MAQC-II: evaluating microarray<br />
classifiers<br />
Building on its original work<br />
assessing the technical performance<br />
of DNA microarray technology (http://<br />
www.nature.com/nbt/focus/maqc/<br />
index.html), the Microarray Quality<br />
Control (MAQC) consortium, a<br />
partnership of research groups from<br />
the US Food and Drug Administration<br />
(FDA), academia, industry and other government agencies, has<br />
set out to investigate the capabilities and limitations of microarray<br />
data analysis with respect to disease diagnosis or choice of<br />
therapies. Although numerous methods for analyzing microarray<br />
data have been developed, there remains a lack of consensus<br />
regarding best practices in terms of their use in identifying gene<br />
signatures that are representative of a pathological condition.<br />
Such practices are becoming increasingly important, especially<br />
as the FDA receives many proposals to use microarrays to support<br />
medical product development and testing. In the present paper, 36<br />
data analysis teams applied a variety of analytic methods to build<br />
classifiers to predict the toxicity of chemicals in rodent models and<br />
to predict clinical outcomes in human patients with breast cancer,<br />
multiple myeloma or neuroblastoma. The experience gained during<br />
this large project may be useful for developing classifiers for data<br />
from other high-throughput assays. This is important in light of<br />
the study’s finding that microarrays perform poorly at making<br />
certain clinical predictions, suggesting that technologies that<br />
assay additional aspects of human physiology may be needed to<br />
formulate better clinical treatment plans. [Articles, p. 827;<br />
News and Views p. 810]<br />
CM<br />
Engineered stem cells control HIV<br />
Cannon and colleagues present an anti-HIV strategy in which human<br />
hematopoietic stem/progenitor cells are modified with zinc-finger<br />
nucleases to knock out C-C chemokine receptor 5 (CCR5), the principal<br />
co-receptor for HIV. CCR5 has been a target of exceptional interest<br />
ever since the 1996 discovery that a homozygous 32-bp deletion in<br />
the gene confers resistance to HIV infection without any apparent<br />
ill effects on health. Most previous work has used small molecules,<br />
ribozymes or siRNA to inhibit CCR5 protein or mRNA. In contrast,<br />
Cannon and colleagues nucleofect plasmids expressing two zincfinger<br />
nucleases into human CD34 + stem/progenitor cells to permanently<br />
knock out the CCR5 gene. The modified cells are transplanted<br />
into irradiated, immunodeficient mice and allowed to engraft for<br />
8–12 weeks before the mice are challenged with CCR5-tropic HIV.<br />
Although human T cell counts initially decline, by week 8 they have<br />
recovered to their original levels. By weeks 10 and 12, HIV RNA in<br />
Written by Kathy Aschheim, Markus Elsner, Michael Francisco,<br />
Peter Hare, Craig Mak, & Lisa Melton<br />
the intestine is undetectable. Because hematopoietic stem cells can<br />
reconstitute the entire hematopoietic system, the authors propose that<br />
modified CD34 + cells could provide long-term HIV resistance in all<br />
the lymphoid and myeloid cell types that the virus infects. In support<br />
of this hypothesis, a transplant of allogeneic CCR5Δ32 hematopoietic<br />
stem cells in an HIV + individual with acute myeloid leukemia may<br />
have cured the HIV infection (N. Engl. J. Med. 360, 724–725, 2009).<br />
[Articles, p. 839; News and Views, p. 807]<br />
KA<br />
Epigenetic marks stand together<br />
State 2<br />
State 3<br />
State 5<br />
State 37<br />
State 38<br />
Coding Exon<br />
Spliced ESTs<br />
Mammalian<br />
Conservation<br />
With over 100 known<br />
histone modifications<br />
that can occur in<br />
thousands of possible<br />
combinations, it is challenging<br />
to identify specific combinations that have distinct biological<br />
functions. Ernst and Kellis describe an algorithm that deduces<br />
chromatin states (reoccurring, spatially coherent combinations of<br />
epigenetic marks) from experimental data on the distribution of different<br />
modifications. Using a multivariate Hidden Markov Model to<br />
analyze data on the position of 41 different marks in human T cells,<br />
they define 51 distinct chromatin states. The authors correlate these<br />
states with prior genome annotation and find that individual states<br />
are associated with specific functional regions such as gene promoters,<br />
transcriptionally active genes, large-scale repressed regions or<br />
intergenic active regions. The identification of chromatin states will<br />
facilitate genome annotation, the discovery of functional elements,<br />
and mechanistic studies of gene regulation by epigenetic marks.<br />
[Analysis, p. 817]<br />
ME<br />
Faster trait-to-gene mapping<br />
chr1:<br />
242959000 242959500 242960000 242960500 242961000 242961500<br />
low-expression promoter state<br />
new exon prediction<br />
Gill and colleagues describe an approach for creating rationally modified<br />
collections of Escherichia coli in which every strain contains the<br />
same defined mutation but in a different gene. Such collections are<br />
valuable tools for mapping the genetic basis of traits, but until now<br />
have been labor intensive to construct. The method creates thousands<br />
of modified strains in parallel by transforming bacteria with<br />
pools of oligonucleotides that each recombine with a single gene<br />
to introduce a mutation. Barcode sequence tags uniquely identify<br />
each oligo and thus each strain. The collection of strains is grown<br />
in a condition of interest that selects for genetic modifications that<br />
confer fitness advantages. Fitter strains are recovered and identified<br />
by sequencing or by microarray detection of their barcodes. To demonstrate<br />
the method, Gill and colleagues created collections of E. coli<br />
with strains in which single genes were either up- or downregulated.<br />
Growing these strains in cellulosic hydrolysates—a toxic intermediate<br />
of biofuel processing—or in the presence of valine, d-fucose or<br />
methyglyoxal revealed unexpected genes that influenced growth in<br />
these industrially relevant conditions. The identified genes could<br />
form the basis for subsequent combinatorial genetic engineering.<br />
[Articles, p. 856; News and Views, p. 812]<br />
CM<br />
nature biotechnology volume 28 number 8 august 2010<br />
vii
in this issue<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
Ubiquitination sites in the crosshairs<br />
Immunoaffinity-based approaches have been<br />
key to enabling proteome-wide analysis of<br />
post-translational modifications such as phosphorylation.<br />
However, attempts to selectively<br />
purify ubiquitinated peptides on a large scale<br />
have been frustrated by the difficulty of isolating and identifying peptides<br />
tagged with the 76-amino-acid ubiquitin protein. Jaffrey and colleagues<br />
simplify such analyses by generating a monoclonal antibody that selectively<br />
recognizes sites of protein ubiquitination. When protein lysates are digested<br />
with trypsin, ubiquitin adducts are trimmed to a diglycine stub. The ability<br />
of the antibody to recognize these ubiquitin remnants conjugated to the<br />
side chains of ubiquitinated lysines in a range of sequence contexts enables<br />
the authors to enrich for peptides carrying sites of ubiquitination and then<br />
identify them using tandem mass spectrometry. Working with cells expressing<br />
hexahistidine-tagged ubiquitin, the authors use this strategy to extend<br />
the catalog of mammalian ubiquitinated proteins and further illustrate the<br />
strength of the approach by demonstrating differential regulation of ubiquitination<br />
at distinct sites within the same protein. [Letters, p. 868] PH<br />
Neu5Gc content and biologics<br />
Much effort has been devoted to reducing the immunogenicity of protein<br />
biologics caused by peptide epitopes. However, far less attention has been<br />
Patent Roundup<br />
The US Food and Drug Administration is proposing new<br />
transparency rules to increase the information it discloses<br />
about product applications. The rules could compromise trade<br />
secret protection and put small companies at a competitive<br />
disadvantage. [News Analysis, p. 767]<br />
LM<br />
The US Supreme Court’s long-awaited decision on Bilski v.<br />
Kappos rules against patenting only inventions transformed by<br />
a machine. But the ruling leaves several questions unanswered,<br />
especially with regard to the eligibility of patents for diagnostic<br />
methods. [News in brief, p. 767]<br />
LM<br />
The not-for-profit Jackson Laboratory has been caught up in<br />
patent disputes, for the first time in its 80-year history. If the<br />
expense of such litigation escalates, the lab may have to cover its<br />
costs by charging researchers higher prices for access to mouse<br />
strains in its repository. [News in brief, p. 768]<br />
LM<br />
A four-year dispute over a European patent for an anti-CD20<br />
monoclonal antibody to treat rheumatoid arthritis has ended in<br />
favor of Trubion, based in Seattle, and against Genentech and<br />
Biogen Idec. The decision frees up the patent space for anyone<br />
contemplating a CD20 program, according to Trubion. [News in<br />
brief, p. 769]<br />
LM<br />
Both sides are claiming victory following the US Supreme Court’s<br />
verdict in Monsanto v. Geerston Seed Farms over future sales of<br />
Roundup Ready alfalfa seeds. Monsanto (St. Louis, MO) cheered<br />
the court’s decision to reverse a previous injunction banning the<br />
transgenic alfalfa, but the seeds’ commercialization is still subject<br />
to an environmental impact statement by the US Department of<br />
Agriculture. [News in brief, p. 770]<br />
LM<br />
The US Supreme Court recently broadened the definition of<br />
patent-eligible subject matter. In this issue, Simmons parses Bilski<br />
v. Kappos and what the far-reaching decision means for biotech<br />
and pharmaceutical patent seekers. [Patent Article, p. 801] MF<br />
Recent patent applications in proteomics. [New Patents, p. 806]MF<br />
GG<br />
K<br />
paid to the possibility of untoward effects caused by immune reactions to<br />
glycans on glycoprotein therapeutics. Varki and colleagues present evidence<br />
suggesting that it may be necessary to revisit whether the presence<br />
of the sialic acid N-glycolylneuraminic acid (Neu5Gc) on certain glycoprotein<br />
drugs may influence their immunogenicity and half-lives in vivo.<br />
Unlike other mammals studied to date, humans lack the ability to make<br />
Neu5Gc. Nonetheless, recent studies have revealed that most of us have<br />
variable—and sometimes relatively high—levels of circulating antibodies<br />
against Neu5Gc. The authors demonstrate the presence of Neu5Gc<br />
on only one of two clinically approved monoclonal antibodies directed<br />
against the same target. In vitro, antibodies or antisera against Neu5Gc<br />
from healthy humans generate immune complexes only in the presence<br />
of the Neu5Gc-containing drug. Moreover, antibodies to Neu5Gc in mice<br />
with a human-like defect in Neu5Gc synthesis promote the clearance of<br />
only the Neu5Gc-containing drug. Injection of this drug also promotes<br />
the production of preexisting antibodies against Neu5Gc. If further studies<br />
support the possibility that antibodies against Neu5Gc might influence<br />
the immunogenicity and efficacy of therapeutic glycoproteins in<br />
humans, production using cultured human cells may not resolve the issue,<br />
as Neu5Gc could still be incorporated from animal-derived products in<br />
culture media. Varki and colleagues show that a better solution would be<br />
to displace Neu5Gc from being incorporated into recombinant proteins<br />
by inclusion of an excess of the human sialic acid N-acetylneuraminic acid<br />
in culture media. [Letters, p. 863]<br />
PH<br />
Epigenetic memory in iPS cells<br />
All induced pluripotent stem (iPS)<br />
cells from different tissues are not<br />
created equal. That is the conclusion<br />
of a study comparing mouse<br />
iPS cells derived from four tissues—<br />
tail-tip fibroblasts, splenic B cells,<br />
bone marrow–derived granulocytes and skeletal muscle precursors.<br />
Hochedlinger and colleagues use a ‘secondary’ system for reprogramming<br />
(Nat. Biotechnol. 26, 916–924, 2008) so that all iPS cells have identical<br />
integrations of the four transgenes, eliminating this confounding variable.<br />
They find that early-passage iPS cells retain an epigenetic memory of their<br />
cell type of origin and that this memory alters the cells’ gene expression<br />
and differentiation potential. Notably, these epigenetic, transcriptional and<br />
functional differences can be attenuated by extended passaging. Several<br />
lines of evidence suggest that this erasure of epigenetic memory occurs<br />
not though the selection of rare, fully reprogrammed cells but through<br />
gradual epigenetic changes in the majority of cells. Epigenetic memory<br />
in iPS cells can be considered desirable or not depending on one’s experimental<br />
goals. In studies aimed at producing a specific cell type, it could be<br />
beneficial—suggesting, for example, that a project to generate blood cells<br />
should begin by reprogramming blood cells rather than an unrelated cell<br />
type. [Articles, p. 848]<br />
KA<br />
Next month in<br />
• Castor bean genome<br />
• Benchmarking dynamic mass redistribution<br />
• Measuring protein-DNA interactions at equilibrium<br />
• Metabolic modeling made easier<br />
viii<br />
volume 28 number 8 august 2010 nature biotechnology
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
www.nature.com/naturebiotechnology<br />
EDITORIAL OFFICE<br />
biotech@us.nature.com<br />
75 Varick Street, Fl 9, New York, NY 10013-1917<br />
Tel: (212) 726 9200, Fax: (212) 696 9635<br />
Chief Editor: Andrew Marshall<br />
Senior Editors: Laura DeFrancesco (News & Features), Kathy Aschheim (Research),<br />
Peter Hare (Research), Michael Francisco (Resources and Special Projects)<br />
Business Editor: Brady Huggett<br />
Associate Business Editor: Victor Bethencourt<br />
News Editor: Lisa Melton<br />
Associate Editors: Markus Elsner (Research), Craig Mak (Research)<br />
Editor-at-Large: John Hodgson<br />
Contributing Editors: Mark Ratner, Chris Scott<br />
Contributing Writer: Jeffrey L. Fox<br />
Senior Copy Editor: Teresa Moogan<br />
Managing Production Editor: Ingrid McNamara<br />
Senior Production Editor: Brandy Cafarella<br />
Production Editor: Amanda Crawford<br />
Senior Illustrator: Katie Vicari<br />
Illustrator/Cover Design: Kimberly Caesar<br />
Senior Editorial Assistant: Ania Levinson<br />
MANAGEMENT OFFICES<br />
NPG New York<br />
75 Varick Street, Fl 9, New York, NY 10013-1917<br />
Tel: (212) 726 9200, Fax: (212) 696 9006<br />
Publisher: Melanie Brazil<br />
Executive Editor: Linda Miller<br />
Chief Technology Officer: Howard Ratner<br />
Head of <strong>Nature</strong> Research & Reviews Marketing: Sara Girard<br />
Circulation Manager: Stacey Nelson<br />
Production Coordinator: Diane Temprano<br />
Head of Web Services: Anthony Barrera<br />
Senior Web Production Editor: Laura Goggin<br />
NPG London<br />
The Macmillan Building, 4 Crinan Street, London N1 9XW<br />
Tel: 44 207 833 4000, Fax: 44 207 843 4996<br />
Managing Director: Steven Inchcoombe<br />
Publishing Director: Peter Collins<br />
Editor-in-Chief, <strong>Nature</strong> Publications: Philip Campbell<br />
Marketing Director: Della Sar<br />
Director of Web Publishing: Timo Hannay<br />
NPG <strong>Nature</strong> Asia-Pacific<br />
Chiyoda Building, 2-37 Ichigayatamachi, Shinjuku-ku, Tokyo 162-0843<br />
Tel: 81 3 3267 8751, Fax: 81 3 3267 8746<br />
Publishing Director — Asia-Pacific: David Swinbanks<br />
Associate Director: Antoine E. Bocquet<br />
Manager: Koichi Nakamura<br />
Operations Director: Hiroshi Minemura<br />
Marketing Manager: Masahiro Yamashita<br />
Asia-Pacific Sales Director: Kate Yoneyama<br />
Asia-Pacific Sales Manager: Ken Mikami<br />
DISPLAY ADVERTISING<br />
display@us.nature.com (US/Canada)<br />
display@nature.com (Europe)<br />
nature@natureasia.com (Asia)<br />
Global Head of Advertising and Sponsorship: Dean Sanderson, Tel: (212) 726 9350,<br />
Fax: (212) 696 9482<br />
Global Head of Display Advertising and Sponsorship: Andrew Douglas, Tel: 44 207 843 4975,<br />
Fax: 44 207 843 4996<br />
Asia-Pacific Sales Director: Kate Yoneyama, Tel: 81 3 3267 8765, Fax: 81 3 3267 8746<br />
Display Account Managers:<br />
New England: Sheila Reardon, Tel: (617) 399 4098, Fax: (617) 426 3717<br />
New York/Mid-Atlantic/Southeast: Jim Breault, Tel: (212) 726 9334, Fax: (212) 696 9481<br />
Midwest: Mike Rossi, Tel: (212) 726 9255, Fax: (212) 696 9481<br />
West Coast: George Lui, Tel: (415) 781 3804, Fax: (415) 781 3805<br />
Germany/Switzerland/Austria: Sabine Hugi-Fürst, Tel: 41 52761 3386, Fax: 41 52761 3419<br />
UK/Ireland/Scandinavia/Spain/Portugal: Evelina Rubio-Hakansson, Tel: 44 207 014 4079,<br />
Fax: 44 207 843 4749<br />
UK/Germany/Switzerland/Austria: Nancy Luksch, Tel: 44 207 843 4968, Fax: 44 207 843 4749<br />
France/Belgium/The Netherlands/Luxembourg/Italy/Israel/Other Europe: Nicola Wright,<br />
Tel: 44 207 843 4959, Fax: 44 207 843 4749<br />
Asia-Pacific Sales Manager: Ken Mikami, Tel: 81 3 3267 8765, Fax: 81 3 3267 8746<br />
Greater China/Singapore: Gloria To, Tel: 852 2811 7191, Fax: 852 2811 0743<br />
NATUREJOBS<br />
naturejobs@us.nature.com (US/Canada)<br />
naturejobs@nature.com (Europe)<br />
nature@natureasia.com (Asia)<br />
US Sales Manager: Ken Finnegan, Tel: (212) 726 9248, Fax: (212) 696 9482<br />
European Sales Manager: Dan Churchward, Tel: 44 207 843 4966, Fax: 44 207 843 4596<br />
Asia-Pacific Sales & Business Development Manager: Yuki Fujiwara, Tel: 81 3 3267 8765,<br />
Fax: 81 3 3267 8752<br />
SPONSORSHIP<br />
g.preston@nature.com<br />
Global Head of Sponsorship: Gerard Preston, Tel: 44 207 843 4965, Fax: 44 207 843 4749<br />
Business Development Executive: David Bagshaw, Tel: (212) 726 9215, Fax: (212) 696 9591<br />
Business Development Executive: Graham Combe, Tel: 44 207 843 4914, Fax: 44 207 843 4749<br />
Business Development Executive: Reya Silao, Tel: 44 207 843 4977, Fax: 44 207 843 4996<br />
SITE LICENSE BUSINESS UNIT<br />
Americas: Tel: (888) 331 6288<br />
institutions@us.nature.com<br />
Asia/Pacific: Tel: 81 3 3267 8751<br />
institutions@natureasia.com<br />
Australia/New Zealand: Tel: 61 3 9825 1160<br />
nature@macmillan.com.au<br />
India: Tel: 91 124 2881054/55<br />
npgindia@nature.com<br />
ROW: Tel: 44 207 843 4759<br />
institutions@nature.com<br />
CUSTOMER SERVICE<br />
www.nature.com/help<br />
Senior Global Customer Service Manager: Gerald Coppin<br />
For all print and online assistance, please visit www.nature.com/help<br />
Purchase subscriptions:<br />
Americas: <strong>Nature</strong> Biotechnology, Subscription Dept., 342 Broadway, PMB 301, New York, NY 10013-<br />
3910, USA. Tel: (866) 363 7860, Fax: (212) 334 0879<br />
Europe/ROW: <strong>Nature</strong> Biotechnology, Subscription Dept., Macmillan Magazines Ltd., Brunel Road,<br />
Houndmills, Basingstoke RG21 6XS, United Kingdom. Tel: 44 1256 329 242, Fax: 44 1256 812 358<br />
Asia-Pacific: <strong>Nature</strong> Biotechnology, NPG <strong>Nature</strong> Asia-Pacific, Chiyoda Building,<br />
2-37 Ichigayatamachi, Shinjuku-ku, Tokyo 162-0843. Tel: 81 3 3267 8751, Fax: 81 3 3267 8746<br />
India: <strong>Nature</strong> Biotechnology, NPG India, 3A, 4th Floor, DLF Corporate Park, Gurgaon 122002, India.<br />
Tel: 91 124 2881054/55, Tel/Fax: 91 124 2881052<br />
REPRINTS<br />
reprints@us.nature.com<br />
<strong>Nature</strong> Biotechnology, Reprint Department, <strong>Nature</strong> Publishing Group, 75 Varick Street, Fl 9,<br />
New York, NY 10013-1917, USA.<br />
For commercial reprint orders of 600 or more, please contact:<br />
UK Reprints: Tel: 44 1256 302 923, Fax: 44 1256 321 531<br />
US Reprints: Tel: (617) 494 4900, Fax: (617) 494 4960
Editorial<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
Wrong numbers?<br />
With biotech infiltrating multiple industries and fewer<br />
life science ventures listing on stock exchanges, what<br />
do we really learn from surveying the set of public<br />
biotech companies?<br />
Each year, <strong>Nature</strong> Biotechnology trawls through the accounts of publicly<br />
quoted biotech companies and pulls out some numbers that characterize<br />
this part of the commercial life science landscape. Perhaps the most<br />
surprising statistic this year was that most of the companies that appeared<br />
in last year’s survey are still there. The current straitened circumstances<br />
took their toll, of course, but total revenues were up 10%, R&D was only<br />
down 4% and the group collectively was profitable for another year. But<br />
what, if anything, does the survey tell us about the general health of the<br />
innovative life science sector?<br />
Back in the 1990s, the answer seemed clear. Thanks to much freer flows<br />
of capital then, the annual audit measured the progress of a specialized,<br />
self-reliant and relatively independent industrial endeavor. It assessed the<br />
rapid churn of companies listing newly on exchanges. Companies could<br />
float much earlier; some were even able to go public without products in<br />
human trials. Buoyant stock markets took valuations to ecstatic heights<br />
and poured money into the sector. Product for product and dollar for dollar,<br />
biotech companies were valued much more highly than ‘traditional’<br />
pharma companies.<br />
That differential was unsustainable. As Amgen and Genentech and<br />
Biogen Idec and others climbed up the pharmaceutical league standings,<br />
reality dawned. Innovators metamorphosed into drugmakers. And as the<br />
pharma sponge absorbed more biotech, the boundaries between the two<br />
spheres faded.<br />
The consequence of this merging is that much, if not most, of the<br />
biological products and biological techniques now resides outside the<br />
group of independent public companies that we survey. Pharma spends<br />
$65 billion a year on R&D, 25–40% of it either devoted to biological<br />
products or using the techniques of biotech. Thus, pharma outspends<br />
‘biotech,’ even on biotech R&D. Furthermore, biotech processes extend<br />
far beyond the pharmaceutical segment: political imperatives and<br />
technological capability have expanded industrial biotech for biofuels<br />
production, waste management and green chemistry. Geographically,<br />
biotech is no longer a Western province: China, India, South Korea and<br />
elsewhere are prominent actors in follow-on biologic drugs, diagnostics<br />
and clinical testing.<br />
Our public company survey reflects none of these changes: pharma<br />
companies, biogenerics firms, diagnostic and device providers all fall outside<br />
the definitions of our survey. In Asia, successful biotech companies<br />
(see p. 783) have only restricted access to mature public capital markets.<br />
Overall, the survey is now less a gauge for innovative life science and more<br />
a pointer to the shape of the Western healthcare market. To measure life<br />
sciences’ impact more broadly, other indicators are needed.<br />
To quantify innovation, we need to look, too, at activities within small<br />
private companies and, increasingly, at the early translational work in the<br />
public sector. These data are exponentially more difficult to gather than<br />
data from publicly quoted firms. Accordingly, policymakers, governments<br />
and industry associations need to devote much more effort and resources<br />
to collecting them.<br />
MAQC-II: analyze that!<br />
The MAQC consortium’s latest study suggests that human<br />
error in handling DNA microarray data analysis software<br />
could delay the technology’s wider adoption in the clinic.<br />
Following up on its publications in <strong>Nature</strong> Biotechnology four years ago<br />
(http://www.nature.com/nbt/focus/maqc/index.html), the Microarray<br />
Quality Control (MAQC) consortium publishes the results of its second<br />
phase of assessment (MAQC-II) on p. 827, in conjunction with ten accompanying<br />
papers in The Pharmacogenomics Journal (http://www.nature.<br />
com/tpj/journal/v10/n4/index.html). The new work assesses the capabilities<br />
and limitations of microarray data analysis methods—so-called<br />
genomic classifiers—in identifying gene signatures representative of a<br />
specific pathological condition.<br />
All in all, >30,000 genomic classifier models were built by combining<br />
one of 17 different data preprocessing and normalization methods,<br />
with one of 9 methods for filtering out problematic data, with one of >33<br />
techniques for picking ‘signature’ genes, with one of >24 algorithms for<br />
discerning patterns from those genes, and with one of 6 methods for testing<br />
the robustness of the results. Thirty-six research teams sought gene<br />
signatures within 6 massive microarray datasets derived from toxicological<br />
studies of chemicals on rodents and expression profiles of human cancer<br />
patients that predict 13 ‘endpoints’ potentially relevant to preclinical or<br />
clinical applications.<br />
As discussed on p. 810, one key finding of MAQC-II is that the classifier<br />
models are remarkably similar in predicting outcome, irrespective of the<br />
approach used. On the other hand, the overall success of the classifiers in<br />
predicting endpoints depends on the endpoints themselves. For example,<br />
predictions were in general much worse for breast cancer and multiple<br />
myeloma, which have highly heterogenous genetic backgrounds, than for<br />
liver toxicology or neuroblastoma.<br />
Perhaps most striking of all, some data analysis teams were consistently<br />
better at predictions than others. This may relate to simple errors<br />
associated with manipulating such large datasets. But insufficient tuning<br />
of the parameters used in a classifier model is also a likely contributor.<br />
In this sense, MAQC-II was as much an exercise in sociology as in<br />
technology. The human element in classifier implementation is key.<br />
Thus a key take-home message is that classifier protocols need to be<br />
more tightly described and more tightly executed. In this respect, regulatory<br />
agencies and scientific journals can promote good practice. A clear<br />
need exists for greater meticulousness both in documenting the parameters<br />
of a particular classifier model used and in detailing the procedures<br />
for normalization, batch effect correction, quality control and reduction<br />
of quality control flaws. Greater attention to detail will not only enhance<br />
reproducibility of research—it will also facilitate the progression of this<br />
technology toward the clinic.<br />
nature biotechnology volume 28 number 8 august 2010 761
in this section<br />
Investigational<br />
cancer agents<br />
tested in pairs<br />
p765<br />
Transparency rules<br />
challenge small<br />
firms p767<br />
news<br />
GM soybeans for<br />
trans fat–free oil<br />
p769<br />
Industry makes strides in melanoma<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
After decades of continuous failures, the treatment<br />
of metastatic melanoma is finally advancing.<br />
This year’s American Society for Clinical<br />
Oncology (ASCO) annual meeting heralded<br />
a breakthrough antibody therapy for the disease.<br />
Top-line, phase 3 results for Bristol-Myers<br />
Squibb’s humanized monoclonal antibody<br />
(mAb) ipilimumab showed a survival benefit<br />
in patients with advanced cancer—the first<br />
ever phase 3 trial to do so. These results contrast<br />
with a litany of letdowns from cancer vaccines,<br />
cytokine therapies, adoptive T-cell therapies as<br />
well as several targeted therapies that all have<br />
failed to improve on standard chemotherapy,<br />
which itself achieves a meager 15% response rate<br />
with negligible survival benefit. “Those of us in<br />
the melanoma business have felt like we’ve been<br />
in a long, dark tunnel,” said oncologist Vernon<br />
Sondak, of the H. Lee Moffitt Cancer Center in<br />
Tampa, Florida, at the ASCO meeting.<br />
The ipilimumab data, released by New<br />
York–based Bristol-Myers Squibb in June, have<br />
changed all that. The 676 individuals included in<br />
the study had unresectable, metastatic melanoma<br />
and had previously undergone chemotherapy for<br />
the disease. Those receiving ipilimumab, with<br />
or without the synthetic peptide vaccine glycoprotein<br />
100 (gp100), had a median survival<br />
of about 10 months, against 6.4 months for the<br />
vaccine alone. Ipilimumab, which targets cytotoxic<br />
T-lymphocyte antigen 4 (CTLA4), nearly<br />
doubled the rates of survival at 12 months (46%<br />
versus 25%) and 24 months (24% versus 14%)<br />
after treatment compared with the peptide.<br />
“This is really a benchmark for the field,” says<br />
John Kirkwood, a melanoma researcher at the<br />
University of Pittsburgh. “We finally have a randomized<br />
controlled trial that is positive.”<br />
Finalized phase 1 results of a BRAF inhibitor,<br />
developed by the Berkeley, California–based<br />
Plexxikon, are at least as dramatic. The small<br />
molecule PLX4032 (also RG7204), which<br />
Plexxikon is co-developing with Roche of Basel,<br />
specifically inhibits the V600E mutant BRAF, a<br />
constitutively active kinase present in more than<br />
half of metastatic melanomas. The drug produced<br />
an 81% response rate among 32 patients<br />
receiving the therapeutic dose. “The early effects<br />
are [as] profound, reliable and gratifying as<br />
Antigenpresenting<br />
cell<br />
B7<br />
MHC<br />
B7<br />
one could ever want out of a cancer therapy,”<br />
says trial principal investigator Keith Flaherty<br />
of Massachusetts General Hospital in Boston.<br />
PLX4032 is now in phase 3.<br />
Although both compounds will almost certainly<br />
become approved drugs, they have limitations.<br />
Ipilimumab extends median survival but,<br />
strangely, has only an 11% overall response rate.<br />
And almost all patients on PLX4032 relapse,<br />
most within a year. Nevertheless, the two drugs<br />
have revitalized melanoma research. By using<br />
ipilimumab and PLX4032 in combination with<br />
a variety of standard and investigational agents<br />
—or with each other—researchers hope to push<br />
long-term survival of metastatic melanoma<br />
patients up from the roughly 10% combined<br />
cure rate now achievable with ipilimumab<br />
monotherapy and interleukin-2 (IL-2) monotherapy.<br />
“We’re going to move the cure rate of<br />
melanoma progressively up,” predicts melanoma<br />
researcher Mario Sznol, of Yale University in<br />
New Haven, “to what could be a very respectable<br />
30, 35, 40% of patients, over the course of<br />
the next several years.”<br />
Ipilimumab<br />
Ag<br />
T cell<br />
activated<br />
TCR<br />
CTLA4<br />
CD28<br />
T cell<br />
Figure 1 Ipilimumab stimulates antitumor immunity by blocking CTLA4, a natural brake on T cells, and<br />
allowing their unimpeded ‘costimulation’. Ipilimumab is the first agent to extend survival in metastatic<br />
melanoma patients in phase 3.<br />
Anti-CTLA4 therapy has succeeded where<br />
other immunotherapies failed because, instead<br />
of trying to indirectly stimulate T cells by presenting<br />
tumor antigen to overcome immune<br />
tolerance, it activates T cells directly, by disabling<br />
a brake on T-cell activity. Normally,<br />
when a T cell is activated after CD28 binding<br />
of the B7 receptor on antigen-presenting cells,<br />
CTLA4 acts as a brake, trafficking from the<br />
T-lymphocyte cytosol to the surface to bind<br />
B7 molecule with high affinity. Thus CTLA4<br />
turns the T cell off. When the ipilimumab<br />
mAb is present it blocks CTLA4, keeping the<br />
T lymphocyte activated. The mAb also promotes<br />
unfettered binding of the T-cell CD28<br />
receptor to the antigen-presenting cell receptor<br />
B7, together with antigen presentation to the<br />
T-cell receptor (Fig. 1). Such ‘co-stimulation’<br />
is necessary for T-cell activation, and antitumor<br />
immunity. Unfortunately, ipilimumab also triggers<br />
autoimmune side effects, some severe. A<br />
few patients have died from colitis-related bowel<br />
perforations, for example. But Kirkwood points<br />
out, “[for] the vast majority of patients, we can<br />
nature biotechnology volume 28 number 8 AUGUST 2010 763
NEWS<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
Table 1 Selected phase 3 trials in metastatic melanoma<br />
Company (location) Product Description<br />
Bristol-Myers Squibb<br />
Ipilimumab<br />
(MDX-010)<br />
manage the side effects fairly easily, once you<br />
know how to look for them.”<br />
The one controversy in the phase 3 trial<br />
was the choice of the gp100 peptide vaccine,<br />
developed by the Bethesda, Maryland–based<br />
National Cancer Institute, as the active control<br />
arm for the study. The combination of this HLA-<br />
A0201–restricted peptide vaccine with highdose<br />
IL-2 resulted in higher response rates and<br />
improved progression-free survival in an earlier<br />
randomized trial. Thus the choice of gp100 for<br />
the control arm. Some researchers speculate<br />
that the vaccine may have hurt patients, thus<br />
giving ipilimumab an artificial statistical boost.<br />
(Certain vaccines have reduced survival in<br />
melanoma trials). Kirkwood disagrees, because<br />
gp100 did not appear to cause harm in its other<br />
trials. “The issues regarding the control are, in<br />
my book, non-issues,” he says.<br />
The question remains, why did ipilimumab<br />
succeed whereas tremelimumab, a similar anti-<br />
CTLA4 antibody from Pfizer, failed? It is possible<br />
that tremelimumab didn’t really fail. “[Pfizer]<br />
analyzed the trial early,” says Sznol. “You need<br />
to wait for the events to develop.” Sznol points<br />
out that some patients treated with anti-CTLA4<br />
mAbs experience progression of their cancers<br />
initially, followed by regression, and that other<br />
patients have most of their lesions disappear<br />
while a few continue growing. All are classified<br />
as nonresponders, but some may live for a long<br />
time. It’s also possible, Sznol says, that the company<br />
used the wrong drug dose and schedule.<br />
Kirkwood agrees that Pfizer was probably too<br />
quick to analyze the data.<br />
Pfizer defended the tremelimumab phase<br />
3 trial dose and schedule in an e-mail, noting<br />
that phase 2 results (using the same dose and<br />
schedule as in phase 3) were very similar to<br />
ipilimumab’s despite the different dose regi-<br />
Fully human antibody targeting the CTLA-4 receptor<br />
on T cells<br />
Plexxikon/Roche PLX4032 Small-molecule inhibitor of V600E mutant BRAF kinase<br />
Abraxis Bioscience Abraxane<br />
Nanoparticle albumin-bound paclitaxel (Taxol)<br />
(Los Angeles)<br />
(nab-paclitaxel, ABI-007)<br />
Eli Lilly<br />
(Indianapolis)<br />
Biovex<br />
(Woburn, Massachusetts)<br />
Novartis<br />
(Basel)<br />
GlaxoSmithKline<br />
Vical<br />
(San Diego)<br />
Tasisulam<br />
(LY573636)<br />
OncoVEX<br />
Tasigna<br />
(nilotinib, AMNN-107)<br />
Astuprotimut-r<br />
(MAGE-A3 ASCI)<br />
Allovectin-7<br />
Source: BioMedTracker & <strong>Nature</strong> Biotechnology<br />
Acyl sulfonamide, generates reactive oxygen species<br />
and induces apoptosis<br />
Oncolytic herpes simplex virus type-1 encoding granulocyte<br />
macrophage colony stimulating factor; selectively<br />
replicates in tumor cells, recruits dendritic cells<br />
Small molecule oral c-kit kinase inhibitor for c-kit<br />
mutant melanoma<br />
Protein subunit vaccine based on melanoma-associated<br />
antigen A3 (MAGE-A3), specific for tumor cells<br />
DNA plasmid/lipid complex containing human leukocyte<br />
antigen B7 and beta-2 microglobulin DNA sequences<br />
that together form major histocompatibility class I;<br />
improves antigen presentation<br />
mens. Long-term phase 3 follow up did show a<br />
survival advantage for the tremelimumab arm,<br />
but not enough to justify US Food and Drug<br />
Administration registration. Many patients in<br />
the tremelimumab trial control arm went on<br />
to receive ipilimumab in a compassionate use<br />
program, which could have decreased tremelimumab’s<br />
apparent effect. So circumstances, not<br />
biology, may have defeated tremelimumab.<br />
Any lingering ipilimumab doubts may<br />
disappear with a second completed phase 3<br />
trial, comparing ipilimumab plus dacarbazine<br />
chemotherapy to dacarbazine alone. Patient<br />
accrual ended more than two years ago, and<br />
results have not yet been reported. The delay<br />
suggests to many a successful trial, but no one<br />
knows for sure.<br />
No efficacy doubts exist for PLX4032. All<br />
agree the drug works, and works quickly, in<br />
the vast majority of patients with mutant BRAF<br />
tumors. Because PLX4032 targets the mutant<br />
form of the protein encoded by the BRAF oncogene,<br />
this allows very high doses to be given<br />
without adverse effects on normal cells. Data<br />
from several groups show, in fact, that PLX4032<br />
paradoxically activates BRAF signaling in normal<br />
cells. This pathway activation enhances the<br />
therapeutic window, but also probably leads to<br />
the appearance of skin lesions known as keratoacanthomas<br />
in many patients. They are benign,<br />
but raise the theoretical possibility that longterm<br />
treatment could promote the growth of<br />
other cancers.<br />
But the main downside of PLX4032 is<br />
relapses. Median duration of response in<br />
phase 1 was about nine months. By historical<br />
standards, this is excellent, and a few patients<br />
have had complete responses lasting two years<br />
or more (they remain on the drug). But the<br />
relapses indicate a still-unknown form of drug<br />
resistance. Some residual BRAF signaling in<br />
tumor cells persists, despite treatment, and<br />
there are new data that the mitogen-activated<br />
protein (MAP) kinase signaling pathway is<br />
reactivated downstream of BRAF. In either<br />
case, combining a BRAF inhibitor with an<br />
inhibitor of MAP kinase kinase (MEK), which<br />
is immediately downstream of BRAF, could<br />
overcome resistance and prolong survival. Such<br />
a trial is now underway with GSK2118436—a<br />
small-molecule inhibitor of the V600E mutant<br />
BRAF—and MEK inhibitor GSK1120212, both<br />
from GlaxoSmithKline in London and soon to<br />
be in phase 2/3 studies.<br />
Meanwhile, PLX4032 is moving forward<br />
quickly. An already completed phase 2 trial<br />
will “we all believe … likely be enough for FDA<br />
approval next year,” says Flaherty. Phase 3 will<br />
definitively show whether PLX4032 changes the<br />
natural history of the disease and extends survival.<br />
The list of agents in phase 3 trials is growing<br />
(Table 1), although none of them displayed<br />
the efficacy of ipilimumab and PLX4032 in<br />
phase 2. One comparable compound, however,<br />
is Bristol-Myers Squibb’s humanized anti-PD-1<br />
mAb, MDX-1106. PD-1, or programmed cell<br />
death-1, is a T-cell molecule that, like CTLA4,<br />
downregulates T-cell activity. It appears to be at<br />
least as powerful as CTLA4, and may function at<br />
the later stages of the immune response to shut<br />
down T cells.<br />
In phase 1, MDX-1106 treatment led to 15<br />
confirmed responses among 46 metastatic<br />
melanoma patients. As of June, none of the<br />
responders had relapsed, with more than a<br />
year passing in several cases. “This is one of<br />
the most promising starts I’ve seen for any<br />
drug,” said Sznol, the trial’s principal investigator.<br />
“It’s the kind of thing where we can’t<br />
sleep because we want to offer this to our next<br />
patient.” Autoimmune side effects occur, but<br />
fewer than with ipilimumab. A combination<br />
trial with ipilimumab has begun (see p. 765).<br />
The most anticipated combination is ipilimumab<br />
and PLX4032. This would bring<br />
together the quick responses of PLX4032 with<br />
ipilimumab’s ability to deliver cures. “The two<br />
are made for one another,” says Kirkwood.<br />
Tumor cells killed by PLX4032 should release<br />
antigen, enhancing ipilimumab’s ability to activate<br />
antitumor T cells. Flaherty says that the two<br />
sponsoring companies have agreed to collaborate<br />
on a large randomized combination trial,<br />
which should begin next year.<br />
Individually, ipilimumab and PLX4032 have<br />
ended the futility and nihilism that have long<br />
dominated melanoma treatment. It will take<br />
time to sort out the best combinations and the<br />
best way to apply them. “But at least the cupboard<br />
is not bare any more,” said Sondak.<br />
Ken Garber Ann Arbor, Michigan<br />
764 volume 28 number 8 AUGUST 2010 nature biotechnology
news<br />
Firms combine experimental cancer drugs<br />
to speed development<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
Tackling breast cancer. Drug<br />
developers are starting to combine<br />
novel, unapproved agents in search of<br />
synergistic activity.<br />
The next generation of cancer treatments<br />
could be approved in pairs, at least judging<br />
by a growing trend among drug makers to<br />
combine drugs early in development and the<br />
US Food and Drug Administration’s (FDA)<br />
willingness to regulate<br />
them. On 2 June, the<br />
FDA opened its public<br />
consultation into the<br />
formulation of guidance<br />
for combinations of<br />
investigational therapies.<br />
In the same week, Merck,<br />
of Whitehouse Station,<br />
New Jersey, reported at<br />
the annual American<br />
Society of Clinical<br />
Oncology meeting in<br />
Chicago that a combination<br />
of ridaforolimus, an<br />
oral inhibitor of mammalian<br />
target of rapamycin<br />
(mTOR) developed<br />
with Ariad of Cambridge,<br />
Massachusetts, and dalotuzumab,<br />
an antibody<br />
targeting the insulinlike<br />
growth factor 1 receptor (IGFR1), led<br />
to responses in a cluster of patients with<br />
highly proliferative, estrogen-receptorpositive<br />
breast cancers in a phase 1b trial.<br />
Collaborations between different sponsors to<br />
combine drugs very early in development are<br />
unusual and pose new issues for regulators<br />
compared with oversight of combinations of<br />
agents already on the market.<br />
The FDA initiative is not limited to cancer—it<br />
also covers infection, seizure disorders<br />
and cardiovascular disease. But cancer<br />
drug makers, in particular, are grappling with<br />
some thorny questions as they attempt to<br />
translate their rapidly expanding knowledge<br />
of tumor biology into therapies that offer significant<br />
improvements on what is now available.<br />
Foremost among their concerns is how<br />
to accelerate clinical development to deliver<br />
solid efficacy data without compromising<br />
patient safety. “We’ve talked to the FDA about<br />
specific combinations and have received guidance<br />
on an ad hoc basis,” says Pearl Huang, vice<br />
president and oncology franchise integrator at<br />
Merck. “For us, the burning issue is if we demonstrate<br />
great activity for the combination, are<br />
we obligated to demonstrate lack of activity for<br />
the single agent alone?”<br />
Some claim combinations of investigational<br />
drugs could accelerate clinical development.<br />
Merck’s ridaforolimus-dalotuzumab program,<br />
which is due to enter phase 2 trials later this<br />
year, is a key initiative and is being closely<br />
scrutinized. It exemplifies a science-based<br />
approach to combining investigational drugs<br />
that may offer limited<br />
potential as single agents,<br />
but which may offer synergistic<br />
effects when administered<br />
together, as well as<br />
reducing the risk of drug<br />
resistance. Trials of several<br />
other combinations of<br />
new types of agents are also<br />
underway (Table 1).<br />
Although combination<br />
therapy in cancer—and<br />
other indications—is not<br />
a new theme, it has developed<br />
historically through<br />
Sebastian Kaulitzki/iStockphoto<br />
trial and error. “Our knowledge<br />
of biological pathways<br />
and networks is so superficial<br />
it really is hard to come<br />
up with a strong rationale,”<br />
says Alan Ashworth, professor<br />
of molecular biology<br />
at the Institute of Cancer Research in London.<br />
The ridaforolimus-dalotuzumab combination<br />
emerged from an unbiased screen of a colon<br />
cancer cell line in which individual genes were<br />
systematically switched off using short hairpin<br />
RNAs, whereas each of the two drugs was<br />
tested in turn in a cell proliferation assay. This<br />
kind of synthetically lethal screen can unveil<br />
dependencies between related pathways and<br />
overcome compensatory mechanisms that<br />
cancer cells switch to when only one target is<br />
hit. “Those types of approaches couldn’t be<br />
done before,” says Eric Rubin, vice president<br />
of clinical oncology at Merck. The upcoming<br />
phase 2 trial will recruit around 200 breast<br />
cancer patients, who will be assigned to one<br />
of four treatment arms, comprising either<br />
ridaforolimus as monotherapy, dalotuzumab<br />
as monotherapy, the two drugs in combination<br />
or exemestane, the active comparator.<br />
The key question is whether that kind of design<br />
would need to be replicated in a large-scale registration<br />
trial of a new combination comprising<br />
two investigational compounds. “What we have<br />
proposed—and others have as well—is to do this<br />
in a more limited setting,” Rubin says. Balancing<br />
regulators’ requirements for statistical power<br />
with patients’ needs for effective therapy is not<br />
a straightforward task, particularly if some trial<br />
participants are to receive single agents that are<br />
nature biotechnology volume 28 number 8 AUGUST 2010 765
NEWS<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
Table 1 Selected targeted experimental combination cancer therapies in development<br />
Company Combination Mechanism Indication Status<br />
AstraZeneca (AZ)<br />
Cediranib maleate<br />
Vascular endothelial growth factor (VEGF) receptor inhibitor + Recurrent Phase 1/2<br />
(AZD2171) + olaparib<br />
poly(ADP-ribose) polymerase inhibitor<br />
ovarian cancer<br />
AZ & Merck (Darmstadt,<br />
Germany)<br />
unlikely to confer any benefit, while at the same<br />
time, the duration of combination trials is significantly<br />
extended. Ashworth says that more innovative<br />
trial designs and early use of biomarkers<br />
can help—but only if there is already a solid case<br />
for moving a particular therapy into the clinic<br />
in the first place. “You need a very strong biological<br />
basis for your combination treatment,” he<br />
says. “If you need 4,000 patients to prove your<br />
hypothesis, I’m sorry mate, you’ve got the wrong<br />
hypothesis.”<br />
There is some precedent for rapid approval<br />
of investigational therapies based on a strong<br />
phase 2 efficacy signal, particularly when<br />
it is backed by a solid understanding of<br />
the underlying biological mechanism. For<br />
example, Novartis, of Basel, gained FDA<br />
approval for Gleevec (imatinib mesylate) in<br />
chronic myeloid leukemia on the basis of a<br />
phase 1b dose-escalating trial (New. Engl. J.<br />
Med. 344, 1031–1037, 2001). “If in a phase<br />
2 trial, you’ve figured out the right dose and<br />
the correct schedule for a combination, and<br />
you get a dramatic change in efficacy, for<br />
example in a directed patient population,<br />
a path for that combination could be very<br />
straightforward,” says Bill Sellers, global head<br />
of oncology research at the Novartis Institutes<br />
for Biomedical Research, in Cambridge,<br />
Massachusetts. Head-to-head studies<br />
against the existing standard of care would<br />
also smooth the path toward approval—and<br />
combination therapies, he says, should aim<br />
for curative levels of efficacy rather than<br />
small, incremental improvements. “A major<br />
change in the rate of complete response or<br />
partial response to a therapeutic says you’ve<br />
killed a lot of the cancer.”<br />
Many of the combinations being tested target<br />
different kinase enzymes. Merck’s Huang<br />
Cediranib maleate + cilengitide VEGF receptor inhibitor + integrin inhibitor Recurrent<br />
glioblastoma<br />
says the combination of their investigational<br />
anti-cancer agent MK-2206, that inhibits<br />
Akt (a component of the phosphatiyliositol-3<br />
kinase pathway), with London-based<br />
AstraZeneca’s selumetinib (AZD6244), an<br />
inhibitor of the enzyme MEK, was chosen<br />
because each target is part of a canonical signal<br />
transduction pathway, downstream from a<br />
receptor tyrosine kinase. “They’re in parallel,<br />
but they also cross-talk,” she says. “They are<br />
not the cancer’s mutational drivers, they’re<br />
more the downstream effectors.”<br />
Even so, insights into tumor biology do<br />
not always yield significant clinical benefits.<br />
“In oncology, what we think works and what<br />
[actually] works are two different things, and<br />
that’s why we need to do big studies,” says<br />
Justin Stebbing, a physician scientist based at<br />
Imperial College London. “The initial promise<br />
of biomarkers doesn’t hold up to scrutiny,<br />
ultimately.”<br />
Matthew Ellis, professor of medicine at<br />
Washington University in St. Louis, Missouri,<br />
who recently published genomic analyses of<br />
cancer and normal tissues taken from an<br />
individual with breast cancer (<strong>Nature</strong>, 464,<br />
999–1005, 2010), has a different take: “My<br />
guess is we can solve the companion diagnostic<br />
problem by making full-genome sequencing<br />
of cancer the primary screen.” “We’re<br />
beginning to understand cancer genomes at<br />
a much more fundamental level than we ever<br />
have before,” he adds. “What we’re seeing, I<br />
think, is a great deal of complexity, much<br />
more complexity than was ever appreciated<br />
before.” This complexity is accompanied by<br />
an appreciable degree of heterogeneity—no<br />
two cancers appear the same. “We’re [starting]<br />
to classify them and put them into different<br />
buckets,” says James Zwiebel, chief of the<br />
Phase 1b<br />
GlaxoSmithKline (London) GSK1120212 + GSK2141795 MEK inhibitor + Akt kinase inhibitor Solid tumors Phase 1b<br />
Novartis & GlaxoSmithKline BKM120 + GSK1120212 Phoshphoinositide-3-OH kinase inhibitor + MEK inhibitor Solid tumors Phase 1b<br />
AZ & Roche (Basel) Cediranib maleate + RO4929097 VEGF receptor inhibitor + γ-secretase inhibitor Solid tumors Phase 1<br />
Bristol-Myers Squibb (New York) Ipilimumab + MDX-1106 Cytotoxic T-Lymphocyte antigen 4 (CTLA-4) inhibitor + Programmed Melanoma Phase 1<br />
& Ono Pharma (London)<br />
death-1 receptor (PD-1) inhibitor<br />
Merck & Ariad Dalotuzumab + ridaforolimus Insulin-like growth factor receptor 1 (IGFR1) inhibitor + mTOR Neoplasms Phase 1<br />
inhibitor<br />
Merck & AZ MK-2206 + selumetinib Akt inhibitor + MEK1/2 inhibitor Solid tumors Phase 1<br />
Pfizer (New York) Figitumumab + PF-00299804 IGFR1 inhibitor + HER tyrosine kinase inhibitor Solid tumors Phase 1<br />
Pfizer Crizotinib + PF-00299804 Met tyrosine kinase inhibitor + HER tyrosine kinase inhibitor Non-small cell<br />
lung carcinoma<br />
Phase 1<br />
Roche GDC-0449 + RO4929097 Hedgehog antagonist + γ-secretase inhibitor Breast cancer<br />
Sarcoma<br />
Source: http://www.ClinicalTrials.gov<br />
Phase 1<br />
Phase 1/2<br />
investigational drug branch at the National<br />
Cancer Institute, in Rockville, Maryland.<br />
“That’s really only scratching the surface.<br />
When you get down to it, every patient is<br />
going to have some unique characteristics.”<br />
That could make life more difficult for drug<br />
developers, he notes.<br />
This genome-level view of cancer, rather<br />
than the classic assumption of cancer as a<br />
disease affecting a particular organ, is turning<br />
our understanding of cancer on its head.<br />
Breast cancer perfectly illustrates the point.<br />
“When you do the genetics, what you see is<br />
a constellation of rare diseases,” Ellis says.<br />
In contrast, gastrointestinal stromal tumors,<br />
for example, seem to have a more uniform<br />
genetic profile. “You’ve got rare diseases<br />
defined by a common mutation, and we’re<br />
making progress,” he says. “We haven’t<br />
worked out how to handle the reverse situation,<br />
a common disease defined by multiple<br />
rare mutations.”<br />
Although the cost of individual genome<br />
sequencing is falling, Sellers says that full<br />
cancer genome sequencing may not be necessary<br />
to identify the dominant mutations that<br />
drive a particular cancer: partial approaches,<br />
based on techniques such as hybrid capture,<br />
targeted resequencing and high-throughput<br />
genotyping, may be sufficient. But even with<br />
the correct genomic information at hand,<br />
clinical progress will remain difficult, as<br />
combining two investigational agents correctly<br />
is not a straightforward task. “This is<br />
probably the biggest challenge: finding the<br />
effective and tolerated dose and, importantly,<br />
the schedule,” Sellers says. “I think this is<br />
probably a bigger challenge than the FDA<br />
regulatory challenge.”<br />
Cormac Sheridan Dublin<br />
766 volume 28 number 8 AUGUST 2010 nature biotechnology
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
FDA transparency rules could hit small<br />
companies hardest<br />
The US Food and Drug Administration (FDA)<br />
is considering changing how much information<br />
it discloses about product applications—news<br />
that biotechs have greeted with a mixture of<br />
trepidation and hope. The agency is proposing<br />
to make publicly available ‘complete response’<br />
and ‘refuse-to-file’ letters for drugs and ‘not<br />
approvable’ letters for devices. From opinions<br />
gathered in advance of the final decision, it<br />
seems the smallest biotechs stand to lose the<br />
most.<br />
The proposed changes are wide-reaching<br />
and include some things most experts agree are<br />
good. On the upside, they say, this is an opportunity<br />
to make more information about what FDA<br />
does available to the public and ensure that data<br />
sources are more user-friendly. The downside,<br />
however, is the proposal to disclose information<br />
early in the approval process, including<br />
Investigational New Drug (IND) applications,<br />
holds and IND withdrawals. Few can see how<br />
revealing more information at the product<br />
application stage can be reconciled with trade<br />
secrets protection.<br />
The Biotechnology Industry Organization<br />
(BIO) wants more details about how these<br />
proposed regulations would be implemented.<br />
“They [FDA] define trade secrets [in the<br />
document], but oddly there is no definition<br />
of what constitutes competitive information,”<br />
explains Andrew Emmett, director for<br />
science and regulatory affairs at BIO, based<br />
in Washington, DC. The organization also<br />
wants clarification around who will decide<br />
what remains secret. Under current Freedom<br />
of Information Act regulations, Emmett<br />
says, companies have five days to determine<br />
whether documents that are going to be<br />
made public contain trade secrets that should<br />
be redacted. “We need to know exactly what<br />
the role of the sponsor will be in deciding<br />
what information is going to be shared,” he<br />
says. Otherwise, companies could be put at<br />
competitive disadvantage or become victims<br />
of wild speculation.<br />
The confidentiality issue is particularly critical<br />
for small biotechs. “When a small public<br />
company has a clinical trial pending, hedge<br />
fund managers do everything they can to get a<br />
sense of what the outcome might be,” says Alan<br />
Mendelson, senior partner at Los Angeles–<br />
headquartered law firm Latham and Watkins.<br />
If every pause in the clinical trial process gets<br />
announced to the public, it could lead to stock<br />
trading based on misleading or inadequate<br />
information. “It’s bad enough today,” he says,<br />
“But at least now people are commenting on<br />
definitive data, not just a signal that might prove<br />
to be nothing.”<br />
Wayne Kubick, a vice president in safety at<br />
Waltham, Massachusetts–based PhaseForward,<br />
says companies with “limited products” are also<br />
going to be at greatest risk of competitive disadvantage.<br />
Competitors will be able to use some<br />
types of information better than others. Says<br />
Gregory Conko, senior fellow at the Competitive<br />
Enterprise Institute in Washington, DC, “It’s<br />
less important with complete response or rejection<br />
letters, but with a new drug application, a<br />
hold, or a withdrawal, that is where tipping off<br />
competitors is a much bigger concern.” Smaller<br />
companies are already at a disadvantage in the<br />
review process. In comments it filed in April,<br />
BIO pointed out that a recent study from the<br />
law firm Booz Allen Hamilton found that small<br />
firms had only a 48% first-cycle approval rate<br />
for products in the priority review category,<br />
compared with a 78% rate for larger companies.<br />
In a survey of 168 of its members (http://www.<br />
bio.org/letters/20100412b.pdf), BIO also found<br />
that “early, frequent and explicit communication<br />
with the FDA” was felt to be the most helpful<br />
means for first-time filers to improve their success<br />
rates.<br />
The transparency initiative could help shore<br />
up this communication weakness. “A variety<br />
of leaders have been pushing for more open<br />
and straightforward dialog with the agency for<br />
years,” says J. Donald deBethizy, president and<br />
CEO of Winston-Salem, North Carolina–based<br />
Targacept. “This initiative could provide a means<br />
for that.” Greater transparency could also put<br />
pressure on FDA to provide rationales for rejections,<br />
which critics charge are sometimes based<br />
on “petty” issues, according to Conko.<br />
Overall, such changes may not necessarily<br />
translate to better decision making, Conko<br />
warns. “FDA’s political incentives are still poorly<br />
aligned. Even when their rationale is weak, they<br />
still don’t have to pay a price for it,” he says.<br />
On the other hand, transparency is not necessarily<br />
a bad thing. “The world is very different<br />
already in 2010” says Kubick. “We have<br />
clinicaltrials.gov and a lot of other information<br />
already available.” But it means companies will<br />
face more instances where study data is used out<br />
of context. “You have to protect yourself against<br />
people who data mine and then hold up a little<br />
data nugget as the truth,” deBethizy says.<br />
Many are watching closely as the next phase<br />
of the initiative rolls out. “This is by no means a<br />
done deal,” says Kubick. “Some [of the proposed]<br />
things are going to happen, but not everything<br />
will.” Others are very skeptical, like Jack McLane,<br />
in brief<br />
Supremes rule on Bilski<br />
The US Supreme<br />
Court has ruled<br />
on a long-awaited<br />
and controversial<br />
patent litigation<br />
case, a decision<br />
greeted with relief<br />
by the biotech<br />
industry but<br />
vague enough that<br />
both sides can<br />
claim victory. The<br />
Bilski v. Kappos<br />
case was closely<br />
news<br />
Biotech welcomes<br />
ruling.<br />
watched by the biotech community after<br />
the US Court of Appeals for the Federal<br />
Circuit ruled in 2008 that only methods<br />
tied to a machine or transformed into a<br />
different state are patentable, a standard<br />
which appeared to exclude crucial aspects of<br />
medical diagnostics. Commentators feared a<br />
restrictive ruling could have severely limited<br />
the ability to obtain patents on methods<br />
that use genes, proteins and metabolites<br />
to diagnose disease. Instead, the Supreme<br />
Court struck down patent claims on narrow<br />
grounds. “The Court was clearly conscious<br />
of the potential negative and unforeseeable<br />
consequences of a broad and sweeping<br />
decision,” stated Washington, DC–based<br />
Biotechnology Industry Organization<br />
president and CEO Jim Greenwood. The court<br />
ruled on two issues. First, it ruled against<br />
patenting only those inventions that are<br />
“tied to a particular machine” or those that<br />
transform “a particular article into a different<br />
state or thing.” Second, the court held that<br />
the word “process” as used in the US Patent<br />
Act should be read broadly to include modern<br />
day inventions. The ruling does not address<br />
the eligibility of patents for diagnostic<br />
methods, however, which leaves a number<br />
of questions unanswered with regard to a<br />
string of pending cases, including the closely<br />
watched dispute against Myriad Genetics<br />
and its breast cancer gene patents. Dan<br />
Ravicher of the Public Patent Foundation, a<br />
co-plaintiff with the American Civil Liberties<br />
Union in the suit against Myriad Genetics,<br />
believes “the opinion reinforces the line of<br />
case law that Judge Sweet relied upon in his<br />
decision striking down gene patents [in the<br />
Myriad case]. It rejects the argument that<br />
‘anything’ is patentable.” Justices Stevens,<br />
Breyer, Ginsburg and Sotomayor would have<br />
struck down not only the specific Bilski<br />
business method claims, but all business<br />
method patents on historical grounds that<br />
this class of patents was never contemplated<br />
by the framers of the US Constitution.<br />
The same argument would be difficult to<br />
support in biotech-specific cases as there<br />
is ample evidence that Thomas Jefferson,<br />
who reformed the Patent Act of 1793,<br />
considered medicine a “useful art” as was<br />
originally stated, a language later changed to<br />
“process.” Kenneth Chahine & Javier Mixco<br />
Lee Pettet/istockphoto<br />
nature biotechnology volume 28 number 8 AUGUST 2010 767
NEWS<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
in brief<br />
Lawsuits rock Jackson<br />
Lee Pettet/istockphoto<br />
Litigation over models<br />
may inflate prices.<br />
The Jackson<br />
Laboratory has<br />
unwittingly found<br />
itself ensnared in<br />
patent disputes.<br />
In June, the<br />
nonprofit laboratory<br />
mouse developer<br />
located in Bar<br />
Harbor, Maine, was<br />
cleared of a patent<br />
infringement<br />
allegation—the<br />
first in the<br />
laboratory’s 80-year history—and now<br />
faces a second allegation by another party.<br />
Jackson’s mission of making its repository<br />
of more than 5,000 mouse strains available<br />
to researchers at affordable prices could<br />
be challenged if it is forced to continue<br />
defending itself in expensive lawsuits, says<br />
David Einhorn, the laboratory’s in-house<br />
attorney. In Jackson’s first scuffle, the<br />
Central Institute for Experimental Animals<br />
(CIEA), a Kawasaki, Japan–based nonprofit,<br />
in 2008 sued Jackson for distributing a<br />
mouse model particularly useful for grafting<br />
human tissue. Both groups in the 1990s<br />
separately developed these immunodeficient<br />
mice by starting with a strain of nonobese<br />
diabetic mouse (NOD), crossing those<br />
with mice carrying the scid mutation for<br />
immunodeficiency, and crossing them again<br />
with mice whose gene for a key immune<br />
signaling molecule, interleukin-2 receptor γ,<br />
was knocked out. Jackson has distributed the<br />
mouse to more than 1,000 research groups<br />
worldwide, says Einhorn. But the laboratory<br />
didn’t patent its mouse, whereas CIEA did.<br />
On June 1, a US District Court judge ruled<br />
that the Jackson Laboratory had not infringed<br />
CIEA’s patent. What ultimately swayed the<br />
judge to side with Jackson was that the<br />
CIEA, in its patent application, described the<br />
mouse but didn’t claim it. In his decision the<br />
judge cited the Guidelines for Nomenclature<br />
of Mouse and Rat Strains, which state mice<br />
inbred for more than 20 generations can be<br />
considered a different strain, and Jackson’s<br />
mouse line had been separately inbred<br />
many times. Michael Rader, attorney with<br />
Wolf, Greenfield & Sacks in Boston, who<br />
represented Jackson, says this was likely<br />
the first time nomenclature rules have been<br />
used to help decide a lawsuit. Now Jackson<br />
faces another lawsuit involving transgenic<br />
mice with mutations useful in Alzheimer’s<br />
disease research. The Alzheimer’s Institute<br />
of America in February sued Jackson and six<br />
biotech and pharma companies for patent<br />
infringement. Despite the high costs of the<br />
two lawsuits, Einhorn says Jackson won’t<br />
alter its mission of making laboratory mice<br />
accessible. But he notes that if the suing<br />
trend continues, “the most obvious way<br />
to recoup the costs is to charge more for<br />
mice.” He adds: “That falls on the backs of<br />
scientists who do the research.” Emily Waltz<br />
The FDA’s Transparency Task Force is proposing to increase access to the agency’s decision letters<br />
about products or drugs. Such a move would challenge small biotech.<br />
vice president of clinical and regulatory affairs<br />
at Hudson, Massachusetts–based Clinquest.<br />
McLane points out that releasing more data<br />
earlier will also stretch the agency’s resources<br />
because there will be pressure to analyze many<br />
more signals quickly and thoroughly. “It’s a tremendous<br />
overreach,” he says. “A lot of people<br />
do not think this will go through.” McLane says<br />
he’d rather see the agency bring their transparency<br />
rules in line with the Sarbanes-Oxley Act<br />
of 2002, which set new standards for US boards,<br />
management and accounting firms. “A lot of<br />
what the FDA is asking for here is competitive<br />
information,” he says.<br />
in their words<br />
“They have grown so<br />
fast and so suddenly<br />
that people are still<br />
skeptical. But we should<br />
get used to it.” Rasmus<br />
Nielsen, a geneticist<br />
at the University of<br />
California at Berkeley,<br />
who collaborates with<br />
Chinese colleagues, on<br />
China’s sudden boom in<br />
sequencing output. (The Washington Post,<br />
28 June 2010)<br />
“Until the capacity issues can be addressed, this<br />
will not be an effective agent.” Chris Logothetis,<br />
head of prostate cancer research at the University<br />
The agency was accepting comments<br />
through July 20. In the autumn, the task<br />
force will consider the public comments as<br />
well as the “priority, operational feasibility,<br />
and resource requirements” of each proposal,<br />
according to Afia K. Asamoah, director of the<br />
FDA’s transparency initiative. BIO submitted<br />
one set of comments in April, and Emmett<br />
says the group will submit more before the<br />
deadline. Even if the agency decided to go<br />
through with all the proposals, though, some<br />
of the changes could not be implemented<br />
without new legislation.<br />
Malorye Allison Acton, Massachusetts<br />
of Texas MD Anderson Cancer Center in Houston,<br />
on the year-long wait patients currently face for<br />
Dendreon’s prostate cancer vaccine Provenge.<br />
(Pharmalot, 28 June 2010)<br />
“Everyone can claim victory, except of course<br />
Mr. Bilski himself.” Dan Ravicher, of the Public<br />
Patent Foundation, the organization leading the<br />
attack on Myriad, on the Supreme Court’s decision<br />
in Bilski v. Kappos. (GoozNews, 28 June 2010)<br />
“Now that the full integration has taken place,<br />
it’s the Genentech guys who are being promoted<br />
and getting the key positions.” Allianz Global<br />
Investors’ Joerg de Vries-Hippen on how<br />
Genentech is the strongest in the marriage with<br />
Roche. (Bloomberg Businessweek, 1 July 2010)<br />
JASON REED/Reuters/Corbis<br />
768 volume 28 number 8 AUGUST 2010 nature biotechnology
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
Food firms test fry Pioneer’s trans fat–free<br />
soybean oil<br />
The US Department of Agriculture (USDA)<br />
has approved for environmental release one of<br />
the first biotech crops aimed at the food industry.<br />
The new crop, a genetically modified soybean<br />
with an altered fatty acid profile, yields oil<br />
that is more stable at high frying temperatures<br />
and has a longer shelf life than commodity soybean<br />
oil. It was developed by Pioneer Hi-Bred<br />
in Johnston, Iowa, a Dupont company. The<br />
company received marketing approval for the<br />
biotech soybean in June and aims to commercialize<br />
it by 2012. St. Louis–based Monsanto<br />
is following close behind, with two soybean<br />
products with modified oil profiles in its pipeline.<br />
The new soybean<br />
traits may help the<br />
biotech industry<br />
deliver on a twodecade-long<br />
promise:<br />
to develop crops with<br />
improved nutritional<br />
value. Until now,<br />
most commercialized<br />
biotech crops have<br />
been engineered with<br />
such traits as pest<br />
resistance and herbicide<br />
tolerance—traits<br />
that mostly benefit<br />
farmers rather than the food industry or consumers.<br />
“Heat stability and longer shelf life:<br />
these are the things that can light up the food<br />
industry, not reduced pesticides,” says Tom<br />
Hoban, a professor of food science at North<br />
Carolina State University in Raleigh.<br />
Pioneer is marketing its new soybean oil as<br />
an alternative to partially hydrogenated vegetable<br />
oils. For decades, food producers have<br />
relied on partially hydrogenated soybean oil<br />
because it retains its flavor at high cooking<br />
temperatures and for extended periods on the<br />
grocery store shelf. But the process of partial<br />
hydrogenation produces trans fatty acids, or<br />
trans fats, which are known to increase ‘bad’<br />
low-density lipoprotein (LDL) cholesterol and<br />
increase risk of coronary heart disease.<br />
In 2006, the US Food and Drug<br />
Administration began requiring food manufacturers<br />
to label food with trans fats, and<br />
measures to alert the public of the health risks<br />
of trans fats ensued. Food producers turned<br />
to alternatives, such as palm oil and certain<br />
kinds of canola oil, that have more stable frying<br />
and shelf life characteristics than those<br />
of unhydrogenated soybean oil. As a result,<br />
soybean oil’s share of the edible fats and oils<br />
The success of Pioneer’s recently approved soy<br />
bean, which has been engineered to cut down on<br />
trans fats, will depend on how well it is received<br />
by the food industry.<br />
market has gone from 76% in 2005 to 64%<br />
today, according to the US Census Bureau.<br />
“We hope to recapture that space [for soybeans],”<br />
says Pioneer’s Russ Sanders, director<br />
of enhanced oils.<br />
Pioneer’s new soybean oil has an oleic fatty<br />
acid content of >75%, a property that gives it<br />
frying and shelf stability comparable to that<br />
of palm, high oleic acid canola and hydrogenated<br />
soybean oils. It also contains 20% less<br />
saturated fat than commodity soybean oil.<br />
Pioneer dubbed the crop “Plenish high-oleic<br />
soybeans.” Overproduction of oleic acid and<br />
decreased levels of linoleic and linolenic acids<br />
in Plenish arise from<br />
transgenic expression<br />
of a fragment of the<br />
soybean microsomal<br />
omega-6 desaturase<br />
gene (FAD2-1)<br />
under the control<br />
of soybean Kunitz<br />
trypsin inhibitor<br />
gene promoter, which<br />
John Lee/iStockphoto<br />
silences endogenous<br />
omega-6 desaturase.<br />
The transgenic<br />
soybean also carries<br />
the S-adenosyl-lmethionine<br />
synthetase<br />
as a marker to enable initial selection<br />
in the laboratory by acetolactate synthase<br />
(ALS)-inhibiting herbicide.<br />
The success of the Plenish soybean will<br />
depend on how well it is received by the food<br />
industry. Pioneer has already set up testing<br />
agreements with a dozen undisclosed food<br />
companies, says Sanders. The companies will<br />
run consumer taste tests, frying tests and shelf<br />
life tests—just about anything a food company<br />
would normally do with a new ingredient.<br />
Food companies can already choose from an<br />
array of oils with modified fatty acid contents<br />
developed with conventional breeding. “The<br />
hard reality will be how producers of liquid<br />
vegetable oils compete,” says Terry Etherton,<br />
professor of animal nutrition at Penn State in<br />
University Park, Pennsylvania.<br />
Food industry representatives say they welcome<br />
the new oil option, but see it as a “trial<br />
situation,” says Jeffrey Barach, vice president<br />
of science policy at Grocery Manufacturers<br />
Association in Washington, DC .“Each company<br />
has to try it out and do some experimental<br />
work,” he says.<br />
Although Pioneer received the full go-ahead<br />
from regulators, the company doesn’t plan to<br />
in brief<br />
news<br />
Anti-CD20 patent battle ends<br />
On June 1, a four-year dispute over a European<br />
patent for anti-CD20 drugs to treat rheumatoid<br />
arthritis came to an end, with Seattle-based<br />
Trubion winning the dispute. This result frees up<br />
the space for anyone with a CD20 program, says<br />
Jeff Pepe, associate general counsel at Trubion.<br />
Multiple oppositions had been filed against the<br />
patent (European Patent 1176981) held jointly<br />
by Genentech of S. San Francisco, California,<br />
and Biogen Idec of Cambridge, Massachusetts.<br />
Trubion was joined by MedImmune, GenMab,<br />
Centocor, the Glaxo Group and Merck Serono, all<br />
pursuing anti-CD-20 programs at one time. In<br />
2008, the Opposition Division of the European<br />
Patent Office ruled that, as filed, the patent did<br />
not meet the necessary requirements, favoring<br />
Trubion. Genentech and Biogen appealed in<br />
2009. Finally, at an oral hearing this June,<br />
the original ruling was upheld, and no further<br />
appeals will be allowed. Ironically, around the<br />
time of the hearing, New York–based Pfizer,<br />
which acquired Trubion’s CD20 programs when<br />
it bought Wyeth in 2009, announced they would<br />
drop Trubion’s lead anti-CD20 compound (TRU-<br />
015) though retaining the biotech’s second<br />
generation anti-CD20 monoclonal antibody also<br />
in rheumatoid arthritis. For Genentech/Roche<br />
“the decision does not impact our expectations<br />
with respect to protection against Rituxan<br />
[rituximab, anti-CD20 chimeric monoclonal<br />
antibody],” says company spokesperson<br />
Rubin Snyder. <br />
Laura DeFrancesco<br />
EU states free to ban GM crops<br />
In July, the European Commission (EC)<br />
officially proposed to give member states<br />
the freedom to veto cultivation of genetically<br />
modified (GM) crops without having to<br />
back their decision with scientific evidence<br />
on new risks. The reform’s goal is to hand<br />
back responsibility to individual states and<br />
speed up pending authorizations. Anti-GM<br />
countries can now choose to opt out whereas<br />
biotech-friendly countries can cultivate new<br />
GM varieties. However, there is no guarantee<br />
it will work. “We are not against freedom<br />
for member states, the problem is how<br />
the principle is articulated,” says Carel du<br />
Marchie Sarvaas, director for agricultural<br />
biotech at EuropaBio. The proposal stands on<br />
two legs: an amendment to directive 2001/18<br />
that must gain the approval of the council<br />
of ministers and the European Parliament,<br />
and an EC recommendation on coexistence,<br />
already effective. The first legalizes national<br />
or local bans on growing, the second one<br />
achieves the same result by conceding that<br />
countries wanting to keep ‘contamination’<br />
levels well below the labeling threshold can<br />
enforce wide isolation distances between<br />
GM and conventional or organic fields. “It’s<br />
a Pandora’s box. We are concerned it will<br />
create legal uncertainty and unpredictability<br />
for farmers and operators,” says du Marchie<br />
Sarvaas. The reform doesn’t target imports of<br />
GM material for food or feed, whose approvals<br />
are also stalled. <br />
Anna Meldolesi<br />
nature biotechnology volume 28 number 8 AUGUST 2010 769
NEWS<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
in brief<br />
GM alfalfa—who wins?<br />
Both sides are claiming victory following the<br />
Supreme Court’s verdict issued June 21 in<br />
Monsanto v. Geerston Seed Farms over the<br />
future sale of Roundup Ready (RR) alfalfa<br />
seeds. The Supreme Court repealed a lower<br />
court injunction issued in 2007 banning the<br />
biotech seeds nationwide (Nat. Biotechnol. 28,<br />
184, 2010). Monsanto’s business lead for the<br />
crop, Steve Welker, says the St. Louis–based<br />
company has plenty of RR alfalfa seeds<br />
“ready to deliver,” although their release is<br />
subject to a pending environmental impact<br />
statement (EIS) by the US Department of<br />
Agriculture (USDA). “Our goal is to have<br />
everything in place for growers to plant in fall<br />
2010,” Welker adds. Not so fast, says lawsuit<br />
opponent Andrew Kimbrell of the Center for<br />
Food Safety in Washington. He points out<br />
that the Supreme Court “just took away the<br />
injunction, and USDA still has to comply with<br />
NEPA [the National Environmental Policy<br />
Act] and complete an EIS” before the crop<br />
can be deregulated. Although USDA appears<br />
poised to complete its EIS and fully deregulate<br />
RR alfalfa, the Center for Food Safety could<br />
renew its challenge of USDA’s decision.<br />
This lingering uncertainty has agitated many<br />
members of Congress. Seven senators and<br />
49 representatives have asked agriculture<br />
secretary Tom Vilsack to retain regulated status<br />
for RR alfalfa, whereas two other senators have<br />
urged Vilsack to “mount vigorous defenses<br />
against lawsuits that seek to upend sciencebased<br />
regulatory decisions.” Jeffrey L Fox<br />
Biofuel ‘Made in China’<br />
Collaboration between the Danish enzyme<br />
producer Novozymes of Bagsvared, Beijingbased<br />
China Petroleum and Chemical and<br />
Cofco, the state-run agriculture company, will<br />
produce three million gallons of ethanol a<br />
year for local consumption, using corn stalks<br />
and leaves from northeastern China’s corn<br />
belt. The demonstration plant will test novel<br />
technologies, including Novozymes’ new<br />
Cellic CTec2 enzymes, with a view to launch a<br />
commercial facility by 2013. Cofco has been<br />
running a small pilot plant in Heilongjiang<br />
province for four years, but as a precondition<br />
for commercialization “we need more capacity<br />
to optimize our design and operation,” says<br />
Guo Shunjie, general manager of Cofco’s<br />
bio-energy and biochemical department. One<br />
remaining hurdle is the inability to break down<br />
five-carbon sugars abundant in lignocellulose,<br />
which make up 20–40% of the plant biomass.<br />
The new process could cut costs considerably,<br />
as it requires half the dose of enzymes needed<br />
by other treatments to break down plant waste.<br />
The partners’ goal is to produce cellulosic<br />
ethanol at $2.25 a gallon, a price further<br />
pushed down by government tax credits to be<br />
competitive with corn-based ethanol, currently<br />
at $1.50–1.60 a gallon. “Since the trend to<br />
lower carbon emissions is here to stay, it<br />
won’t be long before we break even,”<br />
says Shunjie.<br />
Daniel Grushkin<br />
Table 1 USDA-approved soybeans modified for improved trans fat content<br />
Product Company Description<br />
DP-305423<br />
Pioneer Hi-Bred<br />
International<br />
commercialize Plenish soybeans until the first<br />
quarter of 2012, after food players have had<br />
time to determine what food applications, if<br />
any, they want to pursue with Plenish soybeans.<br />
“We’re being fairly conservative in our<br />
commercialization schedule,” Sanders says.<br />
The time to market also depends on<br />
Pioneer’s ability to secure regulatory approval<br />
in key global markets, such as Europe, Japan,<br />
China, Taiwan and South Korea, Sanders says.<br />
The soybean is already approved in Canada<br />
and Mexico.<br />
Global regulatory hurdles hampered<br />
Dupont’s earlier development of a different<br />
high oleic acid soybean (Table 1). In 1997, the<br />
USDA approved, or deregulated, DD-026005-3<br />
—a Dupont soybean with an oleic acid content<br />
of 85%. This variety was modified with<br />
an extra copy of soybean Δ 12 -fatty acid dehydrogenase<br />
under the control of the soybean<br />
β-conglycinin promoter, which triggered<br />
silencing of the transgene and its counterpart<br />
endogenous gene. But the product fizzled<br />
after the company encountered global regulatory<br />
complexities associated with the crop’s<br />
marker technology, says Sanders. Markers<br />
are used by crop developers to test whether<br />
genetic material is successfully transferred<br />
to the host crop. In this case, DD-026005-3<br />
contained the Escherichia coli uidA gene,<br />
encoding β-glucuronidase as a colorimetric<br />
marker, and the bla gene, encoding the<br />
enzyme β-lactamase as a selective marker<br />
that confers resistance to β-lactam antibiotics<br />
(such as penicillin and ampicillin).<br />
Pioneer’s new high oleic soybean targets the<br />
same oleic acid pathway as the 1997 version,<br />
but it is hoped that use of a different marker<br />
gene, one imparting tolerance to an ALSinhibitor<br />
herbicide, will smooth the regulatory<br />
path. (The plant will not be tolerant to<br />
ALS-inhibitor herbicides at the levels used in<br />
the field.) Sanders says he is “optimistic” about<br />
the 2012 regulatory goals.<br />
On Pioneer’s regulatory heels are two<br />
Monsanto soybean products with modified<br />
oil profiles, one with omega-3 fatty acids for<br />
High oleic acid soybean produced by inserting extra copies of a<br />
portion of the gene encoding omega-6 desaturase, gm-fad2-1,<br />
resulting in silencing of the endogenous omega-6 desaturase<br />
gene (FAD2-1).<br />
DD-026005-3 DuPont High oleic acid soybean produced by inserting a second copy of<br />
a portion of the gene encoding omega-6 desaturase, gm-fad2-1,<br />
resulting in silencing of the endogenous omega-6 desaturase<br />
gene (FAD2-1).<br />
OT96-15<br />
Source: AGBIOS<br />
Agriculture & Agri-Food<br />
Canada<br />
Low linolenic acid soybean produced through traditional crossbreeding<br />
to incorporate the trait from a naturally occurring fan1<br />
gene mutant that was selected for low linolenic acid.<br />
nutrition and the other with enhanced texture<br />
and functionality, called high stearic<br />
acid soybeans. Monsanto has submitted to<br />
the USDA petitions for deregulation of both<br />
products. Still in the discovery phase, Dow<br />
AgroSciences in Indianapolis, Indiana is<br />
developing omega-9 canola and sunflower<br />
oils. With one nutritionally altered crop<br />
approved and a handful in the pipeline,<br />
the public may finally get what it has been<br />
promised for two decades. But whether<br />
high oleic acid soybeans directly benefit<br />
consumers enough to boost public opinion<br />
of biotech crops is doubtful, say agriculture<br />
experts. “Companies already have methods<br />
of removing trans fats” from food, says Jane<br />
Rissler, a senior scientist with the Union for<br />
Concerned Scientists in Washington, DC.<br />
Pioneer is “offering an alternative to those<br />
existing methods” without much added benefit<br />
to consumers, she says. Alan McHughen,<br />
a plant biotechnologist at the University of<br />
California, Riverside, notes that: “Those<br />
who already despise [genetic modification]<br />
will continue to do so, those who accept GM<br />
will continue to do so, and most others won’t<br />
even notice it, as it’s not a high-profile whole<br />
food with immediate consumer-recognized<br />
benefit.”<br />
In the US, food companies aren’t required<br />
to label food derived from genetically engineered<br />
crops, and generally don’t voluntarily<br />
do so.<br />
An April 2010 survey of 750 US consumers<br />
asked this question: “All other things<br />
being equal, how likely would you be to<br />
buy a food product made with oils that had<br />
been modified by biotechnology to avoid<br />
trans fats?” Seventy-four percent said they<br />
were either very likely or somewhat likely to<br />
buy this kind of biotech food. However, in a<br />
separate question, only 32% of those respondents<br />
said they had a favorable impression of<br />
biotech food. The survey was conducted by<br />
the International Food Information Council<br />
Federation in Washington, DC.<br />
Emily Waltz Nashville, Tennessee<br />
770 volume 28 number 8 AUGUST 2010 nature biotechnology
data page<br />
2Q10—spreading the wealth<br />
Walter Yang<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
Although biotech stocks, along with the general markets, performed<br />
poorly last quarter, more companies were able to access capital, more than<br />
in each of the previous four quarters. Excluding US partnership monies,<br />
219 companies pulled in $8.1 billion (compared with 157 firms raising $5.3<br />
Stock market performance<br />
The BioCentury 100 and the NASDAQ Biotechnology were down 11% and<br />
15%, respectively, similar to other major indices.<br />
Index<br />
1,700<br />
1,600<br />
1,500<br />
1,400<br />
1,300<br />
1,200<br />
1,100<br />
1,000<br />
900<br />
800<br />
700<br />
600<br />
500<br />
12/2008<br />
1/2009<br />
2/2009<br />
3/2009<br />
4/2009<br />
5/2009<br />
6/2009<br />
7/2009<br />
8/2009<br />
9/2009<br />
10/2009<br />
Month<br />
11/2009<br />
12/2009<br />
Global biotech industry financing<br />
BioCentury 100<br />
Dow Jones<br />
S&P 500<br />
NASDAQ<br />
NASDAQ Biotech<br />
Swiss Market<br />
1/2010<br />
2/2010<br />
Partnership Debt and other financing Venture Follow-on PIPE<br />
2Q10<br />
1Q10<br />
4Q09<br />
3Q09<br />
2Q09<br />
6.1, 2.1, 1.3, 1.3, 0.5, 0.4<br />
8.5, 5.0, 1.7, 0.6, 0.4, 0.3<br />
9.4, 2.3, 1.2, 2.4, 0.6, 0.7<br />
8.0, 2.6, 1.2, 0.8, 0.7, 0.0<br />
0 5 10 15 20 25<br />
Amount raised ($ billions)<br />
3/2010<br />
4/2010<br />
5/2010<br />
6/2010<br />
Excluding partnership monies, 2Q10 funding was up $8.1 billion, 53%<br />
on 2Q09, largely through debt deals, which shot up 97%.<br />
Global biotech initial public offerings<br />
Amount raised ($ millions)<br />
700<br />
600<br />
500<br />
400<br />
300<br />
200<br />
100<br />
0<br />
0<br />
0<br />
0<br />
2Q09<br />
15<br />
7<br />
635<br />
3Q09<br />
50<br />
151<br />
70<br />
4Q09<br />
Financial quarter<br />
31<br />
0<br />
364<br />
1Q10<br />
50<br />
85<br />
208<br />
2Q10<br />
IPO<br />
14.8, 3.1, 1.6, 2.3, 0.7, 0.3<br />
Partnership figures are for deals involving a US company. Source: BCIQ: BioCentury Online Intelligence,<br />
Burrill & Co.<br />
Ten companies raised $342.9 million through IPOs last quarter versus<br />
none in 2Q09.<br />
Asia-Pacific<br />
Europe<br />
Americas<br />
2Q09 3Q09 4Q09 1Q10 2Q10<br />
Americas 0 2 2 4 4<br />
Europe 0 1 2 0 5<br />
Asia-Pacific 0 1 2 2 1<br />
Table indicates number of IPOs. Source: BCIQ: BioCentury Online Intelligence<br />
billion in 2Q09), 39% of which originated from debt deals by Genzyme<br />
(Cambridge, MA) and Teva Pharmaceuticals (Petah Tikva, Israel). Venture<br />
funding was up 36% from 2Q09; ten companies launched initial public<br />
offerings (IPOs), raising $342.9 million.<br />
Global biotech venture capital investment<br />
Venture money raised was up 36% to $1.7 billion from $1.2 billion in<br />
2Q09.<br />
Amount raised ($ millions)<br />
1,800<br />
1,600<br />
1,400<br />
1,200<br />
1,000<br />
800<br />
600<br />
400<br />
200<br />
0<br />
Notable Q2 deals<br />
Venture capital<br />
$9<br />
$180<br />
$1,035<br />
2Q09<br />
$6<br />
$104<br />
$1,064<br />
$9<br />
$479<br />
$1,065<br />
$24<br />
$331<br />
$939<br />
3Q09 4Q09 1Q10<br />
Financial quarter<br />
Amount<br />
raised<br />
($ millions)<br />
$0<br />
$458<br />
$1,210<br />
2Q10<br />
Americas<br />
Europe<br />
Asia<br />
2Q09 3Q09 4Q09 1Q10 2Q10<br />
Americas 43 49 60 60 76<br />
Europe 14 14 32 30 28<br />
Asia-Pacific 1 1 1 1 1<br />
Table indicates number of venture capital investments and includes rounds where the amount raised was<br />
not disclosed. Source: BCIQ: BioCentury Online Intelligence<br />
Company (lead investors)<br />
Round<br />
number<br />
Date<br />
closed<br />
AiCuris (Santo Holding) 74.9 2 14-Apr<br />
Achaogen (Frazier Healthcare) 56.0 3 7-Apr<br />
Pacific Biosciences (Gen-Probe) 50.0 6 17-Jun<br />
OptiNose (Avista Capital) 48.5 NA 8-Jun<br />
Agile Therapeutics (Investor Growth Capital,Care Capital) 45.0 2 14-Jun<br />
Tetraphase (Excel Venture) 45.0 3 1-Jun<br />
Anaphore 3 (5AM Ventures, Versant, Apposite Capital) 38.0 1 14-May<br />
Mergers and acquisitions<br />
Target<br />
Acquirer<br />
Value<br />
($ million)<br />
Date<br />
announced<br />
OSI Pharma Astellas 4,000 17-May<br />
Valeant Biovail 3,200 21-Jun<br />
Abraxis Celgene 2,900 30-Jun<br />
Wuxi PharmTech Charles River 1,500 26-Apr<br />
IPOs<br />
Company (lead underwriters)<br />
Amount<br />
raised<br />
($ millions)<br />
Change<br />
in stock<br />
price<br />
since offer<br />
Date<br />
completed<br />
Codexis 78.0 –33% 22-Apr<br />
Alimera 72.1 –32% 22-Apr<br />
Lansen Pharma 50.2 3% 30-Apr<br />
Tengion 30.0 –26% 9-Apr<br />
GenMark 27.6 –26% 28-May<br />
Aposense 24.8 –11% 7-Jun<br />
Licensing/collaboration<br />
Researcher Investor<br />
Value<br />
($ millions) Deal description<br />
TransTech Forest $1,100 Exclusive, worldwide rights, excluding the Middle East and<br />
North Africa, to develop and commercialize small-molecule<br />
glucokinase activators<br />
Regulus Sanofi-aventis >$750 Discover, develop and commercialize microRNA therapeutics<br />
for up to four targets<br />
Diamyd Johnson &<br />
Johnson<br />
$625 Exclusive rights to Diamyd diabetes vaccine outside Nordic<br />
countries<br />
Neurocrine Abbott $595 Exclusive, worldwide rights to develop and commercialize<br />
endometriosis compound elagolix<br />
OncoMed Bayer >$500 Discover and develop antibodies, proteins and small molecules<br />
targeting the Wnt signaling pathway to treat cancer<br />
Source: BCIQ: BioCentury Online Intelligence<br />
Walter Yang is Research Director at BioCentury<br />
nature biotechnology volume 28 number 8 AUGUST 2010 771
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
NEWS feature<br />
Drugmakers dance with autism<br />
With monogenetic neurodevelopmental disorders similar to autism<br />
serving as starting points for several drug discovery programs,<br />
smaller biotechs are now joining big pharma in pursuing therapies<br />
to tackle this perplexing condition. Sarah Webb reports.<br />
In June, the Autism Research Project published<br />
the largest genetic study of autism so<br />
far, identifying 226 gene mutations that are<br />
found in people with the syndrome 1 . Children<br />
with autism are 20% more likely to carry one<br />
of these rare mutations, though they are not<br />
inheriting them; they are present in less than<br />
6% of the parents of autistic children. This<br />
study adds to the growing list of genes that<br />
could serve as starting points for research on<br />
autism therapies.<br />
Whereas the pharmaceutical industry<br />
increasingly has been shying away from<br />
psychiatric disorders, such as schizophrenia<br />
and depression, interest in autism has<br />
intensified. Together with an increasing<br />
number of autism cases diagnosed each<br />
year, there is a dearth of effective treatments.<br />
As a result, “autism seems to be a relatively<br />
hot area,” says Manuel Lopez-Figueroa<br />
of Bay City Capital, a venture capital firm<br />
in San Francisco, and scientific liaison for<br />
the Pritzker Neuropsychiatric Disorders<br />
Research Consortium. Not only is the pharmaceutical<br />
sector ploughing R&D resources<br />
into the condition, but several smaller companies<br />
are pioneering therapies, one of which<br />
is an enzyme replacement therapy already in<br />
phase 3 human testing (Table 1 and Box 1).<br />
What’s more, progress in drug discovery programs<br />
aiming to target proteins associated<br />
with Mendelian neurodevelopmental disorders<br />
may pave the way for expansion into<br />
broader spectrum autism conditions.<br />
Repurposed drugs<br />
Current estimates indicate that 1 in 110 children<br />
in the United States have an autism<br />
spectrum disorder defined by three core<br />
symptoms: deficits in social interactions,<br />
problems with communication and repetitive<br />
behaviors. Although twin and family studies<br />
have established a strong genetic basis for<br />
autism, no clear genetic cause has emerged.<br />
In addition to complex genetics, the disorder<br />
is phenotypically diverse: individuals with<br />
an autism spectrum diagnosis may be intelligent<br />
and high functioning (e.g., those with<br />
Asperger’s syndrome) or have severe mental<br />
deficits. The large variation in phenotypes and<br />
Trouble at the synapse. The genetics of autism is<br />
pointing toward malfunctioning at the synapse.<br />
high concordance in monozygotic twins suggests<br />
many genetic and environmental biasing<br />
factors are involved.<br />
A diagnosis of autism brings along a slew of<br />
unmet medical needs, including anxiety, sleep<br />
disturbances, and metabolic and gastrointestinal<br />
issues. Initial moves by industry into<br />
autism therapeutics have involved applying<br />
existing drugs to alleviate some of these symptoms,<br />
says Sophia Colamarino, vice president<br />
for research at Autism Speaks, a patient advocacy<br />
group based in New York. “In the short<br />
term, that’s where many of the pharmaceutical<br />
companies will be able to have an immediate<br />
impact,” she says. Two atypical antipsychotics<br />
have been approved by the US Food and Drug<br />
Administration (FDA) for treating irritability<br />
in autistic children. Johnson & Johnson’s<br />
Risperdal (risperidone) was approved in<br />
late 2006, followed by Abilify (aripiprazole)<br />
from Bristol-Myers Squibb in New York, and<br />
Otsuka in Princeton, New Jersey, in 2009.<br />
Selective serotonin reuptake inhibitors such<br />
as low-dose Prozac (fluoxetine) are approved<br />
for use in adults and children for obsessive<br />
compulsive disorder and have been tested in<br />
children with autism. Anticonvulsives such<br />
Mike Agliolo/Corbis<br />
as valproate (Stavzor, Depakene, Depacon)<br />
may serve the same sort of purpose for some<br />
patients, says Eric Hollander, director of the<br />
Compulsive, Impulsive and Autism Spectrum<br />
Disorders Program at Albert Einstein College<br />
of Medicine and Montefiore Medical Center<br />
in New York.<br />
Treating these related symptoms gives<br />
patients and their caregivers an improved<br />
quality of life, making it more likely that<br />
an individual with autism can live at home<br />
rather than in a care facility, Hollander adds.<br />
Improving those related symptoms can also<br />
make patients more responsive to behavioral<br />
therapies, says Robert Ring, who is heading<br />
up Pfizer’s autism research unit in Groton,<br />
Connecticut.<br />
At least one repurposed drug is targeting the<br />
imbalance between excitatory and inhibitory<br />
signaling suspected to be part of the basis of<br />
autism. New York-based Forest Laboratories is<br />
testing Namenda (memantine), an Alzheimer’s<br />
drug and N-methyl d-aspartate receptor<br />
(NMDA) receptor modulator, in a phase 2 trial<br />
in autism patients.<br />
Abnormal synaptic connectivity<br />
Because this spectrum of disorders has a<br />
clear genetic basis but no clear genetic cause,<br />
researchers are chewing on the question of how<br />
so many different mutations could lead to a<br />
similar phenotype, says Luca Santarelli, head<br />
of Roche’s central nervous system exploratory<br />
development in Basel.<br />
Genetic studies are important, but they don’t<br />
tell a complete story. “Identifying genes and<br />
coming up with gene candidates is really just<br />
a first step in gaining confidence in a potential<br />
genetic target that could be druggable,” says<br />
John Spiro, a research director at the Simons<br />
Foundation Autism Research Initiative in New<br />
York City. “There are not many genes that you<br />
can be really, really confident are accounting<br />
for any significant portion of autism.” Though<br />
researchers remain hopeful that the genes might<br />
converge into a single meaningful pathway, he<br />
adds, “for the most part in autism, it’s not clear<br />
yet that’s going to be the case.”<br />
Nonetheless, some patterns are emerging<br />
that may help researchers devise new therapeutic<br />
strategies. A genome-wide survey of a group<br />
of autistic and mentally retarded individuals<br />
revealed a set of mutations (point mutations<br />
and copy number variants) in a gene, SHANK2,<br />
that controls synaptic structure, defects in<br />
which could lead to problems in neuronal<br />
communication 2 .<br />
Mutations in another family of genes<br />
involved with synapse formation, the neuroligins,<br />
which code for adhesion molecules<br />
that cluster on the receiving side<br />
772 volume 28 number 8 august 2010 nature biotechnology
news feature<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
Box 1 Enzyme replacement for autism?<br />
Unlike other emerging treatment strategies for autism that target genes or neurochemical<br />
pathways, Rye New York’s Curemark is working on an enzyme replacement therapy<br />
comprising a mixture of several digestive enzymes (Table 1). In clinical work with children<br />
who showed symptoms of autism, Curemark’s founder and CEO, Joan Fallon, noticed that<br />
several of these patients restricted their diets by their own choice, preferring carbohydrateladen<br />
foods such as crackers and pasta. Searching for an explanation, she found that these<br />
patients had low fecal levels of the protease chymotrypsin (fecal chymotrypsin levels have<br />
also served as a diagnostic indicator of cystic fibrosis). Children with autism without a known<br />
genetic cause, often had these low enzyme levels, Fallon says.<br />
Administering high-protease enzymes, the physicians observed behavioral changes in<br />
the children. Fallon filed patents in 1999 and formed a biotech company in 2005. The<br />
company’s protease-based treatment, CM-AT, is currently being tested in a phase 3 study<br />
with 170 children ages 3–8 in 12 locations around the United States.<br />
of the synapse, may account for up to 6%<br />
of autism cases, according to Nils Brose,<br />
director of the Department of Molecular<br />
Neurobiology at the Max Planck Institute<br />
of Experimental Medicine, in Göttingen,<br />
Germany. Neuroligins 3 and 4 localize to<br />
glutamatergic synapses, and loss-of-function<br />
mutations in these genes segregate in<br />
certain pedigrees with mental retardation,<br />
autism and Asperger’s syndrome. These<br />
molecules are likely operating as the organizational<br />
point for information coming into<br />
the postsynaptic space, recruiting signaling<br />
receptors. In mouse knockouts of two of<br />
these neuroligins, Brose says, “the synapses<br />
are intrinsically operational, but they lack<br />
normal receptors and as a consequence don’t<br />
function properly.”<br />
But just noting a connection between these<br />
genes and synaptic structures isn’t enough for<br />
developing drug candidates, Spiro adds. “You<br />
don’t know. Is it too much? Is it too little? Are<br />
[the structures] in the wrong place during<br />
development? There are just a million questions<br />
that need to be ironed out before you can<br />
think about a pharmaceutical intervention.”<br />
Santarelli’s group at Roche is trying to get at<br />
some of these questions, in collaboration with<br />
Peter Scheiffele, a professor of cell and developmental<br />
neurobiology at the University of<br />
Basel and a leader in the neuroligin research<br />
area. “We’d like to understand the common<br />
downstream effects of different genetic alterations<br />
that lead to autisms and whether there<br />
are common mechanisms that could lead to<br />
treatments,” Santarelli says.<br />
Clues from rare single-gene disorders<br />
The increasing understanding of some of the<br />
molecular mechanisms of autism is providing<br />
one avenue forward. The second breakthrough,<br />
according to Colamarino, is coming through<br />
animal studies of single-gene disorders such<br />
as fragile X 3 and Rett’s syndromes 4 , which are<br />
found in a disproportionate number of individuals<br />
who meet the criteria for autism spectrum<br />
disorders. Since 2007, a handful of studies of<br />
animal models with inducible mutations have<br />
shown that animals can develop to adulthood<br />
with these disorders, and then recover after<br />
proper gene function is switched back on.<br />
That ability to reverse the symptoms in animals<br />
with advanced disease has been a major<br />
breakthrough, says Spiro. With clear genetic<br />
causes coupled with the opportunity to build<br />
animal models of these disorders, “it may be<br />
very reasonable to say that the pathway to drug<br />
discovery in autism may be paved by a careful<br />
focus on these rarer syndromes,” Ring says.<br />
Fragile X syndrome provides a case study<br />
in this approach that weds treatment strategies<br />
for a rare disorder with the possibility<br />
of understanding the underpinnings of<br />
autism. This genetic disorder, which affects 1<br />
in 4,000 males and 1 in 6,000 females (http://<br />
www.fraxa.org/), leads to learning disabilities<br />
and even mental retardation, anxiety and seizures.<br />
Up to 20% of individuals with fragile X<br />
also meet the criteria for an autism diagnosis.<br />
As a result of a single gene mutation, these<br />
individuals do not make the fragile X mental<br />
retardation protein (FMRP). Mark Bear of<br />
the Massachusetts Institute of Technology in<br />
Cambridge and his colleagues found that the<br />
lack of FMRP leads to dysregulation of signaling<br />
through the metabotropic glutamate<br />
receptors (mGluR). The mGluR5 receptor is<br />
highly expressed in regions of the brain critical<br />
for learning and memory.<br />
FMRP serves as a brake on this signaling<br />
pathway, says Randall Carpenter, CEO<br />
and president of Seaside Therapeutics, a<br />
Cambridge, Massachusetts, biotech company<br />
co-founded by Bear. “When it’s not<br />
there then there’s overactivation of the signaling<br />
pathway. The brain can’t discriminate<br />
between important information and noise<br />
and it doesn’t develop normally.” In mice<br />
with the fragile X mutation, Bear and his colleagues<br />
found that knocking down expression<br />
of mGluR5 to 50% rescued the learning<br />
deficits, stopped seizures and increased other<br />
measures of plasticity in the brain.<br />
Confident that they’re targeting the appropriate<br />
pathways, Seaside Therapeutics has licensed<br />
a series of small-molecule compounds from<br />
Merck to target glutamate signaling in general<br />
and mGluR5 signaling specifically, Carpenter<br />
says. They recently completed a phase 2 clinical<br />
trial of a general γ-aminobutyric acid (GABA)<br />
B agonist, STX209, in fragile X patients, and<br />
will soon complete a phase 2 trial of the same<br />
compound in individuals with autism spectrum<br />
disorders. A specific antagonist of the mGluR5<br />
receptor is currently in repeat-dose phase 2 trials,<br />
and Seaside expects to start phase 2 trials<br />
with fragile X patients by early 2011.<br />
Mutations in glutamate receptor genes<br />
GRIN2A and GRIK2 and multiple GABA<br />
receptor genes have been associated with<br />
autism. Two pharma companies also see<br />
promise in the mGluR5 receptor strategy<br />
for treating fragile X patients. Novartis in<br />
Basel recently completed a phase 2 clinical<br />
trial of their compound AFQ 056 at sites<br />
in Europe and is planning their next study,<br />
which is scheduled to open later in 2010, says<br />
spokesman Jeffrey Lockwood in an e-mail.<br />
Roche’s small-molecule mGluR5 antagonist<br />
is being tested in phase 2 clinical trials<br />
in five locations in the United States, says<br />
Santarelli. Their results are “encouraging so<br />
far,” he says. This growing understanding<br />
of these specific, related genetic disorders,<br />
Santarelli adds, provides a pathway to think<br />
about possible extrapolations to the more<br />
sporadic types of autism.<br />
Peptide hormone targets<br />
The peptide oxytocin and its related receptors<br />
are emerging as a pathway that could prove<br />
useful for treating a variety of neuropsychiatric<br />
disorders including autism. Animal studies<br />
have pointed to the importance of oxytocin in<br />
social behavior; in voles, for example, oxytocin<br />
and its counterpoint hormone vasopressin<br />
appears to have a role in pair bonding.<br />
Karen Parker and her colleagues at Stanford<br />
University in California observed seasonal<br />
differences in the way females and males who<br />
are raising young interacted. In the laboratory,<br />
they tracked these differences, caused by purely<br />
environmental cues to the locations of oxytocin<br />
receptors in the animals’ brains. Changes based<br />
on environmental cues have led researchers to<br />
consider oxytocin therapies for treating social<br />
dysfunctioning in humans.<br />
Such tests are already being done in humans.<br />
Hollander has given intravenous oxytocin<br />
nature biotechnology volume 28 number 8 august 2010 773
NEWS feature<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
to higher functioning patients with autism<br />
and Asperger’s syndrome and has observed<br />
improved social cognition. Patients were better<br />
able to lay down social memories or recognize<br />
emotions in spoken language, he says.<br />
Such treatments also decreased the severity<br />
of repetitive behaviors and self-stimulatory<br />
behaviors such as hand clapping, rocking and<br />
head banging.<br />
Patients treated with intranasal oxytocin<br />
showed similar improvements. Earlier this<br />
year, researchers at the Center for Cognitive<br />
Neuroscience in Bron, France, found that adults<br />
diagnosed as high functioning on the autism<br />
spectrum who received doses of intranasal<br />
oxytocin were better able to recognize cooperative<br />
play than adults with a similar diagnosis<br />
who had not received oxytocin. Those who had<br />
received oxytocin also spent more time looking<br />
at the face of their virtual playmates 5 .<br />
But teasing out the importance of oxytocin<br />
isn’t easy. The French study shows variation in<br />
individual responses to oxytocin. “We don’t have<br />
good biomarkers of oxytocin levels,” Parker says.<br />
Funded by a grant from the Simons Foundation,<br />
she and her colleagues are trying to measure<br />
plasma oxytocin levels, various mutations and<br />
social phenotypes among individuals with<br />
autism and their siblings and compare them<br />
with controls matched for age and gender.<br />
Oxytocin and the related response pathways<br />
represent “one of the most exciting biologies in<br />
the autism space today,” says Pfizer’s Ring and<br />
could have implications for other psychiatric<br />
areas as well. In research Ring carried out at<br />
Wyeth, he developed the first nonpeptide oxytocin<br />
receptor agonist 6 . “The oxytocin receptor<br />
is a priority target for the field, but a very<br />
challenging target to develop traditional smallmolecule<br />
chemistry for.”<br />
Cellceutix, a biotech company in Beverly,<br />
Massachusetts, is also testing a preclinical<br />
compound for autism, KM-391, in a rodent<br />
model of autism developed by researchers at<br />
the Kennedy Krieger Institute in Baltimore.<br />
The autism-like symptoms are induced by<br />
injecting the chemical 5,7-dihydroxytryptamine<br />
(5,7-DHT) into the forebrain of newborn<br />
rat pups, leading to neonatal serotonin depletion,<br />
reduced brain plasticity and abnormal<br />
behaviors. In an initial study, KM-391 given<br />
over 90 days restored normal behaviors, and<br />
near-normal serotonin levels and increased<br />
brain plasticity relative to a nontreatment<br />
group and a group given Prozac. Another study<br />
measuring serotonin levels in three regions of<br />
the rat brain has confirmed the restoration of<br />
normal serotonin levels.<br />
Another small study added an oxytocin<br />
antagonist to the mix. The antagonist alone<br />
intensified the autism-related behaviors, such as<br />
Table 1 Selected companies with autism targets in clinical development<br />
Company Target Drug candidate<br />
Curemark Protease CM-AT (a mixture of amylase, protease, chymotrypsin,<br />
deficiency trypsin, papain and papaya in a 4–10:1 ratio with lipase,<br />
derived from animal, plant, microbial or synthetic sources)<br />
repetitive behaviors and sensitivity to touch, but<br />
when given with KM-391, the frequency and<br />
intensity of these behaviors were reduced.<br />
Measuring outcomes<br />
Fueled by academic research and increased<br />
funding from the US National Institutes of<br />
Health, nonprofit and advocacy organizations,<br />
the field is moving forward. But even<br />
as some drug candidates are moving into the<br />
clinic, a number of challenges remain for the<br />
field as a whole. Above all is the problem of<br />
the heterogeneity of the disorder, according<br />
to Colamarino. “We’re calling it one thing<br />
when it’s really probably more than one.” That<br />
heterogeneity can pose a challenge in choosing<br />
appropriate study subjects. The field is<br />
also struggling with finding appropriate outcome<br />
measures, particularly those that can be<br />
measured within the time frame of a clinical<br />
study. Without sensitive measures of changes<br />
in the core symptoms, researchers need to<br />
identify what the focus should be within a<br />
particular trial. In many cases researchers<br />
have depended on parental reporting of<br />
behavioral changes, Colamarino says, leading<br />
to a large placebo effect. Although no<br />
biomarkers have been established for autism,<br />
some sort of biological measure of change<br />
in connection with autism’s core symptoms,<br />
would be particularly attractive. Some clinical<br />
trials have failed because of methodological<br />
issues, she adds. “That’s why we need to<br />
address this sooner rather than later.”<br />
To bring researchers together to discuss<br />
these challenges, Autism Speaks and Pfizer<br />
are co-sponsoring a translational research<br />
meeting to improve clinical study methodology<br />
and design, tentatively scheduled for<br />
later this year. “There’s no better investment<br />
for us externally than to bring together all<br />
the key experts in this area and have a discussion<br />
with FDA present and try to iron<br />
out a framework to address this challenge<br />
together,” Ring says. The development of the<br />
Diagnostic and Statistical Manual of Mental<br />
Stage of<br />
development<br />
Phase 3<br />
Novartis mGluR5 AFQ 056 (small molecule) Phase 2<br />
Roche mGluR5 RO4917523 (small molecule) Phase 2<br />
Seaside<br />
Therapeutics<br />
Forest<br />
Laboratories<br />
GABA B<br />
mGluR5<br />
NMDA receptor<br />
modulator<br />
STX209 (R-isomer of baclofen)<br />
STX107 (2-methyl-1,3-thiazol-4-yl)<br />
ethynylpyridine)<br />
Phase 2<br />
Phase 1<br />
Namenda (memantine) Phase 2<br />
Disorders (DSM-V), the bible for neurological<br />
diseases, scheduled for release in May<br />
2013, could complicate the development of<br />
trial endpoints, Bay City’s Lopez-Figueroa<br />
adds, depending on how autism disorders<br />
and symptoms are classified.<br />
A second meeting in early 2011 will look at<br />
clinical targets—both their identification and<br />
validation—in an attempt to reach a consensus<br />
on where therapeutics can bring the most<br />
initial benefit to patients. This is something<br />
the field is still struggling with, Ring says. “If<br />
we had one shot today to demonstrate that<br />
this would work, what would be the clinical<br />
target that we should take on?”<br />
Pfizer and Roche are also developing an<br />
autism proposal for the Innovative Medicines<br />
Initiative, which coordinates European<br />
Union–based public-private partnerships in<br />
drug discovery and development. The idea<br />
is for companies to join forces to work on<br />
research that is not generating intellectual<br />
property, Santarelli says, such as the development<br />
of animal models, understanding disease<br />
mechanisms and physiology, finding biomarkers<br />
and developing clinical methodology.<br />
Unquestionably, developing therapeutics<br />
for a developmental neuropsychiatric disorder<br />
with such an early onset presents several<br />
challenges. But Autism Speaks’ Colamarino is<br />
encouraged by the growth in the field. “Three<br />
to five years ago, we wouldn’t have been talking<br />
about clinical trials, certainly with respect to<br />
novel drug discovery,” she says. Pfizer’s Ring<br />
expects industry involvement to continue to<br />
grow: “It’s just too large an unmet medical<br />
need for companies not to see the opportunity<br />
to enter into this research space.”<br />
Sarah Webb, Brooklyn, NY<br />
1. Pinto, D. et al. <strong>Nature</strong> 466, 368–372 (2010).<br />
2. Berkel, S. et al. Nat. Genet. 42, 489–491 (2010).<br />
3. Guy, J. et al. Science 315, 1143–1147 (2007).<br />
4. Dölen, G. et al. Neuron 56, 955–962 (2007).<br />
5. Andari, E. et al. Proc. Natl. Acad. Sci. USA 107,<br />
4389–4394 (2010).<br />
6. Ring, R.H. et al. Neuropharmacology 58, 69–77<br />
(2010).<br />
774 volume 28 number 8 august 2010 nature biotechnology
uilding a business<br />
At ground level<br />
Julian Bertschinger<br />
The hardest—and perhaps loneliest—period of being an entrepreneur might be just after your company is founded.<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
cofounded Covagen when I was 30 years<br />
I old. Although my PhD and postdoc work<br />
had taught me to think in a focused manner<br />
and be product oriented, I was as green as<br />
they come concerning the nuts and bolts of<br />
launching a company. Picking it up as you go<br />
might not be the optimal way to learn, but<br />
I’m living proof that it can be done with the<br />
right team. Here’s how we did it.<br />
Two men and a plan<br />
The most important motivating factor, for me,<br />
was my education. I did my thesis in Dario<br />
Neri’s lab at the Institute of Pharmaceutical<br />
Sciences at ETH Zurich. The research group<br />
there had just isolated an antibody fragment<br />
that binds to a tumor-associated marker,<br />
and proof-of-concept data showed that the<br />
fragment selectively targeted solid tumors<br />
in mice. Neri went on to cofound Philogen,<br />
based in Siena, Italy, and develop the antibody<br />
in collaboration with Bayer Schering<br />
in Berlin. Today, several derivatives of this<br />
antibody are in phase 2 trials.<br />
Seeing this process firsthand showed me<br />
(and Dragan Grabulovski, my cofounder at<br />
Covagen, which is based in Zurich) that it was<br />
possible to move from the lab to the commercial<br />
side. This had our group thinking about<br />
products right away, which I believe is crucial<br />
when contemplating a biotech company. But<br />
the truth is that Covagen never would have<br />
been founded without the Venture business<br />
plan competition, organized every two years<br />
by McKinsey, in Zurich, and ETH Zurich.<br />
One of the winners of this competition was<br />
Glycart Biotechnology, also in Zurich, which<br />
took the prize in 1998 and eventually was<br />
acquired by Roche, in Basel, Switzerland, for<br />
CHF235 million (US$180 million) in 2005.<br />
Grabulovski and I decided to take part in<br />
Julian Bertschinger is CEO at Covagen,<br />
Zurich, Switzerland.<br />
e-mail: julian.bertschinger@covagen.com<br />
the Venture 2006 competition for two reasons:<br />
we were eager to learn how to write a business<br />
plan (we’d never written one) and we thought<br />
it would be interesting precisely because it was<br />
so different from the reports and scholarly<br />
articles we were used to writing.<br />
The competition is divided into two<br />
phases. During the first, entrants submit a<br />
business idea outlined on a few pages, and<br />
the best ten ideas are awarded a prize. In the<br />
second, all participants receive free coaching<br />
from industry experts and venture capitalists,<br />
who then give advice to participants<br />
writing their first business plan. The ten best<br />
business plans are chosen by a jury and all<br />
receive the same prize amount of CHF2,500<br />
(US$2,057).<br />
We submitted our business idea, but I<br />
didn’t actually expect us to be one of the<br />
winners; I was busy applying for postdoc<br />
positions abroad. Nevertheless, our idea<br />
was chosen out of about 100 applications<br />
to be awarded with a CHF2,500 prize. This<br />
Box 1 The technology behind Covagen<br />
Covagen is built on Fynomer technology (Fig. 1),<br />
developed at the Institute of Pharmaceutical<br />
Sciences at ETH Zurich. Fynomers are a class of<br />
binding proteins derived from the Src homology<br />
3 (SH3) domain of the human Fyn kinase (D.<br />
Grabulovski et al. J. Biol. Chem. 282, 3196–<br />
3204 (2007)). The Fyn SH3 domain structure<br />
is made up of two anti-parallel β-sheets and<br />
two loops—n-src and RT—which are known to<br />
be involved in interactions with other ligand<br />
proteins.<br />
Fynomers can be produced in bacteria at<br />
high yields and are approximately 20 times<br />
smaller than antibodies. Additionally, they<br />
have the advantage of being easily assembled<br />
in a modular manner to yield bispecific and/or<br />
surprised me—not because we doubted our<br />
entry, which was based on the Fynomer technology<br />
(Fig 1; Box 1 and D. Grabulovski et<br />
al. J. Biol. Chem. 282, 3196–3204 (2007)) but<br />
because we felt that it was too early to found<br />
a company on the available results: we had<br />
no in vivo data.<br />
Looking back, the biggest effect of participating<br />
in Venture 2006 was that it let us begin<br />
to establish a business network—previously,<br />
we’d known only people within academia. At<br />
workshops during the second phase of the<br />
competition, we met Rudolf Gygax, a managing<br />
director of Novartis Venture Fund, who<br />
would be a key contact for us later on. He<br />
and Neri helped us to draft our first business<br />
plan.<br />
The prize money was certainly useful,<br />
but the large amount of positive feedback<br />
we received was even more important. That<br />
boosted our confidence, and after winning, I<br />
thought for the first time that we really could<br />
found our own company.<br />
Figure 1 Fyn Src homology 3 (SH3)<br />
domain structure. The RT-Src loop is<br />
shown in red, and the n-Src loop is shown<br />
in green. (Protein Data Bank entry 1M27)<br />
multivalent proteins, which might allow new treatment modalities that are challenging or<br />
impossible to explore with traditional antibody formats.<br />
nature biotechnology volume 28 number 8 august 2010 775
uilding a business<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
Box 2 Securing our funding<br />
I was able to found Covagen with an initial investment (in several tranches) from the<br />
Novartis Venture Fund. The first tranche came after signing investment documents, and<br />
the following tranches were hinged on attaining research milestones.<br />
It was crucial that Novartis Venture Fund was prepared to invest in us at a very early<br />
stage. Corporate venture funds are beneficial in this way: they are usually more likely to do<br />
early-stage investments than most private venture capitalists because corporate funds can<br />
afford longer times to exit. If you’ve hit upon an interesting idea in academia, you might<br />
look to corporate venture funds first.<br />
In 2009, Covagen was able to attract three other investors: the corporate venture<br />
fund MP Healthcare Venture Management, of Boston; Ventech, of Paris; and Edmond de<br />
Rothschild Investment Partners, also of Paris. We also have received some funds via our<br />
research collaboration with Roche, which was secured in June 2009.<br />
To move our interleukin-17A inhibitor into preclinical and clinical development, we<br />
are planning to raise additional money this year, so we are seeking one or two venture<br />
capitalists to join our existing investors.<br />
Founding Covagen<br />
We stayed in contact with Gygax, and he<br />
invited us to present our project at the<br />
Novartis Venture Fund headquarters in Basel.<br />
The fund was interested in investing, and we<br />
sat down to negotiate our first term sheet. I<br />
had absolutely no idea what the difference<br />
was between a binding contract and term<br />
sheet, and this was my initiation. I learned<br />
what Series A shares are, how to calculate<br />
pre-money and post-money valuations, what<br />
drag-along and tag-along clauses are, why a<br />
high liquidation preference for investors is<br />
bad for holders of common shares and how<br />
anti-dilution protection for investors can<br />
hurt founders in a down round. I was moving<br />
into a whole new world.<br />
It is very important to understand every<br />
word in term sheets and agreements. You<br />
should always know what you are signing.<br />
To do this, first make sure you find a lawyer<br />
who intimately knows relationships between<br />
venture capitalists and biotech startup companies,<br />
and then be persistent enough to ask your<br />
lawyer about every single expression or phrase<br />
you do not understand. (You can familiarize<br />
yourself somewhat with the terminology by<br />
using the internet, in particular http://www.<br />
investopedia.com/terms/v/venturecapital.asp,<br />
but also ask your lawyer directly.)<br />
When we finally signed the term sheet,<br />
we found it just meant more paperwork. We<br />
still needed to establish a licensing agreement<br />
with ETH Zurich and negotiate the<br />
investment and shareholder’s agreements. I<br />
admit that when I first read the investment<br />
document drafts, I thought the beginning<br />
definitions weren’t very relevant. But after<br />
further reading and questioning our lawyer,<br />
I quickly realized that those definitions are<br />
actually one of the most important things in<br />
a contract.<br />
Once all the details were ironed out<br />
(Box 2), we founded Covagen in December<br />
2006 and signed the investment agreements<br />
with Novartis Venture Fund. The real work<br />
was about to start.<br />
The lonely lab<br />
Grabulovski still had to finish his PhD thesis.<br />
This made me Covagen’s only employee<br />
from December 2006 until May 2007, and<br />
Covagen was a startup in every sense of<br />
the word. My first task was to open a bank<br />
account so Novartis Venture Fund could<br />
transfer in its investment.<br />
When that was done, I set up Covagen’s<br />
homepage (be sure to check for domain<br />
name availability before you decide on a<br />
company name). A friend of a friend runs a<br />
company offering website design and e-mail<br />
hosting services, and he helped me create<br />
Covagen’s website. Here’s a tip: make sure<br />
that you can administer the website yourself<br />
so you will not have to pay a web designer for<br />
every small change or update. In addition, I<br />
opened a Covagen e-mail account, and here,<br />
too, I made sure I could independently set up<br />
additional e-mail accounts.<br />
But there remained a very big need—work<br />
space. We had no laboratory. Unfortunately,<br />
ETH Zurich does not offer incubator space<br />
for spin-outs. Startup companies usually try<br />
to find space within the department they<br />
originated from, but in our case there was no<br />
room available. After asking around within<br />
ETH Zurich, Grabulovski learned of an empty<br />
laboratory not attached to any department,<br />
and we were able to make an arrangement to<br />
allow us to rent this space. In addition, our former<br />
institute enabled us to access some rather<br />
expensive instruments for an affordable fee.<br />
The laboratory was empty, except for<br />
benches and desks, and somewhat dusty. On<br />
my second day, I brought rags from home<br />
and started cleaning. This wasn’t really what<br />
I envisioned a biotech CEO doing, but the<br />
truth is, I was excited—I was starting a company<br />
from the very bottom! There was no<br />
network connection for my computer, no<br />
printer, no phone, no fax. However, after<br />
making a few calls with my mobile phone,<br />
the university’s staff set up all the necessary<br />
connections within a few days. This is<br />
a benefit of staying within academia: when<br />
starting your company, all issues related to<br />
infrastructure need only minimal time and<br />
management resources.<br />
After all that work, I thoroughly appreciated<br />
making the first company phone call and<br />
sending the first message from my Covagen<br />
e-mail account!<br />
With communications behind me, I was<br />
left with the science. It’s only when you start<br />
from scratch that you realize how many different<br />
instruments and tools, disposable<br />
plastic tubes, glassware, kits, antibodies and<br />
chemicals are needed for research, and I had<br />
none of it. I also realized how comfortable<br />
my life in the academic lab had been, where<br />
many instruments were available and I didn’t<br />
have to think about budgeting. That was not<br />
the case at Covagen, where I became very<br />
cost sensitive. Comparison shopping takes<br />
time, and it was four months before the last<br />
instruments and reagents arrived. This neatly<br />
coincided with Grabulovski earning his PhD<br />
in May 2007, and he joined Covagen as CSO.<br />
I finally had company.<br />
Building a biotech<br />
Established as Covagen, we now had several<br />
target proteins in mind to validate the<br />
technology, but we did not have a clear plan<br />
on which targets we wanted to focus on<br />
for the development of our first Fynomerbased<br />
clinical candidate. Choosing a good<br />
first target was the most important decision<br />
we needed to make because once we made<br />
the call, we’d invest most of our resources in<br />
that direction. We investigated many different<br />
targets to find one that was economically<br />
promising and in an area in which Covagen<br />
had freedom to operate. We decided to go for<br />
inhibition of the cytokine interleukin-17A,<br />
which is an attractive emerging target for<br />
diseases such as rheumatoid arthritis, psoriasis<br />
and uveitis.<br />
In early summer 2007, we hired another<br />
person to help speed up our research. We<br />
had spent less money than we expected in the<br />
first half of 2007, so we had sufficient financial<br />
resources to hire. We felt that our first<br />
employee should be someone we already knew<br />
and someone we could trust to be dependable<br />
776 volume 28 number 8 august 2010 nature biotechnology
uilding a business<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
and competent. As several investors had<br />
warned us, not getting along with co-workers<br />
is a big reason why many small companies fail.<br />
Personal frictions tend to increase even more<br />
if a company hits hard times.<br />
We asked Simon Brack, an antibody engineering<br />
specialist we knew from our time in Neri’s<br />
group, to join Covagen. Brack was returning to<br />
Switzerland from Oxford, where he’d worked as a<br />
postdoc. In October 2007, he became Covagen’s<br />
third employee and was a great hire.<br />
Even in a company as small as Covagen<br />
was then, there were a million administrative<br />
things to do, and they occupied a large amount<br />
of my time—I was finding it hard to do the<br />
necessary work on the bench to develop our<br />
technology, not to mention that creating documents<br />
and presentations for potential investors<br />
takes a lot of time. So at the very least, it<br />
felt good to know that if I had to leave the lab,<br />
I had four hands working while I was gone.<br />
Now, we are up to seven employees.<br />
Advancing our technology is the most<br />
important task we have at Covagen, just as<br />
it was when we started. For this reason, all<br />
employees at Covagen are PhD scientists. We<br />
are a young and enthusiastic team; none of<br />
us is older than 33. This can be a problem<br />
at times: when talking to investors, I realize<br />
that we sometimes lack credibility. Quite<br />
often, investors do not believe our claims,<br />
and mainly that’s because they do not believe<br />
I have enough experience. In some ways,<br />
they are right—I am a scientist still learning<br />
the business side of things. But we have<br />
been taught a lot about the varying aspects<br />
of drug development through working with<br />
Neri, and I believe a young group like us can<br />
learn fast if given the right advice.<br />
Currently, we’re getting that advice from<br />
Ray Hill, who was executive director for<br />
licensing in Europe at Merck & Co. and now<br />
is a visiting professor in neuroscience and<br />
mental health at Imperial College London.<br />
Hill sits on our board of directors. We’ve also<br />
established an excellent scientific advisory<br />
board, which will be of great help and value<br />
when bringing our first drug candidate to<br />
preclinical development and broadening our<br />
research activities.<br />
Conclusions<br />
Even as our company grows, things continue<br />
to change quickly and will for the foreseeable<br />
future. The larger we get, the more important<br />
(and time consuming) communicating<br />
with employees, investors, our board of<br />
directors and our scientific advisory board<br />
becomes. My tasks are always shifting as we<br />
adapt, improve and complement our skills.<br />
But this fluid environment is partially what<br />
makes startup companies attractive workplaces.<br />
Now, our company doesn’t feel so young<br />
anymore. This year, we plan to bring our<br />
first drug candidate to good manufacturing<br />
practice production and preclinical development.<br />
That, of course, will require additional<br />
money, and we plan to close a financing<br />
round this year. Raising a sizable round is<br />
another challenge for me, and it means I’m<br />
no longer on the bench. My job is raising<br />
money now. In that regard, I’ve graduated to<br />
the role of a typical biotech CEO.<br />
To discuss the contents of this article, join the Bioentrepreneur forum on <strong>Nature</strong> Network:<br />
http://network.nature.com/groups/bioentrepreneur/forum/topics<br />
nature biotechnology volume 28 number 8 august 2010 777
correspondence<br />
Waking up and smelling the coffee<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
To the Editor:<br />
As I pointed out recently on the Patent<br />
Docs weblog (http://www.patentdocs.<br />
org/), the editorial ‘Sitting up and taking<br />
notice’ in the May issue 1 , announcing<br />
Judge Sweet’s 29 March decision in favor of<br />
the plaintiffs in Association for Molecular<br />
Pathology v. US Patent and Trademark<br />
Office, contains several misstatements and<br />
promotes the wrong-headed idea that gene<br />
patenting is a problem.<br />
In describing the case, you begin by<br />
making factual errors. Judge Sweet’s<br />
decision (summary judgment) does not<br />
indicate that “the judge felt that Myriad<br />
had no case to argue.” Rather, summary<br />
judgment is used when there are no<br />
disputed issues of material fact, and the<br />
case is decided as a matter of law. I would<br />
argue that the prudence of Judge Sweet’s<br />
judgment is questionable because he chose<br />
to make law by deciding that DNA is not<br />
patent eligible for being “the physical<br />
embodiment of genetic information.”<br />
You then state that “[t]he plaintiffs…won<br />
on virtually every count.” In fact, the court<br />
refused to consider the US Constitutional<br />
issues raised in the complaint, which<br />
formed the basis for the breast cancer<br />
victims to have standing in the lawsuit.<br />
This is not trivial because the court used<br />
these constitutional issues not only to<br />
deny defendants’ motions to dismiss, but<br />
also, politically, to provide the political<br />
frisson so attractive to the American Civil<br />
Liberties Union (New York) and the Public<br />
Patent Foundation (New York).<br />
The editorial goes on to mischaracterize<br />
the effects of BRCA patents on research,<br />
stating that “Myriad’s influence has been<br />
particularly pernicious. Its lawyers have<br />
issued cease-and-desist letters to genetics<br />
laboratories in universities, hospitals and<br />
clinics that offered diagnostic services<br />
based on the BRCA1 and BRCA2 genes.”<br />
Why is enforcing your patent rights<br />
pernicious? Use of these patented tests by<br />
these institutions constitutes infringement.<br />
It doesn’t matter whether the infringer<br />
is a university, hospital or clinic, they<br />
are still liable for infringement owing to<br />
their for-profit, commercial activities.<br />
There is no evidence that Myriad Genetics<br />
(Salt Lake City, UT, USA) or any other<br />
gene patent holder has inhibited basic<br />
biological research by threatening patent<br />
infringement litigation; indeed, there are<br />
several thousand basic research papers in<br />
scientific journals that have been published<br />
since the BRCA gene patents were granted.<br />
The piece also attempts<br />
to achieve ‘truth by<br />
association’ in citing<br />
several groups having<br />
“concerns” about gene<br />
patents that filed amicus<br />
briefs, including the<br />
International Center for<br />
Technology Assessment,<br />
Greenpeace, the<br />
Indigenous Peoples’<br />
Council on Biocolonialism<br />
and the Council for<br />
Responsible Genetics.<br />
Their contribution would<br />
be more worthwhile if<br />
it did not include incorrect statements<br />
regarding gene patenting’s consequences,<br />
including “the privatization of genetic<br />
heritage, the creation of private rights of<br />
unknown scope and consequences and the<br />
violation of patients’ rights.”<br />
The editorial was correct in noting<br />
that “[t]he alignment of physicians’<br />
and patients’ groups with what are, in<br />
effect, antibiotech lobbyists is a worrying<br />
development,” albeit ignoring the fact that<br />
not only the biotech sector, but also the<br />
public should be worried if these groups get<br />
their way.<br />
The editorial did supply potentially<br />
informative data, that Myriad reported<br />
“$326 million in revenue from diagnostic<br />
testing against $43 million in costs.”<br />
Assuming that these numbers are correct,<br />
and reflect only BRCA testing, this<br />
could be a measure of the profitability of<br />
BRCA testing results (perhaps providing<br />
motivation for the “universities, hospitals<br />
and clinics” to be so keen on getting into<br />
the business, infringing or no). But even<br />
here, the figures are completely out of<br />
context. No indication is provided whether<br />
these profits are out of the ordinary for a<br />
diagnostics company, traditional or genetic,<br />
or whether the ‘costs’ include ancillary<br />
costs like genetic counseling or physician<br />
education (both critical in genetic<br />
diagnostics due to the consequences for a<br />
patient of receiving a genetic diagnosis).<br />
If Myriad’s profits are<br />
significantly higher than<br />
those at other diagnostic<br />
companies, that fact would<br />
be relevant. The absence of<br />
any comparisons suggests<br />
that the absolute numbers<br />
were used because they<br />
better supported the<br />
editorial’s views.<br />
Finally, the editorial<br />
departs from reality when<br />
it decries the patent system<br />
for rewarding “only the last<br />
inventive step—the small<br />
breakthrough that enables<br />
a concept to be realized.” Such a statement<br />
indicates just how little the writers<br />
understand the ‘balance of rights’ that the<br />
patent bargain actually strikes. The patent<br />
system rewards inventors who disclose<br />
how to make and use an invention that<br />
is new, useful and nonobvious. Whether<br />
the improvement is groundbreaking or<br />
incremental, satisfaction of the statutory<br />
requirements governs patentability. Thus,<br />
if technology becomes obsolescent, new<br />
technology takes its place—because patents<br />
expire, as indeed Myriad’s patents will<br />
begin to expire in 2014. The consistent<br />
lack of understanding of innovation and<br />
the patent process is illustrated by the<br />
suggestion that rights to specific genes in<br />
multigene tests be assigned based on “the<br />
importance of any specific gene sequence<br />
to the utility of the test.” This is something<br />
the marketplace can be counted on to do<br />
without the government’s help.<br />
The last sentence of the piece<br />
even acknowledges the editorial idea<br />
778 volume 28 number 8 AUGUST 2010 nature biotechnology
correspondence<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
is “implausible within the current<br />
petrified patent system and commercial<br />
infrastructure,” and then adds that this<br />
“doesn’t have to stop the dream” or “stop<br />
the discussion.” I would counter that the<br />
dream of better diagnostics and therapies<br />
is being, and has been, realized by 30 years<br />
of biotech and protection thereof by an<br />
invigorated patent system in the United<br />
States (and elsewhere). Changing that now,<br />
particularly if based on the wooly-headed<br />
arguments (really, sentiments) in the<br />
editorial, is the fastest and surest way that<br />
those hopes and dreams will be dashed.<br />
COMPETING FINANCIAL INTERESTS<br />
The author declares no competing financial<br />
interests.<br />
Kevin E Noonan<br />
McDonnell Boehnen Hulbert & Berghoff LLP,<br />
Chicago, Illinois, USA.<br />
e-mail: noonan@mbhb.com<br />
1. Anonymous. Nat. Biotechnol. 28, 381 (2010).<br />
<strong>Nature</strong> Biotechnology replies:<br />
We were not making the case that gene<br />
patenting itself was a problem, although it<br />
is clear that some DNA patents with overly<br />
broad claims are cause for concern. We<br />
disagree with the contention that “there<br />
is no evidence that Myriad Genetics…or<br />
any other gene patent holder has inhibited<br />
basic biological research by threatening<br />
patent infringement litigation.” There are<br />
cases where exclusive licensing practices<br />
(a particular problem for methods patents)<br />
or aggressive license enforcement has<br />
stymied research, as is detailed elsewhere<br />
in this issue 1 . The problems also reach<br />
beyond basic research: a survey of 132<br />
clinical laboratory heads in the United<br />
States found that 53% had “decided not<br />
to develop or perform a test/service for<br />
clinical or research purposes because of a<br />
patent” 2 . Indeed, one of the plaintiffs in<br />
the Association for Molecular Pathology<br />
v. US Patent and Trademark Office case<br />
is a patient who would like to have their<br />
BRCA1 test from Myriad independently<br />
verified by another laboratory, but cannot<br />
because of Myriad’s aggressive stance that<br />
prevents other laboratories performing the<br />
test. It might be good business for Myriad,<br />
but is it reasonable to enforce intellectual<br />
property in such a manner that it is so<br />
difficult for a patient to confirm a DNA<br />
test in an independent laboratory?<br />
The claim that new technology takes the<br />
place of ‘obsolescent’ technology because<br />
“patents expire” is also moot in relation to<br />
DNA patents. A point we were trying to<br />
make in the editorial is that the fields of<br />
molecular diagnostics and sequencing are<br />
moving so quickly that they are becoming<br />
obsolete along much shorter timelines<br />
than patent terms of 20 years. Although<br />
Genetic stability in two<br />
commercialized transgenic<br />
lines (MON810)<br />
To the Editor:<br />
A letter of correspondence by Dany Morisset<br />
and his colleagues 1 in the August 2009 issue<br />
cites two recent publications 2,3 in which “two<br />
commercial seed varieties of the MON810<br />
maize genetically modified<br />
event (ARISTIS BT and<br />
CGS4540) present genetic<br />
variation thus hampering the<br />
detection by several methods<br />
for MON810 (Monsanto, St.<br />
Louis).” As representatives of<br />
Monsanto Europe (Brussels),<br />
Syngenta Crop Protection<br />
(Basel) and Limagrain<br />
Services Holding (Chappes,<br />
France), we would like to<br />
correct the scientific record<br />
concerning the claimed<br />
“variation” of the transgenic<br />
insertion in these transgenic<br />
hybrids.<br />
Upon request for further information,<br />
Margarita Aguilera and her colleagues at<br />
the European Commission, Directorate<br />
General Joint Research Center (JRC) in Ispra,<br />
Italy, informed us that the seeds tested were<br />
among 26 MON810 varieties provided by the<br />
Spanish Instituto Nacional de Investigación<br />
y Technología Agraria y Alimentaria (INIA;<br />
Madrid). The Spanish agency did not provide<br />
the JRC with details of the respective batch<br />
numbers for each variety.<br />
Our investigation has revealed that the<br />
two deviating results were not in fact related<br />
to variation of the transgenic insertion,<br />
as reported by Aguilera et al. 2,3 . Instead,<br />
our conclusions are that the two varieties<br />
(reported as entry 2 and entry 5) were not<br />
MON810 maize hybrids at all.<br />
Variety CGS4540 (entry 5) is a Bt176 maize<br />
hybrid and we do not understand why the<br />
seed was provided by INIA as MON810.<br />
Entry 2, which was designated as Aristis<br />
it was not trivial to sequence a human<br />
gene 20 years ago, it is certainly becoming<br />
routine today.<br />
1. Carbone, J. et al. Nat. Biotechnol. 28, 784–791<br />
(2010).<br />
2. Cho, M.K. et al. J. Mol. Diagnostics 5, 3–6 (2003).<br />
Bt, is most likely Aristis, the conventional<br />
counterpart of Aristis Bt (MON810). When<br />
we requested INIA to send a sample of<br />
Aristis Bt to its official Spanish laboratory<br />
CSIC (Consejo Superior de Investigaciones<br />
Científicas) for testing, the<br />
results were positive for<br />
MON810, as expected.<br />
Aguilera and her<br />
colleagues were not able<br />
to provide a correct chain<br />
of custody for the samples<br />
used in their analyses,<br />
which would have allowed<br />
resolution of the origin of<br />
these deviating results.<br />
The seed industry has<br />
invested significantly to<br />
provide quality products<br />
to the market place, which<br />
includes selling compliant<br />
and stable products. Traits are tested for<br />
presence and stability for many generations<br />
before release to the market place. We<br />
are therefore convinced that there is no<br />
scientific evidence of instability in MON810<br />
hybrids.<br />
COMPETING FINANCIAL INTERESTS<br />
The authors declare competing financial interests:<br />
details accompany the full-text HTML version of the<br />
paper at http://www.nature.com/naturebiotechnology/.<br />
Sofia Ben Tahar 1 , Isabelle Salva 2 &<br />
Ivo O Brants 3<br />
1 Limagrain Services Holding, Quality Assurance,<br />
Chappes, France. 2 Syngenta Crop Protection AG,<br />
Regulatory Affairs, Basel, Switzerland. 3 Monsanto<br />
Europe SA, Scientific Affairs, Brussels, Belgium.<br />
e-mail: ivo.o.brants@monsanto.com<br />
1. Morisset, D. et al. Nat. Biotechnol. 27, 700–701<br />
(2009).<br />
2. Aguilera, M. et al. Food Anal. Methods 1, 252–258<br />
(2008).<br />
3. Aguilera, M. et al. Food Anal. Methods 2, 73–79<br />
(2009).<br />
nature biotechnology volume 28 number 8 AUGUST 2010 779
correspondence<br />
Distances needed to limit cross-fertilization<br />
between GM and conventional maize in Europe<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
To the Editor:<br />
To avoid the economic consequences of<br />
admixtures of genetically modified (GM)<br />
and non-GM harvests, and to ensure that<br />
agricultural production complies with<br />
mandatory labeling provisions, the European<br />
Union (EU; Brussels) member states have<br />
adopted co-existence measures directed to<br />
farmers cultivating GM varieties. For GM<br />
maize cultivation, regulators have established<br />
mandatory isolation distances, which<br />
differ between countries and in some cases<br />
have been regarded as disproportionate 1,2 .<br />
Taking advantage of numerous field studies<br />
conducted by EU researchers in recent years,<br />
we report here a statistical analysis of crossfertilization<br />
data in maize, showing that<br />
separating fields 40 m is sufficient to keep<br />
GM adventitious presence below the legal<br />
labeling threshold in the EU set at 0.9%.<br />
Currently, insect-resistant maize<br />
(engineered to express Bacillus thuringiensis<br />
toxin; Bt) and Amflora potato (engineered<br />
with antisense against granule-bound starch<br />
synthase), which was recently approved 3 ,<br />
are the only two GM crops authorized for<br />
commercial cultivation in the EU. Bt maize<br />
was approved in 1998 and currently covers<br />
1.2% of the total maize area in the EU<br />
(Supplementary Notes 1 and 2).<br />
Given the legal standards for labeling and/<br />
or purity, the cultivation of GM maize in the<br />
EU is associated with mandatory technical<br />
coexistence measures designed to reduce<br />
the adventitious presence of GM maize<br />
in neighboring non-GM maize harvests.<br />
Such measures, to be applied by GM maize<br />
growers, should be stringent enough to<br />
keep adventitious presence below 0.9% so<br />
that conventional maize can comply with<br />
labeling provisions and avoid any potential<br />
price premium losses associated with GM<br />
admixtures 4,5 .<br />
Cross-fertilization between neighboring<br />
maize fields is the most important ‘biological’<br />
source of admixture between GM and<br />
conventional maize 4,5 . Factors influencing<br />
cross-fertilization rates in maize cultivation<br />
are well studied and include, among others,<br />
the distance between fields, flowering<br />
synchrony, weather conditions, the relative<br />
positions of donor and receptor fields (with<br />
respect to dominant winds in the area)<br />
and the size and shape of fields 4 . Because<br />
of the difficulty to control some of these<br />
parameters, regulatory bodies from most<br />
EU countries have decided to establish<br />
mandatory separation distances between GM<br />
and non-GM maize fields as the preferred<br />
single measure to limit cross-fertilization 6 .<br />
An overview of mandatory separation<br />
distances adopted by EU member states<br />
(Supplementary Table 1) shows a remarkable<br />
range of variation, 25–600 m, between the<br />
different countries. Although climatic and<br />
landscape parameters in maize cultivation<br />
(that affect cross-fertilization rates) are<br />
variable in the EU, often there is little sciencebased<br />
evidence that the distances adopted<br />
are proportional to achieve the desired purity<br />
standards.<br />
To test the proportionality of the<br />
separation distances established by EU<br />
member states, we perform a statistical<br />
analysis of data obtained from a number of<br />
recent studies on maize cross-fertilization<br />
performed in different European countries.<br />
Although the various studies recorded<br />
different variables, we analyzed only data<br />
on cross-fertilization rates (measured as<br />
percentage of seeds in the sample) in the<br />
receptor field as a function of distance<br />
from the edge of the pollen source. The aim<br />
of the analysis was to estimate distances<br />
necessary to keep cross-fertilization below<br />
different arbitrary tolerance thresholds and<br />
with different confidence levels. The results<br />
should inform debate on whether current<br />
distances between GM and non-GM maize<br />
fields stipulated by member states to meet<br />
legal EU labeling thresholds are supported by<br />
scientific data.<br />
Out-crossing (% seeds)<br />
40<br />
35<br />
30<br />
25<br />
20<br />
15<br />
10<br />
5<br />
0<br />
Out-crossing (% seeds)<br />
5<br />
4<br />
3<br />
2<br />
1<br />
0<br />
We first compiled a database of crossfertilization<br />
rates and distance by collating<br />
different publications and unpublished<br />
studies on maize cross-fertilization, to obtain<br />
a total of 1,174 observations covering four<br />
European countries (Germany, Italy, Spain and<br />
Switzerland). Details on the sources of data<br />
used are given in Supplementary Table 2.<br />
The database covered studies with a variety<br />
of experimental designs (mostly receptor and<br />
donor fields side by side, but also donor and<br />
receptor fields dispersed in actual agricultural<br />
landscapes) and that had been performed<br />
in different growing seasons (2001–2006).<br />
Data originate from experimental designs<br />
representing worst-case scenarios (receptor<br />
fields situated downwind from donor fields<br />
and coincidence of flowering between donor<br />
and receptor fields) in Europe.<br />
The relationship between distances and<br />
cross-fertilization rates for the database<br />
shows a negative relationship between<br />
these two variables (Fig. 1). This reciprocal<br />
relationship between cross-fertilization rates<br />
and distance was pointed out previously<br />
by several other authors 4,5,7–9 . For further<br />
analyses, cross-fertilization rates were<br />
analyzed for 10 m distance intervals<br />
(Supplementary Table 3). Because of the lack<br />
of sufficient observations from 50 m upwards,<br />
the size of intervals was increased to 20 m.<br />
Supplementary Table 3 shows that data on<br />
maize cross-fertilization are mostly available<br />
for short distances, close to the donor (84.1%<br />
of the data set, or 985 observations, are taken<br />
between 0 m and 20 m). In contrast, only<br />
0 25 50 75 100 125 150<br />
Distance (m)<br />
0 50 100 150 200<br />
Distance (m)<br />
Figure 1 Cross-fertilization rates for Bt maize. The figure shows a meta-analysis of maize crossfertilization<br />
data. Cross-fertilization rates are represented in relation to the distance from the pollen<br />
donor. The upper chart is a magnification of the original chart with a limited scale of the respective axis.<br />
780 volume 28 number 8 AUGUST 2010 nature biotechnology
correspondence<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
Table 1 Probability of keeping cross-fertilization below a certain threshold level (%) using a gamma distribution<br />
Distance (m)<br />
1.5%<br />
Mean<br />
(low-high bounds)<br />
(0–10] 49.44<br />
(46.10–52.92)<br />
(10–20] 91.19<br />
(88.58–93.70)<br />
(20–30] 99.86<br />
(99.54–100)<br />
(30–40] 99.99<br />
(99.96–100)<br />
(40–50] 99.88<br />
(99.56–100)<br />
(50–70] 99.88<br />
(99.28–100)<br />
(70–90] 99.98<br />
(99.90–100)<br />
>90 100<br />
(100–100)<br />
4.2% of the measurements are available from<br />
distances above 50 m from the donor field.<br />
The mean cross-fertilization rate and the<br />
standard deviation for each distance interval<br />
were calculated using all data points in the<br />
interval, and the highest and the lowest<br />
values for cross-fertilization rate registered<br />
(Supplementary Table 3).<br />
The mean and variance of each<br />
distance interval were used to calculate<br />
the parameters that characterize different<br />
probability distributions at those intervals.<br />
Once the distribution was obtained,<br />
probability of avoiding maize crossfertilization<br />
at different thresholds levels was<br />
calculated for each distance interval.<br />
To ensure robustness of the results<br />
obtained, different probability distributions<br />
were used following parametric and<br />
nonparametric approaches. Both approaches<br />
produced similar results. In the parametric<br />
approach, the probability distribution used<br />
to represent the cross-fertilization level for<br />
a given distance interval was the gamma<br />
distribution. The parameters of the gamma<br />
distribution were determined by the mean<br />
and the variance of the data in each interval.<br />
The probability distribution of crossfertilization<br />
being above a certain threshold<br />
level was obtained by conducting bootstrap<br />
sampling per interval 1,000 times. Bootstrap<br />
sampling allows obtaining a range of values<br />
for the parameters of the gamma distribution<br />
and therefore we were able to calculate the<br />
probability of being above a number of<br />
stated cross-fertilization thresholds (e.g.,<br />
0.9%; see ‘Gamma parameterization’ in<br />
Supplementary Note 3). We also estimated<br />
Cross-fertilization threshold (% of seeds) 1<br />
0.9%<br />
Mean<br />
(low-high bounds)<br />
41.16<br />
(37.80–44.62)<br />
70.89<br />
(67.56–74.38)<br />
95.62<br />
(92.12–98.44)<br />
99.61<br />
(98.76–100)<br />
98.56<br />
(96.10–100)<br />
99.11<br />
(96.26–100)<br />
99.58<br />
(98.68–100)<br />
99.96<br />
(99.86–100)<br />
a beta distribution to analyze the data<br />
(Supplementary Note 4).<br />
The nonparametric approach, where<br />
no distributional parameters are assigned,<br />
was based on a bootstrap simulation that<br />
consisted in drawing the observed data<br />
on cross-fertilization 1,000 times with<br />
replacement per interval. Therefore, we<br />
obtained 1,000 subsamples per interval.<br />
From each of these subsamples, the<br />
probability distribution of being above<br />
any cross-fertilization threshold can be<br />
calculated and mean and confidence<br />
intervals for the probability of being above a<br />
cross-fertilization threshold can be obtained.<br />
Table 1 shows the mean probability of<br />
keeping cross-fertilization between maize<br />
fields below different arbitrary threshold<br />
levels (1.5%, 0.9%, 0.5% and 0.3%) for each<br />
separation distance interval, using the<br />
gamma distribution. A 95% confidence<br />
interval of the mean probability of keeping<br />
cross-fertilization below a certain threshold<br />
is calculated (see low and high bounds for<br />
each distance interval).<br />
The results provided in Table 1 are<br />
relevant for policy decision-making. For<br />
example, implementing a 30 m separation<br />
distance would result in a probability higher<br />
than 95% (95.62%, see mean probability<br />
values in bold in Table 1) to keep crossfertilization<br />
values below the 0.9% EU<br />
labeling threshold. The probability increases<br />
to 99% if a 40 m distance is implemented.<br />
However, it is known that cross-fertilization<br />
is not the only source of GM adventitious<br />
presence in maize harvests. Traces of GM<br />
seeds in conventional seeds and machinery<br />
0.5%<br />
Mean<br />
(low-high bounds)<br />
33.11<br />
(29.76–36.66)<br />
41.41<br />
(37.78–45.06)<br />
66.94<br />
(58.30–75.14)<br />
94.14<br />
(87.70–99.44)<br />
92.07<br />
(84.12–99.80)<br />
95.89<br />
(87.30–99.90)<br />
96.08<br />
(91.56–99.94)<br />
98.58<br />
(97.30–99.54)<br />
0.3%<br />
Mean<br />
(low-high bounds)<br />
27.30<br />
(24.06–30.64)<br />
21.80<br />
(18.20–25.68)<br />
31.19<br />
(21.52–41.00)<br />
77.26<br />
(63.70–91.08)<br />
79.38<br />
(66.48–95.34)<br />
88.05<br />
(74.54–96.86)<br />
86.81<br />
(77.48–97.66)<br />
90.76<br />
(86.22–94.76)<br />
1 Numbers in italics indicate a scenario where separation distance is sufficient to reduce admixture in maize cultivation below different threshold levels (1.5%, 0.9%, 0.5% and 0.3%).<br />
Square brackets denote that the upper limit is included in the interval.<br />
are considered to be additional contributors<br />
to final adventitious presence 4,10 . Greater<br />
distances to the pollen source would be<br />
required if lower threshold levels for crossfertilization<br />
were to be considered that aim<br />
to take into account additional sources<br />
of adventitious presence. For example, a<br />
distance of 40 m is needed to keep crossfertilization<br />
below 0.5% with a probability<br />
higher than 90% (94.1%).<br />
An analysis of the data in Table 1 also<br />
allows the effects of a hypothetical increase<br />
in the EU mandatory labeling threshold on<br />
segregation practices in maize cultivation to<br />
be estimated (countries such as Japan allow<br />
as much as 5% tolerance). For example, a<br />
20 m separation distance would be sufficient<br />
to achieve a desired threshold level of 1.5%<br />
(with a probability of 91.19%). When using<br />
a nonparametric approach (bootstrapping<br />
simulation) results were quite similar to<br />
those obtained for the gamma distributions<br />
(Supplementary Table 4).<br />
The results presented here (Table 1)<br />
clearly show that some of the current<br />
mandatory separation distances proposed<br />
by several EU countries for maize<br />
segregation (Supplementary Table 1) are<br />
disproportionate. They are set too high to the<br />
objective of keeping cross-fertilization below<br />
the legal threshold level in real agricultural<br />
landscapes. Our results are robust because<br />
the experimental data set considered<br />
represents several climatic conditions,<br />
field sizes and locations in Europe. A<br />
previous study by Sanvido et al. 5 looking at<br />
separation distances in Switzerland came<br />
to similar conclusions. Also, the levels of<br />
nature biotechnology volume 28 number 8 AUGUST 2010 781
correspondence<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
cross-fertilization recorded in our database<br />
correspond to individual data points in<br />
receptor fields at several distances. Because<br />
most of the field points sampled were located<br />
at short distances from the donor field, crossfertilization<br />
rates at these distances were<br />
likely to be higher than cross-fertilization<br />
rates computed for an entire field harvested.<br />
In an agricultural context, harvest always<br />
represents a mixture of different harvested<br />
areas. The actual GM content in the harvest<br />
is thereby often substantially reduced<br />
because zones with higher cross-fertilization<br />
rates at the field margin are mixed with<br />
zones with lower GM content further within<br />
the receptor field. Studies performed in real<br />
agricultural landscapes with commercial<br />
cultivation of GM and non-GM maize point<br />
to distances over 20 m as being sufficient to<br />
prevent cross-fertilization below a threshold<br />
level of 0.9% 11,12 .<br />
In practice, large mandatory distances<br />
restrict farmers’ freedom of choice to grow<br />
GM maize in certain agricultural landscapes<br />
(especially in those with substantial presence<br />
of maize cultivation in small and scattered<br />
fields). This imposes important opportunity<br />
costs on farmers, reducing the potential net<br />
gains in farmers’ gross margins derived from<br />
Bt maize cultivation 13 .<br />
In conclusion, we have shown that a<br />
separation distance of 40 m is sufficient to<br />
reduce admixture in maize cultivation below<br />
the legal threshold of 0.9%. However, this<br />
is not an endorsement of using separation<br />
distances as the single tool to regulate coexistence<br />
in maize production. Numerous<br />
recent studies have pointed to the need for<br />
flexibility in co-existence measures 4,14,15 .<br />
Pollen barriers consisting of non-GM<br />
maize, for example, have proven to reduce<br />
cross-fertilization rates more effectively<br />
than an isolation of the same distance with<br />
open ground or low-growing crops. With<br />
a maize barrier of 10–20 m, the remaining<br />
maize harvest in the field rarely exceeds the<br />
threshold of 0.9% GM material 11 . Buffer<br />
zones, discard zones and other measures<br />
could therefore be combined or substitute for<br />
large, fixed-separation distances in search of<br />
a system that increases the real options for<br />
farmers to cultivate their crop of choice 1 .<br />
Note: Supplementary information is available on the<br />
<strong>Nature</strong> Biotechnology website.<br />
Disclaimer<br />
The views expressed are purely those of the authors<br />
and may not in any circumstances be regarded<br />
as stating an official position of the European<br />
Commission.<br />
ACKNOWLEDGMENTS<br />
The authors thank M. Czarnak-Klos for help in<br />
the interpretation of the data sets of maize crossfertilization<br />
trials that constitute the database of this<br />
analysis and J. Delincé for his useful comments on<br />
statistical simulation. The authors wish to express<br />
thanks to G. Squire, as coordinator of the gene flow<br />
and ecological field studies of the SIGMEA project,<br />
for providing SIGMEA data sets on maize crossfertilization<br />
trials. Within the SIGMEA partners, many<br />
thanks are extended to R. Wilhelm for providing data<br />
under German agricultural conditions, A. Vogler for<br />
Swiss data and J. Messeguer for data from Spain.<br />
COMPETING FINANCIAL INTERESTS<br />
The authors declare no competing financial interests.<br />
Laura Riesgo 1 , Francisco J Areal 1 ,<br />
Olivier Sanvido 2 & Emilio Rodríguez-Cerezo 1<br />
1 European Commission, Joint Research Centre<br />
(JRC), Institute for Prospective Technological<br />
Studies (IPTS), Edificio Expo, Avda. Inca<br />
Garcilaso, Seville, Spain. 2 Agroscope Reckenholz<br />
Tänikon Research Station ART., Zurich,<br />
Switzerland.<br />
e-mail: laura.riesgo@ec.europa.eu<br />
1. Devos, Y., Demont, M. & Sanvido, O. Nat. Biotechnol.<br />
26, 1223–1225 (2008).<br />
2. Moschini, G. Eur. Rev. Agric. Econ. 35, 331–355<br />
(2008).<br />
3. Ryffel, G.U. Nat. Biotechnol. 28, 318 (2010).<br />
4. Devos, Y. et al. Agron. Sustain. Dev. 29, 11–30<br />
(2009).<br />
5. Sanvido, O. et al. Transgenic Res. 17, 317–335<br />
(2008).<br />
6. European Commission. Commission Staff Working<br />
Document: Report from the Commission to the Council<br />
and the European Parliament on the Coexistence of<br />
Genetically Modified Crops with Conventional and<br />
Organic Farming. Implementation of National Measures<br />
on the Coexistence of GM crops with Conventional<br />
and Organic Farming. (Commission of the European<br />
Communities, Brussels, 2009). <br />
7. Pla, M. et al. Transgenic Res. 15, 219–228 (2006).<br />
8. Goggi, A.S. et al. Field Crops Res. 99, 147–157<br />
(2006).<br />
9. Vogler, A., Eisenbeiss, H., Aulinger-Leipner, I. &<br />
Stamp, P. Eur. J. Agron. 31, 99–102 (2009).<br />
10. Demeke, T., Perry, D.J. & Scowcroft, W.R. Can. J. Plant<br />
Sci. 86, 1–23 (2006).<br />
11. Messeguer, J. et al. Plant Biotechnol. J. 4, 633–645<br />
(2006).<br />
12. Gustafson, D.I. et al. Crop Sci. 46, 2133–2140<br />
(2006).<br />
13. Gómez-Barbero, M., Berbel, J. & Rodríguez-Cerezo, E.<br />
Nat. Biotechnol. 26, 384–386 (2008).<br />
14. Demont, M. & Devos, Y. Trends Biotechnol. 26, 353–<br />
358 (2008).<br />
15. Messéan, A. et al. Oleagineux 16, 37–51 (2009).<br />
782 volume 28 number 8 AUGUST 2010 nature biotechnology
case study<br />
commentary<br />
India’s billion dollar biotech<br />
Justin Chakma, Hassan Masum, Kumar Perampaladas, Jennifer Heys & Peter A Singer<br />
By focusing on an unmet medical need, providing a cost-efficient solution and reinvesting the resulting revenues into<br />
R&D and state-of-the-art manufacturing, Shantha Biotechnics was able to build one of India’s first biotech successes.<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
Shantha Biotechnics, an Indian biotech firm started by K. I. Varaprasad<br />
Reddy with $1.2 million of angel funds, was acquired last year by<br />
Sanofi-Aventis of Paris for €571 million. Since developing a copy of the<br />
hepatitis B surface antigen subunit vaccine—one of the first recombinant<br />
products to be ‘home grown’ in India—Shantha has been on a tear,<br />
bringing 11 products to market. Much of the company’s success can be<br />
attributed to the vision of its management, which brought its first product<br />
to market in only four years, reinvested revenues into internal R&D and<br />
built a state-of-the art manufacturing capability. This not only enhanced<br />
the company’s ability to address local health needs, but also built its global<br />
reputation—all of which has subsequently proved good business. 1<br />
After attending a conference in 1992, Varaprasad, an electrical engineer<br />
by training, recognized the urgent need for an inexpensive Indian<br />
hepatitis B vaccine; over 100,000 Indians die every year from the viral<br />
infection, with 4% of the population carriers. Prices were as high as $23<br />
a dose with primary suppliers being Merck and SmithKlineBeecham<br />
(now part of GlaxoSmithKline). With most Indian families living on<br />
$1 a day, with multiple children and three doses required per child,<br />
vaccination was simply unaffordable. Varaprasad saw the possibility of<br />
a local venture that could supply an affordable version.<br />
After recruiting local talent and two expatriate scientists in 1993 (see<br />
Supplementary Tables), the company took only four years to develop<br />
and register Shanvac-B, a version of the vaccine produced in Pichia pastoris.<br />
Shanvac-B was launched at $1 a dose and was an immediate success.<br />
Indian consumption of hepatitis B vaccine rose from a few hundred<br />
thousand doses in the early 1990s to tens of millions today with prices<br />
dropping as low as $0.25.<br />
Rapid uptake of the vaccine was partly helped by a confidential partnership<br />
with a large pharmaceutical multinational, which provided<br />
manufacturing/regulatory acumen and also resold the vaccine. Shantha<br />
followed Shanvac-B with Shanferon (interferon alpha 2b), which it also<br />
produced in P. pastoris. The company’s development of a purification<br />
process compliant with International Conference on Harmonization<br />
regulations led it to become the first Indian company to have a hepatitis<br />
B vaccine prequalified by the World Health Organization (WHO;<br />
Geneva). The initial investment in quality control helped accelerate<br />
approval for its other products.<br />
The company’s growing reputation for manufacturing excellence and<br />
regulatory expertise in recombinant vaccines also helped to secure business<br />
from entities in other developing countries, such as the International<br />
Vaccine Institute (IVI; South Korea) for low-cost oral cholera vaccine,<br />
and the Pediatric Dengue Vaccine Initiative (South Korea).<br />
This success led to international attention in 2006 when Mérieux<br />
Alliance (Paris, France) acquired a 60% stake in Shantha after its Omani<br />
investors sought an exit. The acquisition further bolstered Shantha’s<br />
reputation internationally as well as opening new markets. In 2009, the<br />
firm was awarded a $340 million United Nations International Children’s<br />
The authors are at the McLaughlin-Rotman Centre for Global Health,<br />
University Health Network and University of Toronto, Toronto,<br />
ON, Canada.<br />
e-mail: peter.singer@mrcglobal.org<br />
Emergency Fund (UNICEF) contract for pentavalent vaccines from<br />
2010–2012. Soon after, rumors emerged that multinationals were interested<br />
in bidding on Shantha, ultimately culminating in the takeover by<br />
Sanofi-Aventis the same year.<br />
The case of Shantha shows developing world biotech innovators can<br />
maintain a balance between local health impact and financial returns by<br />
keeping four principles in mind. First, identify therapeutic areas where<br />
cost efficiencies can be achieved locally and combine this with strong<br />
leadership skills. Varaprasad leveraged India’s homegrown scientists,<br />
lower labor costs, process innovation and a low-margins business strategy<br />
to exploit this opportunity.<br />
Second, seek investments/partnerships from non-traditional and<br />
international sources. Shantha embraced collaborations with research<br />
institutes such as the US National Institutes of Health (Bethesda, MD),<br />
and with competing multinationals for regulatory guidance.<br />
Third, focus on innovation and reinvestment. By plowing back significant<br />
profits toward R&D, Shantha has recently released new products<br />
every year or two. This initial focus on process and quality innovation<br />
may have delayed Shanvac-B’s launch, but it allowed Shantha to become<br />
the first WHO-prequalified Indian firm for hepatitis B vaccine, and<br />
opened the door to large international contracts, including contract<br />
research. However, experience with Shanferon suggested that India’s<br />
regulatory environment had challenges in conducting complex clinical<br />
trials. Other innovators in developing countries should not insist upon<br />
home-grown manufacturing or clinical trials if it entails compromise on<br />
quality for the sake of patriotism.<br />
Finally, Shantha shows integrated business models are viable in<br />
developing countries. Pre-acquisition, Shantha would not invest in<br />
any products for which it did not have internal capacity to execute<br />
on a significant part of the project. This contrasts with the developed<br />
world, where it is becoming increasingly popular to develop a<br />
‘virtual’ business model, whereby clinical trials and even early stage<br />
work is outsourced to contract research organizations. Shantha shows<br />
the virtual model may not make sense for an innovative biotech in<br />
a developing country because the risks of low quality and delays<br />
in outsourcing are too great. By maintaining internal development<br />
capabilities, Shantha and other developing country firms can also<br />
capitalize on earnings generated by contract research work for other<br />
companies.<br />
By combining cost-efficiency with focused R&D, biotech firms like<br />
Shantha are creating a new source of innovation for global health.<br />
Funding<br />
This work was funded by a grant from the Bill & Melinda Gates<br />
Foundation through the Grand Challenges in Global Health Initiative.<br />
Note: Supplementary information is available on the <strong>Nature</strong> Biotechnology website.<br />
Competing Financial Interests<br />
The authors declare competing financial interests: details accompany the full-text<br />
HTML version of the paper at http://www.nature.com/naturebiotechnology/.<br />
1. Prahalad, CK. The Fortune at the Bottom of the Pyramid: Eradicating Poverty through<br />
Profits. (Wharton School Publishing, Philadelphia; 2004).<br />
nature biotechnology volume 28 number 8 august 2010 783
commentary<br />
DNA patents and diagnostics: not a<br />
pretty picture<br />
Julia Carbone, E Richard Gold, Bhaven Sampat, Subhashini Chandrasekharan, Lori Knowles, Misha Angrist &<br />
Robert Cook-Deegan<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
Restrictive licensing practices on DNA patents are stymieing clinical access and research on genetic diagnostic testing.<br />
Diagnostic companies, university tech transfer offices and their respective associations need to pay more attention.<br />
Four decades after the US Supreme Court first<br />
held that an artificially created bacterium<br />
had the potential to be patented in the United<br />
States 1 , biotech patents continue to generate<br />
controversy—particularly human gene patents<br />
used in diagnostic testing. The persistence of the<br />
debate can be attributed to particular business<br />
models for genetic testing and university licensing<br />
that, despite public pronouncements to the<br />
contrary, have failed to acknowledge and appropriately<br />
address the real social and economic<br />
concerns raised by clinical geneticists, health<br />
care professionals, patient groups, politicians<br />
and academics. Their failure has led both policymakers<br />
and the courts to express increasing<br />
concern about broad patent rights over human<br />
genes that affect diagnostic testing.<br />
The most recent flare-up in the ongoing<br />
DNA patent and genetic testing debate is<br />
Julia Carbone is at Duke University’s School of<br />
Law, Durham, North Carolina, USA; E. Richard<br />
Gold is at McGill University’s Faculty of Law<br />
and Faculty of Medicine, Montreal, Québec,<br />
Canada; Bhaven Sampat is at Columbia<br />
University’s Department of Health Policy and<br />
Management, New York, NY, USA; Subhashini<br />
Chandrasekharan is at Duke University’s Center<br />
for Genome Ethics, Law & Policy, Institute for<br />
Genome Sciences and Policy, Durham, NC, USA;<br />
Lori Knowles is at the University of Alberta’s<br />
Health Law Institute, Edmonton, Alberta,<br />
Canada; Misha Angrist is at Duke University’s<br />
Institute for Genome Sciences & Policy, Durham,<br />
NC, USA; and Robert Cook-Deegan is at Duke<br />
University’s Center for Genome Ethics, Law &<br />
Policy, Institute for Genome Sciences and Policy,<br />
Durham, NC, USA.<br />
e-mail: Robert Cook-Deegan: bob.cd@duke.edu<br />
Myriad Genetics has been the poster child for controversial DNA patent licensing.<br />
the decision of the US District Court for the<br />
Southern District of New York in Association<br />
for Molecular Pathology et al. v. United States<br />
Patent and Trademark Office et al. 2 . On 29<br />
March, US Federal District Court Judge<br />
Robert Sweet ruled that isolated DNA is not<br />
patentable in the United States, and also that<br />
Myriad Genetics’ (Salt Lake City, UT, USA)<br />
method claims relevant to testing for BRCA1<br />
and BRCA2 genes are invalid. Essentially,<br />
the District Court held that neither isolated<br />
DNA nor cDNA is sufficiently different from<br />
DNA as it occurs within host cells to be considered<br />
an invention. As for the diagnostic<br />
tests, the court held that they simply involved<br />
drawing a mental correlation between facts,<br />
something that does not fall within the scope<br />
of what is patentable.<br />
A week earlier, the US Court of Appeals<br />
for the Federal Circuit held in Ariad<br />
Pharmaceuticals, Inc. et al. v. Eli Lilly and<br />
Company 3 that a researcher must do more than<br />
identify that a class of compounds has a certain<br />
effect: he or she must actually describe what<br />
those compounds are. This effectively eliminated<br />
the award of patents over basic research,<br />
requiring, instead, that the inventor “actually<br />
perform the difficult work of ‘invention’—that<br />
is, conceive of the complete and final invention<br />
with all its claimed limitations—and disclose<br />
the fruits of that effort to the public.”<br />
One month before that, on 10 February, the<br />
Secretary’s Advisory Committee on Genetics,<br />
Health and Society (SACGHS; Bethesda,<br />
MD, USA) at the US Department of Health<br />
and Human Services 4 , after a careful study<br />
of current knowledge on the effects of patenting<br />
genes on research and accessibility to<br />
genetic tests, found that there is no convincing<br />
evidence that patents either facilitate or<br />
accelerate the development and accessibility<br />
of such tests. What’s more, the committee<br />
784 volume 28 number 8 august 2010 nature biotechnology
COMMENTARY<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
found that there was some, albeit limited,<br />
evidence that patents had a negative effect<br />
on clinical research and on the accessibility<br />
of genetic tests to patients. In addition,<br />
most gene patents relevant to diagnostics are<br />
held by universities on the basis of research<br />
funded by public money. In this context, the<br />
committee recommended that universities<br />
be more cautious in patenting and licensing<br />
human genes, that there be more transparency<br />
and accountability for university licensing<br />
practices and that an existing exception<br />
protecting medical practitioners from patent<br />
infringement when they undertake surgery or<br />
treat a patient’s body be extended to include<br />
the provision of genetic diagnostic testing.<br />
What all three developments have in common<br />
is that they reflect growing disenchantment<br />
with the patenting and licensing practices<br />
of universities and industry. These concerns<br />
have existed for over a decade without resolution<br />
5,6 . The maturity of microarray technology<br />
that allows for multi-allele genotyping and<br />
now the prospect of full-genome sequencing<br />
deepen these concerns 7 . A legacy of exclusively<br />
licensed gene patents casts a shadow of patent<br />
infringement liability over the future of multiallele<br />
testing and full-genome analysis.<br />
In an attempt to better understand why concerns<br />
about DNA patenting persist and what role<br />
universities play as patentees and often exclusive<br />
licensors, this article outlines university technology<br />
transfer practices and business models that<br />
have given rise to the concerns. After outlining<br />
the practices that have given rise to concerns<br />
about the patenting of human genes for<br />
diagnostic genetic tests, we review past efforts<br />
attempting to address concerns. We then lay out<br />
the obstacles to addressing these concerns going<br />
forward, including a lack of recognition that<br />
diagnostics is a highly unusual market—and<br />
that the problem is not so much a legal question<br />
or necessarily about what gets patented, so much<br />
as how patents are licensed and enforced by both<br />
universities and industry. The ability to change<br />
these restrictive licensing practices, will, in turn,<br />
depend on several factors: first, a sharper definition<br />
of what constitutes research that needs<br />
to be protected in licensing provisions; second,<br />
more coherent university policies that promote<br />
broad dissemination, along with incentives for<br />
industry compliance with best practices; third,<br />
greater recognition of problems and the proposal<br />
of constructive solutions by key players;<br />
fourth, transparent reporting of DNA patents<br />
and diagnostic testing license agreements; and<br />
fifth, secure funding for technology transfer<br />
offices. Although legislative change may ultimately<br />
be necessary to facilitate these changes<br />
in practice, many problems can be addressed<br />
without statutory change.<br />
A legacy of short-sighted tech transfer<br />
and business practices<br />
Currently, universities frequently file patents<br />
on early-stage inventions 9 , and license patents<br />
exclusively half the time 10–13 . A study by<br />
Mowery et al. 10 notes the following: “A relatively<br />
high fraction of all inventions that are<br />
licensed—as high as 90% for UC [University<br />
of California] licenses and no less than 58.8%<br />
for Stanford licenses of ‘all technologies’ during<br />
this period—is licensed on a relatively exclusive<br />
basis, and these shares are similar for biomedical<br />
inventions.” Many of those licenses will endure<br />
for many years, including licenses on university<br />
patents relevant to DNA diagnostics.<br />
Universities and academic medical centers<br />
that provide diagnostic testing services face<br />
private genetic testing companies that enforce<br />
patents against university genetic testing services<br />
and national reference laboratories 5 —in<br />
contrast to the situation for therapeutics, where<br />
universities are often the plaintiffs. The story<br />
often begins with publicly funded academic<br />
or nonprofit research that is either patented<br />
and licensed exclusively to a private company<br />
or forms the basis for a spin-off company that<br />
attracts further investment and develops an<br />
invention that is patented. Whether exclusive<br />
licensees or spin-offs, these companies then<br />
develop genetic testing services based on a<br />
business model that relies not only on patenting<br />
sequences and mutations—not objectionable<br />
in itself—but also on preventing other<br />
institutions, including universities from offering<br />
those genetic tests.<br />
The case of Myriad patents over BRCA1,<br />
BRCA2 and methods for diagnostic testing<br />
14 , as well as Athena Diagnostics’ exclusive<br />
licenses for clinical testing from Duke<br />
University (Durham, NC, USA) over three<br />
method patents related to diagnostic testing<br />
for Alzheimer’s disease 15,16 , exemplify these<br />
practices and business models.<br />
Furthermore, other neurological and metabolic<br />
conditions, as well as other entities’ screening<br />
for Canavan disease, hemochromatosis and<br />
other single-gene conditions, has also generated<br />
fierce debate. In the case of Canavan testing,<br />
litigation resulted from licensing restrictions<br />
that inhibited freedom of action among those<br />
seeking to get genetic tests.<br />
In the case of Myriad, initial research took<br />
place at the University of Utah—with public<br />
funding from the US National Institutes<br />
of Health (NIH; Bethesda, MD, USA).<br />
The researchers then spun off Myriad,<br />
which attracted investment from Eli Lilly<br />
(Indianapolis, IN, USA) and succeeded in patenting<br />
BRCA1 and a diagnostic test for breast<br />
cancer (patents that were ultimately jointly<br />
assigned to the University of Utah, Myriad and<br />
the NIH). Rather than licensing out the test to<br />
clinical geneticists and laboratories around the<br />
world, Myriad required initial testing in each<br />
family to be performed at its laboratories in Salt<br />
Lake City. In the United States, the company<br />
sent out cease-and-desist letters to laboratories—both<br />
academic and commercial—already<br />
performing tests when the patent was issued.<br />
Threatened patent enforcement resulted in<br />
a backlash around the world from public laboratories,<br />
clinicians, molecular geneticists and<br />
some patient groups—against both the patenting<br />
of human genes and what they viewed<br />
as Myriad’s strong-arm tactics. These groups<br />
feared that by closing down public laboratories,<br />
Myriad would thwart research identifying<br />
weaknesses in Myriad’s test or distinguishing<br />
the effects of different mutations in the genes<br />
on disease severity or progression, and prevent<br />
the integration of breast and ovarian cancer<br />
genetic tests into genetic health services.<br />
Although some of these fears were clearly<br />
exaggerated, Myriad’s aggressive initial patent<br />
enforcement affected practice in the clinical<br />
genetics community and stirred long-standing<br />
resentment. Furthermore, in countries with<br />
public health care systems, health administrators<br />
objected to Myriad’s business model<br />
because it removed their ability to deploy<br />
genetic tests to their citizens in the manner<br />
that they viewed as most efficient 14 .<br />
Myriad always permitted what it considered<br />
to be basic research on BRCA1 and BRCA2, and<br />
also engaged in research collaborations. In fact,<br />
until 2004—after which Myriad ceased to do so<br />
for unknown reasons—the company contributed<br />
data to public databases. To illustrate Myriad’s<br />
openness to others performing basic research<br />
using BRCA1 and BRCA2, the company’s president,<br />
Greg Critchfield, has identified 7,000<br />
papers published by independent authors that<br />
mention BRCA1 or BRCA2 (http://docs.justia.<br />
com/cases/federal/district-courts/new-york/<br />
nysdce/1:2009cv04515/345544/158/0.pdf).<br />
This indicates that, with the exception of clinical<br />
testing at the University of Pennsylvania<br />
in 1998, Myriad did not pursue those who<br />
conducted research. Myriad also defined the<br />
University of Pennsylvania’s testing as ‘commercial’,<br />
as later defined under the terms of a 1999<br />
Memorandum of Understanding with the US<br />
National Cancer Institute (NCI: Bethesda, MD,<br />
USA). Myriad has been successful in arranging<br />
for payment agreements with insurers and<br />
other payers. However, as a result of Myriad’s<br />
enforcement actions coupled with broad patent<br />
claims, its fairly narrow conception of what<br />
constituted acceptable research and its failure<br />
to clearly state that it would not pursue those<br />
conducting such research, university and private<br />
laboratories ceased to offer the test publicly<br />
nature biotechnology volume 28 number 8 august 2010 785
COMMENTARY<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
in the United States. Outside the United States,<br />
resistance to Myriad’s model—particularly<br />
from health care administrators and government<br />
departments—caused the company to<br />
lose most of its market. Furthermore, Myriad’s<br />
relationship with scientists and policymakers<br />
around the world was seriously damaged 14 .<br />
Although the biotech industry tried to portray<br />
Myriad as an outlier, a series of detailed<br />
case studies conducted by some of us (J.C.,<br />
S.C., M.A. and R.C.-D.) and others 15,18–24 at<br />
Duke University’s Center for Genome Ethics<br />
Law and Policy reveal that, in fact, Myriad’s<br />
business model is not unique. As these studies<br />
show, diagnostic companies such as Athena<br />
Diagnostics (Worcester, MA) and PGxHealth<br />
(New Haven, CT) have adopted similar or even<br />
more aggressive business models and have<br />
shut out university laboratories from offering<br />
genetic testing for diseases such as long-QT<br />
syndrome and Alzheimer’s disease. In the case<br />
of Alzheimer’s disease, genes and method patents<br />
for diagnostic testing were initially patented<br />
by Duke University (and other academic<br />
institutions) and licensed exclusively to Athena<br />
Diagnostics. Athena Diagnostics then used its<br />
patents aggressively to prevent others from carrying<br />
out the test.<br />
These case studies strongly suggest both that<br />
universities are often not managing research<br />
and patents in a way that promotes dissemination<br />
and that companies deploy their patents or<br />
exclusive licenses to remove genetic testing laboratories<br />
at academic health centers and lowmargin<br />
national reference laboratories from<br />
the market. This is demonstrably a viable business<br />
model, or at least it has proven to be until<br />
recently—but is it good national policy, and<br />
does it add value to the national health system?<br />
As clinicians and laboratory directors react to<br />
cease-and-desist letters by withdrawing from<br />
those activities, clinical research and genetic<br />
testing are impeded. GeneDx (Gaithersburg,<br />
MD) and university laboratories ceased testing<br />
for the life-threatening long-QT syndrome<br />
after patent enforcement in 2002, for example,<br />
but no commercial test entered the market<br />
until 2004 (ref. 9); neither the University of<br />
Utah (which held the patents) nor the NIH<br />
(which could have been petitioned to march<br />
in, given that ‘health and safety’ needs were<br />
not being met) took action. Certain tests may<br />
not be offered if the patent holder or exclusive<br />
licensee does not provide them; second-opinion<br />
and verification testing may be unavailable;<br />
and tests are costly to public and private payers,<br />
sometimes prohibitively so for those lacking<br />
insurance 25,26 . Although negative effects on<br />
price and access to genetic testing are not uniform,<br />
consistent or pervasive, one cannot read<br />
the case studies as a whole without realizing<br />
there are real problems—and also that there are<br />
relatively easy solutions modeled on nonexclusive<br />
licensing, as used for Huntington’s disease<br />
and cystic fibrosis testing. Gene patents over<br />
diagnostics are not just like all other patents,<br />
and the diagnostic market is not just like markets<br />
for therapeutics and instruments. Holders<br />
of gene patents need to take care in licensing<br />
them for diagnostic use.<br />
Hurdles to resolution of concerns<br />
The past decade saw a plethora of policy<br />
reports about DNA patents, such as those from<br />
the Nuffield Council on Bioethics 17 , the US<br />
National Academy of Sciences 27 , the Ontario<br />
Ministry of Health 28 and the Australian Law<br />
Reform Commission 29 . Academic articles<br />
examined the concerns, the extent to which<br />
concerns were founded and the roles of<br />
industry, universities and legislative reform<br />
in addressing these concerns 5,6,26,30–38 . Some<br />
countries also made statutory changes to their<br />
patent and health laws. France expanded compulsory<br />
licensing laws 39 , and Belgium did the<br />
same, also carving out a diagnostic-use exemption<br />
from patent-infringement liability 40 . The<br />
In addition to evidence that gene<br />
patents covering diagnostics do<br />
not necessarily impede research,<br />
there is very little evidence of<br />
patent litigation in the field.<br />
US Patent and Trademark Office (USPTO;<br />
Washington, DC) developed guidelines on<br />
‘utility’ and ‘written description’ specifically<br />
for examining gene patent applications 41 .<br />
Recognizing that many of the concerns<br />
could be addressed through better licensing<br />
practices, many institutions also developed<br />
licensing guidelines, some aimed at universities<br />
and others at industry. These include<br />
the NIH’s Best Practices for the Licensing of<br />
Genomic Inventions 42 , the Organisation for<br />
Economic Cooperation and Development’s<br />
(OECD; Paris) Guidelines for Licensing of<br />
Genetic Inventions 43 and In the Public Interest:<br />
Nine Points to Consider in Licensing University<br />
Technology 44 , a document crafted by 12 institutions<br />
and subsequently endorsed by the Board<br />
of Trustees of the Association of University<br />
Technology Managers (AUTM; Deerfield, IL,<br />
USA). Since then, ~50 other institutions and<br />
organizations have also endorsed the guidelines.<br />
In November 2009, as part of AUTM’s<br />
Global Health Initiative to promote licensing<br />
practices that facilitate access to essential<br />
medicines in developing countries, AUTM<br />
also endorsed a document entitled University<br />
Principles on Global Access to Medicines 45 .<br />
Most recently, the SACGHS recommended<br />
the implementation of an exception to patentinfringement<br />
liability for research use and<br />
diagnostic testing 4 . All of these reports and recommendations<br />
focus on broad dissemination<br />
through nonexclusive licensing of gene-based<br />
inventions, particularly for publicly funded<br />
research. They reserve exclusive licensing<br />
for situations in which it is needed to induce<br />
investment in private-sector development to<br />
bring a product or service to fruition—which,<br />
as will later be discussed, is rarely the case for<br />
genetic diagnostics.<br />
Despite the plethora of policy reports,<br />
academic articles, guidelines and legislative<br />
changes, concerns about DNA patents persist.<br />
We must therefore turn our attention to factors<br />
that impede changing the system.<br />
A question of law or of practice. The first<br />
response to concerns is often a call to change<br />
patent law 39,46,47 . As recent research indicates,<br />
however, the central problem does not lie with<br />
patents over human genes themselves so long as<br />
the law incorporates the appropriate checks and<br />
balances. The recent suit challenging Myriad’s<br />
patents on BRCA genes notwithstanding 2 , the<br />
following discussion indicates that there is little<br />
evidence on which to conclude that limiting<br />
the ability to patent genes is the only way to<br />
solve the problems in the system.<br />
A recent study by Huys et al. 48 from Belgium<br />
suggests that relatively few claims in gene patents<br />
block competing laboratories from providing<br />
genetic tests. This study of 145 active patent<br />
documents (267 independent claims) related to<br />
genetic diagnostic testing of 22 inherited diseases<br />
(including method claims, gene claims,<br />
oligo claims and kit claims) that the European<br />
Patent Office (Munich, Germany) and the<br />
USPTO issued. It concluded that clinicians<br />
could easily get around 36% of claims and<br />
could, with work, circumvent another 49% of<br />
claims. Only 15% of claims would be difficult<br />
or impossible to circumvent. Of the gene claims<br />
studied, only 3% were found to be blocking.<br />
However, as discussed below, blocking claims<br />
were more prevalent among method claims.<br />
In addition to evidence that gene patents<br />
covering diagnostics do not necessarily impede<br />
research, there is very little evidence of patent<br />
litigation in the field. A recent study 8 on<br />
trends in human gene patent litigation notes<br />
that there is rarely any litigation over diagnostic<br />
tests arising from gene patents. This study<br />
identified only 31 examples of litigation over<br />
human genes in the United States from 1987 to<br />
2008. Although the low frequency of litigation<br />
786 volume 28 number 8 august 2010 nature biotechnology
COMMENTARY<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
could hypothetically support the conclusion<br />
that patents successfully exclude others (that<br />
is, threatened patent enforcement stops potentially<br />
infringing activities), an examination of<br />
patent claims suggests that most patents over<br />
human genes and related diagnostic tests find<br />
themselves in a relatively weak legal position.<br />
This weak legal position is further reinforced<br />
by the dissent in Laboratory Corp. of America<br />
Holdings v. Metabolite Laboratories, Inc. 49 ,<br />
which concluded that a natural correlation<br />
between two substances in the body was an<br />
unpatentable product of nature (the majority<br />
decided not to address the issue); by the United<br />
States District Court decision in Association for<br />
Molecular Pathology et al. v. the United States<br />
Patent and Trademark Office et al.; and by the<br />
general trajectory of recent decisions on assessing<br />
damages, the lack of automatic injunctive<br />
relief (eBay Inc v. MercExchange, L.L.C. 50 ), as<br />
well as by the increasing ambit for finding an<br />
invention to be obvious under patent law. The<br />
recent US Supreme Court decision In re Bilski 51<br />
only exasperates the uncertainty over method<br />
claims on DNA diagnostics. In fact, an eventual<br />
appeal from the District Court decision<br />
in Association for Molecular Pathology et al. v.<br />
the United States Patent and Trademark Office<br />
et al. may be required to determine whether<br />
these type of claims are valid.<br />
Adding to the trend in legal thinking is the<br />
Federal Circuit’s decision in Ariad, relating to<br />
claims based on DNA patents, where the court<br />
writes: “Much university research relates to<br />
basic research, including research into scientific<br />
principles and mechanisms of action…,<br />
and universities may not have the resources<br />
or inclination to work out the practical implications<br />
of all such research [i.e., finding and<br />
identifying compounds able to affect the mechanism<br />
discovered]. That is no failure of the law’s<br />
interpretation, but its intention. Patents are not<br />
awarded for academic theories, no matter how<br />
groundbreaking or necessary to the later patentable<br />
inventions of others.”<br />
That research hypotheses do not qualify for<br />
patent protection possibly results in some loss<br />
of incentive, although Ariad presents no evidence<br />
of any discernable impact on the pace of<br />
innovation or the number of patents obtained<br />
by universities. But claims to research plans<br />
also impose costs on downstream research,<br />
discouraging later invention.” Taken together,<br />
these studies and cases indicate that gene patents<br />
per se have closed off far less of the research<br />
landscape than is often supposed, and where<br />
expansive claims have been granted, many are<br />
vulnerable to challenge.<br />
Method claims in patents related to diagnostic<br />
testing, however, bear special mention.<br />
Although many pharmaceutical patents claim<br />
products as chemical entities, universities and<br />
biotech firms also tend to patent ways of using<br />
knowledge, including method patents that<br />
affect genetic tests. In fact, Huys et al. 48 conclude<br />
that 30% of method claims relating to<br />
genetic testing are difficult, if not impossible,<br />
to circumvent. Such claims tend to be broad,<br />
often to the point of vagueness, and many cover<br />
all conceivable ways to conduct genetic tests on<br />
a gene or for a clinical condition. In the 15 of<br />
22 conditions that Huys et al. 48 found had at<br />
least one blocking claim, most such claims were<br />
to methods. In the diagnostic realm, blocking<br />
patents thus appear to be common, present in<br />
68% of the clinical conditions studied. Changes<br />
in jurisprudence could reduce the number of<br />
truly blocking patents in genetic diagnostics.<br />
Recent and pending court decisions suggest<br />
that some fraction of broad claims in US<br />
patents on DNA sequences and methods pertinent<br />
to genetic diagnostics would be judged<br />
invalid if challenged. Although dealing with<br />
a patent claim in the information technology<br />
field, the recent US Court of Appeals for the<br />
Federal Circuit decision in In re Bilski narrowed<br />
criteria for patents on methods to inventions<br />
that entail a transformative step or involvement<br />
of a particular machine. Depending<br />
on how the Federal Circuit deals with the US<br />
Supreme Court in Bilski—perhaps in an appeal<br />
in the Myriad case—it could signal that broad<br />
method claims in DNA diagnostics might be<br />
held invalid because the link between a mutation<br />
and a probability of contracting a disease<br />
may be considered unpatentable. As it stands,<br />
many broad method claims pertinent to DNA<br />
diagnostics suffer under a cloud of uncertainty<br />
and may turn out to be invalid, thus dramatically<br />
increasing freedom to operate without fear of<br />
patent-infringement liability. Other recent US<br />
court decisions have moved in the same direction,<br />
increasing the stringency of criteria for<br />
nonobviousness 52,53 and written description 3 .<br />
Taken as a group, these decisions suggest<br />
that some of the potential obstacles to innovation<br />
that patents cause in diagnostics may not<br />
be as high, nor the amount of intellectual territory<br />
enclosed and enforced as expansive, as<br />
some had feared. A clear research exemption,<br />
a simplified method for challenging patents<br />
(for example, opposition proceedings or inter<br />
partes re-examination requests) and improved<br />
examination procedures to avoid overly broad<br />
patent claims could help quell concerns over<br />
blocked research and overly broad patents 54 .<br />
Overall, the problem does not lie wholly in<br />
patent law but rather concerns how decisions<br />
are made about what is patented (methods<br />
versus products) and how patents are managed<br />
and used. With one or a few successful<br />
challenges to broad patents enforced for<br />
diagnostic purposes, the business models of<br />
enforcing monopolies on genetic testing for<br />
specific conditions would probably give way<br />
to more cross-licensing, more competition and<br />
faster innovation in testing methods.<br />
A need for changes in patent licensing practices<br />
at universities. As patent law evolves,<br />
it is increasingly apparent that the exclusive<br />
licensing strategies of universities and the<br />
business models of a few companies doing<br />
DNA diagnostics are as much, or even more,<br />
of an impediment to DNA diagnostics as any<br />
problems with the law. Meanwhile, no evidence<br />
suggests that exclusive licensing is as<br />
important in the field of diagnostic testing as<br />
in therapeutics in creating products that would<br />
not otherwise exist. The exclusive licenses over<br />
erythropoietin, growth hormone, interferon<br />
and other therapeutic proteins are of commercial<br />
significance, as illustrated by the fact<br />
that eleven legal cases that presume the validity<br />
of gene patents have been decided by the<br />
US Court of Appeals for the Federal Circuit 8 .<br />
The same cannot be said for diagnostic testing:<br />
no exclusive license in this field has been<br />
deemed to be of such importance for anyone<br />
to take to court. In fact, most cases involving<br />
diagnostic testing are settled after initial notification<br />
letters or cease and desist letters are sent<br />
out. A handful have led to litigation, but settled<br />
early. The Federal District Court’s ruling of 29<br />
March in Association for Molecular Pathology<br />
et al. v. the United States Trademark and Patent<br />
Office is the first diagnostic case to go before a<br />
judge for a decision. Furthermore, barriers to<br />
entering the market with a new genetic test, at<br />
least for the first-generation genetic tests that<br />
search for mutations in one or a few genes, are<br />
far lower than for therapeutics. This is because<br />
for universities and national reference laboratories<br />
that already offer other genetic tests, the<br />
cost of ‘setting up’ a new genetic test based on<br />
data in scientific publications is comparable to<br />
the cost of patenting the underlying inventions<br />
since they are already laboratories approved by<br />
US regulators.<br />
Supporting this proposition is the fact that<br />
exclusive licensing does not appear to have<br />
been necessary to get a test to market in any of<br />
the cases 15,18–24 studied for SACGHS. In the<br />
study of 10 clinical conditions considered by<br />
SACGHS, three cases did not involve patent<br />
rights (i.e., there were no patents or patents<br />
were not licensed or enforced) or patents were<br />
nonexclusively licensed to multiple providers.<br />
These were cystic fibrosis, hereditary colorectal<br />
cancer and Tay-Sachs disease. Such<br />
patenting and licensing practices comply with<br />
current guidelines. In six cases, however, exclusive<br />
licensing led to patent enforcement that<br />
nature biotechnology volume 28 number 8 august 2010 787
COMMENTARY<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
Defining what qualifies as research. Although<br />
most industries tolerate a broad range of<br />
research activities and most researchers<br />
ignore patents when deciding whether to do<br />
research 55 , such blithe ignorance is not an obvious<br />
option in human genetic diagnostics, where<br />
threatened enforcement is common, laboratory<br />
directors and clinicians tend to respond to<br />
threatened enforcement by ceasing the activities<br />
under threat and workaround in the case<br />
of method patents are not always available 48 .<br />
Norms over what research is to be tolerated<br />
are unsettled, despite the existence of research<br />
exceptions 56 in many national laws (including<br />
an exemption in the United States for research<br />
into products that may eventually lead to the<br />
filing of an application with the US Food and<br />
Drug Administration (Rockville, MD) 57 ).<br />
One prominent example of disputed norms<br />
is the controversy between Myriad and the<br />
University of Pennsylvania Genetic Diagnostic<br />
Laboratory (GDL; Philadelphia, PA). Although<br />
Myriad states that it is generally supportive of<br />
research, it nevertheless sent GDL a cease-andreduced<br />
availability of genetic tests already<br />
being offered: HFE (hemochromatosis), APOE,<br />
Alzheimer’s disease and genes associated with<br />
Canavan disease, long-QT syndrome, hearing<br />
loss and spinocerebellar ataxias. Because tests<br />
were already available, exclusive licensing in<br />
these cases deviates from the norms that technology<br />
licensing offices generally claim to<br />
be following. In some cases, but not all, this<br />
led, at least transiently, to genetic testing by<br />
a single provider, and that exclusive license<br />
holder then eliminated other testing services<br />
that had beaten it to market. In all cases except<br />
hemochromatosis, exclusive licenses from universities<br />
were involved. Although the exclusive<br />
licensee may ultimately have developed a better<br />
test, in no case was the exclusive licensee the<br />
first to market. The tenth clinical condition<br />
studied by the SACGHS, hearing impairment,<br />
is subject to a hybrid of exclusive and nonexclusive<br />
licensing, and entails many genes and<br />
different means of testing. This case does have<br />
some examples of controversial patent enforcement<br />
action, but tests are generally widely<br />
available from several vendors.<br />
Patent incentives may induce investment<br />
in genetic diagnostics, but in none of the case<br />
studies did this lead to new availability of a test<br />
that was not already available, at least in part.<br />
This is in stark contrast with the role of patents<br />
in therapeutics and scientific-instrument<br />
development, where the benefits attributable<br />
to private R&D and new products are much<br />
clearer. The SACGHS case studies thus reinforce<br />
the benefits of licensing nonexclusively<br />
for genetic diagnostics, unless an unusual<br />
situation arises in which exclusivity is needed<br />
to get a product to market for the first time.<br />
The cases also highlight deviations from the<br />
NIH Best Practices 43 , OECD Guidelines 43 and<br />
the AUTM-endorsed Nine Points 44 . Exclusive<br />
licensing practices consistently reduce availability,<br />
at least as measured by the number of<br />
available laboratories offering a test, and thus<br />
reduce competition in genetic diagnostics, but<br />
with little evidence of a public benefit from services<br />
not otherwise available.<br />
Instead of recognizing this reality, some<br />
universities continue to seek broad patents<br />
regardless of subject matter and then<br />
license exclusively, enabling business models<br />
that impede competition in genetic testing.<br />
Although the real risk of being successfully<br />
sued for patent infringement in DNA diagnostics<br />
may be low, a 2003 survey 33 and recent<br />
case studies 14,15,18–24 indicate that laboratory<br />
directors change their testing practices and<br />
clinicians avoid research areas in reaction to<br />
cease-and-desist letters. Diagnostics are generally<br />
low-margin sources of revenue, and<br />
when faced with a threat of patent enforce-<br />
ment, most laboratories simply stop offering<br />
a genetic test, or at least no longer advertise<br />
a test’s availability publicly (in all the case<br />
studies, we learned of ‘research’ testing as an<br />
‘escape valve’ for patients who could not get<br />
or could not afford commercial genetic tests).<br />
Although part of the problem is that licenses<br />
executed over the past decade do not embody<br />
the principles of the NIH, OECD or AUTM<br />
guidelines and yet remain in force, the reality<br />
is that only a minority of universities have<br />
endorsed the consensus Nine Points 44 —with<br />
no repercussions for those who do not or those<br />
who sign and then violate the norms. Shortsighted<br />
licensing practices persist.<br />
Potential solutions<br />
Changes that could remedy problems with<br />
the current strategy of the licensing system<br />
include the following: first, a clear definition<br />
of research that should be exempt from patent-infringement<br />
liability; second, universities’<br />
leadership in promoting the alignment of tech<br />
transfer licensing practices with the univeristies’<br />
broader goal of dissemination; third, coupling<br />
of the latter with incentives to promote<br />
industry compliance and leadership by AUTM<br />
and the Biotechnology Industry Organization<br />
(BIO; Washington, DC) in recognizing problems<br />
and proposing constructive solutions;<br />
fourth, adequate funding for tech transfer<br />
offices to learn about and implement changing<br />
practices; and finally, greater transparency in<br />
reporting patent holdings and licensing agreement<br />
terms. A more detailed discussion of each<br />
of these follows.<br />
desist letter because it did not consider GDL’s<br />
activities to be research. To Myriad, GDL’s<br />
provision of testing services to researchers was<br />
commercial, not a research service 14 . GDL took<br />
the position, however, that its activities, which<br />
supported others’ research, fell within the norm<br />
of tolerated research use, and much of the contested<br />
testing was part of clinical trials funded<br />
by the NCI, which is clearly clinical research.<br />
Much debate ensued, leaving many researchers<br />
with the (wrong) impression that Myriad<br />
would not tolerate any form of research.<br />
In an attempt to establish a clear norm<br />
over the question of which activities should<br />
be considered ‘research’, Myriad entered into<br />
a Memorandum of Understanding with the<br />
NCI to provide at-cost or below-cost testing<br />
to the NCI and any researcher working under<br />
an NCI-funded project. Myriad also similarly<br />
offered to provide NIH researchers with at-cost<br />
testing, given that the NIH was a co-owner<br />
of some of the relevant patents. Importantly,<br />
the agreement with the NCI defined the type<br />
of research Myriad would tolerate as being<br />
“part of the grant supported research of an<br />
Investigator, and not in performance of a technical<br />
service for the grant supported research<br />
of another (as a core facility, for example).”<br />
Furthermore, testing services had to be paid<br />
for out of grant funds and not by a patient or<br />
by insurance. Under this definition, GDL was<br />
not conducting research. This agreement was<br />
acceptable to both parties (Myriad and the<br />
NCI), and given the ‘at-cost’ provisions and the<br />
known efficiency of Myriad in testing, perhaps<br />
it is a salutary precedent. It is worth noting,<br />
however, that the NCI did not seek to delegate<br />
its government use rights under the Bayh-Dole<br />
Act 35 U.S.C. § 200-212 (“Bayh-Dole Act”) or<br />
Stevenson-Wydler Act 15 U.S.C. 3701 (which<br />
pertain because Myriad’s patents include inventors<br />
covered by both laws).<br />
The restricted nature of the Myriad-NCI<br />
Memorandum of Understanding limits its value<br />
as a precedent. It covered only the provision<br />
of services by Myriad; it did not address the<br />
general question of which research practices a<br />
patent holder should tolerate in the diagnostics<br />
field. Some of the conflict surrounding patents<br />
and genetics laboratories could be avoided by<br />
adopting a clearer definition of ‘research’ for<br />
the purposes of incorporating licensing terms<br />
that lower the threat of patent-infringement<br />
liability. The scope of government use rights<br />
under the Bayh-Dole and Stevenson-Wylder<br />
Acts is another legal gray zone. In any case, the<br />
definition of research should not be left to the<br />
individual negotiation between one company<br />
and one NIH institute. The NIH could take on<br />
a key role in developing this norm by convening<br />
a meeting of interested parties to develop<br />
788 volume 28 number 8 august 2010 nature biotechnology
COMMENTARY<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
the principles by which individual actors can<br />
determine how to apply the norm.<br />
University leadership. Implementation of<br />
licensing guidelines and best practices is<br />
difficult when interests and goals are not<br />
aligned. Participants at a workshop held at<br />
Duke University in April 2009 addressed<br />
the role of universities in DNA patents and<br />
diagnostic testing and noted that those at the<br />
front line of implementing these guidelines,<br />
tech transfer offices, face many hurdles to<br />
implementation. Many university administrators<br />
view patents as a means to secure revenues<br />
(to subsequently reinvest in research)<br />
and believe that exclusive licenses generate<br />
the most revenues. Although the evidence 58<br />
is quite clear that most tech transfer offices<br />
either break even or lose money and that<br />
many of the most lucrative university patents<br />
have entailed nonexclusive licensing, this view<br />
persists. Compounding this problem, universities<br />
expect tech transfer offices to generate<br />
sufficient revenues to be sustainable. Despite<br />
usually being unrealistic, such expectations<br />
can lead these offices toward licensing strategies<br />
that promote short-term income over<br />
dissemination and broad availability.<br />
If there is to be a change of behavior, it<br />
must come from two sources: first, university<br />
administrators must align tech transfer strategy<br />
with the university mission of broad knowledge<br />
dissemination; and second, universities<br />
should provide more push-back when threatened<br />
patent enforcement gets in the way of<br />
research and impedes the university’s central<br />
mission. Regarding the first point, university<br />
presidents and senior management must take<br />
seriously the university mission to disseminate<br />
knowledge and technology. They must consider<br />
technology transfer as one component<br />
of their strategy to enable the wider world to<br />
access, enjoy and use university-generated<br />
knowledge. To achieve change, they need to<br />
change the way they fund tech transfer offices<br />
so that the latter have the freedom to explore<br />
alternatives to the way they currently license<br />
out technology. They also need to develop clear<br />
goals for dissemination and ensure that they<br />
impose measures of success for their technology<br />
licensing offices that correspond to those<br />
goals. Expecting technology licensing officers<br />
to forgo exclusive licenses when companies<br />
seek them is unrealistic unless the officers<br />
are rewarded for decisions that acknowledge<br />
the broad social benefit of avoiding patent<br />
thickets in genetic diagnostics. Recognition<br />
must also be given to the fact that these offices<br />
do not negotiate licenses in a vacuum: they<br />
negotiate largely with industry partners. If<br />
diagnostic companies are unwilling to accept<br />
nonexclusive licenses, broad research exemptions<br />
or other terms that universities propose<br />
to support research, tech transfer offices have<br />
little room to maneuver. Currently, there is no<br />
incentive—whether external or through the<br />
threatened use of government march-in rights<br />
under the Bayh-Dole Act—to curb industry<br />
behavior even when it is problematic. Tech<br />
transfer departments with limited funding,<br />
limited staff and unreasonable expectations<br />
to be sustainable cannot be expected to resist<br />
intransigence by licensees.<br />
Universities need also to take a lead in<br />
encouraging their researchers, clinicians and<br />
laboratory directors to push back when threatened<br />
with patent enforcement. University<br />
administrators need to educate themselves<br />
and their staff about the freedom to operate for<br />
purposes of research and improving diagnostic<br />
testing—that is, the scope of activities allowed<br />
that do not infringe on a valid patent. University<br />
Implementation of licensing<br />
guidelines and best practices<br />
is difficult when interests and<br />
goals are not aligned.<br />
administrators, researchers, clinicians and<br />
laboratory directors can act together by sharing<br />
cease and desist letters or other patent<br />
enforcement actions to determine whether the<br />
activities are, in fact, infringing. They can share<br />
expertise about the validity of patent claims that<br />
threaten research or clinical testing. Although<br />
individual laboratories may lack the resources<br />
to conduct these analyses, other institutions<br />
may have the requisite resources (for example,<br />
the American Society of Human Genetics, the<br />
American College of Medical Genetics, the<br />
College of American Pathologists and academic<br />
units such as the science policy research units at<br />
the University of Sussex in Brighton, UK, and<br />
the University of Leuven, Belgium).<br />
Leadership from AUTM and BIO. The<br />
development of a ‘gene patent supermarket’<br />
by Denver firm MPEG-LA is a promising<br />
step toward enabling nonexclusive licensing,<br />
increasing simplicity and consistency in licensing<br />
terms, and reducing transaction costs 59 .<br />
Unfortunately, instead of proposing such<br />
constructive solutions, BIO and AUTM have<br />
chosen not to acknowledge the real problems<br />
that exist in the unusual market for genetic<br />
diagnostics and have been quick and vociferous<br />
in their opposition to the recommendations<br />
of the SACGHS 60,61 . It is impossible to<br />
judge the full extent of the problems, but it is<br />
certainly poor policy to deny that they exist at<br />
all. Moreover, BIO and AUTM have expended<br />
time and resources opposing SACGHS recommendations<br />
while failing to enforce the<br />
established norms laid out by the NIH and the<br />
OECD, as well as the AUTM-endorsed Nine<br />
Points, among their respective constituencies.<br />
Companies and universities that violate those<br />
norms have faced no action, or even recognition<br />
that they have deviated. Indeed, there has<br />
been no public statement from either BIO or<br />
AUTM that members have been responsible for<br />
some of the problems uncovered in licensing<br />
practices for genetic diagnostics. It is reasonable<br />
to disagree with the SACGHS recommendations,<br />
but it is not reasonable to read the<br />
SACGHS report and the case studies prepared<br />
for it and conclude that the system is working<br />
well across the board. BIO and AUTM should<br />
recognize the very real problems that have been<br />
uncovered, exhort compliance with established<br />
norms and—even more importantly if such<br />
norms are to be meaningful—criticize deviations<br />
from them, rather than following the<br />
politically expedient tactic of focusing their<br />
fire on SACGHS recommendations intended<br />
to prevent these problems.<br />
The two most controversial SACGHS recommendations<br />
are, first, a proposed exemption<br />
from infringement liability for research use,<br />
and second, a similar exemption for diagnostic<br />
use. As previously noted, university licensing<br />
offices opposing a research exemption puts<br />
them at odds with their own stated principles,<br />
as licensing to ensure freedom to do research<br />
appears in every document proposing norms<br />
for licensing. Opposition to a diagnostic-use<br />
exemption is more understandable because<br />
it may be that there are unusual situations in<br />
which exclusivity is needed to get a product or<br />
service to market, and such situations simply<br />
have not been captured in the cases studied to<br />
date. Nevertheless, it is quite clear that in many<br />
if not most cases of genetic diagnostics, the<br />
main use of exclusive licenses from universities<br />
has been to reduce competition and reduce<br />
the number of laboratories offering tests, without<br />
apparent benefits of introducing tests that<br />
were not already available. Rather, tests would<br />
demonstrably have been available even without<br />
the participation of the companies involved.<br />
The SACGHS may have judged that tech<br />
transfer offices are failing to respect existing<br />
norms, and in the absence of any credible compliance<br />
measures, the simplest legal solution<br />
is to address the problem through exemption<br />
from infringement liability. If AUTM and<br />
BIO want to preserve the option of exclusive<br />
licensing when needed to get genetic tests to<br />
market, then compliance with guidelines needs<br />
to be credible. Criticizing deviations when<br />
nature biotechnology volume 28 number 8 august 2010 789
COMMENTARY<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
they come to light, with the long-term goal<br />
of increasing compliance with stated norms,<br />
would go a long way toward reducing the need<br />
for a diagnostic-use exemption. Moreover,<br />
enforcing nonexclusive licensing norms can<br />
preserve revenue streams, as seen in the cystic<br />
fibrosis and Huntington’s models, whereas<br />
a diagnostic-use exemption would eliminate<br />
those revenues because the patents would be<br />
unenforceable for diagnostic uses.<br />
One could object that it is neither the function<br />
nor the responsibility of either BIO or<br />
AUTM to criticize their members. BIO is an<br />
industry lobby group that sees itself as “the<br />
champion of biotechnology and the advocate<br />
for its member organizations,” whereas AUTM<br />
is an association of individuals working in tech<br />
transfer that seeks “to support and advance academic<br />
technology transfer globally.” Developing<br />
and enforcing patenting and licensing policies<br />
fall within neither mandate. This argument is,<br />
however, disingenuous, given that both AUTM<br />
and BIO claim to be working to ensure that<br />
tech transfer serves the public good. It is just as<br />
important to reduce practices that fall short as<br />
to promote practices that achieve the goals of<br />
their respective constituencies. Both organizations<br />
have endorsed the Nine Points guidelines<br />
and actively promote technology transfer “in<br />
a manner that is beneficial to the public interest”<br />
(http://bio.org/ip/techtransfer/) while<br />
“improving quality of life, building social and<br />
economic well-being, and enhancing research<br />
programs” (http://betterworldproject.org/<br />
tech_transfer.cfm). Having voluntarily taken<br />
these positions, both organizations should be<br />
held accountable for them.<br />
Increasing transparency to permit ‘system<br />
learning’. To promote change, universityindustry<br />
relationships need to be more transparent;<br />
indeed, the current opaqueness over<br />
existing university-industry interactions is<br />
a major hurdle to improving the intellectual<br />
property system for DNA diagnostics 11 . For<br />
example, license agreements between universities<br />
and start-up and private companies are<br />
unavailable, even in general terms. The only<br />
exceptions are universities or companies that<br />
voluntarily make such information public.<br />
Participants at the workshop on the role of<br />
universities in DNA patents and diagnostic testing<br />
held at Duke in April 2009 noted that most<br />
licensing information is not publicly available,<br />
even for inventions arising from public funding.<br />
In some cases, but only some, it is possible<br />
to reconstruct licensing terms from company<br />
annual reports or from press announcements.<br />
There is often no way for researchers and institutions<br />
to know what practices a license covers,<br />
whether there remains scope for others to<br />
practice an invention, which regions it covers<br />
and whether it applies to any specific fields of<br />
use or contains special restrictions. The lack of<br />
information makes it difficult to substantiate<br />
claims that licensing practices are changing or<br />
comply with best practices. As a study 11 on university<br />
licensing practices notes, simply stating<br />
whether a license is exclusive or nonexclusive<br />
misses important nuances. Not only would<br />
more transparency help researchers better<br />
understand the scope and ownership of intellectual<br />
property rights, it would also allow policymakers,<br />
academics and tech transfer offices<br />
to determine in what cases exclusive licensing<br />
is justified, as opposed to enforcing a blanket<br />
norm of nonexclusive licensing.<br />
Although under provisions 62 of the Bayh-<br />
Dole Act, all recipients of federal grants must<br />
report on activities involving the disposition of<br />
certain intellectual property rights that result<br />
from federally funded research, the information<br />
is incomplete and cannot be obtained<br />
Data on patenting and licensing<br />
practices are languishing in a<br />
government database that is not<br />
mined for valuable insights.<br />
because of strictures on access to the data. A<br />
clause of the legislation was intended to protect<br />
proprietary data from public access through the<br />
Freedom of Information Act 35 U.S.C. § 202(c)<br />
(5). The way the implementing regulations<br />
were written, however, went well beyond this,<br />
and gave licensees veto power over nongovernment<br />
disclosure of information. Tech transfer<br />
offices file reports with the interagency Edison<br />
(iEdison) database when they license inventions<br />
supported by most government funders.<br />
The reporting requirements do not require the<br />
disclosure of the licensing terms, and what is<br />
reported to iEdison is not publicly available.<br />
Indeed, access to iEdison is highly restricted;<br />
the database is unavailable for study or use outside<br />
government, and even government officials<br />
wanting to study technology transfer have<br />
been denied access unless they get permission<br />
from all licensees, a nearly impossible hurdle<br />
to overcome.<br />
Making licensing terms of publicly funded<br />
inventions more transparent would require<br />
a rewrite of the implementing regulations to<br />
change interpretation of the Bayh-Dole Act’s<br />
confidentiality clause. The confidentiality provision<br />
in the Bayh-Dole Act was intended to<br />
protect agencies from being forced to disclose<br />
proprietary data, but its implementing regulation<br />
is so broad that, in effect, it restricts the<br />
government’s ability to use data without permission<br />
of the relevant licensee. Current nondisclosure<br />
practices lead to data being unavailable for<br />
research aimed at improving knowledge about<br />
patenting and licensing practices. Many studies<br />
could be undertaken on aggregated reported<br />
data, and there are many precedents for using<br />
census data, health statistics and other very<br />
private information in government databases.<br />
The original rationale for the Bayh-Dole Act<br />
was that government-owned inventions were<br />
languishing for want of effective patent incentives<br />
to grantees and contractors; the current<br />
problem is that data on patenting and licensing<br />
practices are languishing in a government database<br />
that is not mined for valuable insights.<br />
On the industry side, there is a somewhat<br />
higher standard for disclosure by public companies<br />
to protect shareholders. As of 2003, the<br />
Securities and Exchange Commission (SEC)<br />
requires disclosure of material agreements,<br />
including license agreements, as part of SEC<br />
filings. Section 401(a) of the Sarbanes-Oxley<br />
Act of 2002 (Public Company Accounting<br />
Reform and Investor Protection Act of 2002,<br />
Pub. L. No. 107-204, 116 Stat. 745) requires the<br />
SEC to adopt rules to require each annual and<br />
quarterly financial report filed with the commission<br />
to disclose “all material off-balance<br />
sheet transactions, arrangements, obligations<br />
(including contingent obligations), and other<br />
relationships of the issuer with unconsolidated<br />
entities or other persons, that may have a material<br />
current or future effect on financial condition,<br />
changes in financial condition, results<br />
of operations, liquidity, capital expenditures,<br />
capital resources, or significant components<br />
of revenues or expenses.” In many cases, however,<br />
these disclosures are of little assistance in<br />
understanding the licensing landscape. The<br />
reporting pertains only when a license underpins<br />
a genetic test that is a large enough portion<br />
of a publicly traded company’s business that it<br />
needs to be disclosed to investors. Even then,<br />
which patents have been licensed under what<br />
terms may be disclosed vaguely. Many biotech<br />
start-up companies are not publicly traded and<br />
are not subject to SEC disclosure requirements.<br />
By the time a biotech company goes public, its<br />
prospectus may contain some, but only limited,<br />
information about licensing agreements. In the<br />
usual case of a public company acquiring technology<br />
by buying another company, disclosure<br />
of the original license may not be required.<br />
Universities argue that if they are forced to<br />
disclose the terms of prior licensing agreements,<br />
it will undermine their negotiating position<br />
with new potential licensees. If, however,<br />
public companies must disclose the contents of<br />
their license agreements to protect the interests<br />
of those funding them (namely, shareholders) as<br />
790 volume 28 number 8 august 2010 nature biotechnology
COMMENTARY<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
a matter of public policy, then it is not clear why<br />
a university should not be required to disclose<br />
the contents of its license agreements to protect<br />
those who fund it (namely, the public). The<br />
question of human resources needed to ensure<br />
transparency is very real and needs to be taken<br />
into account, but the principle of public disclosure<br />
should be entrenched within public institutions,<br />
particularly when the licensed inventions<br />
arise from publicly funded research and when<br />
data are being collected and reported already.<br />
Government and nonprofit research dollars<br />
should come with public accountability.<br />
Secure funding of tech transfer offices. As<br />
noted above, some tech transfer offices are<br />
expected to be self-sustaining and suffer from<br />
a serious lack of resources. This situation has<br />
several consequences. First, the agreements<br />
that these offices pursue will not necessarily<br />
aim to promote dissemination but instead will<br />
focus first on securing revenues. Second, tech<br />
transfer offices lack resources to train managers<br />
on implementing guidelines and the particular<br />
challenges that different technologies raise. The<br />
DNA diagnostic market is complex and rapidly<br />
evolving. For example, technology licensing<br />
officers need to know that the development of<br />
genetic testing after the discovery of the gene<br />
requires far less investment than the development<br />
of therapeutics, suggesting that exclusive<br />
licenses are usually not as necessary 11 . Without<br />
a more nuanced and informed understanding<br />
of how optimal patenting, dissemination and<br />
licensing decisions vary across different types<br />
of technologies and uses, these offices cannot<br />
fulfill their mandate: transferring technology.<br />
Conclusions<br />
To address the ongoing failure to achieve the<br />
goals of the multiple guidelines, policies and<br />
even legislation aimed at ensuring continued<br />
research on and access to clinical genetic tests,<br />
practices within universities and their industry<br />
partners must conform to existing guidelines.<br />
Although some changes to patent law—such as<br />
clearer research exemptions and an opposition<br />
proceeding—could be of use, fundamentally the<br />
problem is one of strategy about what to patent<br />
(products versus methods), how broadly to<br />
make claims to early-stage gene-based inventions<br />
and how to deploy those patents (broadly<br />
versus exclusively). Patents will be properly<br />
deployed only when university constituencies<br />
unite in promoting broad dissemination, when<br />
technology transfer offices are given the necessary<br />
financial support and incentives and when<br />
universities and industry have transparent and<br />
publicly accountable practices for licensing of<br />
DNA diagnostic technologies. Industry groups<br />
such as BIO and university technology transfer<br />
organizations such as AUTM have a crucial and<br />
constructive role to play in resolving this predicament.<br />
Progress toward addressing the problems<br />
in genetic diagnostics can begin with less caustic<br />
and unhelpful rhetoric and more focus on<br />
engagement with their constituencies on seriously<br />
implementing guidelines, as well as with<br />
federal advisory bodies such as the SACGHS.<br />
By acknowledging and engaging with the distinctive<br />
problems that patenting and licensing<br />
practices raise for DNA diagnostics, both the<br />
universities licensing out technology and the<br />
companies licensing it in can bring about real<br />
improvement without the need for legislation.<br />
COMPETING FINANCIAL INTERESTS<br />
The authors declare no competing financial interests.<br />
1. Diamond v. Chakrabarty, 447 U.S. 303 (1980).<br />
2. Association for Molecular Pathology et al. v. United<br />
States Patent and Trademark Office et al. (USDC SDNY<br />
09 Civ. 4515, 2010).<br />
3. Ariad Pharmaceuticals, Inc. v. Eli Lilly and Co. (560<br />
F3d 1366 (Fed Cir 2009).<br />
4. Secretary’s Advisory Committee on Genetics Health<br />
and Society, National Institutes of Health. Report<br />
on Gene Patents and Licensing Practices and Their<br />
Impact on Patient Access to Genetic Tests (SACGHS,<br />
Washginton, DC, 2010).<br />
5. Merz, J.F. Clin. Chem. 45, 324–330 (1999).<br />
6. Heller, M.A. & Eisenberg, R.A. Science 280, 698–701<br />
(1998).<br />
7. Chandrasekharan, S. & Cook-Deegan, R. Genome Med.<br />
1, 92 (2009).<br />
8. Holman, C.M. Science 322, 198–199 (2008).<br />
9. Nelson, R. J. Technol. Transf. 26, 13–19 (2001).<br />
10. Mowery, D.C. et al. Res. Policy 30, 99–119 (2001).<br />
11. Pressman, L. et al. Nat. Biotechnol. 24, 31–39<br />
(2006).<br />
12. Schissel, A., Merz, J.F. & Cho, M.K. <strong>Nature</strong> 402, 118<br />
(1999).<br />
13. Henry, M.R., Cho, M.K., Weaver, M.A. & Merz, J.F.<br />
Science 297, 1279 (2002).<br />
14. Gold, E.R. & Carbone, J. Genet. Med. 12 Suppl, S39–<br />
S70 (2010).<br />
15. Skeehan, K., Heaney, C. & Cook-Deegan, R. Genet.<br />
Med. 12 Suppl, S71–S82 (2010).<br />
16. Merz, J.F. in The Penn Center Guide to Bioethics (eds.<br />
Ravitsky, F., Feister, A. & Caplan, A.L.) 383–385<br />
(Springer, New York, 2009).<br />
17. Nuffield Council on Bioethics. The Ethics of Patenting<br />
DNA (Nuffield Council on Bioethics, London, 2002).<br />
18. Cook-Deegan, R. et al. Genet. Med. 12 Suppl, S15–<br />
S38 (2010).<br />
19. Angrist, M., Chandrasekharan, S., Heaney, C. &<br />
Cook-Deegan, R. Genet. Med. 12 Suppl, S111–S154<br />
(2010).<br />
20. Chandrasekharan, S. & Fiffer, M. Genet. Med. 12<br />
Suppl, S171–S193 (2010).<br />
21. Chandrasekharan, S., Heaney, C., James, T., Conover,<br />
C. & Cook-Deegan, R. Genet. Med. 12 Suppl,<br />
S194–S211 (2010).<br />
22. Chandrasekharan, S., Pitlick, E., Heaney, C. & Cook-<br />
Deegan, R. Genet. Med. 12 Suppl, S155–S170<br />
(2010).<br />
23. Colaianni, A., Chandrasekharan, S. & Cook-Deegan, R.<br />
Genet. Med. 12 Suppl, S5–S14 (2010).<br />
24. Powell, A., Chandrasekharan, S. & Cook-Deegan, R.<br />
Genet. Med. 12 Suppl, S83–S110 (2010).<br />
25. Cook-Deegan, R., Chandrasekharan, S. & Angrist, M.<br />
<strong>Nature</strong> 458, 405–406 (2009).<br />
26. Caulfield, T., Cook-Deegan, R.M., Kieff, F.S. & Walsh,<br />
J.P. Nat. Biotechnol. 24, 1091–1094 (2006).<br />
27. National Research Council. Reaping the Benefits<br />
of Genomic and Proteomic Research: Intellectual<br />
Property Rights, Innovation and Public Health<br />
(National Research Council, Washington, DC,<br />
2006).<br />
28. Ontario Report to the Provinces and Territories.<br />
Genetics, Testing and Gene Patenting: Charting<br />
New Territory in Healthcare (Government of Ontario,<br />
Toronto, Ontario, Canada, 2002).<br />
29. Australian Law Reform Commission. Essentially<br />
Yours: The Protection of Human Genetic Information<br />
in Australia (ALRC 96) (ALRC, Sydney, New South<br />
Wales, Australia, 2003).<br />
30. Gold, E.R., Bubela, T., Miller, F.A., Nicol, D. & Piper,<br />
T. Nat. Biotechnol. 25, 388–389 (2007).<br />
31. Gold, E.R. Nat. Biotechnol. 18, 1319–1320 (2000).<br />
32. Nicol, D. & Nielsen, J. Patents and Medical<br />
Biotechnology: An Empirical Analysis of Issues<br />
Facing the Australian Industry (Occasional Paper no.<br />
6) (Centre for Law & Genetics, Sandy Bay, Tasmania,<br />
Australia, 2003).<br />
33. Cho, M.K., Illangasekare, S., Weaver, M.A., Leonard,<br />
D.G.B. & Merz, J.F. J. Mol. Diagn. 5, 3–8 (2003).<br />
34. Rai, A. Northwest. Univ. Law Rev. 94, 77–152<br />
(1999).<br />
35. Merz, J.F., Kriss, A.G., Leonard, D.G. & Cho, M.K.<br />
<strong>Nature</strong> 415, 577–579 (2002).<br />
36. Merz, J.F., Cho, M.K., Robertson, M.J. & Leonard, D.G.<br />
Mol. Diagn. 2, 299–304 (1997).<br />
37. Merz, J.F. & Cho, M.K. Camb. Q. Healthc. Ethics 7,<br />
425–428 (1998).<br />
38. Andrews, L.B. Nat. Rev. Genet. 3, 803–808 (2002).<br />
39. LOI no 613–16 as amended in 2004.<br />
40. Overwalle, G.V. Int. Rev. Intellect. Property Competition<br />
Law 889, 908–918 (2006).<br />
41. Fed. Reg. 66, 1092–1099 (2001).<br />
42. Fed. Reg. 70, 18413–18415 (2005).<br />
43. Organisation for Economic Co-operation and<br />
Development. Guidelines for the Licensing of Genetic<br />
Inventions (OECD, Paris, 2006).<br />
44. In the Public Interest: Nine Points to Consider in<br />
Licensing University Technology (AUTM, Deerfield,<br />
Illinois, USA, 2007).<br />
45. Association of University Technology Managers.<br />
University Principles on Global Access to Medicines<br />
(AUTM, Deerfield, Illinois, USA, 2009).<br />
46. Rimmer, M. Eur. Intellectual Prop. Rev. 25, 20–33<br />
(2003).<br />
47. American Medical Association. Report 9 of the Council<br />
on Scientific Affairs (AMA, Chicago, 2000).<br />
48. Huys, I., Berthels, N., Matthijs, G. & Van Overwalle, G.<br />
Nat. Biotechnol. 27, 903–909 (2009).<br />
49. Laboratory Corporation of America Holdings, dba<br />
Labcorp v. Metabo-Lite Laboratories, Inc. et al., 548<br />
U.S. 124 (2006).<br />
50. eBay Inc. v. MercExchange, LLC, 547 U.S. 388<br />
(2006).<br />
51. Bilski v. Kappos, 561 U.S. ____ 20010 (No. 08–964),<br />
affirming F.3d 943 3d 943 (Fed. Cir. 2008).<br />
52. In re Kubin (Fed Cir. 2009).<br />
53. KSR International Co. v. Teleflex, Inc., 550 U.S. 398<br />
(2007).<br />
54. Van Overwalle, G., van Zimmeren, E., Verbeure, B. &<br />
Matthijs, G. Nat. Rev. Genet. 7, 143–148 (2006).<br />
55. Walsh, J.P., Ashish, A. & Cohen, W. in Effects Of<br />
Research Tool Patents And Licensing On Biomedical<br />
Innovation (eds. Cohen, W. & Merrill, S.) 285–336<br />
(National Academies Press, Washington, DC, 2003).<br />
56. Gold, E.R. et al. The Research or Experimental<br />
Use Exception: A Comparative Analysis (Centre for<br />
Intellectual Property Policy/Health Law Institute,<br />
Montreal, Quebec, Canada, 2005).<br />
57. Merck KGaA v. Integra Lifesciences I, Ltd., 545 U.S.<br />
193 (2005).<br />
58. Siegel, D.S. & Wright, M. Oxford Rev. Econ. Policy 23,<br />
529–540 (2007).<br />
59. http://www.mpegla.com/Lists/MPEG%20LA%20<br />
News%20List/Attachments/230/n-10–04–08.pdf,<br />
Last Accessed May 4, 2010.<br />
60. (5 February 2010).<br />
61. http://bio.org/ip/genepat/documents/SACGHSsignonletter2–4-2010final_000.pdf<br />
62. Bayh-Doyle Act, 37 C.F.R. Part 401.<br />
nature biotechnology volume 28 number 8 august 2010 791
FEATURE<br />
Public biotech 2009—the numbers<br />
Brady Huggett, John Hodgson & Riku Lähteenmäki<br />
The public biotech sector sustained more losses in 2009, but the year ended on a positive note, and the industry has<br />
regained its footing.<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
That whooshing sound at the end of 2009<br />
was the biotech sector letting out its collective<br />
breath. The year began as a hard slog,<br />
so when it came to a close on an upward<br />
swing, the industry rightfully felt a measure<br />
of relief. That’s not to say there weren’t casualties:<br />
a distressingly large number of companies<br />
departed the scene last year. But it was not as<br />
bad as some pundits had estimated, and the<br />
industry proved itself to be strong and creative.<br />
It was helped by a recovering economy in the<br />
second half of the year. Overall, counting the<br />
vast financial potential of collaborations, the<br />
industry recorded one of its best years for<br />
fundraising. That has left the sector brightly<br />
looking ahead again—a far cry from how<br />
things appeared at the end of 2008.<br />
Economic woes<br />
The 2009 data from <strong>Nature</strong> Biotechnology’s<br />
annual survey of public biotech firms, which<br />
now number 461 (owing to a change in<br />
our data-gathering process; see Box 1 and<br />
Supplementary Table 1), show little trace of<br />
how terribly the year began or how tightly<br />
the public markets had been hammered shut<br />
at the end of 2008. The reality is that 2009<br />
started bleakly for biotech, and it continued<br />
that way for most of the first quarter.<br />
Of course, not just biotech suffered—the<br />
recession affected all countries and sectors.<br />
Along with the other indices, shares on the<br />
Nasdaq Biotechnology Index bottomed out<br />
on 9 March, resting at 59.05, a low it had not<br />
seen since May 2003. The global economy continued<br />
to shed jobs last year: the US Central<br />
Data retrieval for this article was by Ernst &<br />
Young (Boston) with additional reporting by<br />
Riku Lähteenmäki. Brady Huggett is business<br />
editor at <strong>Nature</strong> Biotechnology, John Hodgson<br />
is editor-at-large at <strong>Nature</strong> Biotechnology,<br />
and Riku Lähteenmäki is a freelance writer in<br />
Turku, Finland.<br />
Box 1 The numbers<br />
<strong>Nature</strong> Biotechnology has published an annual report on public biotech companies since<br />
1996. As the industry has grown and changed, so have our definition of what constitutes<br />
a biotech company and our methods for gathering the information that serves as the<br />
backbone to this piece. We generally include companies built upon applying biological<br />
organisms, systems or processes, or the provision of specialist services to facilitate the<br />
understanding thereof. We exclude pharmaceutical companies, medical-device firms and<br />
contract research organizations to better focus on the unique attributes and situations<br />
that make up the biotech sector.<br />
This year’s data was provided by Ernst & Young, which has broadened the report’s reach<br />
into international exchanges and increased our total number of companies. Additional<br />
reporting was done via individual financial reports. The top-ten lists and other aggregate<br />
lists are sourced appropriately, with most data supplied by BioCentury. As investors do not<br />
stratify the biotech sector as stringently as <strong>Nature</strong> Biotechnology, we used money figures<br />
from across the biotech and biopharmaceutical arena to best highlight trends. In some<br />
cases, full-year data were not available and fourth-quarter numbers were extrapolated;<br />
this is noted in the company-by-company data table (Supplementary Table 1). Companies<br />
delisted in 2009 from major exchanges were excluded.<br />
Intelligence Agency estimates unemployment<br />
numbers increased around the world, sometimes<br />
drastically—Ireland’s unemployment<br />
nearly doubled to 12%, whereas the US went<br />
from 5.8% in 2008 to 9.3%.<br />
So although biotech wasn’t alone in the<br />
dark, as an industry made up mainly of small<br />
companies devoid of revenue—and thus<br />
more dependent on raising public funds—<br />
the sector was hit particularly hard. The fear,<br />
expressed by pundits, the Biotechnology<br />
Industry Organization (Washington, DC)<br />
and even biotech executives themselves, was<br />
that the industry would lose up to 25% of its<br />
companies to bankruptcy.<br />
But the Nasdaq Biotech Index steadily<br />
recovered from that March low and closed<br />
2009 at 81.83. Overall funding for the sector<br />
jumped in the second half, and although the<br />
National Bureau of Economic Research has<br />
yet to officially declare the end of the recession<br />
in the United States, consensus pegs it<br />
around the second quarter of 2009.<br />
Catastrophic shrinkage in the sector has<br />
not happened. There were losses (Table 1),<br />
but they were not as far-reaching as feared.<br />
And among all this detritus, a surprise: the<br />
biotech sector was again profitable in 2009.<br />
The money trail<br />
Financing levels for biotech are a useful<br />
gauge of the sector’s overall health, because<br />
without repeated investment, the industry<br />
shrivels. In this regard, 2009 turned out better<br />
than expected. The third quarter saw<br />
the first month of positive growth in the<br />
US economy since the recession started in<br />
December 2007, and as the economy recovered,<br />
money again began moving. By year’s<br />
end, overall biotech financing was up 84%<br />
from the depressed figures seen in 2008.<br />
In 2008, as first the United States and then<br />
the world slid into recession, overall funding<br />
was at its lowest since at least 2002 (Fig. 1),<br />
with debt financings, private investments in<br />
a public entity (PIPEs), follow-on offerings<br />
nature biotechnology volume 28 number 8 august 2010 793
feature<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
Table 1 Casualties in 2009<br />
Company<br />
Alpha Innotech<br />
Altus Pharmaceuticals<br />
Arthrokinetics<br />
Autoimmune<br />
Avalon Pharmaceuticals<br />
Avigen<br />
Biopure Corporation<br />
BioXell<br />
CelSis<br />
Cellegy<br />
Cell Genesys<br />
Cobra<br />
Curagen<br />
Curalogic<br />
CV Therapeutics<br />
EPIX Pharmaceuticals<br />
Evolutec<br />
Genaera<br />
Genentech<br />
Hemacare<br />
Hemagen Diagnostics<br />
IDM Pharma<br />
Introgen<br />
Isologen<br />
Intercytex<br />
Liponex<br />
Medarex<br />
Metabasis Therapeutics<br />
Monogram<br />
Napo Pharma<br />
Nastech<br />
Neos<br />
Neurogen<br />
Northfield Laboratories<br />
Nucryst<br />
Nuvelo<br />
Nventa Biopharmaceuticals<br />
Phynova<br />
Replidyne<br />
Targanta<br />
ViRexx Medical<br />
XLT Biopharmaceuticals<br />
and initial public offerings (IPOs) all declining<br />
substantially from previous years. Only<br />
venture capital remained aloft, although<br />
venture capitalists were more inclined to put<br />
money into companies previously invested<br />
in, rather than new ventures.<br />
This pattern reversed last year. Debt<br />
financings, venture capital and money raised<br />
in follow-ons and IPOs all increased, almost<br />
achieving the level seen in 2007, before the<br />
markets tanked. Only one category went<br />
backward, PIPEs —which was to be expected,<br />
Reason for status change<br />
Acquired by Cell Biosciences<br />
Bankruptcy<br />
Delisted<br />
Inactive<br />
Acquired by Clinical Data<br />
Acquired by Medicinova<br />
Bankruptcy<br />
Acquired by Cosmo<br />
Acquired by JM Hambro<br />
Merged with Adamis Pharmaceuticals<br />
Acquired by BioSante<br />
Merged with Recipharm<br />
Acquired by CellDex<br />
Bankruptcy<br />
Acquired by Gilead<br />
Liquidated<br />
Transformed into investment company<br />
Dissolved<br />
Acquired by Roche<br />
Inactive<br />
Inactive<br />
Acquired by Takeda<br />
Bankruptcy<br />
Bankruptcy<br />
Delisted<br />
Merged with ImaSight<br />
Acquired by BNS<br />
Acquired by Ligand Pharmaceuticals<br />
Acquired by LabCorp<br />
Inactive<br />
Changed name to MDRNA<br />
Inactive<br />
Inactive<br />
Inactive<br />
Inactive<br />
Merged with Arca<br />
Inactive<br />
Delisted<br />
Merged with Cardiovascular<br />
Acquired by The Medicines Company<br />
Acquired by Paladin<br />
Delisted<br />
as once the general markets (and individual<br />
stock prices) improved, the need for private<br />
investment faded.<br />
The largest follow-on offering of the year<br />
($640 million) was conducted by Qiagen<br />
(Venlo, The Netherlands), a profitable provider<br />
of sample and assay technologies<br />
(Table 2). It had the best year of its existence<br />
in 2009, with overall revenues above $1 billion,<br />
and is the type of stable company that<br />
can easily reach into the secondary-offering<br />
market. The sexier story is Human Genome<br />
Sciences (HGS, Rockville, Maryland, USA),<br />
which raised about $850 million in two follow-on<br />
offerings. As its stock price rocketed<br />
after positive pivotal trial results for the lupus<br />
drug Benlysta (belimumab), it tapped the<br />
public markets in late July for more than $373<br />
million and again in December for about $477<br />
million. The company’s stock, which opened<br />
the year at $2.12, ended it at $30.58.<br />
This is a similar story to Dendreon’s<br />
(Seattle), which in April reported positive<br />
phase 3 results for its prostate cancer vaccine<br />
Provenge (sipuleucel-T), sending its<br />
stock up more than 100% on the day the<br />
results were announced. This set the stage for<br />
a $427-million public offering in May, followed<br />
by another in December. Provenge has<br />
now been approved, the company has priced<br />
the drug aggressively, and Dendreon’s stock,<br />
at the time of publication, sat just above $34;<br />
it began 2009 at $4.59.<br />
Whereas many of biotech’s established<br />
companies completed debt deals last year,<br />
returning that funding category to levels<br />
seen before a well-below-average 2008, it<br />
was hardly a year worth mentioning for IPOs<br />
(Table 3). Just ten occurred in 2009, none<br />
before August, and none could be considered<br />
a typical biotech IPO, either in the type of<br />
company or the amount of money raised.<br />
For instance, the JSC Human Stem Cell<br />
Institute (with sites in Russia, Germany and the<br />
Ukraine) raised a mere $4.8 million. The institute<br />
doesn’t look much like the usual biotech enterprise<br />
preparing to go public: it has a research<br />
laboratory and a center for storage of cellular<br />
materials, and it publishes the journal Cellular<br />
Transplantation and Tissue Engineering.<br />
What’s more, an IPO is no longer the cash<br />
windfall and viable exit for investors it once<br />
was. Consider D-Pharm (Rehovot, Israel),<br />
which raised about $7.4 million on the Tel<br />
Aviv Stock Exchange to fund clinical testing<br />
of its small-molecule stroke drug, DP-b99, a<br />
membrane-active derivative of the calcium<br />
chelator 1,2-bis-(2-aminophenoxy)ethane-<br />
N,N,N′,N′-tetraacetic acid (BAPTA).<br />
Alongside the IPO, the company also completed<br />
a rights offering (which gives existing<br />
shareholders the right to buy shares during a<br />
defined period, usually at a discount), raising<br />
NIS 57 million ($14.8 million). The existing<br />
investors didn’t exit—they instead had the<br />
choice to increase their stake.<br />
In truth, the average amount raised per<br />
IPO is hardly enough to alleviate financial<br />
concerns for long. In 2008, our survey<br />
showed IPOs raised on average $22.3 million.<br />
In the previous two years, it was considerably<br />
more, $58 million in 2007 and $41 million in<br />
2006. Figure 2 shows an IPO in 2009 raised,<br />
794 volume 28 number 8 august 2010 nature biotechnology
feature<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
of finance in Europe and nearly half of all<br />
finance in Europe during 2009 (Table 4).<br />
Without this money, the amount raised in<br />
Europe during 2009 would have been only<br />
15% of the global total finance in this survey,<br />
rather than 26%.<br />
Those IPOs had a small role in the sizable<br />
increase in overall funding from 2008, but<br />
the biggest factor was headline-grabbing<br />
partnering deals: $36.9 billion in 2009, up<br />
from $20 billion the previous year. This<br />
heightened partnering activity was propelled<br />
both by pharma’s need to bolster fading pipelines<br />
and biotech’s need for help of any kind<br />
during the recession.<br />
But here again, that high figure is misleading,<br />
because a large portion of it represents<br />
milestone payments that may never be<br />
paid. The leading deal among our companies<br />
(Table 6) was formed between Nektar<br />
and AstraZeneca for two programs that use<br />
Nektar’s advanced polymer conjugate techon<br />
average, $92.8 million. On the surface<br />
that seems a marked increase, but further<br />
inspection shows that the figure is distorted<br />
by the unique case of Talecris Biotherapeutics<br />
(Research Triangle Park, NC, USA). The<br />
company develops nonrecombinant protein<br />
therapeutics from plasma and is profitable. It<br />
was pegged as an acquisition target by rival<br />
CSL (Victoria, Australia) in 2008 for $3.1 billion,<br />
but the US government challenged the<br />
purchase as anticompetitive, and the deal fell<br />
apart. Talecris instead conducted an IPO in<br />
2009 for a whopping $550 million. Toss aside<br />
Talecris, and the figure falls more in line with<br />
recent years: $42 million. Talecris is again in<br />
line for an acquisition, by Grifols (Barcelona,<br />
Spain) for $3.4 billion.<br />
Overall, the public markets in Europe<br />
remain relatively parsimonious. They provided<br />
only 15% of all European financing,<br />
whereas US public markets provided 33%<br />
of the total US fundraising (Table 4). The<br />
main shortfall, as in previous years, was<br />
in follow-on offerings. Where follow-on<br />
financings occurred in Europe, they raised<br />
amounts comparable to those raised by US<br />
firms—$112 million on average, compared<br />
with $107 million for US companies. But in<br />
2009, 48 US biotech companies got followon<br />
offerings away, compared with only seven<br />
in Europe. For European public companies,<br />
secondary offerings are still the exception—<br />
leaving them open to acquisition bids and<br />
investors open to disillusionment.<br />
Two European firms dominated debt<br />
financing this year (Table 5), with giant UCB<br />
(formed around Celltech, Brussels) taking in<br />
more than $2.6 billion in a series of three<br />
notes. Elan (Dublin) also raised $625 billion<br />
in a bond issue. These two massive chunks<br />
of debt financing distort the European<br />
fundraising picture, giving it an undue rosy<br />
glow. The $3.2 billion raised represents over<br />
three-quarters of the ‘Other’ categories<br />
Financing raised ($ billions)<br />
70<br />
60<br />
50<br />
40<br />
30<br />
20<br />
10<br />
0<br />
0.544<br />
3.883<br />
2.231<br />
4.018<br />
9.075<br />
8.933<br />
2003<br />
2.556<br />
3.335<br />
2.93<br />
5.318<br />
8.833<br />
10.933<br />
2004<br />
1.859<br />
4.838<br />
2.661<br />
5.398<br />
6.112<br />
17.268<br />
2005<br />
2.03<br />
5.578<br />
4.695<br />
5.682<br />
11.853<br />
19.796<br />
2006<br />
Year<br />
2.95<br />
4.377<br />
4.748<br />
6.809<br />
11.68<br />
22.365<br />
2007<br />
0.134<br />
1.867<br />
3.143<br />
5.177<br />
3.232<br />
20.023<br />
2008<br />
0.928<br />
6.041<br />
2.277<br />
5.198<br />
10.335<br />
36.923<br />
2009<br />
IPO<br />
Follow-on<br />
PIPES<br />
Venture capital<br />
Debt and other<br />
Partnerships<br />
Figure 1 Global biotech industry financing. Biotech funding was up 84% to $62 billion in 2009 from<br />
$33 billion in 2008. Partnership figures from Burrill & Co. are for deals involving a US company.<br />
BioCentury makes updates to its financing data on an ongoing basis. Sources: BCIQ: BioCentury Online<br />
Intelligence; Burrill & Co.<br />
nology platform—the program NKTR-118,<br />
which had completed phase 2 for opioidinduced<br />
constipation, and NKTR-119, an<br />
early-stage program intended to deliver<br />
products for pain without a constipation side<br />
effect. Nektar did receive an up-front payment<br />
of $125 million in the deal, but it’s the<br />
potential milestones that give the partnership<br />
its $1.5 billion high-end value.<br />
That was one of six deals in 2009 that had<br />
a potential payout of more than $1 billion,<br />
making the average potential of our top-ten<br />
group worth more than a billion dollars.<br />
But the average amount of funds received<br />
up front (including equity investments or<br />
money for milestones hit at the time of deal<br />
signing) was much lower, at about $109 million,<br />
meaning nearly 90% of the value in these<br />
deals remained unrealized at year’s end.<br />
When considering all partnerships<br />
between pharma and biotech (public and<br />
private), using data from Elsevier’s Strategic<br />
Table 2 Top ten follow-on offerings of 2009<br />
Company name<br />
Date completed<br />
Amount raised<br />
($ millions) Underwriters<br />
Qiagen 9/24 640.4 Deutsche Bank, Goldman Sachs, J.P. Morgan, Barclays Capital, Commerzbank, DZ Bank<br />
Vertex Pharmaceuticals 12/2 500.5 Goldman Sachs, Merrill Lynch, J.P. Morgan, Morgan Stanley<br />
Human Genome Sciences 12/2 476.8 Goldman Sachs, Citigroup, J.P. Morgan, Morgan Stanley, UBS<br />
Dendreon 12/10 426.9 J.P. Morgan, Deutsche Bank, Citigroup, Morgan Stanley, Lazard, Leerink<br />
Human Genome Sciences 7/28 373.8 Goldman Sachs, Citigroup<br />
Vertex Pharmaceuticals 2/18 320 Merrill Lynch, Cowen<br />
Cephalon 5/21 300 Deutsche Bank, J.P. Morgan, Barclays Capital Inc., Credit Suisse, Morgan Stanley<br />
Dendreon 5/13 229.9 Deutsche Bank<br />
Incyte 9/25 139.7 Goldman Sachs, Morgan Stanley, J.P. Morgan<br />
Seattle Genetics 8/11 135.9 J.P. Morgan, Goldman Sachs, Needham, Oppenheimer, RBC Capital Markets<br />
Data are matched to the definition of biotech in Box 1. Source: BCIQ: BioCentury Online Intelligence<br />
nature biotechnology volume 28 number 8 august 2010 795
feature<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
Table 3 Initial public offerings of 2009<br />
Amount raised<br />
Company name Location Date completed ($ millions) Underwriters<br />
CanBas Shizuoka, Japan 9/17 14.8 Mitsubishi UFJ Securities International plc, Mizuho, Ichiyoshi, JPMorgan,<br />
Mizuho Investors, Takagi<br />
China Nuokang<br />
Bio-Pharmaceutical<br />
Cumberland<br />
Pharmaceuticals<br />
D. Western Therapeutics<br />
Institute<br />
Transactions database, we found the average<br />
total amount paid up front in 2009 was about<br />
$58.9 million. That’s the highest average over<br />
the past 10 years (only 2006 came close, at<br />
$55.7 million), and a long way from the upfront<br />
money paid out in 2000, which was just<br />
$12.4 million. Still, it also drives home the<br />
reality that a deal with a potential value of<br />
$1 billion is just that: potential.<br />
2009 also provided an interesting wrinkle<br />
for equity investments around partnerships.<br />
Over the past 10 years, the average equity<br />
bought as part of a deal in each year was<br />
well below $10 million, with the exception<br />
of 2001, when it leaped to $32.3 million. Last<br />
year, it leaped again, to $20.6 million. In both<br />
2001 and 2009, the public markets had come<br />
down from peaks, and thus selling equity as<br />
part of partnering deals rose in favor.<br />
Beijing 12/9 40.7 Jefferies, Oppenheimer<br />
Nashville, TN, USA 8/10 85 UBS, Jefferies, Wells Fargo, Morgan Joseph and Co.<br />
Aichi, Japan 10/13 9.7 Nomura, Mitsubishi UFJ Securities International plc, Takagi, SBI Securities<br />
Co. Ltd., Tokai Tokyo, Mizuho<br />
D-Pharm Rahovot, Israel 8/17 7.3 Clal Finance, Rosario, Meitav<br />
Human Stem Cell Institute Moscow 12/10 4.8 CJSC Alor Invest<br />
Movetis N.V. Turnhout, Belgium 12/3 146 Credit Suisse, KBC, Piper Jaffray<br />
Omeros Corp. Seattle 10/7 68.2 Deutsche Bank, Wedbush, Canaccord, Needham, Chicago Investment<br />
Group, National Securities<br />
Talecris Biotherapeutics Research Triangle Park,<br />
NC, USA<br />
9/30 549.9 Morgan Stanley, Goldman Sachs, JPMorgan, Citigroup, Wells Fargo,<br />
Barclays Capital<br />
T-Ray Science Inc. Vancouver 12/9 1.4 Research Capital Corp.<br />
Source: BCIQ: BioCentury Online Intelligence<br />
Number of IPOs<br />
60<br />
50<br />
40<br />
30<br />
20<br />
10<br />
0<br />
10<br />
28<br />
2002<br />
14<br />
39<br />
2003<br />
53<br />
48<br />
2004<br />
45<br />
41<br />
2005<br />
Year<br />
49<br />
41<br />
2006<br />
Buyouts and climbing sales<br />
Mergers and acquisitions fell in 2009, both<br />
in total number and in the values assigned to<br />
the companies acquired (Table 7). Leading<br />
our list is Roche’s buyout of Genentech, but<br />
that deal was actually announced in 2008.<br />
Although it closed in the spring of last year,<br />
the acquisition is old news.<br />
But also high on the list is the purchase of<br />
Medarex by Bristol-Myers Squibb (BMS, New<br />
York), an acquisition that gained a validation<br />
of sorts in 2010. The purchase gave BMS<br />
access to Medarex’s antibody-drug conjugate<br />
technology and UltiMAb human antibody<br />
development system, but the main draw was<br />
ipilimumab. BMS was already partnered with<br />
Medarex on ipilimumab in phase 3 for metastatic<br />
melanoma, in phase 2 for lung cancer<br />
and in phase 3 for adjuvant melanoma and<br />
51<br />
58<br />
2007<br />
6<br />
22<br />
2008<br />
10<br />
92.8<br />
2009<br />
100<br />
90<br />
80<br />
70<br />
60<br />
50<br />
40<br />
30<br />
20<br />
10<br />
0<br />
Average amount raised ($ millions)<br />
Number of IPOs<br />
Average amt<br />
raised ($M)<br />
Figure 2 Global biotech initial public offerings. IPOs in 2009 seemingly made a recovery in amount<br />
raised, if not number of offerings.But the data is skewed by one large offering.<br />
hormone-refractory prostate cancer, so it had<br />
seen the product up close. Perhaps that’s the<br />
reason it offered a greater than 90% premium<br />
to the trading price of Medarex shares; the deal<br />
went through at $16 apiece, or $2.4 billion.<br />
Ipilimumab, a monoclonal antibody<br />
designed to block the inhibitory signal of<br />
cytotoxic T lymphocyte-associated antigen-4<br />
(CTLA-4), had failed in a phase 3 trial<br />
in 2007, and there was uncertainty around<br />
the new pivotal program for melanoma.<br />
But BMS announced in June 2010 at the<br />
American Society of Clinical Oncology’s<br />
annual meeting in Chicago that ipilimumab<br />
met the primary endpoint of survival in<br />
advanced melanoma in a phase 3 doubleblind<br />
randomized trial, and BMS said it<br />
expects to submit for regulatory approval<br />
of ipilimumab this year. Should the drug<br />
win approval, the $2.4 billion price tag for<br />
Medarex will seem a steal.<br />
Also of interest last year was Gilead’s (Foster<br />
City, CA, USA) buyout of CV Therapeutics,<br />
giving a company typically known for its<br />
HIV franchise a presence in the cardiovascular<br />
space. The move brought aboard<br />
Ranexa (ranolazine extended-release tablets),<br />
approved for chronic angina, and Lexiscan<br />
(regadenoson) injection for use as a pharmacologic<br />
stress agent in radionuclide myocardial<br />
perfusion imaging. Gilead remains a leader in<br />
HIV drugs—its highest-selling product was<br />
Truvada at about $2.5 billion last year, and<br />
90% of Gilead’s product sales came from its<br />
antiviral franchise—but through this acquisition<br />
it is seeking growth in other areas.<br />
Big sellers like Truvada are the beacons in the<br />
biotech fog, promising a move into the black<br />
after years spent dumping money into R&D and<br />
796 volume 28 number 8 august 2010 nature biotechnology
feature<br />
Table 4 Comparison of US and EU financing in 2009<br />
Amount<br />
raised in US<br />
($ millions)<br />
Number of<br />
US deals<br />
Amount<br />
raised in EU<br />
($ millions)<br />
Number of<br />
EU deals<br />
UCB and Elan<br />
($ millions)<br />
EU financing<br />
minus UCB<br />
($ millions)<br />
EU financing<br />
minus UCB<br />
(% of US +<br />
EU total)<br />
EU as a<br />
percentage<br />
of US + EU<br />
total<br />
EU as a<br />
percentage<br />
of EU total<br />
US as a<br />
percentage<br />
of US total<br />
Venture capital 3,939 197 1,114 87 – 1,114 22% 22% 18% 22%<br />
IPO 703 3 158 3 – 158 18% 18% 3% 4%<br />
Follow-on offering 5,166 48 785 7 – 785 13% 13% 12% 29%<br />
Other 7,756 236 4,253 108 3,200 1,053 12% 35% 67% 44%<br />
Total 17,564 484 6,310 205 3,200 3,110 15% 26% 100% 100%<br />
Source: BCIQ: BioCentury Online Intelligence<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
the clinic. Achieving that level of revenue usually<br />
follows this path: drug approval, then a marketing<br />
push and physician acceptance, followed<br />
by subsequent approvals in other indications<br />
to further increase sales. Most of the biologics<br />
in our list of the top ten drugs (Table 8) went<br />
that route. Enbrel (etanercept), from Amgen<br />
(Thousand Oaks, CA, USA), exemplifies this<br />
tactic. Originally approved in 1998 for rheumatoid<br />
arthritis, Amgen has received approvals in<br />
four other indications (ankylosing spondylitis,<br />
psoriasis, psoriatic arthritis and juvenile<br />
rheumatoid arthritis), and its worldwide revenue<br />
has jumped from $2.6 billion in 2005 to<br />
an estimated $6.4 billion in 2009, according to<br />
BioMedTracker. The drug, which inhibits the<br />
tumor necrosis factor (TNF) pathway, is the<br />
top-selling biologic in the world.<br />
In fact, three of the top five revenue-producing<br />
drugs target TNF: Remicade (infliximab,<br />
Johnson & Johnson, New Brunswick,<br />
NJ, USA) and Humira (adalimumab, Abbott,<br />
Abbott Park, IL, USA), are the other two,<br />
selling $5.9 billion and $5.5 billion worldwide,<br />
respectively. Those numbers, like the<br />
revenues for all the drugs in this table, are an<br />
improvement over the previous year.<br />
Given the lack of generic competition for<br />
biologics, it’s almost an anomaly when a<br />
drug does not increase sales year on year; it<br />
suggests something must have gone wrong.<br />
That’s been the case with Amgen’s Aranesp.<br />
Peaking at $4.1 billion in worldwide sales in<br />
2006, the drug has lost ground yearly since<br />
then, and in 2009 declined 15% to about<br />
$2.7 billion, falling off our list of the top ten<br />
biotech drugs. Amgen attributes the decline<br />
to the negative impact, mostly in supportive<br />
cancer care, of a “product label change”<br />
that came in August 2008. In fact, Aranesp<br />
serves as an example of the downside of<br />
product growth: the drug was being used offlabel<br />
in various indications until reports of<br />
adverse effects caused the US Food and Drug<br />
Administration (FDA) to tighten its label.<br />
The decline of Aranesp revenue meant<br />
Amgen reported lower overall revenues for<br />
2009, although the company’s adjusted net<br />
income for the year was more than $5 billion,<br />
compared with $4.9 billion in 2008, a<br />
3% increase.<br />
Affymetrix (Santa Clara, CA, USA) also<br />
saw its revenue decrease in 2009, though the<br />
reason has more to do with accounting: the<br />
figures had been buoyed in 2008 by a onetime<br />
intellectual property payment of $90<br />
million. So while a comparison year-by-year<br />
shows the company lost 20% of revenue in<br />
2009, in truth the business ground along<br />
smoothly. It had product revenue of $279.2<br />
million and service revenue of $39.6 million<br />
last year, both up from the previous year<br />
(2008 product revenue was $270.4 million<br />
and service revenue was $32.1 million.)<br />
Like Amgen and Affymetrix, other established<br />
firms fared well. Gilead experienced the<br />
largest increase in revenues, posting product<br />
sales that increased 27% over 2008 to nearly<br />
$6.5 billion, driven mostly by its HIV franchise<br />
of Truvada (emtricitabine and tenofovir disoproxil<br />
fumarate) and Atripla (efavirenz 600<br />
mg, emtricitabine 200 mg, tenofovir disoproxil<br />
fumarate 300 mg). Truvada sales increased<br />
18% to about $2.5 billion, and Atripla brought<br />
in $2.4 billion, up 51% over 2008.<br />
HGS also reported impressive revenues<br />
of $275.7 million for 2009, compared with<br />
Table 5 Top ten debt financings of 2009<br />
Company name Financing type Date completed<br />
revenues of only $48.4 million the previous<br />
year. The company logged its first product<br />
sales—$180.2 million for delivering to the<br />
US Strategic National Stockpile raxibacumab<br />
(human monoclonal antibody drug for treatment<br />
of inhalation anthrax) under a government<br />
contract. That helped HGS earn<br />
a net income of $5.7 million for the year,<br />
compared with a net loss of $268.9 million<br />
in 2008. The company also reported positive<br />
results for Benlysta (belimumab) phase<br />
3 trials announced in July and November<br />
2009. The good news drove up HGS’s stock<br />
price considerably, and as we noted earlier, it<br />
raised public funds twice during the year.<br />
End of the line<br />
Whereas 2008 saw 34 companies depart from<br />
the public biotech landscape—11 because of<br />
delisting or bankruptcy—those numbers<br />
increased in 2009. The total number of<br />
companies departing for any reason (buyout<br />
or merger included) climbed to 44, and<br />
the number removed owing to financial difficulty<br />
also went up, reaching 20. But a 9.5%<br />
drop in the number of companies is fewer<br />
casualties than was feared. Of those that teetered<br />
but survived, some were helped partially<br />
by the markets opening back up in the<br />
spring; by the ability to conduct debt deals,<br />
Amount raised<br />
($ millions)<br />
Amgen Sr notes (other) 1/14 2,000<br />
UCB Group Bond (other) 10/27 1,128<br />
UCB Group Bond (other) 12/3 751.9<br />
UCB Group Sr convert notes (other) 9/30 730.3<br />
Elan Sr notes (other) 9/29 625<br />
Cephalon Sr subord convert notes (other) 5/22 500<br />
Gilead Sciences Debt (other) 4/20 400<br />
Incyte Convert notes (other) 9/25 400<br />
Bio-Rad Laboratories Sr notes (other) 5/19 300<br />
PDL BioPharma Sr notes (other) 10/28 300<br />
Source: BCIQ: BioCentury Online Intelligence<br />
nature biotechnology volume 28 number 8 august 2010 797
feature<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
Table 6 Top ten research partnership and licensing deals of 2009<br />
Researcher<br />
Investor<br />
which returned to a more normal level after<br />
suffering through the battered credit markets<br />
in 2008; and by partners supplying up-front<br />
money and other funding.<br />
Also, considering that Genentech (and<br />
its $3.4 billion of net income in 2008) is no<br />
longer in our survey (now part of Roche), it<br />
seemed unlikely the sector would be able to<br />
repeat its performance from 2008, when it<br />
posted a net profit of $3.8 billion. But it did,<br />
drawing a collective net income in 2009 of<br />
$8 billion—with the heavy lifting, unsurprisingly,<br />
done by the large-cap firms (Fig. 3).<br />
Three main drivers contributed to the<br />
unexpected profitability in 2009. The first<br />
is an accounting change by the US Federal<br />
Accounting Standards Board, issued in late<br />
2007 but applicable in fiscal year 2009. Called<br />
SFAS 141R, the new guidance allows the costs<br />
associated with mergers and acquisitions to<br />
be expensed over time, rather than all at once<br />
Date<br />
announced<br />
Deal value<br />
($ millions) Details<br />
Nektar AstraZeneca 9/21 1,505 Worldwide rights to NKTR-118 for opioid-induced constipation and NKTR-119<br />
for pain<br />
Incyte Novartis 11/25 1,310 Ex-US rights to oral INCB18424, which is in phase 3 for myelofibrosis, and worldwide<br />
rights to preclinical cancer compound INCB28060<br />
Targacept AstraZeneca 12/3 1,240 Worldwide rights to develop and commercialize major depressive disorder compound<br />
TC-5214<br />
Exelixis Sanofi-aventis 5/29 >1,161 Exclusive, worldwide rights to XL147 and XL765, oral phosphoinositide 3-kinase<br />
inhibitors in phase 1b/2 and phase 2 to treat cancer<br />
ZymoGenetics Bristol-Myers Squibb 1/12 1,105 Codevelop and commercialize phase 1 HCV compound PEG-Interferon<br />
lambda (IL-29)<br />
Amylin Takeda 11/1 1,075 Codevelop and commercialize therapeutics for obesity and related indications<br />
Santaris Pharma Wyeth 1/12 847 Worldwide rights to ALD518 for all indications except cancer<br />
Algeta Bayer 9/3 800 Codevelop Alpharadin for bone metastases<br />
Medivation Astellas Pharma 10/27 765 Codevelop MDV3100 for the treatment of prostate cancer<br />
Cytokinetics Amgen 5/26 650 Exclusive world-wide (except Japan) license for cardiac contractility program<br />
Acorda Bayer 7/1 510 Exclusive collaboration and license agreement to develop Fampridine-SR for<br />
multiple sclerosis<br />
Data are matched to the definition of biotech in Box 1. Source: BCIQ: BioCentury Online Intelligence<br />
as part of the purchase price. It’s a small factor,<br />
and biotech-biotech mergers are less common<br />
and of lesser value than those between<br />
biotech and pharma, but still noteworthy.<br />
The second is that some companies simply<br />
had good years, and their revenue growth<br />
helped make up for the loss of Genentech.<br />
We’ve seen this with companies such as<br />
Gilead, which pushed its revenue up 31% and<br />
net income up 33% from 2008, and Biogen<br />
Idec (Weston, MA, USA), which posted a<br />
net income of $970 million, up 24% over the<br />
previous year.<br />
But the major reason for the collective<br />
profit is the same one that kept the number<br />
of bankruptcies lower than feared: a cutback<br />
on expenses. When the money isn’t<br />
there, spending has to decrease, and biotech<br />
tightened its belt in 2009. Companies spent<br />
less in two notable ways. First, they carried<br />
smaller payrolls than previously. In 2008, the<br />
companies surveyed had an average of 489<br />
employees per company. In 2009, although<br />
our pool of biotech firms surveyed grew to<br />
461 and with it the total number of employees<br />
increased, the average number of employees<br />
per company actually dropped to 442.<br />
Second, the biotech sector collectively<br />
reduced its R&D spending. In 2008, even as<br />
it faced financial turmoil, biotech increased<br />
its spending on R&D, as it had for years, from<br />
$22.8 billion in 2007 to $25.5 billion. This<br />
pattern came to a halt last year, when the<br />
sector’s overall R&D spending fell to $22.3<br />
billion, with the greatest decrease seen in the<br />
microcaps, which went from $5.4 billion in<br />
2008 to $4.0 billion (a fall of nearly 30%) in<br />
2009. (Large caps reduced their R&D spending<br />
by just under 10%.) This considerable<br />
drop helped keep biotech profitable, but it is<br />
likely that it penalized the sector’s ability to<br />
carry out innovative science.<br />
Table 7 Top ten announced mergers and acquisitions of 2009<br />
Target<br />
Acquirer<br />
Month<br />
completed<br />
Deal value<br />
($ millions)<br />
Genentech Roche March 46,800<br />
Medarex Bristol-Myers Squibb September 2,400<br />
CV Therapeutics Gilead Sciences April 1,400<br />
ESBATech Alcon September 589<br />
BiPar Sciences Sanofi-aventis April 500<br />
Noven Hisamitsu Pharmaceuticals August 428<br />
ViroChem Vertex March 413<br />
Peplin Leo Pharma November 288<br />
Dow Pharmaceutical Sciences Valeant Pharmaceuticals January 285<br />
Arana Therapeutics Cephalon August 276<br />
Data are matched to the definition of biotech in Box 1. Source: BCIQ: BioCentury Online Intelligence<br />
The horizon<br />
Compared with other business sectors, biotech<br />
will continue to face the challenges of<br />
long timelines for product development.<br />
The heavy costs of R&D have shaped this<br />
industry since its inception, and that’s not<br />
about to change. But precisely because biotech<br />
remains centered on the provision of<br />
medical products, it has had the advantage of<br />
being considered ‘recession proof ’—people<br />
need drugs no matter how the economy is<br />
performing. The bottom lines of biotech’s<br />
big producers—Amgen, Gilead, Biogen—in<br />
2008 and 2009 reflect this.<br />
Yet the sector’s ability to fund itself ebbs<br />
and flows with the global economy, and this<br />
798 volume 28 number 8 august 2010 nature biotechnology
feature<br />
Table 8 Top ten biologic drugs in terms of sales in 2009<br />
Name Lead company Approved indication(s)<br />
Enbrel Amgen Rheumatoid arthritis (RA), ankylosing spondylitis, psoriasis, psoriatic arthritis (PA), juvenile<br />
rheumatoid arthritis<br />
2009 revenue<br />
($ million)<br />
~6,400<br />
Remicade Johnson & Johnson Psoriasis, ulcerative colitis (UC), ankylosing spondylitis, Crohn’s disease, PA, RA 5,892<br />
Avastin Roche Colorectal cancer, breast cancer, brain cancer, renal cell cancer, non–small cell lung cancer 5,747<br />
Rituxan Biogen IDEC Non-Hodgkin’s lymphoma, RA, chronic lymphocytic leukemia 5,617<br />
Humira Abbott Laboratories RA, ankylosing spondylitis, juvenile rheumatoid arthritis, Crohn’s disease, PA, psoriasis 5,488<br />
Herceptin Roche Breast cancer 4,833<br />
Lantus Sanofi-aventis Diabetes mellitus type II, diabetes mellitus type I 4,295<br />
Gleevec Novartis Chronic myelogenous leukemia, hypereosinophilic syndrome, dermatofibrosarcoma protuberans,<br />
3,944<br />
myeloproliferative disorders, gastrointestinal stromal tumor, acute lymphocytic leukemia,<br />
myelodysplastic syndrome, mastocytosis<br />
Neulasta Amgen Neutropenia, leucopenia 3,355<br />
Prevnar Pfizer Prevention of otitis media, Streptococcus pneumoniae pneumonia ~3,100<br />
Source: BioMedTracker<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
is especially true for the smaller-cap firms.<br />
These companies require investors, they<br />
require the support of the public markets,<br />
and they require lending, and when the<br />
world’s money locks up the way it did over<br />
2008 and the beginning of 2009, they suffer.<br />
At times like these, some will break, and R&D<br />
expertise and know-how will be dispersed—<br />
or worse, will be gone for good.<br />
But what biotech showed us in 2008 and<br />
2009 is its ability to hibernate until money<br />
flows again. The industry has long had to<br />
make do with less—a valuable trait when the<br />
tap runs dry. It forces the sector’s executives to<br />
look constantly for new ways to trim expenses<br />
and to partner. This can be seen through collaborations<br />
by Symphony Capital (New York),<br />
which invests in clinical programs rather than<br />
a company itself, or the low-infrastructure<br />
model espoused by groups such as Talaris<br />
Advisors (Hopkinton, MA, USA), or the use of<br />
contract research organizations to outsource<br />
portions of drug development.<br />
The economic upswing seen in the second<br />
half of 2009 has continued. Overall funding<br />
in the first six months of 2010 is on pace<br />
to easily surpass 2009 for both private and<br />
public biotechs. The FDA approved 16 biologics<br />
last year, an increase over both 2008<br />
(11 biologic approvals) and 2007 (9 biologic<br />
approvals). The Nasdaq biotech index has<br />
held ground for the first six months of 2010.<br />
J. Craig Venter and colleagues caught the<br />
world’s attention by creating a bacterium with<br />
an artificial genome. Biotech made its way<br />
to the Supreme Court, winning a decision<br />
favorable to Monsanto (St. Louis, MO, USA)<br />
and others developing genetically modified<br />
seeds. And so far this year, there have been<br />
approvals of Amgen’s Prolia (denosumab) for<br />
post-menopausal osteoporosis and Provenge<br />
(sipuleucel-T) for prostate cancer, both of<br />
which are expected to be huge sellers.<br />
Biotech, with its small firms and entrepreneurial<br />
spirit, has long thought of itself as the<br />
underdog, made up of fast, nimble companies<br />
built to innovate, overachieve, withstand hardship<br />
and adapt. This attitude has always been<br />
part of the industry’s culture, and these days<br />
it’s also a carefully cultivated personality used<br />
to distance biotech from the more troubled<br />
a<br />
b<br />
Number of companies Amount ($ billions)<br />
60<br />
50<br />
40<br />
30<br />
20<br />
10<br />
0<br />
–10<br />
350<br />
300<br />
250<br />
200<br />
150<br />
100<br />
50<br />
0<br />
58.7<br />
21.0<br />
Revenue<br />
97,207<br />
13<br />
Large cap<br />
10.3<br />
7.1 4.8 4.3 3.7 4.0<br />
63,876<br />
32<br />
Mid-cap<br />
R and D<br />
82<br />
Small cap<br />
pharmaceutical industry. In short, it has often<br />
seemed like biotech was built to deal with<br />
adversity. After surviving the past two years,<br />
it now knows it can.<br />
ACKNOWLEDGMENTS<br />
The authors would like to acknowledge the insight of G.<br />
Giovannetti and G. Jaggi in crafting this article.<br />
Note: Supplementary information is available on the<br />
<strong>Nature</strong> Biotechnology website.<br />
12.9<br />
1.6 –2.4 –4.0<br />
Net profit/loss<br />
334<br />
24,394 22,954<br />
Micro cap<br />
100,000<br />
80,000<br />
60,000<br />
40,000<br />
20,000<br />
Number of employees<br />
Micro cap<br />
Small cap<br />
Mid-cap<br />
Large cap<br />
Number of companies<br />
Number of employees<br />
Figure 3 Public biotech company revenue, R&D spending, profits and number of employees by<br />
market cap. Large cap, ≥$5 billion; mid-cap, $1 billion to
patents<br />
Bilski v. Kappos: the US Supreme Court broadens<br />
patent subject-matter eligibility<br />
William J Simmons<br />
The court narrowly ruled that business methods may be patent eligible, while striking down the primacy of its main test.<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
With over 60 biopharmaceutical products<br />
applied for or expected to be filed at the<br />
US Food and Drug Administration this year,<br />
joining over 335 currently approved biopharmaceuticals,<br />
determining what can or cannot<br />
be patented is a threshold question protecting<br />
inventions in biotech and pharmaceutical<br />
industry 1 . Up until 2008, the answer to<br />
this important question was relatively clear.<br />
However, a 2008 decision that set a new single<br />
standard for patent eligibility made addressing<br />
this inquiry fundamentally uncertain.<br />
In a landmark decision issued 28 June,<br />
the US Supreme Court issued its holding<br />
regarding patent-eligible subject matter in<br />
Bilski v. Kappos. The court unanimously<br />
agreed that Bilski’s claims recited no more<br />
than “abstract ideas” and were therefore<br />
not patentable under US law. Importantly, a<br />
majority of the court held that the language<br />
of the relevant law (35 USC §§100–101)<br />
broadly encompassed vast forms of subject<br />
matter as patent eligible. The court unanimously<br />
struck down the ‘machine-or-transformation’<br />
test 1 , a test implemented by the US<br />
Court of Appeals for the Federal Circuit in<br />
2008 that was criticized as “unnecessary,” as<br />
the sole test for determining whether a process<br />
is directed to patentable subject matter<br />
and held that the machine-or-transformation<br />
test is one test among many that can be used<br />
to determine patent eligibility. Justice Kennedy<br />
delivered the court’s opinion, with Justices<br />
Roberts, Thomas and Alito joining in full and<br />
Justice Scalia joining in part. Justice Stevens<br />
filed a concurring opinion in which Justices<br />
Ginsburg, Breyer and Sotomayor joined.<br />
Justice Breyer filed a concurring opinion in<br />
which Justice Scalia joined in part.<br />
William J. Simmons is at Sughrue Mion, PLLC,<br />
Washington, DC, USA.<br />
e-mail: wsimmons@sughrue.com<br />
The facts of Bilski v. Kappos did not involve<br />
biotech or pharmaceutical subject matter but<br />
rather a process for hedging risk in commodity<br />
markets (that is, an invention regarding<br />
instructing buyers and sellers of commodities<br />
in the energy market to protect against the risk<br />
of price fluctuations) 2 . For example, the application<br />
recited a series of steps instructing how to<br />
hedge risk, and in another instance the application<br />
of risk hedging was described in the form<br />
of a mathematical formula. The US Patent and<br />
Trademark Office (USPTO) denied Bilski a<br />
patent because, according to the USPTO, the<br />
patent application was directed to business<br />
methods that were patent-ineligible subject<br />
matter. The USPTO reasoned that the invention<br />
was too abstract, that it merely manipulated an<br />
idea and that it failed to practically apply concepts<br />
enough to render them patentable. The<br />
administrative appeal board affirmed, concluding<br />
that the application involved only mental<br />
steps and did not result in the transformation<br />
of physical matter.<br />
On appeal, the Federal Circuit, sitting<br />
en banc, did not rely on any of the several tests<br />
used by prior courts, including the Supreme<br />
Court, but instead created and applied a new<br />
legal standard for patentability: processes are<br />
patentable only if they are tied to a particular<br />
machine or apparatus, or transform a particular<br />
article into a different state or thing—<br />
namely, the machine-or-transformation test 3 .<br />
The Federal Circuit reasoned that because<br />
Bilski’s claims did not satisfy the new governing<br />
test, which the court made clear should be<br />
grossly applied to all areas of technology, the<br />
USPTO’s decision was correct and Bilski was<br />
not entitled to a patent.<br />
Judge Rader, now the Chief Judge of the<br />
Federal Circuit, in dissent, indicated that the<br />
language of 35 USC §101 “contains no hint<br />
of an exclusion for certain types of methods”<br />
and stated that “ironically the Patent Act itself<br />
specifically defines ‘process’ without any of<br />
these judicial innovations.” Rader argued that<br />
the only limits on eligibility are inventions<br />
that embrace natural laws, natural phenomena<br />
and abstract ideas. He wrote, “this court today<br />
invents several circuitous and unnecessary<br />
tests.” Even so, Rader suggested that the hedging<br />
claim on appeal was abstract, and he stated,<br />
“Bilski’s method for hedging risk in commodities<br />
trading is either a vague economic concept<br />
or obvious on its face.” Rader pointed out that<br />
US patent law was designed to encourage ingenuity<br />
and that the law is focused not on particular<br />
subject categories but on the patentability of<br />
the specific claimed invention. He maintained<br />
that the law distinguishes eligibility from conditions<br />
of patentability and generously provides<br />
for patent eligibility. His dissent was clear: the<br />
court should not create any categorical exclusion.<br />
Rader also pointed out that in Diehr 4 , the<br />
Supreme Court indicated that only natural laws,<br />
natural phenomena and abstract ideas are patent<br />
ineligible. He clarified, however, that if an<br />
abstract idea is applied to a practical use, it may<br />
be patent eligible. Notably, Rader commented<br />
that the earlier Supreme Court opinion of<br />
three dissenting justices in Lab. Corp. 5 misapprehended<br />
the distinction between a natural<br />
phenomenon and a patentable process, and in<br />
so doing, this opinion did not ask the fundamental<br />
question of whether the subject matter<br />
at issue is deserving of patent protection. Rader<br />
was clear that courts should not avoid this fundamental<br />
inquiry nor categorically preclude any<br />
form of invention.<br />
In response to the Federal Circuit’s decision,<br />
Bilski petitioned for and obtained Supreme<br />
Court review. The Bilski decision garnered the<br />
attention of many, prompting an unprecedented<br />
number of submissions of unsolicited briefs<br />
expressing the views of nonparties. Among the<br />
66 briefs, 13 were submitted by or on behalf of<br />
life science organizations, including biotech and<br />
nature biotechnology volume 28 number 8 august 2010 801
patents<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
5 5<br />
pharmaceutical interests (Fig. 1 and Table 1).<br />
Interestingly, there are differing opinions on the<br />
desired outcome of the case within the industry,<br />
including support for affirmance of the decision.<br />
However, among the 13 briefs submitted,<br />
only one brief appeared to support the machineor-transformation<br />
test (with the caveat that the<br />
test be applied correctly; Figs. 1 and 2, and<br />
Table 1).<br />
During the oral arguments heard at the<br />
Supreme Court in November 2009, several<br />
justices expressed their concerns that in the<br />
absence of unambiguous limitations regarding<br />
patent eligibility, the public could be harmed<br />
by the grant of patents to inventions directed<br />
to unworthy subject matter or commercially<br />
useful subject matter that might stifle business<br />
or innovation if granted a monopoly. The chief<br />
justice and several other justices appeared dissatisfied<br />
with the Federal Circuit’s machine-ortransformation<br />
test as the sole test for patent<br />
eligibility but seemed to be concerned to avoid<br />
expanding the scope of patent-eligible subject<br />
matter beyond that limited by the court’s precedent<br />
5 . In defense of its decision, the USPTO<br />
argued that the Bilski process did not comply<br />
with the machine-or-transformation test, that<br />
the claimed process was a method of conducting<br />
business that was per se unpatentable and<br />
that the claimed process was no more than<br />
an abstract idea and therefore unworthy of a<br />
patent. The USPTO was clear about the devastating<br />
effects of banning entire categories of<br />
inventions from patenting and further asserted,<br />
“to say that business methods are categorically<br />
ineligible for patent protection would eliminate<br />
new machines, including programmed computers,<br />
that are useful because of their contributions<br />
to the operation of business.”<br />
3<br />
Bilski<br />
Affirmance<br />
Neither party<br />
Figure 1 Number of amicus briefs from biotech and pharma sector vis-à-vis Bilski v. Kappos. Chart<br />
compares numbers of briefs arguing for a decision in favor of Bilski, for affirmance of the court’s<br />
decision against Bilski or for neither party.<br />
The Supreme Court’s decision was supported<br />
by all justices but the Court divided 5–4 in holding<br />
that under some undefined circumstances,<br />
at least some business methods may be patented.<br />
The Court did not clarify under which<br />
circumstance one could distinguish a patenteligible<br />
business method from an unpatentable<br />
“abstract idea,” leaving this issue for the Federal<br />
Circuit to decide.<br />
In reaching its decision, the court looked to<br />
the language of the law that describes four categories<br />
of patentable subject matter: processes,<br />
manufactures, machines and compositions<br />
of matter. A problem, however, arises in that<br />
the law sets forth a circular definition of ‘process’,<br />
making it difficult, at times, to determine<br />
whether a process meets the requirements of the<br />
statute. According to the court, the machine-ortransformation<br />
test, when applied as the sole<br />
test of determining a statutory process, violates<br />
proper statutory interpretation because “[t]he<br />
term ‘process’ means process, art or method,<br />
machine, manufacture, composition of matter<br />
or material” and the ordinary definition of process<br />
does not require that it be tied to a machine<br />
or transform an article.” 5 Joined by three<br />
other justices, Justice Kennedy explained that<br />
“[s]ection 101 is a dynamic provision designed<br />
to encompass new and unforeseen inventions”<br />
and that as new technologies evolve, the statute<br />
allows for the development and application of<br />
additional tests to assist in determining which<br />
processes are patent eligible.<br />
Regarding the contention that business methods<br />
are per se unpatentable, the court rejected<br />
this argument. However, the court reasoned<br />
that, in view of specific on-point legislation—<br />
namely, 35 USC §273—which creates a defense<br />
to alleged infringement of a business method<br />
claim, the legislature intended that claims<br />
directed to business methods can be patentable<br />
subject matter. Justice Kennedy reiterated that<br />
abstract ideas (which he did not define) are not<br />
patentable and that the court’s decisions regarding<br />
the unpatentability of abstract ideas were<br />
useful in determining which business methods<br />
may be protected under the patent law. The<br />
court held that Bilski’s claims were unpatentable<br />
because they were directed to “abstract ideas.”<br />
According to the court, Bilski sought a patent<br />
on “the use of the abstract idea of hedging risk<br />
in the energy market,” which was too abstract<br />
to be patent eligible. Even though the court<br />
rejected application of an exclusive machineor-transformation<br />
test, the court was careful to<br />
point out that inventions should be considered<br />
as a whole, not analyzed by dissecting the claims<br />
into old and new elements. Although it rejected<br />
the machine-or-transformation sole standard,<br />
the court provided the Federal Circuit with<br />
great flexibility in developing and applying<br />
“other limiting criteria” useful for determining<br />
patent eligibility. This guidance by the court is<br />
important and when properly implemented<br />
will fundamentally impact method patenting<br />
in every act.<br />
The court’s reasoning was grounded on precedent,<br />
such as that articulated in Benson 7 , Flook 8<br />
and Diehr 5 . The court held that the claims at<br />
issue were unpatentable because allowing Bilski<br />
“to patent risk hedging would pre-empt use of<br />
this approach in all fields, and would effectively<br />
grant a monopoly over an abstract idea.” The<br />
court did not formulate a new test but instead<br />
held “precedents establish that the machine-ortransformation<br />
test is a useful and important<br />
clue, an investigative tool” and nothing more. It<br />
is therefore clear that the machine-or-transformation<br />
test is a nonexclusive option for lower<br />
courts, in addition to the tests set forth in the<br />
court’s earlier decisions.<br />
The guidance set forth in Benson, Flook and<br />
Diehr should therefore be carefully considered<br />
and revisited. Briefly, in Benson, the patent<br />
sought related to an algorithm that converts<br />
numbers from binary-coded decimal form<br />
into pure binary form, which arguably could<br />
be applied to specific computer applications.<br />
The Supreme Court held that the recited<br />
algorithms were not patentable because they<br />
were drawn to abstract ideas, were not tied<br />
to a particular machine or apparatus and did<br />
not change articles or materials to a “different<br />
state or thing.” The court found it important to<br />
determine whether, assuming the algorithm to<br />
be patentable, patenting of the invention would<br />
pre-empt use of the mathematical formula. In<br />
Flook, the Supreme Court held that a method<br />
for updating alarm limits in catalytic conversion<br />
processes, which recited a mathematical<br />
802 volume 28 number 8 august 2010 nature biotechnology
patents<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
Table 1 Amicus summary (selected) in Bilski v. Kappos<br />
Amicus Industry or group represented Summary<br />
Novartis Corp. 15<br />
Caris Diagnostics, Inc. 16<br />
algorithm for computing an updated alarm<br />
limit from measured present values of variables,<br />
was not patent eligible. The court held that the<br />
identification of a limited category of useful<br />
post-solution applications of a formula does<br />
not make an otherwise unpatentable formula<br />
patentable because a process itself, not merely<br />
the mathematical algorithm, must be new and<br />
useful in order to meet the requirements for<br />
patentability. In Diehr, the court addressed a<br />
process for molding uncured synthetic rubber<br />
into a cured product. The claims were directed<br />
to a method that constantly measured the actual<br />
temperature inside a mold. The court held that<br />
the process constituted patentable subject matter<br />
under 35 USC §101 because the transformation<br />
and reduction of an article “to a different<br />
state or thing” is one clue to the patentability of<br />
a process claim that does not include a specific<br />
machine. In this instance, the court determined<br />
that the invention manifested the transformation<br />
of an article, uncured synthetic rubber, into<br />
a different state or thing. Although the invention<br />
used a well-known mathematical equation,<br />
the court remarked that the applicants did not<br />
seek to pre-empt the use of the equation.<br />
Health care solutions;<br />
pharmaceutical<br />
Personalized medicine; tailoring<br />
therapeutics for individual patients<br />
using biomarkers<br />
Machine-or-transformation test unduly narrows the scope of diagnostic process<br />
claims. If upheld, the court should clarify that the test is not the dispositive standard.<br />
Machine-or-transformation test is not the exclusive test for patent eligibility of<br />
processes. Many diagnostic tests do not involve a machine or transformation.<br />
Georgia Biomedical Partnership, Inc. 17 Life sciences Machine-or-transformation test is too rigid. Precedent is flexible and permissive.<br />
University of South Florida 18 University; research facility Only presents arguments for the first question presented. Machine-ortransformation<br />
test excludes from patent eligibility certain processes that<br />
Congress intended to be patent eligible.<br />
Ananda Chakrabarty 19 University medical research Machine-or-transformation test finds no support in the statute and is bad policy.<br />
Prometheus Laboratories 20<br />
Manufacturer of pharmaceutical,<br />
medical treatment and diagnostic<br />
processes<br />
Court’s interpretation of section 101 may have significant ramifications beyond<br />
business methods and may adversely affect the field of medical diagnostic and<br />
treatment processes.<br />
Monogram Biosciences, Inc. et al. 21<br />
Emerging field of personalized<br />
medicine, using molecular diagnostic<br />
tests to correlate genetic and<br />
molecular biomarkers with clinically<br />
useful disease characteristics<br />
Federal Circuit erred in holding that a process must be tied to a particular<br />
machine or transformation. This should not be the sole test. Nonphysical<br />
processes should not be excluded.<br />
Medtronic, Inc. 22 R&D of medical technology Machine-or-transformation test would adversely affect medical technology innovation.<br />
Such a test would render significant medical advances patent ineligible.<br />
Pharmaceutical Research and<br />
Manufacturers of America 23<br />
Biotechnology Industry Organization<br />
et al. 11<br />
Knowledge Ecology International 24<br />
Pharmaceutical and biotechnology<br />
industry<br />
Biotech and medical technology<br />
industries<br />
Advocate of new incentive and<br />
financing models for biomedical<br />
information<br />
Court should not adopt a new test for the boundaries of section 101. Medical<br />
processes have long been protected.<br />
Bilski test is not appropriate for determining patent eligibility of biotechnology<br />
and medical technology under section 101.<br />
It is not necessary to fashion an overly broad definition of patentable subject<br />
matter merely to save medical innovations from an imagined and speculative<br />
danger.<br />
Adamas Pharmaceuticals et al. 25 Biomarkers and pharmaceuticals Problematic business method patents should be eliminated. Machine-ortransformation<br />
test violates NAFTA and the 1994 TRIPS Agreement. This test<br />
directly over-rules Congress’s choice (35 USC section 287(c)) to maintain broad<br />
subject-matter coverage for health care–related technology.<br />
American Medical Association et al. 26<br />
Medical profession; physicians and<br />
geneticists<br />
Bilski’s claims are not directed to technology. Machine-or-transformation test<br />
must remain secondary and cannot supplant this court’s requirement that<br />
claims address a technology or the court’s pre-emption standard. Machine-ortransformation<br />
test must be allowed to vary with each particular case.<br />
Impact on life science technologies<br />
In Bilski, four Supreme Court justices unequivocally<br />
indicated that nascent technologies, such as<br />
biotech and pharmaceutical processes, are patent<br />
eligible. This plurality expressed appreciation for<br />
technological progress and acknowledged that<br />
“unforeseen innovations such as computer programs”<br />
are patent eligible 9 . Justice Kennedy reasoned<br />
that the machine-or-transformation test<br />
may be an appropriate test for evaluating the patent<br />
eligibility of processes of the Industrial Age<br />
but should not be the sole test for newer types<br />
of inventions, such as medical diagnostic techniques.<br />
Interestingly, Justice Kennedy was careful<br />
to point out that he was “not commenting<br />
on the patentability of any particular invention,<br />
let alone holding that any of the above-mentioned<br />
technologies from the Information Age<br />
should or should not receive patent protection.”<br />
Regarding limiting interference with the development<br />
of nascent technologies, such as biotechnology<br />
and biopharmaceuticals, the court<br />
indicated that some types of inventions “raise<br />
special problems in terms of vagueness and suspect<br />
validity” and could “put a chill on creative<br />
endeavor and dynamic change.”<br />
In dramatic contrast, however, Justice<br />
Stevens’ concurrence (joined by Justices<br />
Breyer, Ginsburg and Sotomayor), in a separate<br />
47-page opinion, “strongly disagree[d]<br />
with the court’s disposition of this case.”<br />
Justice Stevens expressed great concern that<br />
the court “never provides a satisfying account<br />
of what constitutes an unpatentable abstract<br />
idea” and indicated that business method<br />
patents are per se unpatentable even though<br />
Bilski’s claims and application materials presented<br />
concrete parameters that may have<br />
amounted to more than an abstract idea or<br />
generalized concept. Justice Stevens cited<br />
English and early American patent jurisprudence<br />
and legislation as supportive of the<br />
opinion, concluding that the scope of patenteligible<br />
subject matter is “broad” but not limitless<br />
because, according to history, neither the<br />
patent statute nor patent law was intended to<br />
include business methods. Interestingly, biotech<br />
or pharmaceutical processes were not<br />
differentiated from business methods in the<br />
opinion, and it remains unclear to what extent<br />
such inventions could be distinguished, sufficient<br />
to survive Stevens’ per se ban.<br />
nature biotechnology volume 28 number 8 august 2010 803
patents<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
The dissenting justices agreed with the majority<br />
that the machine-or-transformation test was<br />
not the exclusive test for method claim patentability,<br />
but they went further, indicating that<br />
business methods are categorically excluded<br />
from patentable subject matter. Justice Stevens<br />
indicated that the court should have held “that<br />
Petitioners’ claim is not a ‘process’ within the<br />
meaning of Section 101 because methods of<br />
doing business are not, in themselves, covered<br />
by the statute.” Regarding the majority opinion’s<br />
holding that the patentability of business<br />
methods was clear from a reading of the satute,<br />
Stevens asserted that Congress did not explicitly<br />
state that it was amending and expanding<br />
the patent statute to include business methods;<br />
thus, he wrote, it was improper for the court to<br />
make such a presumption. Justice Stevens did<br />
not indicate how business method patents are<br />
categorically distinct from other forms of patent<br />
protection (for example, life science processes<br />
or therapeutic processes) but rather expressed<br />
“serious doubts” about whether business<br />
method patents are needed to encourage business<br />
innovation. It is unclear to what extent a<br />
safe harbor defense to those alleged of infringement<br />
of a business method claim applies to biotech<br />
or pharmaceutical businesses. The dissent<br />
therefore encompasses life science methods and<br />
Stevens’ logic applies equally well to biotech and<br />
pharmaceutical method patents vis-à-vis therapeutic<br />
innovations, making it critical for the<br />
industry to consider how each of their process<br />
inventions encourage medical innovations.<br />
Justice Breyer filed a separate concurring<br />
opinion, joined by Justice Scalia, indicating<br />
that agreement was reached by all of the<br />
12<br />
1<br />
Against MoT test<br />
Support MoT test<br />
Figure 2 Amicus briefs from biotech and pharma sector supporting or against Bilski machine-ortransformation<br />
test. MoT, machine-or-transformation.<br />
justices on at least four points: (i) the statute<br />
is broad but has some narrow limits; (ii) the<br />
machine-or-transformation is a useful test;<br />
(iii) the machine-or-transformation test is not<br />
to be misunderstood as the governing test; and<br />
(iv) by no means is everything that produces a<br />
“useful, concrete and tangible result” a patenteligible<br />
process.<br />
Regarding the breadth of patent-eligible subject<br />
matter, Justice Breyer considered the issue<br />
at oral argument wherein he indicated, “…every<br />
successful businessman typically has something.<br />
His firm wouldn’t be successful if he didn’t have<br />
anything others didn’t have…—and it’s new, too,<br />
and it’s useful, made him a fortune—anything<br />
that helps any businessman succeed is patentable<br />
because we reduce it to a number of steps,<br />
explain it in general terms, file our application,<br />
granted…” to which the attorney answered yes,<br />
what was described by Justice Bryer is potentially<br />
patentable. The Justice was also concerned that<br />
by simply assigning a set of instructions to a<br />
computer, and including the computer in the<br />
patent, an otherwise unpatentable process<br />
would be rendered patentable, asking, “how<br />
you are going to later, down the road, deal with<br />
the situation of all you do is get somebody who<br />
knows computers, and you turn every business<br />
patent into a setting of switches on the machine<br />
because there are no businesses that don’t use<br />
those machines.” This concern was directly<br />
addressed by Judge Rader, in the Bilski dissent<br />
at the Federal Circuit, wherein he focused the<br />
court not on patent ineligibility but rather on<br />
the fundamental inquiry of determining if an<br />
invention was worthy of patent protection (e.g.,<br />
if an invention is novel and not obvious).<br />
Important pending life sciences cases<br />
Following the Federal Circuit’s decision in<br />
Bilski, several cases were decided based solely<br />
on the machine or transformation test. Parties<br />
whose patent claims were held to be invalid<br />
under Bilski will take advantage of the change<br />
in law and seek reversal of these decisions.<br />
One such case is Association for Molecular<br />
Pathology v. USPTO (hereinafter, AMP),<br />
wherein the patent claims at issue are related to<br />
isolated DNA containing all or portions of the<br />
BRCA1 and BRCA2 gene sequence and methods<br />
for comparing or analyzing BRCA1 and<br />
BRCA2 gene sequences to identify the presence<br />
of mutations correlating with a predisposition<br />
to breast or ovarian cancer 10 . In a decision that<br />
radically changed the law, the court held that<br />
the step of isolating or purifying DNA does not<br />
sufficiently change the genetic sequence found<br />
in nature to make a claim to the gene per se<br />
patent eligible and that comparisons of DNA<br />
sequences are abstract mental processes, and<br />
thus not patent eligible. The court discussed<br />
abstract ideas, referring to the Federal Circuit’s<br />
opinion in Bilski, and applied the machine-ortransformation<br />
test to invalidate the process<br />
claims. In deciding AMP, the court discussed<br />
and distinguished another critical case,<br />
Prometheus Laboratories v. Mayo 11 .<br />
Prometheus Laboratories owns patents<br />
covering a method to optimize dosage of two<br />
drugs useful for autoimmune diseases, which<br />
involves administering a drug at certain dosage,<br />
detecting the concentration of certain metabolites<br />
and then comparing the value to a preset<br />
threshold value and subsequently increasing<br />
or decreasing the drug dosage accordingly.<br />
The Federal Circuit considered that this diagnostic<br />
process based on a correlation between<br />
drug metabolites level and drug efficacy and<br />
toxicity was patent eligible because, consistent<br />
with In re Bilski, a claimed process is patenteligible<br />
if the claimed process is transformative<br />
(e.g., citing the administering step and various<br />
chemical and physical changes of the drug’s<br />
metabolites that enable their concentrations<br />
to be determined). The court reasoned that<br />
determining the levels of drug metabolites was<br />
per se transformative because drug metabolite<br />
levels cannot be determined by mere inspection.<br />
And because these transformations were<br />
central to the invention, according to the court,<br />
the process was found to be patent eligible and<br />
the patent was held valid. The court provided<br />
no guidance as to when the interaction of a<br />
drug metabolite with the human body is a<br />
natural phenomenon.<br />
In Bilski, Justice Kennedy discusses the technological<br />
aspects of the Industrial Age and the<br />
Information Age, suggesting that the differences<br />
between the two periods provides insight<br />
804 volume 28 number 8 august 2010 nature biotechnology
patents<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
into how inventions are reduced to “physical<br />
or tangible form 12 .” Justice Kennedy seemed<br />
to be concerned that adoption of a single test<br />
—the machine-or-transformation test—could<br />
retard innovation by “creating uncertainty as<br />
to the patentability of … advanced diagnostic<br />
medicine techniques…”. This issue is addressed<br />
again later, where Justice Kennedy refers to “the<br />
tension, ever present in patent law, between<br />
stimulating innovation by protecting investors<br />
and impeding progress by granting patents<br />
when not justified by the statutory design 13 .”<br />
This tension is most evident in the field of biotechnology<br />
and biopharmaceuticals.<br />
In deciding AMP, the district court looked to<br />
Prometheus Labs 10 in determining what constitutes<br />
a ‘transformation’ in the biotechnological<br />
arts; for example, if an alleged transformation<br />
is mere preparatory “data gathering,” it falls outside<br />
the “central” focus of the recited method.<br />
Myriad’s patents were directed to methods of<br />
“analyzing” or “comparing” isolated or purified<br />
DNA, not host DNA. Although the district<br />
court recognized the great difficulty in isolating<br />
the subject DNAs, the court characterized<br />
this technical accomplishment as a mere “datagathering<br />
step,” thus invalidating the claimed<br />
methods as being directed to patent-ineligible<br />
subject matter. The district court’s new patent<br />
eligibility test is that to be patent eligible,<br />
isolated material must be “markedly different”<br />
from its naturally occurring counterpart. The<br />
court referred to the Supreme Court landmark<br />
decision in Diamond v. Chakrabarty 14 as precedent<br />
but did not define a “markedly different”<br />
invention. However, the district court went further<br />
and applied a “fundamental qualities” test<br />
to invalidate Myriad’s isolated DNA composition<br />
claims, indicating that a naturally occurring<br />
DNA’s “fundamental quality” is to contain “the<br />
physical embodiment of biological information,”<br />
which is the same “fundamental quality”<br />
as isolated DNA. The court appeared to reason<br />
that because both forms of DNA shared this<br />
quality, the isolated DNA was not sufficiently<br />
different from the naturally occurring DNA to<br />
render it patent eligible—a sweeping conclusion<br />
that draws into question the validity of thousands<br />
of patents susceptible to the application<br />
of similar logic.<br />
It is also important to remember the questions<br />
raised by the court in Lab Corp. v. Metabolite 5<br />
in attempting to differentiate patent eligible<br />
subject matter from ineligible biotech inventions.<br />
In this case, the Supreme Court declined<br />
to explicitly consider the issue of the patent eli-<br />
gibility of claims to a method for detecting the<br />
deficiency of cobalamin or folate by measuring<br />
the level of homocysteine in body fluids. The<br />
Federal Circuit held that the claims were valid<br />
but did not address the issue of patent eligibility<br />
under 35 USC §101. The Supreme Court then<br />
declined to review the decision, with Justices<br />
Stevens, Breyer and Souter dissenting. The<br />
dissenting opinion maintained that the claims<br />
were invalid because they recited only natural<br />
phenomena, which are not patent eligible. The<br />
dissent was compelled by public policy considerations<br />
and indicated that if the correlations<br />
between metabolite levels and disease were<br />
patent eligible, physicians may not be able to<br />
exercise their best judgment or might waste<br />
time, and the cost of healthcare would increase<br />
a result that would outweigh the value of protecting<br />
the invention at issue.<br />
Conclusion<br />
In Bilski, the Supreme Court expanded the<br />
forms of biotech and pharmaceutical inventions<br />
that are patent eligible in the US, holding<br />
that the machine-or-transformation test is<br />
not the sole test for patent eligibility in the US<br />
and the types of patent-eligibile subject matter<br />
are vast. But the Court narrowly avoided a<br />
catastrophe for the biotech and pharmaceutical<br />
industry. A majority of the court declined<br />
to adopt the view that “new technologies may<br />
call for new inquiries” directed to patent eligibility,<br />
which would adapt patent law to inventions<br />
of the Information Age. While the court<br />
unanimously held that Bilski’s process claims<br />
were not patent eligible, it indicated that the<br />
machine-or-transformation test may be useful<br />
for determining whether a method claim<br />
meets the threshold requirements of eligibility.<br />
Thus, universities and companies should consider<br />
providing sufficient evidence to satisfy the<br />
machine-or-transformation test when seeking<br />
to obtain patents.<br />
Although a 5–4 majority held that business<br />
methods are not categorically unpatentable, the<br />
court was a single vote away from denying business<br />
methods patent protection. This is chilling<br />
in view of the implications of such a ruling for<br />
other areas of technology, such as biotech and<br />
pharmaceutical method patenting. The court<br />
refrained from articulating a generic test that<br />
would distinguish a patentable method from<br />
an abstract idea. It remains to be seen how the<br />
USPTO, district courts and the Federal Circuit<br />
will proceed to define a new standard of patent<br />
eligibility designed to accommodate future<br />
innovations such as those emerging in the life<br />
sciences. The courts must provide guidance<br />
to the biotech industry as to what is patentable.<br />
What is clear, however, is that based on<br />
the court’s determination that Bilski’s claims<br />
were unpatentable because they were directed<br />
to abstract ideas, it is essential for the pharmaceutical<br />
and biotech industry to pursue and<br />
obtain method claims of varying scope and<br />
pre-emptively evaluate any available evidence<br />
to address future attacks on their intellectual<br />
property based on Bilski, at least until a medically<br />
important “abstract idea,” which could<br />
include an otherwise patentable invention<br />
under US law, is distinguished by the courts or<br />
the legislature.<br />
acknowledgments<br />
The views expressed are solely the author’s and do not<br />
represent those of Sughrue Mion and its clients, and are<br />
subject to changes in the art and law. The author thanks<br />
An Kang Li and Stuart Levy for their cotributions.<br />
COMPETING FINANCIAL INTERESTS<br />
The author declares no competing financial interests.<br />
1. Langer, E.S. Realistic expectations likely to prevail in<br />
2010. Gen. Eng. News (March 1, 2010).<br />
2. In re Bilski, 545 F.3d 943 (Fed. Cir. 2008) (en banc).<br />
3. Simmons, W.J. Nat. Biotechnol. 27, 245–248<br />
(2009).<br />
4. In re Bilski, 545 F.3d at 954.<br />
5. Diamond v. Diehr, 450 US 175, 185 (1981).<br />
6. Lab Corp. v. Metabolite (Fed. Cir. 2004).<br />
7. Gottschalk v. Benson, 409 US 63 (1972).<br />
8. Parker v. Flook, 437 US 584 (1978).<br />
9. Bilski v. Kappos, 561 US (2010) at 8.<br />
10. Association for Molecular Pathology et al. v. USPTO<br />
et al. 1:09 cv-04515 (SDNY).<br />
11. Prometheus Laboratories v. Mayo Collaborative Servs.,<br />
581 F.3d 1336 (Fed. Cir. 2009).<br />
12. 561 US (2010) - Kennedy, p. 9.<br />
13. 561 US (2010) - Kennedy, pp. 12–13.<br />
14. Diamond v. Chakrabarty 447 US 303 (1980).<br />
15. <br />
16. <br />
17. <br />
18. <br />
19. <br />
20. <br />
21. <br />
22. <br />
23. <br />
24. <br />
25. <br />
26. <br />
nature biotechnology volume 28 number 8 august 2010 805
patents<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
Recent patent applications in proteomics<br />
Patent number Description Assignee Inventor<br />
US 20100155243<br />
US 20100129842<br />
WO 2010052510<br />
CN 101696238<br />
JP 2010078455<br />
WO 2010035129<br />
WO 2010026742<br />
WO 2010011860<br />
WO 2010010108<br />
US 7653493<br />
JP 2010014689<br />
A method for separating a sample, involving introducing<br />
the sample into a microchannel formed in a module and<br />
separating the sample into sub-samples according to<br />
isoelectric point and into protein components based on<br />
electrophoresis; useful in, e.g., proteomics.<br />
Proteomic analysis of polypeptides for biomarker analysis,<br />
involving reacting two polypeptide samples, each having<br />
reactive analytes, with different labeling reagents of a set<br />
of labeling reagents, mixing, digesting with enzyme and<br />
performing mass analysis.<br />
A method of diagnosing S-adenyl-l-homocysteine<br />
hydrolase deficiency involving determining qualitativequantitative<br />
blood plasma proteomic profile and<br />
diagnosing S-adenyl-l-homocysteine hydrolase<br />
deficiency based on data obtained by the subject method.<br />
The total protein extract of a plant, and a method for its<br />
preparation, comprising phenol and the reducing agents<br />
mercaptoethanol or dithiothreitol; used for proteomics<br />
research on plant tissue samples.<br />
A method for isolating a peptide, e.g., disease marker<br />
protein, from blood, involving performing multidimensional<br />
column chromatography using an amphoteric ion<br />
column to isolate the peptide and performing protein<br />
mass spectrometry.<br />
An apparatus for separating constituents of a complex<br />
protein mixture for proteomic analysis, comprising the<br />
separation of elements having chemical-physical features<br />
such that they can capture proteins belonging to the<br />
determined homogeneous group by adsorption.<br />
A liquid chromatograph for proteomic analysis that<br />
injects a sample solution into an injection valve through<br />
an injection port that is arranged in the flow path of the<br />
injection valve.<br />
A method for determining if a subject of interest has<br />
pre-diabetes or diabetes or is at risk for developing<br />
pre-diabetes or diabetes, or for monitoring the efficacy<br />
of a therapy, comprising comparing a proteomic profile of<br />
a test sample with a reference sample.<br />
A new cell with no or low endogenous dihydrofolate<br />
reductase (DHFR) levels comprising at least two<br />
heterologous vector constructs; useful as a model cell<br />
for production cell proteomics and for manufacturing<br />
proteins.<br />
A system for automatic mass spectroscopy analysis of a<br />
group of proteomic samples, e.g., peptides, comprising a<br />
unit for detecting ions, ion data processing units to receive<br />
the ion data and a material characterization processor.<br />
A method for the determination of melanoma, involving<br />
detecting or quantifying a melanoma marker gene, e.g.,<br />
serum amyloid A2 gene, or melanoma marker protein,<br />
e.g., serum amyloid A2 protein, in a biological sample<br />
extracted from a human.<br />
Baraniuk JN,<br />
Schneider TW<br />
Life Technologies<br />
(Carlsbad, CA, USA)<br />
Rudjer Boskovic<br />
Institute (Zagreb,<br />
Croatia)<br />
Guangdong Academy<br />
of Agricultural Sciences<br />
Crop Research Institute<br />
(Guangdong, China)<br />
Japan Science and<br />
Technology Agency<br />
(Saitama, Japan)<br />
National Research<br />
Council (Rome)<br />
Baraniuk JN,<br />
Schneider TW<br />
Coull JM,<br />
Pappin DJC,<br />
Purkayastha S<br />
Cindric M, Hock K,<br />
Kraljevic Pavelic S,<br />
Sedic M<br />
Chen X, Liang X,<br />
Zhang E<br />
Asajima M,<br />
Fukuda H, Into A,<br />
Kurisaki A<br />
Boccardi C, Citti L,<br />
Mercatanti A,<br />
Parodi O,<br />
Rocchiccioli S<br />
Priority<br />
application<br />
date<br />
Publication<br />
date<br />
2/26/2003 6/24/2010<br />
1/5/2004 5/27/2010<br />
11/5/2008 5/14/2010<br />
10/27/2009 4/21/2010<br />
9/26/2008 4/8/2010<br />
9/29/2008 4/1/2010<br />
GL Sciences (Tokyo) Uzu H, Zhou X 9/2/2008 3/11/2010<br />
Diabetomics<br />
(Beaverton, OR, USA)<br />
Boehringer Ingelheim<br />
Pharma (Ingelheim,<br />
Germany)<br />
Stanford University<br />
(Palo Alto, CA, USA)<br />
Nagalla SR,<br />
Paturi VR,<br />
Roberts CT<br />
Becker E, Florin L,<br />
Kaufmann H,<br />
Studts JM<br />
Brown M,<br />
Chungfat N, Dutta S,<br />
Mathewson S,<br />
Wang EW<br />
Shizuoka Ken Akiyama Y,<br />
Takigawa M<br />
7/23/2008 1/28/2010<br />
7/23/2008 1/28/2010<br />
2/24/2006 1/26/2010<br />
6/6/2008 1/21/2010<br />
Source: Thomson Scientific Search Service. The status of each application is slightly different from country to country. For further details, contact Thomson Scientific, 1800<br />
Diagonal Road, Suite 250, Alexandria, Virginia 22314, USA. Tel: 1 (800) 337-9368 (http://www.thomson.com/scientific).<br />
806 volume 28 number 8 august 2010 nature biotechnology
news and views<br />
Can HIV be cured with stem cell therapy?<br />
Steven G Deeks & Joseph M McCune<br />
Transplantation of human hematopoietic stem cells engineered to lack the viral coreceptor CCR5 confers resistance<br />
to HIV infection in mice.<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
Antiretroviral therapy has transformed the<br />
treatment of HIV infection, but, despite its profound<br />
successes, it will not halt the relentless<br />
advance of the epidemic. Against this sobering<br />
reality, several promising, recent developments<br />
in the basic-science arena have led<br />
HIV researchers to envision new thera peutic<br />
approaches that would completely eradicate<br />
the virus, effectively ‘curing’ HIV disease. In<br />
an exciting and impressive display of data<br />
published in this issue, Holt et al. 1 provide a<br />
scientific bellwether for the practical implementation<br />
of one such strategy. They show<br />
that CCR5, a human gene often required for<br />
HIV to enter target cells, can be effectively and<br />
permanently disrupted in long-lived, multilineage,<br />
human hematopoietic stem cells (HSCs).<br />
When introduced into mice, these cells generate<br />
an apparently intact human immune<br />
system that is resistant to subsequent infection<br />
with HIV (Fig. 1a). This result raises the<br />
intriguing possibility that HIV-infected individuals<br />
might be cured with a one-time infusion<br />
of autologous, gene-modified HSCs.<br />
The introduction of combination antiretroviral<br />
regimens against HIV in the mid-1990s<br />
was undoubtedly one of the great triumphs<br />
of modern medicine. Almost overnight, those<br />
who could receive and adhere to the therapies<br />
gained a new lease on life. But the passage<br />
of time has revealed the limitations of<br />
these regimens. Because HIV DNA persists<br />
as an integrated genome in long-lived cellular<br />
reservoirs, current antiretroviral drugs are<br />
unlikely to prove curative 2 . In addition, the<br />
therapies require life-long adherence, which<br />
many find challenging, and are often associated<br />
with some short-term and long-term<br />
Steven G. Deeks and Joseph M. McCune are<br />
in the Department of Medicine, University of<br />
California, San Francisco, California, USA.<br />
e-mail: sdeeks@php.ucsf.edu and<br />
mike.mccune@ucsf.edu<br />
toxicity. Moreover, although they suppress<br />
HIV replication in a potent, durable manner,<br />
they do not restore health; for reasons<br />
that remain unknown, treated HIV disease<br />
is attended by chronic inflammation, persistent<br />
T-cell dysfunction and a shortened<br />
life expectancy 3,4 . Finally, and perhaps most<br />
importantly, antiretroviral therapies and<br />
their management are expensive and hard to<br />
deliver on a worldwide basis. It is now apparent<br />
that the number of HIV-infected people<br />
will continue to eclipse the number that can<br />
be successfully treated. To stop the epidemic<br />
and to provide care for all, a fundamentally<br />
different approach is needed.<br />
Gene therapy with HSCs<br />
The concept of HSC-based gene therapy for<br />
HIV disease emerged in the epidemic’s first<br />
decade, when effective antiretroviral regimens<br />
were nonexistent. Multiple advances in delivering<br />
and expressing transgenes in eukaryotic<br />
cells suggested that therapeutic applications<br />
were within reach 5 . Baltimore coined the term<br />
“intracellular immunization” to describe the<br />
introduction of HIV resistance genes into<br />
HSCs to allow long-term repopulation of the<br />
host with progeny cells that would be impervious<br />
to HIV 6 . By the late 1980s, startup biotech<br />
companies were isolating and preparing<br />
human HSCs for transplantation, devising<br />
techniques and vectors to genetically modify<br />
the cells, and conducting preclinical testing 7 .<br />
During the same period, studies of HIV<br />
pathogenesis were generating data that begged<br />
for a therapeutic approach that went beyond<br />
antiretroviral drugs. On the one hand, it<br />
became clear that CD4 + T-cell depletion, the<br />
hallmark of HIV disease, was caused not simply<br />
by destruction of late-stage CD4 + T effector<br />
cells but also by the host’s inability to maintain<br />
progenitor cells, including HSCs, intrathymic<br />
T progenitor cells and central memory T cells<br />
in the periphery 8 (Fig. 1b). On the other hand,<br />
it was recognized that HIV can persist within<br />
multiple lineages of long-lived cells, including<br />
T cells and cells of the myeloid lineage (some of<br />
which appear to be progenitor cells) 2,9 . Taken<br />
together, these observations underscored the<br />
need to confer HIV resistance to both progenitor<br />
cells and their progeny.<br />
Early attempts to engineer HIV resistance<br />
into hematopoietic progenitor cells encountered<br />
insurmountable hurdles: the scientific and<br />
practical constraints of HSC-based therapies<br />
were substantial; protocols for genetic modification<br />
of HSCs were inefficient and cytotoxic;<br />
the preclinical animal models were inadequate;<br />
and the choice of anti-HIV genes was driven<br />
more by convenience (and/or patent considerations)<br />
than by data 7 . Moreover, it proved<br />
difficult to devise a business model that could<br />
support the introduction of such a dramatically<br />
different, untested and potentially toxic form of<br />
therapy into the clinic. More recently, however,<br />
two important developments have prompted<br />
a reevaluation of HSC-based therapy for HIV:<br />
a critical target—the cell-surface receptor<br />
CCR5—was identified, and an HIV-infected<br />
individual was reported to be virus free in<br />
the absence of antiretroviral medications 20<br />
months after receiving a transplant of CCR5-<br />
defective allogeneic HSCs 10 .<br />
Targeting the Achilles’ heel of HIV<br />
To enter cells, HIV must bind to either CCR5 or<br />
CXCR4, chemokine receptors present on many<br />
immune cells 11 . The vast majority of transmitted<br />
viruses use CCR5 (R5 variants). As the disease<br />
progresses, HIV evolves and often, but not<br />
always, expands its co-receptor preference to<br />
include CXCR4 (X4 variants). A small fraction<br />
of people carry a 32-base-pair deletion in the<br />
CCR5 gene, leading to a truncated gene product,<br />
CCR5 ∆32. Those who are heterozygous<br />
for CCR5 ∆32 have delayed disease progression<br />
after they acquire HIV, whereas homozygotes<br />
rarely acquire HIV 11 . Although lack of<br />
nature biotechnology volume 28 number 8 August 2010 807
news and views<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
a<br />
b<br />
Human<br />
HSPCs<br />
HSC<br />
CCR5 knockout<br />
with ZFNs<br />
No CCR5<br />
modification<br />
CLP<br />
M/E<br />
Bone marrow<br />
ITTP<br />
Transplant<br />
HSPCs into<br />
NOG mice<br />
DP<br />
Thymus<br />
CCR5 may be associated with increased risk<br />
of developing serious sequela of some uncommon<br />
infections 12 , it does not seem to affect life<br />
expectancy and may even be associated with a<br />
reduced risk of certain inflammatory diseases.<br />
Once the role of CCR5 became clear in the late<br />
1990s, the pharmaceutical industry devoted<br />
tremendous resources to the development<br />
of small-molecule inhibitors, one of which,<br />
maraviroc (Selzentry), is highly effective, welltolerated<br />
and now FDA approved.<br />
This important set of observations inspired<br />
several groups to pursue CCR5-targeted gene<br />
therapy 13,14 . One highly innovative approach<br />
relied on engineered zinc-finger nucleases<br />
specific for the CCR5 gene 15 . Such ‘molecular<br />
scissors’ can be delivered to cells ex vivo using<br />
methods such as integrase-defective lentiviral<br />
vectors, adenoviral vectors and plasmid DNA<br />
nucleofection. After specific binding of a pair<br />
of zinc-finger nucleases to the CCR5 gene, a<br />
double-stranded DNA break is introduced<br />
and then repaired by pathways that include<br />
SP4<br />
SP8<br />
Tissue myeloid cells<br />
Challenge with<br />
CCR5-tropic HIV<br />
HSPC/thymusmediated<br />
expansion<br />
of peripheral<br />
CD4 + T cells<br />
CD4M<br />
CD4N<br />
CD8N<br />
CD8M<br />
HIV-resistant<br />
immune cells<br />
Low viremia<br />
High viremia<br />
HIV-mediated<br />
destruction of<br />
immune cells<br />
CD4E<br />
CD4E<br />
CD8E<br />
CD8E<br />
Peripheral lymphoid system<br />
Figure 1 Reconstitution of an HIV-resistant lymphoid and myeloid system in an experimental model.<br />
(a) Holt et al. 1 isolated human hematopoietic stem/progenitor cells (HSPCs) and used zinc-finger<br />
nucleases (ZFNs) to disrupt the CCR5 gene, which is often required for the entry of HIV into target<br />
cells. Mice that were successfully engrafted with CCR5-disrupted HSPCs tolerated infection with<br />
HIV, whereas those engrafted with unmodified HSPCs exhibited loss of CD4 + T cells and high-level<br />
viremia. (b) Long-lived, multilineage hematopoietic stem cells (HSCs) give rise to common lymphocyte<br />
progenitors (CLPs) and progenitors of the myelo-erythroid (M/E) lineages. CLPs move through the<br />
thymus and differentiate through a series of stages, from CD3 – CD4 + CD8 – intrathymic T progenitor<br />
(ITTP) cells to CD3 +/– CD4 + CD8 + double positive (DP) thymocytes to CD3 + thymocytes that are single<br />
positive for CD4 (SP4) or CD8 (SP8) to circulating naïve (N), effector (E), and memory (M) CD4 + or<br />
CD8 + T cells. All of the cell stages colored in red can be directly or indirectly disabled by HIV infection.<br />
error-prone nonhomologous end-joining. This<br />
can create a permanent gene disruption that is<br />
passed to daughter cells in the absence of persistent<br />
transgene expression. The end result is<br />
the functional disruption of the CCR5 gene.<br />
Previous work using this approach demonstrated<br />
its feasibility in human peripheral blood<br />
CD4 + T cells 15 , and unpublished data from a<br />
phase 1 trial suggest that autologous CD4 +<br />
T cells modified in this way can be reinfused<br />
safely into HIV-infected individuals (P. Tebas,<br />
University of Pennsylvania, personal communication).<br />
Although of great interest, this strategy<br />
does not disrupt CCR5 in HSCs and thus<br />
would not enable the long-term generation of<br />
both T and myeloid-lineage cells resistant to<br />
HIV infection. Evidence supporting such a leap<br />
came from another quarter.<br />
The Berlin patient: an instructive N of 1<br />
For all of those engaged in the care and treatment<br />
of patients with HIV disease, the world<br />
changed in 2009 with the remarkable story of<br />
a stably treated, HIV-infected individual—the<br />
‘Berlin patient’—who developed acute myeloid<br />
leukemia and was transplanted with HSCs from<br />
a human leukocyte antigen–matched, homozygous<br />
CCR5 ∆32 donor 10 . Combination antiretroviral<br />
therapy was discontinued the day before<br />
the transplant. Twenty months later, HIV could<br />
not be detected in any of the patient’s tissues<br />
examined, even when very sensitive techniques<br />
were used. Given disappointing treatment outcomes<br />
in the past, the HIV research community<br />
is hesitant to use the word ‘cure’, but this<br />
single case could very well be the first example<br />
to fit the bill.<br />
It is important to emphasize that this road<br />
to a cure was arduous and will not be available<br />
to the vast majority of patients. The Berlin<br />
patient underwent fully ablative condi tion ing<br />
with a potentially lethal regimen that included<br />
fludarabine (Fludara), Ara-C, amsacrine<br />
(Amerkin, Amsidyl, Amsidine), cyclosporin,<br />
mycophenolate mofetil (CellCept), antithymocyte<br />
globulin and 4 Gy of total body irradiation.<br />
Graft-versus-host disease developed<br />
during the post-transplant period. Owing to<br />
recurrent acute myeloid leukemia, a second<br />
stem cell transplantation using cells from the<br />
same donor was performed one year after the<br />
first transplant, which again required exposure<br />
to myeloablative therapy, including irradiation.<br />
No one believes that this approach<br />
will soon be used beyond the highly unusual<br />
indications for which allogeneic transplantations<br />
are normally performed. However, the<br />
example of the Berlin patient does provide a<br />
strong rationale for the development of CCR5-<br />
targeted stem cell therapy.<br />
This case also provides fascinating insights<br />
into HIV pathogenesis, some of which may be<br />
relevant to future attempts at HIV eradication.<br />
For example, it is not entirely clear why HIV<br />
did not rebound after combination antiretroviral<br />
therapy was discontinued. According to<br />
genotypic assays, the patient likely harbored<br />
a minority (2.9%) of X4 variant viruses. Also,<br />
host-derived CCR5-expressing myeloid cells,<br />
which are permissive for HIV infection, persisted<br />
for at least five months after the transplant.<br />
Given this volatile combination of<br />
residual CXCR4-tropic virus and long-lived<br />
CCR5-expressing targets, HIV replication and<br />
spread should have continued even as the rest<br />
of the hematopoietic system was being replaced<br />
by homozygous CCR5 ∆32 donor cells.<br />
There are at least two possible explanations<br />
for this surprising result. First, the low-level<br />
X4 variant may have been a poorly fit dualtropic<br />
virus that was dependent on CCR5 for<br />
replication, whereas the number of residual<br />
CCR5-expressing myeloid cells was too low to<br />
support systemic replication of the CCR5-tropic<br />
808 volume 28 number 8 August 2010 nature biotechnology
news and views<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
variants. Second, it is possible that the myeloablative<br />
preparative regimen itself contributed to<br />
the cure by destroying latently infected T and<br />
myeloid cells and by reducing the numbers of<br />
susceptible activated CD4 + T cells (HIV more<br />
readily infects activated rather than resting<br />
target cells). It is also possible that the ongoing<br />
graft-versus-host disease may have acted to<br />
clear residual susceptible target cells. Detailed<br />
exploration of these and other mechanisms will<br />
surely provide profound insights into almost<br />
any possible intervention aimed at HIV eradication<br />
in the future, and should be pursued.<br />
Disruption of CCR5 in autologous HSCs<br />
The strategy of Holt et al. 1 is related to the<br />
treatment received by the Berlin patient but<br />
is potentially relevant to a larger number of<br />
patients (Fig. 1a). The authors obtained human<br />
CD34 + hematopoietic stem/progenitor cells<br />
(HSPCs) (a population enriched in HSCs)<br />
from umbilical cord blood and stimulated them<br />
to divide with Flt-3 and thrombopoietin. The<br />
cells were nucleofected with plasmids expressing<br />
CCR5-specific zinc-finger nucleases.<br />
A mean of 17% of the cells were successfully<br />
modified, 5–7% of which were estimated to<br />
be homozygous CCR5 – . Modified or unmodified<br />
CD34 + cells were then transplanted<br />
into nonobese diabetic/severe combined<br />
immunodeficient/interleukin 2rγ null (NOD/<br />
SCID/IL2rγ null or NOG) mice, a model known<br />
to support multilineage human hematopoiesis.<br />
As expected, mice engrafted with unmodified<br />
stem cells and subsequently challenged with<br />
CCR5-tropic HIV (Bal) showed high levels<br />
of viremia and loss of peripheral and tissuebased<br />
human T cells. Remarkably, in animals<br />
repopulated with CCR5-disrupted HSPCs, the<br />
virus levels were lower and CD4 + T cells were<br />
not depleted, either in the peripheral blood<br />
or in the hemato lymphoid tissues (e.g., bone<br />
marrow, thymus, spleen and small intestine).<br />
The preservation of human CD4 + T cells in<br />
the experimental group was due to selection<br />
for multiple independent clones of successfully<br />
gene-modified cells. The frequency of<br />
cells containing evidence of CCR5 disruption<br />
increased to >80% in the peripheral blood<br />
and to >40% in multiple tissues by week 12<br />
of infection.<br />
These experiments raise a number of technical<br />
issues and derivative questions. For<br />
instance, do genetically modified HSCs confer<br />
benefit to a mouse that is already infected<br />
(the situation most closely approximating the<br />
therapeutic need in humans)? Does a CCR5-<br />
disrupted hematopoietic compartment confer<br />
protection against infection by X4 viral variants?<br />
Is the CCR5-disrupted immune system<br />
normal? Are there long-term toxicities that will<br />
become evident later? Is off-target cleavage by<br />
the zinc-finger nucleases a significant concern<br />
(e.g., the CCR2 gene may also be targeted by<br />
this nuclease 15 )? These and other issues can be<br />
resolved with further work. In the meantime,<br />
the data of Holt et al. 1 show convincingly that<br />
a relatively small number of gene-modified<br />
HSCs can be rapidly selected to ultimately<br />
confer resistance to HIV in vivo.<br />
Next steps<br />
If stem cell–based gene therapy for HIV is to<br />
become a reality in the clinic, a number of<br />
nontrivial theoretical and practical concerns<br />
must be addressed. First, in the current era,<br />
when clinicians are increasingly concerned<br />
about the ‘toxicity’ of ongoing viral replication,<br />
will patients and their healthcare providers be<br />
willing to allow HIV to replicate at high levels<br />
in the absence of antiretroviral therapy so that<br />
CCR5-deficient cells can be selected? There is<br />
now a growing consensus that HIV replication<br />
causes significant and perhaps irreversible<br />
harm to many organs, including those of the<br />
cardiovascular, renal, hepatic and neurologic<br />
systems 4 , so this approach must be assumed<br />
to carry some risk.<br />
Second, will a partially effective antiviral<br />
intervention (which is what the gene- modified<br />
cells represent) select for the outgrowth of a<br />
resistant virus population, such as X4 variants?<br />
The history of HIV therapeutics is absolutely<br />
clear on this issue: if HIV is allowed to<br />
replicate in the presence of a selective pressure,<br />
it will find a way to survive. This concern<br />
is even more pressing as it is widely believed<br />
that X4 variants are more virulent than R5<br />
variants. Although X4 variants are only infrequently<br />
selected in patients treated with smallmolecule<br />
CCR5 inhibitors (e.g., maraviroc),<br />
this is only true when a fully suppressive regimen<br />
is used from day one. It is not likely that<br />
transplantation of gene-modified HSCs will<br />
be fully suppressive at first, particularly if partially<br />
myeloablative therapy is used.<br />
Third, will ablative therapy be needed to<br />
allow stem cell engraftment and, if so, will<br />
short- and long-term toxicity preclude its use<br />
in those most likely to be offered this intervention<br />
first? Those most in need of aggressive<br />
interventions typically have dual-tropic<br />
virus and are therefore unlikely to respond<br />
to any approach based on disruption of<br />
CCR5 (ref. 16). And with advanced disease,<br />
they have a paucity of HSCs and damaged<br />
hematopoietic microenvironments (such as<br />
bone marrow, thymus and lymph node) that<br />
would normally support the maturation of<br />
modified HSCs.<br />
Finally, the mechanism whereby HIV causes<br />
CD4 + T-cell depletion remains unclear 8 .<br />
Although HIV can clearly kill cells directly,<br />
many if not most cells in an HIV-infected individual<br />
die as a consequence of indirect viral<br />
effects. Generalized activation of the immune<br />
system, for example, is harmful to the function<br />
of T and myeloid cells and to the regeneration<br />
of multiple lineages. These indirect effects may<br />
persist even as the virus is driven to extinction<br />
by the gradual emergence of an HIV-resistant<br />
T-cell population.<br />
Reaching for blue sky<br />
Although the above concerns are daunting,<br />
the epidemic is not going to disappear,<br />
the science of stem cells is becoming more<br />
tractable, sociopolitical forces are forging<br />
new perspectives in healthcare, and now is<br />
not the time to stop. From our perspective,<br />
HSC-based gene therapy for HIV disease may<br />
make a significant impact on the worldwide<br />
epidemic if two goals can be met. First, it is<br />
essential to find a way to deliver these therapies<br />
to all in need, in a manner that is safe,<br />
affordable and generally available around<br />
the world. Many clever approaches to do this<br />
have been proposed in the past, and more<br />
will surely emerge. Second, the preclinical<br />
and clinical development of these strategies<br />
requires a sustainable financial model. Such<br />
a model may involve reprioritization of governmental<br />
efforts, creative plans to incentivize<br />
existing pharmaceutical and healthcare delivery<br />
systems, and global assistance programs<br />
motivated by a common desire for a world<br />
free of HIV. This may seem like a formidable<br />
exercise, but it is worth noting that if oneshot,<br />
modified HSC-based gene therapy can<br />
be made efficacious and accessible in the context<br />
of HIV disease, similar approaches will<br />
likely be applicable to a host of other chronic<br />
diseases, infectious and otherwise. If so, the<br />
treatment paradigms of the future will look<br />
vastly different from today’s. In the same way<br />
that problems associated with the reliance on<br />
fossil fuels have stimulated the development of<br />
alternative strategies of energy delivery, so too<br />
may the ongoing crisis in the HIV epidemic<br />
spark novel approaches to the provision of<br />
healthcare in the future.<br />
Conclusion<br />
The progress in HIV therapeutics over the past<br />
15 years has been tremendous. The life expectancy<br />
of most people who present with HIV<br />
disease today in resource-rich regions is on<br />
the order of decades. Yet antiretroviral drugs<br />
have intrinsic limitations that are unlikely to<br />
be surmounted. What is needed, therefore, is<br />
a ‘game changer’, such as a cure for HIV infection<br />
or an effective vaccine. Could a one-shot<br />
manipulation of HSCs be the answer? We will<br />
nature biotechnology volume 28 number 8 AUGUST 2010 809
news and views<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
not know unless we continue to move these<br />
new technologies into the clinic. Even if CCR5-<br />
targeted gene therapy is not the ultimate solution,<br />
human studies are certain to be highly<br />
informative with regard to HIV pathogenesis<br />
and human immunology.<br />
ACKNOWLEDGMENTS<br />
The authors wish to acknowledge amfAR, Project<br />
Inform, TAG and the AIDS Policy Project for<br />
supporting and stimulating cross-disciplinary<br />
discussion on the issues outlined in this commentary.<br />
The authors’ work that contributed to this review<br />
was supported by the National Institute of Allergy<br />
and Infectious Diseases (RO1 AI087145 and<br />
K24AI069994 to S.G.D. and R37 AI40312 and DPI<br />
OD00329 to J.M.M.), the University of California,<br />
San Francisco (UCSF) Center for AIDS Research<br />
(P30 MH59037), the UCSF Clinical and Translational<br />
Science Institute (UL1 RR024131), the Harvey V.<br />
Berneking Living Trust and amfAR. J.M.M. is a<br />
recipient of the National Institutes of Health (NIH)<br />
Director’s Pioneer Award Program, part of the NIH<br />
Roadmap for Medical Research.<br />
Microarrays in the clinic<br />
Guy W Tillinghast<br />
Clinical application of gene expression microarrays<br />
1 and other ’omics technologies is widely<br />
expected to usher in a new era of personalized<br />
medicine. But although DNA microarrays are<br />
beginning to be used in patient care 2,3 , progress<br />
has been slow, in part because of analytic<br />
challenges and concerns about accuracy and<br />
reproducibility. In this issue, the MAQC consortium<br />
presents the results of a large study,<br />
MAQC-II 4 , to evaluate methods for building<br />
genomic classifiers—software programs that<br />
convert microarray profiles of an individual<br />
sample into a prediction, such as membership<br />
in a clinical class. The results show that<br />
microarray algorithms can be reliable enough<br />
to justify clinical application, at least within<br />
certain contexts. More broadly, the findings<br />
of MAQC-II on microarray classifiers may<br />
be useful for analyzing data from other highthroughput<br />
assays.<br />
Existing clinical predictors have well-known<br />
limitations, especially with respect to complex<br />
diseases such as cancer. Given two individuals<br />
who present identical clinical parameters, one<br />
Guy Tillinghast is at the Riverside Cancer Care<br />
Center, Newport News, Virginia, USA.<br />
e-mail: guy.tillinghast@rivhs.com<br />
COMPETING FINANCIAL INTERESTS<br />
The authors declare competing financial<br />
interests: details accompany the full-text HTML<br />
version of the paper at http://www.nature.com/<br />
naturebiotechnology/.<br />
1. Holt, N. et al. Nat. Biotechnol. 28, 839–847 (2010).<br />
2. Siliciano, J.D. et al. Nat. Med. 9, 727–728 (2003).<br />
3. Kuller, L.H. et al. PLoS Med. 5, e203 (2008).<br />
4. Phillips, A.N., Neaton, J. & Lundgren, J.D. AIDS 22,<br />
2409–2418 (2008).<br />
5. Friedman, A.D., Triezenberg, S.J. & McKnight, S.L.<br />
<strong>Nature</strong> 335, 452–454 (1988).<br />
6. Baltimore, D. <strong>Nature</strong> 335, 395–396 (1988).<br />
7. Rossi, J.J., June, C.H. & Kohn, D.B. Nat. Biotechnol.<br />
25, 1444–1454 (2007).<br />
8. McCune, J.M. <strong>Nature</strong> 410, 974–979 (2001).<br />
9. McCune, J.M. Cell 82, 183–188 (1995).<br />
10. Hutter, G. et al. N. Engl. J. Med. 360, 692–698<br />
(2009).<br />
11. Moore, J.P., Kitchen, S.G., Pugach, P. & Zack, J.A.<br />
AIDS Res. Hum. Retroviruses 20, 111–126 (2004).<br />
12. Glass, W.G. et al. J. Exp. Med. 203, 35–40 (2006).<br />
13. DiGiusto, D.L. et al. Sci. Transl. Med. 2, 36ra43<br />
(2010).<br />
14. Shimizu, S. et al. Blood 115, 1534–1544 (2010).<br />
15. Perez, E.E. et al. Nat. Biotechnol. 26, 808–816<br />
(2008). w<br />
16. Hunt, P.W. et al. J. Infect. Dis. 194, 926–930 (2006).<br />
The MicroArray Quality Control (MAQC) consortium has evaluated methods<br />
for making clinically useful predictions from large-scale gene expression data.<br />
may respond to a therapy whereas the other may<br />
not. In principle, genome-wide data should be<br />
able to discriminate between them. The most<br />
common goals of a clinical test are to make a<br />
diagnosis or to determine an appropriate therapy.<br />
In light of statistical considerations, these<br />
goals depend on the prevalence of a disease,<br />
suggesting that clinical DNA microarray tests<br />
will augment, and not supplant, other clinical<br />
information. Thus, a possible strategy would<br />
be to first use traditional clinical predictors to<br />
broadly identify patients who might benefit<br />
from a treatment, and to then use an expensive<br />
assay, such as a microarray, to eliminate<br />
those for whom the treatment is unlikely to<br />
be effective.<br />
Despite this promise, DNA microarrays have<br />
not been rapidly adopted in clinical practice.<br />
One reason is the noise that results from analyzing<br />
thousands of genes, which can lead to<br />
false predictions. Consequently, microarrays<br />
have been criticized because studies of the<br />
same clinical groups using different microarray<br />
measurements or analytic methods have<br />
often yielded dissimilar lists of differentially<br />
expressed genes. A second concern is the inherent<br />
error in the technology. Error stems from<br />
high background at the bottom of the dynamic<br />
range, saturation at the top of the dynamic<br />
range, and nonlinearity, at least with measurements<br />
of some transcripts.<br />
Many statistical methods have been developed<br />
to address these challenges, including<br />
approaches for grouping samples and genes,<br />
data normalization schemes to allow meaningful<br />
comparisons across samples, multiple testing<br />
procedures to select differentially expressed<br />
genes and ‘cross-validation’ methods for using<br />
samples to train prediction algorithms while<br />
reducing bias. These methods are applied<br />
sequentially to transform massive data sets of<br />
raw microarray gene expression profiles into<br />
clinically useful classifiers (Fig. 1a). As the<br />
optimal combination of methods is difficult<br />
to determine, MAQC-II sought to evaluate<br />
approaches to building classifiers.<br />
Clinical use of microarrays is particularly<br />
challenging owing to the variability of<br />
the arrays themselves and to the variability<br />
between patients and between laboratories<br />
performing the analyses. These effects fall<br />
under the rubric of ‘batch effects’ and cause<br />
false positives. Moreover, before MAQC-II, it<br />
had not been clear whether classifiers trained<br />
on an initial data set would be able to make<br />
accurate predictions based on completely<br />
independent samples collected at a later date.<br />
The five-step process for building a classifier<br />
in MAQC-II involved designing the experiment,<br />
collecting microarray data, creating a predictive<br />
model, validating the model internally with<br />
the training samples and validating the model<br />
externally with new samples obtained independently<br />
from the training data. MAQC-II<br />
enlisted 36 teams of data analysts within government<br />
agencies, academia and industry. The<br />
teams were given six microarray data sets and<br />
charged with predicting 13 ‘endpoints’ potentially<br />
relevant to clinical or preclinical applications.<br />
The data sets included toxicological<br />
studies of chemicals on rodents and expression<br />
profiles of human cancer patients. In total, the<br />
teams built >30,000 classifiers using hundreds<br />
of combinations of analytic methods. A team of<br />
referees comprising biostatisticians and experienced<br />
data analysts chose one ‘candidate’ model<br />
that was expected to have the best performance<br />
for each endpoint from among models nominated<br />
by each of the 36 teams.<br />
Next, the consortium analyzed how well<br />
the models classified samples. Performance<br />
was measured using several metrics, but the<br />
one most familiar to clinicians is the receiver<br />
operating characteristic area under the curve<br />
(AUC), a metric that varies between 0 and<br />
1, where 0.5 indicates performance no better<br />
than chance and 1 means that all samples<br />
are correctly classified and none misclassified.<br />
For most of the endpoints, the candidate<br />
810 volume 28 number 8 AUGUST 2010 nature biotechnology
news and views<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
not know unless we continue to move these<br />
new technologies into the clinic. Even if CCR5-<br />
targeted gene therapy is not the ultimate solution,<br />
human studies are certain to be highly<br />
informative with regard to HIV pathogenesis<br />
and human immunology.<br />
ACKNOWLEDGMENTS<br />
The authors wish to acknowledge amfAR, Project<br />
Inform, TAG and the AIDS Policy Project for<br />
supporting and stimulating cross-disciplinary<br />
discussion on the issues outlined in this commentary.<br />
The authors’ work that contributed to this review<br />
was supported by the National Institute of Allergy<br />
and Infectious Diseases (RO1 AI087145 and<br />
K24AI069994 to S.G.D. and R37 AI40312 and DPI<br />
OD00329 to J.M.M.), the University of California,<br />
San Francisco (UCSF) Center for AIDS Research<br />
(P30 MH59037), the UCSF Clinical and Translational<br />
Science Institute (UL1 RR024131), the Harvey V.<br />
Berneking Living Trust and amfAR. J.M.M. is a<br />
recipient of the National Institutes of Health (NIH)<br />
Director’s Pioneer Award Program, part of the NIH<br />
Roadmap for Medical Research.<br />
Microarrays in the clinic<br />
Guy W Tillinghast<br />
Clinical application of gene expression microarrays<br />
1 and other ’omics technologies is widely<br />
expected to usher in a new era of personalized<br />
medicine. But although DNA microarrays are<br />
beginning to be used in patient care 2,3 , progress<br />
has been slow, in part because of analytic<br />
challenges and concerns about accuracy and<br />
reproducibility. In this issue, the MAQC consortium<br />
presents the results of a large study,<br />
MAQC-II 4 , to evaluate methods for building<br />
genomic classifiers—software programs that<br />
convert microarray profiles of an individual<br />
sample into a prediction, such as membership<br />
in a clinical class. The results show that<br />
microarray algorithms can be reliable enough<br />
to justify clinical application, at least within<br />
certain contexts. More broadly, the findings<br />
of MAQC-II on microarray classifiers may<br />
be useful for analyzing data from other highthroughput<br />
assays.<br />
Existing clinical predictors have well-known<br />
limitations, especially with respect to complex<br />
diseases such as cancer. Given two individuals<br />
who present identical clinical parameters, one<br />
Guy Tillinghast is at the Riverside Cancer Care<br />
Center, Newport News, Virginia, USA.<br />
e-mail: guy.tillinghast@rivhs.com<br />
COMPETING FINANCIAL INTERESTS<br />
The authors declare competing financial<br />
interests: details accompany the full-text HTML<br />
version of the paper at http://www.nature.com/<br />
naturebiotechnology/.<br />
1. Holt, N. et al. Nat. Biotechnol. 28, 839–847 (2010).<br />
2. Siliciano, J.D. et al. Nat. Med. 9, 727–728 (2003).<br />
3. Kuller, L.H. et al. PLoS Med. 5, e203 (2008).<br />
4. Phillips, A.N., Neaton, J. & Lundgren, J.D. AIDS 22,<br />
2409–2418 (2008).<br />
5. Friedman, A.D., Triezenberg, S.J. & McKnight, S.L.<br />
<strong>Nature</strong> 335, 452–454 (1988).<br />
6. Baltimore, D. <strong>Nature</strong> 335, 395–396 (1988).<br />
7. Rossi, J.J., June, C.H. & Kohn, D.B. Nat. Biotechnol.<br />
25, 1444–1454 (2007).<br />
8. McCune, J.M. <strong>Nature</strong> 410, 974–979 (2001).<br />
9. McCune, J.M. Cell 82, 183–188 (1995).<br />
10. Hutter, G. et al. N. Engl. J. Med. 360, 692–698<br />
(2009).<br />
11. Moore, J.P., Kitchen, S.G., Pugach, P. & Zack, J.A.<br />
AIDS Res. Hum. Retroviruses 20, 111–126 (2004).<br />
12. Glass, W.G. et al. J. Exp. Med. 203, 35–40 (2006).<br />
13. DiGiusto, D.L. et al. Sci. Transl. Med. 2, 36ra43<br />
(2010).<br />
14. Shimizu, S. et al. Blood 115, 1534–1544 (2010).<br />
15. Perez, E.E. et al. Nat. Biotechnol. 26, 808–816<br />
(2008). w<br />
16. Hunt, P.W. et al. J. Infect. Dis. 194, 926–930 (2006).<br />
The MicroArray Quality Control (MAQC) consortium has evaluated methods<br />
for making clinically useful predictions from large-scale gene expression data.<br />
may respond to a therapy whereas the other may<br />
not. In principle, genome-wide data should be<br />
able to discriminate between them. The most<br />
common goals of a clinical test are to make a<br />
diagnosis or to determine an appropriate therapy.<br />
In light of statistical considerations, these<br />
goals depend on the prevalence of a disease,<br />
suggesting that clinical DNA microarray tests<br />
will augment, and not supplant, other clinical<br />
information. Thus, a possible strategy would<br />
be to first use traditional clinical predictors to<br />
broadly identify patients who might benefit<br />
from a treatment, and to then use an expensive<br />
assay, such as a microarray, to eliminate<br />
those for whom the treatment is unlikely to<br />
be effective.<br />
Despite this promise, DNA microarrays have<br />
not been rapidly adopted in clinical practice.<br />
One reason is the noise that results from analyzing<br />
thousands of genes, which can lead to<br />
false predictions. Consequently, microarrays<br />
have been criticized because studies of the<br />
same clinical groups using different microarray<br />
measurements or analytic methods have<br />
often yielded dissimilar lists of differentially<br />
expressed genes. A second concern is the inherent<br />
error in the technology. Error stems from<br />
high background at the bottom of the dynamic<br />
range, saturation at the top of the dynamic<br />
range, and nonlinearity, at least with measurements<br />
of some transcripts.<br />
Many statistical methods have been developed<br />
to address these challenges, including<br />
approaches for grouping samples and genes,<br />
data normalization schemes to allow meaningful<br />
comparisons across samples, multiple testing<br />
procedures to select differentially expressed<br />
genes and ‘cross-validation’ methods for using<br />
samples to train prediction algorithms while<br />
reducing bias. These methods are applied<br />
sequentially to transform massive data sets of<br />
raw microarray gene expression profiles into<br />
clinically useful classifiers (Fig. 1a). As the<br />
optimal combination of methods is difficult<br />
to determine, MAQC-II sought to evaluate<br />
approaches to building classifiers.<br />
Clinical use of microarrays is particularly<br />
challenging owing to the variability of<br />
the arrays themselves and to the variability<br />
between patients and between laboratories<br />
performing the analyses. These effects fall<br />
under the rubric of ‘batch effects’ and cause<br />
false positives. Moreover, before MAQC-II, it<br />
had not been clear whether classifiers trained<br />
on an initial data set would be able to make<br />
accurate predictions based on completely<br />
independent samples collected at a later date.<br />
The five-step process for building a classifier<br />
in MAQC-II involved designing the experiment,<br />
collecting microarray data, creating a predictive<br />
model, validating the model internally with<br />
the training samples and validating the model<br />
externally with new samples obtained independently<br />
from the training data. MAQC-II<br />
enlisted 36 teams of data analysts within government<br />
agencies, academia and industry. The<br />
teams were given six microarray data sets and<br />
charged with predicting 13 ‘endpoints’ potentially<br />
relevant to clinical or preclinical applications.<br />
The data sets included toxicological<br />
studies of chemicals on rodents and expression<br />
profiles of human cancer patients. In total, the<br />
teams built >30,000 classifiers using hundreds<br />
of combinations of analytic methods. A team of<br />
referees comprising biostatisticians and experienced<br />
data analysts chose one ‘candidate’ model<br />
that was expected to have the best performance<br />
for each endpoint from among models nominated<br />
by each of the 36 teams.<br />
Next, the consortium analyzed how well<br />
the models classified samples. Performance<br />
was measured using several metrics, but the<br />
one most familiar to clinicians is the receiver<br />
operating characteristic area under the curve<br />
(AUC), a metric that varies between 0 and<br />
1, where 0.5 indicates performance no better<br />
than chance and 1 means that all samples<br />
are correctly classified and none misclassified.<br />
For most of the endpoints, the candidate<br />
810 volume 28 number 8 AUGUST 2010 nature biotechnology
news and views<br />
a<br />
b<br />
AUC = 0.991<br />
AUC = 0.956<br />
AUC = 0.787<br />
AUC = 0.615<br />
Tissue sample<br />
Microarray<br />
Remove<br />
batch<br />
effects<br />
Classifier<br />
Normalize<br />
Select<br />
features<br />
Prediction<br />
Train<br />
algorithm<br />
Process evaluated in MAQC-II<br />
Treatment plan<br />
Internal<br />
validation<br />
True-positive rate (sensitivity)<br />
False-positive rate (1 – specificity)<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
Figure 1 Using microarrays to make clinical predictions. (a) Current clinical decision-making processes can be refined by gene expression–based<br />
predictions generated by microarray classifiers (top). MAQC-II evaluated methods for constructing classifiers (bottom). Constructing a classifier from<br />
raw microarray data requires processing the data using a sequence of analytic steps (colored boxes). Many different approaches have been developed to<br />
solve each step (represented as dots above each box). In MAQC-II, >30,000 classifiers were constructed to test different combinations of analytic steps<br />
to predict 13 clinical and preclinical ‘endpoints’. (b) Curves showing the range of performance of classifiers developed for different data sets as part of<br />
MAQC-II. Performance is quantified using AUC. Data sets are characterized by the ratio of positive to negative samples in the cohort (P/N). Classifiers<br />
performed well for some endpoints, such as the sex of patients. The ~400 genes exclusively present on the Y chromosome made this an easy-to-predict<br />
positive control (red, training set P/N 1.44). The most difficult-to-predict endpoint was the overall survival of multiple myeloma patients, which has<br />
traditionally been difficult for other tests as well (orange, training set P/N 0.34). Classifiers for liver toxicity in rats (blue, training set P/N 0.58) and<br />
pathological complete remission in breast cancer (green, training set P/N 0.34) showed intermediate performance.<br />
microarray-based classifiers performed far<br />
better than chance on the independent validation<br />
data set, with a range of 0.62–0.99.<br />
Moreover, the performance of the refereeselected<br />
candidate models was better than that<br />
of nominated models, suggesting that expert<br />
advice can enhance the modeling outcome.<br />
Notably, classifier performance was found<br />
to depend heavily on the endpoint being predicted<br />
(Fig. 1b). However, it is evident from<br />
inspecting the data that there is a linear correlation<br />
between the AUC performance and<br />
the ratio of positive to negative samples in the<br />
cohort (‘training set P/N’). The composition of<br />
the training set is known to affect classification<br />
performance, and extreme imbalance, such as<br />
with the breast cancer and multiple myeloma<br />
endpoints (Fig. 1b, orange and green), may have<br />
adversely affected performance. Alternatively,<br />
the genetics of neuroblastoma and certainly<br />
the rodent data sets may be less variable and<br />
hence more tractable to modeling (Fig. 1b,<br />
blue). Moreover, genetic variation typically<br />
accumulates over time, making the genomes<br />
of the patients with breast cancer and multiple<br />
myeloma more variable than those with neuroblastoma<br />
and therefore less consistent with the<br />
reference genome from which the microarray<br />
platforms were constructed. These substantial<br />
differences in endpoints may have affected the<br />
validation AUC results.<br />
Several findings from MAQC-II may help<br />
bring the technology closer to clinical use.<br />
Microarray experiments should be designed<br />
to minimize batch effects, such as those introduced<br />
by different laboratories or material<br />
lots. There should be a plan for detecting such<br />
effects (e.g., by testing for unexpected genes<br />
that are expressed in different experimental<br />
conditions), and the same statistical test used<br />
to detect differentially expressed genes should<br />
be applied to all samples 5 . A gene that is differentially<br />
expressed in a pattern that matches<br />
the grouping of samples into batches should<br />
be examined closely and probably not used<br />
in a classifier.<br />
Related to batch effects, quality control metrics<br />
should be used to distinguish variation in<br />
gene expression caused by laboratory artifact<br />
rather than by clinical phenotype. Quality control<br />
metrics are formulated to assess specific<br />
aspects of laboratory processing, such as RNA<br />
degradation or faulty equipment. These metrics<br />
can be used to adjust gene expression measurements<br />
or to identify problem microarrays. In<br />
the MAQC-II project, rather than adjusting<br />
measurements to account for laboratory noise,<br />
data analysts did not use samples that appeared<br />
to have quality control problems.<br />
Several factors were found to influence<br />
classifier performance more than the type of<br />
algorithm used. One of these is the inherent<br />
difficulty of the biological phenomena being<br />
predicted. Another is the method for tuning<br />
the algorithm. Inexperience in tuning can be a<br />
major source of bias in the final classifier, especially<br />
if the predictive algorithm is not tuned<br />
for the population of interest. For example, in<br />
a population with low prevalence of a disease,<br />
it may be more desirable to have a test that<br />
makes few false predictions.<br />
The results of MAQC-II highlight two priorities<br />
for future work. First, the field needs<br />
rigorous standards for reporting the steps<br />
used to develop a classifier, its parameters<br />
of use and the appropriate quality metrics.<br />
Examples in the literature 2 may provide useful<br />
starting points. A classifier submitted for<br />
publication or for regulatory approval should<br />
specify how to use it to classify new samples—<br />
for example, the normalization and batch<br />
effect correction procedures to perform, the<br />
essential quality control checks and how to<br />
handle quality control flaws. The final report<br />
of a prediction algorithm should provide the<br />
variance (that is, standard error) of the performance<br />
measure as well as an estimation of<br />
the bias. A prediction report based on analysis<br />
of an individual patient sample should be<br />
accompanied by a report of quality metrics<br />
and their normal values and a report of batch<br />
effect measures that could provide a clinician<br />
with a sense of whether a microarray is within<br />
the range of the samples for which the test<br />
was developed 5 .<br />
Second, methods are needed to combine<br />
microarray predictions with existing clinical<br />
decision-making tools, such as nomograms<br />
(a graphical chart for performing calculations).<br />
In constructing a nomogram, it will be necessary<br />
to determine how to balance the data from<br />
a microarray classifier with traditional clinical<br />
predictors. In addition, approaches should be<br />
developed to handle variability. For instance,<br />
the microarray chips used in MAQC-II have<br />
already been replaced by newer versions.<br />
A key observation of MAQC-II—namely,<br />
that some endpoints seem inherently more<br />
predictable than others, regardless of the<br />
analytic methods used—suggests that gene<br />
expression microarrays may not capture a<br />
nature biotechnology volume 28 number 8 AUGUST 2010 811
news and views<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
sufficiently rich snapshot of disease physiology.<br />
In such cases, complementary technologies,<br />
which measure mRNA expression,<br />
protein levels, genetic mutation, copy number<br />
variation, gene silencing or regulatory RNA<br />
expression, could be considered. Alternatively,<br />
the best technology may vary by tumor type.<br />
High-throughput sequencing, in particular,<br />
offers advantages over microarrays in that<br />
coverage of the genome is less biased and the<br />
dynamic range is larger 6 . With luck, the results<br />
of MAQC-II will be useful for shepherding<br />
Shaking up genome engineering<br />
KA Tipton & John Dueber<br />
A new method generates genome-scale modified bacteria with<br />
unprecedented ease.<br />
Systematic approaches to mutate and characterize<br />
the function of every gene in a microbe have<br />
been hampered by the need to manually create<br />
thousands of separate strains through tedious<br />
genetic manipulation. In this issue, Warner<br />
et al. 1 describe an approach to create and characterize<br />
rationally modified versions of almost<br />
every gene in Escherichia coli. Using this strategy,<br />
the authors quickly zero in on genes that<br />
influence industrially relevant traits, such as<br />
tolerance to toxins in a biofuel feedstock. The<br />
method enables single genome modifications<br />
to be probed rapidly and comprehensively and<br />
correlated to a phenotype, yielding information<br />
that lays a foundation for gene mapping and for<br />
engineering strains with desired phenotypes.<br />
Until now, systematic phenotyping of mutants<br />
in yeasts 2,3 and E. coli 4 has been accomplished<br />
by Herculean manual efforts to create thousands<br />
of mutant strains, each with a different singlegene<br />
knockout. Although the resulting strain<br />
collections have proven valuable, it remains<br />
a challenge to create, on a genome scale, new<br />
collections of mutants for targeted applications<br />
or to control gene expression levels using<br />
a strong promoter, an inducible promoter or a<br />
low- efficiency ribosome binding site.<br />
In contrast, the method of Warner et al. 1 —<br />
trackable multiplex recombineering (TRMR),<br />
pronounced ‘tremor’ (Fig. 1)—offers a fast<br />
and cheap approach for creating collections of<br />
mutants. Impressively, the authors were able to<br />
KA Tipton and John Dueber are at the<br />
University of California Berkeley, Berkeley,<br />
California, USA.<br />
e-mail: jdueber@berkeley.edu<br />
other high- throughput technologies toward<br />
the clinic as well.<br />
COMPETING FINANCIAL INTERESTS<br />
The author declares no competing financial interests.<br />
1. DeRisi, J.L., Iyer, V.R. & Brown, P.P. Science 278,<br />
680–686 (1997).<br />
2. Dumur, C.I. et al. J. Mol. Diagn. 10, 67–77 (2008).<br />
3. Buyse, M. et al. J. Natl. Cancer Inst. 98, 1183–1192<br />
(2006).<br />
4. The MicroArray Quality Control (MAQC) consortium.<br />
Nat. Biotechnol. 28, 827–838 (2010).<br />
5. Luo, J. et al. Pharmacogenomics J. 10, 278–291<br />
(2010).<br />
6. Schuster, S.C. Nat. Methods 5, 16–18 (2008).<br />
construct libraries containing up- and downregulated<br />
versions of 96% of the genes in the<br />
E. coli genome in one week at a materials cost<br />
of ~$1 per targeted gene.<br />
The first step in TRMR is to obtain thousands<br />
of 189-base-pair oligonucleotides that<br />
target and uniquely identify every E. coli gene.<br />
Each of these oligos consists of a barcode tag<br />
unique to a gene and regions of homology that<br />
E. coli<br />
+<br />
Multiplex<br />
oligonucleotide library<br />
E. coli strains with modified<br />
gene expression levels<br />
flank the targeted gene in the genome. Warner<br />
et al. 1 purchased the oligos, which were made<br />
on a programmable microarray. Next, using<br />
a clever cloning strategy, they appended the<br />
oligos to DNA elements that modulate gene<br />
expression. Attaching the targeting oligos to<br />
the strong P LtetO-1 promoter created a DNA<br />
cassette that was expected to upregulate the<br />
targeted gene after incorporation into the<br />
genome. Conversely, attaching the targeting<br />
oligo to a weak ribosome binding site produced<br />
a DNA cassette that downregulated the<br />
targeted gene. An antibiotic resistance gene<br />
allowed selection for the genetic modifications.<br />
As a result of the DNA synthesis and<br />
manipulation steps, Warner et al. 1 created<br />
two libraries of linear DNA fragments, each<br />
with 4,077 DNA cassettes pooled together in<br />
a single tube.<br />
These libraries of DNA oligonucleotides were<br />
used to modify the E. coli genome by means of<br />
recombineering, a homologous recombination–<br />
based method in E. coli expressing λ phage<br />
recombination factors (λgam, bet and exo) 5 .<br />
Growth on antibiotic medium selects for successful<br />
recombinants, and the sites of recombination<br />
are determined by homology of the<br />
targeting oligos to genomic regions flanking<br />
each gene.<br />
The resulting collections of modified E. coli<br />
strains were then challenged by growth in<br />
environmental conditions of interest. Warner<br />
et al. 1 measured the relative fitness of each<br />
Selection in new<br />
environmental<br />
conditions<br />
Figure 1 TRMR enables genome-scale selection of rational modifications to the expression of single<br />
genes. A multiplex library of oligonucleotides is synthesized to encode a unique barcode tag and regions<br />
of homology flanking individual target genes in the E. coli genome (left). A series of cloning steps<br />
generates linear DNA fragments that contain sequences necessary for up- or downregulating the<br />
expression of each target gene. E. coli are transformed with this library of linear fragments to create a<br />
collection of genetically modified strains (middle, green cells containing a modified genetic network).<br />
The modifications alter the functional linkages between genes. (Lines in the networks represent<br />
linkages, with thickness being the strength of the link. Circles represent genes, with translucency<br />
and a dashed outline representing attenuated expression). The E. coli strain collection is grown<br />
on medium containing an environmental challenge of interest (right). The identities and relative<br />
abundances of individual survivors are determined by sequencing colonies using universal primer<br />
sequences. Alternatively, survivors are determined in bulk by microarray analysis of the barcode tags.<br />
Importantly, the basic TRMR strategy is amenable to rapid iteration such that the most promising gene<br />
modifications are used to seed subsequent cycles of mutation and selection (dotted arrow).<br />
812 volume 28 number 8 AUGUST 2010 nature biotechnology
news and views<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
sufficiently rich snapshot of disease physiology.<br />
In such cases, complementary technologies,<br />
which measure mRNA expression,<br />
protein levels, genetic mutation, copy number<br />
variation, gene silencing or regulatory RNA<br />
expression, could be considered. Alternatively,<br />
the best technology may vary by tumor type.<br />
High-throughput sequencing, in particular,<br />
offers advantages over microarrays in that<br />
coverage of the genome is less biased and the<br />
dynamic range is larger 6 . With luck, the results<br />
of MAQC-II will be useful for shepherding<br />
Shaking up genome engineering<br />
KA Tipton & John Dueber<br />
A new method generates genome-scale modified bacteria with<br />
unprecedented ease.<br />
Systematic approaches to mutate and characterize<br />
the function of every gene in a microbe have<br />
been hampered by the need to manually create<br />
thousands of separate strains through tedious<br />
genetic manipulation. In this issue, Warner<br />
et al. 1 describe an approach to create and characterize<br />
rationally modified versions of almost<br />
every gene in Escherichia coli. Using this strategy,<br />
the authors quickly zero in on genes that<br />
influence industrially relevant traits, such as<br />
tolerance to toxins in a biofuel feedstock. The<br />
method enables single genome modifications<br />
to be probed rapidly and comprehensively and<br />
correlated to a phenotype, yielding information<br />
that lays a foundation for gene mapping and for<br />
engineering strains with desired phenotypes.<br />
Until now, systematic phenotyping of mutants<br />
in yeasts 2,3 and E. coli 4 has been accomplished<br />
by Herculean manual efforts to create thousands<br />
of mutant strains, each with a different singlegene<br />
knockout. Although the resulting strain<br />
collections have proven valuable, it remains<br />
a challenge to create, on a genome scale, new<br />
collections of mutants for targeted applications<br />
or to control gene expression levels using<br />
a strong promoter, an inducible promoter or a<br />
low- efficiency ribosome binding site.<br />
In contrast, the method of Warner et al. 1 —<br />
trackable multiplex recombineering (TRMR),<br />
pronounced ‘tremor’ (Fig. 1)—offers a fast<br />
and cheap approach for creating collections of<br />
mutants. Impressively, the authors were able to<br />
KA Tipton and John Dueber are at the<br />
University of California Berkeley, Berkeley,<br />
California, USA.<br />
e-mail: jdueber@berkeley.edu<br />
other high- throughput technologies toward<br />
the clinic as well.<br />
COMPETING FINANCIAL INTERESTS<br />
The author declares no competing financial interests.<br />
1. DeRisi, J.L., Iyer, V.R. & Brown, P.P. Science 278,<br />
680–686 (1997).<br />
2. Dumur, C.I. et al. J. Mol. Diagn. 10, 67–77 (2008).<br />
3. Buyse, M. et al. J. Natl. Cancer Inst. 98, 1183–1192<br />
(2006).<br />
4. The MicroArray Quality Control (MAQC) consortium.<br />
Nat. Biotechnol. 28, 827–838 (2010).<br />
5. Luo, J. et al. Pharmacogenomics J. 10, 278–291<br />
(2010).<br />
6. Schuster, S.C. Nat. Methods 5, 16–18 (2008).<br />
construct libraries containing up- and downregulated<br />
versions of 96% of the genes in the<br />
E. coli genome in one week at a materials cost<br />
of ~$1 per targeted gene.<br />
The first step in TRMR is to obtain thousands<br />
of 189-base-pair oligonucleotides that<br />
target and uniquely identify every E. coli gene.<br />
Each of these oligos consists of a barcode tag<br />
unique to a gene and regions of homology that<br />
E. coli<br />
+<br />
Multiplex<br />
oligonucleotide library<br />
E. coli strains with modified<br />
gene expression levels<br />
flank the targeted gene in the genome. Warner<br />
et al. 1 purchased the oligos, which were made<br />
on a programmable microarray. Next, using<br />
a clever cloning strategy, they appended the<br />
oligos to DNA elements that modulate gene<br />
expression. Attaching the targeting oligos to<br />
the strong P LtetO-1 promoter created a DNA<br />
cassette that was expected to upregulate the<br />
targeted gene after incorporation into the<br />
genome. Conversely, attaching the targeting<br />
oligo to a weak ribosome binding site produced<br />
a DNA cassette that downregulated the<br />
targeted gene. An antibiotic resistance gene<br />
allowed selection for the genetic modifications.<br />
As a result of the DNA synthesis and<br />
manipulation steps, Warner et al. 1 created<br />
two libraries of linear DNA fragments, each<br />
with 4,077 DNA cassettes pooled together in<br />
a single tube.<br />
These libraries of DNA oligonucleotides were<br />
used to modify the E. coli genome by means of<br />
recombineering, a homologous recombination–<br />
based method in E. coli expressing λ phage<br />
recombination factors (λgam, bet and exo) 5 .<br />
Growth on antibiotic medium selects for successful<br />
recombinants, and the sites of recombination<br />
are determined by homology of the<br />
targeting oligos to genomic regions flanking<br />
each gene.<br />
The resulting collections of modified E. coli<br />
strains were then challenged by growth in<br />
environmental conditions of interest. Warner<br />
et al. 1 measured the relative fitness of each<br />
Selection in new<br />
environmental<br />
conditions<br />
Figure 1 TRMR enables genome-scale selection of rational modifications to the expression of single<br />
genes. A multiplex library of oligonucleotides is synthesized to encode a unique barcode tag and regions<br />
of homology flanking individual target genes in the E. coli genome (left). A series of cloning steps<br />
generates linear DNA fragments that contain sequences necessary for up- or downregulating the<br />
expression of each target gene. E. coli are transformed with this library of linear fragments to create a<br />
collection of genetically modified strains (middle, green cells containing a modified genetic network).<br />
The modifications alter the functional linkages between genes. (Lines in the networks represent<br />
linkages, with thickness being the strength of the link. Circles represent genes, with translucency<br />
and a dashed outline representing attenuated expression). The E. coli strain collection is grown<br />
on medium containing an environmental challenge of interest (right). The identities and relative<br />
abundances of individual survivors are determined by sequencing colonies using universal primer<br />
sequences. Alternatively, survivors are determined in bulk by microarray analysis of the barcode tags.<br />
Importantly, the basic TRMR strategy is amenable to rapid iteration such that the most promising gene<br />
modifications are used to seed subsequent cycles of mutation and selection (dotted arrow).<br />
812 volume 28 number 8 AUGUST 2010 nature biotechnology
news and views<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
modified strain by isolating genomic DNA,<br />
amplifying the barcode tags using PCR and<br />
hybridizing the amplified DNA to a microarray<br />
that contains probes complementary to<br />
each tag. A signal on the microarray identifies<br />
strains that grew. To demonstrate the<br />
approach, the authors selected for growth in<br />
media containing salicin, d-fucose, valine or<br />
methylglyoxyl. These compounds inhibit cell<br />
growth by different mechanisms. Salicin is a<br />
carbon source that normally cannot be metabolized.<br />
d-fucose is an analogue of arabinose<br />
that inhibits the ability of E. coli to metabolize<br />
this sugar. Valine acts as a feedback inhibitor<br />
of growth-limiting leucine and isoleucine biosynthesis.<br />
Methylglyoxal presents an oxidative<br />
stress if present in elevated concentrations.<br />
These conditions demonstrated the effectiveness<br />
of TRMR in identifying gene-trait relationships<br />
and in identifying genes that were<br />
not expected to be involved in resistance to the<br />
given cellular stress, thus supporting the power<br />
of a genome-scale, unbiased approach.<br />
In a particularly challenging and exciting<br />
application of TRMR, Warner et al. 1 grew their<br />
libraries of strains in lignocellulosic hydrolysate<br />
derived from corn stover. Hydrolysates<br />
represent a complex potpourri of molecules<br />
toxic to E. coli. It has been difficult to predict<br />
a priori which genes would best confer resistance<br />
to growth inhibitors in the hydrolysates 6 .<br />
This problem is thus well suited to test the<br />
authors’ methods. Among the modified genes<br />
that conferred improved growth were genes<br />
with expected functions as well as several<br />
with seemingly disparate cellular functions,<br />
including primary metabolism, RNA metabolism,<br />
sugar transporters, secondary metabolism,<br />
vitamin processes and antioxidant activities. In<br />
one notable result, the authors identified the<br />
antioxidant ahpC, a gene not previously linked<br />
to growth on hydrolysates, which, when upregulated,<br />
considerably improved both growth rate<br />
and final biomass levels.<br />
TRMR has many potential uses. Warner<br />
et al. 1 note that it could easily be applied iteratively,<br />
with strains selected after one round of<br />
TRMR used as the starting strains for a second<br />
round, thereby accumulating beneficial<br />
genome alterations (Fig. 1, dotted arrow).<br />
Such iterative processing can take advantage<br />
of the same pool of oligos already synthesized.<br />
Parallel microarray analysis of the barcode<br />
tags present in the selected survivors should<br />
produce additional layers of information about<br />
genetic contributors to fitness. For instance,<br />
the ability to track combinations of alterations<br />
in a stepwise fashion as they accumulate has<br />
the potential to provide snapshots of genetic<br />
interaction data that, if taken at a high enough<br />
frequency, may uncover network connections<br />
in conditions particularly relevant to industrial<br />
and biotechnological settings.<br />
TRMR is also valuable because it identifies<br />
genes and network connections that<br />
could form the basis for further strain optimization.<br />
For instance, a particularly powerful<br />
combination of technologies would<br />
be to first use TRMR to identify relevant<br />
genes and then apply the recently developed<br />
multiplex automated genome engineering<br />
(MAGE) method 7 , which finely tunes the<br />
expression levels of a limited number of<br />
genes. In microbial engineering applications,<br />
such as the creation of a strain of E. coli that<br />
can metabolize lignocellulose sugars, TRMR<br />
should complement existing technologies,<br />
including directed evolution, genome-scale<br />
metabolic modeling and synthetic biology<br />
approaches for redox balancing, flux improvement<br />
and limiting the production of undesirable<br />
and toxic metabolic products.<br />
In addition to TRMR, other approaches<br />
based on genome-wide modifications are<br />
Dendritic cells (DCs) are central players in the<br />
control of immunity and tolerance, and investigation<br />
of their properties is expected to illuminate<br />
many diseases of the immune system<br />
and lead to innovative therapies. Four recent<br />
reports 1–4 in The Journal of Experimental<br />
Medicine mark new progress in our understanding<br />
of the biology of a particular human<br />
DC subset identified by co-expression of<br />
CD141 (thrombomodulin, BDCA-3) and the<br />
increasingly providing scientists with the ability<br />
to generate large, information-rich data sets<br />
from which new genetic information may be<br />
extracted 2–4,8,9 . TRMR heralds an approach to<br />
genetic analyses in which phenotypes are rapidly<br />
mapped to genetic modifications across the<br />
genome, simultaneously producing improved<br />
strains for immediate practical use as well as<br />
data sets enabling future rational creation of<br />
sophisticated strains.<br />
COMPETING FINANCIAL INTERESTS<br />
The authors declare no competing financial interests.<br />
1. Warner, J. et al. Nat. Biotechnol. 28, 856–862<br />
(2010).<br />
2. Giaever, G. et al. <strong>Nature</strong> 418, 387–391 (2002).<br />
3. Kim, D.U. et al. Nat. Biotechnol. 28, 617–623<br />
(2010).<br />
4. Baba, T. et al. Mol. Syst. Biol. 2, 2006.0008 (2006).<br />
5. Datta, S., Costantino, N. & Court, D.L. Gene 379,<br />
109–115 (2006).<br />
6. Mohagheghi, A. & Schell, D.J. Biotechnol. Bioeng. 105,<br />
992–996 (2010).<br />
7. Wang, H.H. et al. <strong>Nature</strong> 460, 894–898 (2009).<br />
8. Tong, A.H. et al. Science 294, 2364–2368 (2001).<br />
9. Mnaimneh, S. et al. Cell 118, 31–44 (2004).<br />
The expanding family of dendritic<br />
cell subsets<br />
Hideki Ueno, A Karolina Palucka & Jacques Banchereau<br />
The recent identification of human CD141 + dendritic cells as a counterpart<br />
of mouse CD8 + dendritic cells may be useful in developing vaccines and<br />
immunotherapies.<br />
Hideki Ueno, A. Karolina Palucka and Jacques<br />
Banchereau are at the Baylor Institute for<br />
Immunology Research and INSERM U899,<br />
Dallas, Texas, USA; A. Karolina Palucka is at<br />
the Sammons Cancer Center, Baylor University<br />
Medical Center, Dallas, Texas, USA; and<br />
A. Karolina Palucka and Jacques Banchereau<br />
are in the Department of Gene and Cell<br />
Medicine and Department of Medicine,<br />
Immunology Institute, Mount Sinai School of<br />
Medicine, New York, New York, USA.<br />
e-mail: jacquesb@baylorhealth.edu<br />
C-type lectin CLEC9A (DNGR-1). Collectively,<br />
the papers show that CD141 + DCs are the<br />
human counterpart of mouse CD8 + DCs. As<br />
mouse CD8 + DCs are important for the induction<br />
of cytotoxic T-lymphocyte responses<br />
through their exceptional capacity to present<br />
exogenous antigens in an HLA class I pathway<br />
(so-called cross-presentation) 5 , this discovery<br />
could have significant clinical impact if human<br />
CD141 + DCs have a similar role.<br />
DCs were discovered in 1973 by Ralph<br />
Steinman as a novel cell type in the mouse<br />
spleen and are now recognized as a group of<br />
related cell populations that efficiently present<br />
antigens. Both mice and humans have two<br />
major types of DC: myeloid DCs (mDCs, also<br />
called conventional or classical DCs), and<br />
plasmacytoid DCs (pDCs). pDCs are considered<br />
the front line in anti-viral immunity as<br />
they rapidly produce abundant type I interferon<br />
in response to viral infection. In their<br />
resting state, pDCs may be important in tolerance,<br />
including oral tolerance 6,7 . pDCs are<br />
nature biotechnology volume 28 number 8 AUGUST 2010 813
news and views<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
modified strain by isolating genomic DNA,<br />
amplifying the barcode tags using PCR and<br />
hybridizing the amplified DNA to a microarray<br />
that contains probes complementary to<br />
each tag. A signal on the microarray identifies<br />
strains that grew. To demonstrate the<br />
approach, the authors selected for growth in<br />
media containing salicin, d-fucose, valine or<br />
methylglyoxyl. These compounds inhibit cell<br />
growth by different mechanisms. Salicin is a<br />
carbon source that normally cannot be metabolized.<br />
d-fucose is an analogue of arabinose<br />
that inhibits the ability of E. coli to metabolize<br />
this sugar. Valine acts as a feedback inhibitor<br />
of growth-limiting leucine and isoleucine biosynthesis.<br />
Methylglyoxal presents an oxidative<br />
stress if present in elevated concentrations.<br />
These conditions demonstrated the effectiveness<br />
of TRMR in identifying gene-trait relationships<br />
and in identifying genes that were<br />
not expected to be involved in resistance to the<br />
given cellular stress, thus supporting the power<br />
of a genome-scale, unbiased approach.<br />
In a particularly challenging and exciting<br />
application of TRMR, Warner et al. 1 grew their<br />
libraries of strains in lignocellulosic hydrolysate<br />
derived from corn stover. Hydrolysates<br />
represent a complex potpourri of molecules<br />
toxic to E. coli. It has been difficult to predict<br />
a priori which genes would best confer resistance<br />
to growth inhibitors in the hydrolysates 6 .<br />
This problem is thus well suited to test the<br />
authors’ methods. Among the modified genes<br />
that conferred improved growth were genes<br />
with expected functions as well as several<br />
with seemingly disparate cellular functions,<br />
including primary metabolism, RNA metabolism,<br />
sugar transporters, secondary metabolism,<br />
vitamin processes and antioxidant activities. In<br />
one notable result, the authors identified the<br />
antioxidant ahpC, a gene not previously linked<br />
to growth on hydrolysates, which, when upregulated,<br />
considerably improved both growth rate<br />
and final biomass levels.<br />
TRMR has many potential uses. Warner<br />
et al. 1 note that it could easily be applied iteratively,<br />
with strains selected after one round of<br />
TRMR used as the starting strains for a second<br />
round, thereby accumulating beneficial<br />
genome alterations (Fig. 1, dotted arrow).<br />
Such iterative processing can take advantage<br />
of the same pool of oligos already synthesized.<br />
Parallel microarray analysis of the barcode<br />
tags present in the selected survivors should<br />
produce additional layers of information about<br />
genetic contributors to fitness. For instance,<br />
the ability to track combinations of alterations<br />
in a stepwise fashion as they accumulate has<br />
the potential to provide snapshots of genetic<br />
interaction data that, if taken at a high enough<br />
frequency, may uncover network connections<br />
in conditions particularly relevant to industrial<br />
and biotechnological settings.<br />
TRMR is also valuable because it identifies<br />
genes and network connections that<br />
could form the basis for further strain optimization.<br />
For instance, a particularly powerful<br />
combination of technologies would<br />
be to first use TRMR to identify relevant<br />
genes and then apply the recently developed<br />
multiplex automated genome engineering<br />
(MAGE) method 7 , which finely tunes the<br />
expression levels of a limited number of<br />
genes. In microbial engineering applications,<br />
such as the creation of a strain of E. coli that<br />
can metabolize lignocellulose sugars, TRMR<br />
should complement existing technologies,<br />
including directed evolution, genome-scale<br />
metabolic modeling and synthetic biology<br />
approaches for redox balancing, flux improvement<br />
and limiting the production of undesirable<br />
and toxic metabolic products.<br />
In addition to TRMR, other approaches<br />
based on genome-wide modifications are<br />
Dendritic cells (DCs) are central players in the<br />
control of immunity and tolerance, and investigation<br />
of their properties is expected to illuminate<br />
many diseases of the immune system<br />
and lead to innovative therapies. Four recent<br />
reports 1–4 in The Journal of Experimental<br />
Medicine mark new progress in our understanding<br />
of the biology of a particular human<br />
DC subset identified by co-expression of<br />
CD141 (thrombomodulin, BDCA-3) and the<br />
increasingly providing scientists with the ability<br />
to generate large, information-rich data sets<br />
from which new genetic information may be<br />
extracted 2–4,8,9 . TRMR heralds an approach to<br />
genetic analyses in which phenotypes are rapidly<br />
mapped to genetic modifications across the<br />
genome, simultaneously producing improved<br />
strains for immediate practical use as well as<br />
data sets enabling future rational creation of<br />
sophisticated strains.<br />
COMPETING FINANCIAL INTERESTS<br />
The authors declare no competing financial interests.<br />
1. Warner, J. et al. Nat. Biotechnol. 28, 856–862<br />
(2010).<br />
2. Giaever, G. et al. <strong>Nature</strong> 418, 387–391 (2002).<br />
3. Kim, D.U. et al. Nat. Biotechnol. 28, 617–623<br />
(2010).<br />
4. Baba, T. et al. Mol. Syst. Biol. 2, 2006.0008 (2006).<br />
5. Datta, S., Costantino, N. & Court, D.L. Gene 379,<br />
109–115 (2006).<br />
6. Mohagheghi, A. & Schell, D.J. Biotechnol. Bioeng. 105,<br />
992–996 (2010).<br />
7. Wang, H.H. et al. <strong>Nature</strong> 460, 894–898 (2009).<br />
8. Tong, A.H. et al. Science 294, 2364–2368 (2001).<br />
9. Mnaimneh, S. et al. Cell 118, 31–44 (2004).<br />
The expanding family of dendritic<br />
cell subsets<br />
Hideki Ueno, A Karolina Palucka & Jacques Banchereau<br />
The recent identification of human CD141 + dendritic cells as a counterpart<br />
of mouse CD8 + dendritic cells may be useful in developing vaccines and<br />
immunotherapies.<br />
Hideki Ueno, A. Karolina Palucka and Jacques<br />
Banchereau are at the Baylor Institute for<br />
Immunology Research and INSERM U899,<br />
Dallas, Texas, USA; A. Karolina Palucka is at<br />
the Sammons Cancer Center, Baylor University<br />
Medical Center, Dallas, Texas, USA; and<br />
A. Karolina Palucka and Jacques Banchereau<br />
are in the Department of Gene and Cell<br />
Medicine and Department of Medicine,<br />
Immunology Institute, Mount Sinai School of<br />
Medicine, New York, New York, USA.<br />
e-mail: jacquesb@baylorhealth.edu<br />
C-type lectin CLEC9A (DNGR-1). Collectively,<br />
the papers show that CD141 + DCs are the<br />
human counterpart of mouse CD8 + DCs. As<br />
mouse CD8 + DCs are important for the induction<br />
of cytotoxic T-lymphocyte responses<br />
through their exceptional capacity to present<br />
exogenous antigens in an HLA class I pathway<br />
(so-called cross-presentation) 5 , this discovery<br />
could have significant clinical impact if human<br />
CD141 + DCs have a similar role.<br />
DCs were discovered in 1973 by Ralph<br />
Steinman as a novel cell type in the mouse<br />
spleen and are now recognized as a group of<br />
related cell populations that efficiently present<br />
antigens. Both mice and humans have two<br />
major types of DC: myeloid DCs (mDCs, also<br />
called conventional or classical DCs), and<br />
plasmacytoid DCs (pDCs). pDCs are considered<br />
the front line in anti-viral immunity as<br />
they rapidly produce abundant type I interferon<br />
in response to viral infection. In their<br />
resting state, pDCs may be important in tolerance,<br />
including oral tolerance 6,7 . pDCs are<br />
nature biotechnology volume 28 number 8 AUGUST 2010 813
news and views<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
CTL Th cells<br />
Long-lived memory<br />
CD8 + T cells<br />
Langerhans<br />
cells<br />
IL-15<br />
CTLs<br />
Antigen crosspresentation<br />
themselves composed of at least two subsets<br />
with different functional properties 8 .<br />
Similarly, mDCs comprise different subsets<br />
with unique localization, phenotype and functions<br />
(Fig. 1). In human skin, the epidermis<br />
hosts Langerhans cells, whereas the dermis<br />
contains CD1a + DCs and CD14 + DCs. The<br />
latter DC subset is involved in the generation of<br />
humoral immunity, partly through secretion of<br />
interleukin (IL)-12, which stimulates the differentiation<br />
of activated B cells into plasma cells<br />
and also promotes the differentiation of naive<br />
CD4 + T cells into T follicular helper cells 9,10 ,<br />
a CD4 + T-cell subset that promotes antibody<br />
responses. In contrast, Langerhans cells efficiently<br />
prime antigen-specific CD8 + T cells,<br />
possibly by means of IL-15 (ref. 9). The functions<br />
of the predominant CD1a + dermal DCs<br />
are as yet unknown.<br />
Human DCs expressing CD141 were originally<br />
found in blood as a subset of mDCs distinct<br />
from CD1c + mDCs 11 . The new reports 1–4<br />
argue that CD141 + DCs are the human counterpart<br />
of mouse CD8 + DCs on the basis of<br />
results from several different experimental<br />
CD141 + DCs<br />
IL-12?<br />
Protection in vivo<br />
Plasma cells<br />
Dermal<br />
CD14 + DCs<br />
IL-12<br />
Tfh cells<br />
Long-lived<br />
memory B cells<br />
Figure 1 Contribution of human myeloid DC subsets to the regulation of adaptive immunity. The<br />
humoral and cellular arms of adaptive immunity are regulated by different human mDC subsets.<br />
Humoral immunity is preferentially regulated by CD14 + dermal DCs by means of IL-12, which acts<br />
directly on B cells and promotes the development of T follicular helper cells (Tfh). Cellular immunity<br />
is preferentially regulated by Langerhans cells, possibly through IL-15 and a dedicated subset of CD4 +<br />
T cells specialized to help CD8 + T cells (CTL Th cells). Given their capacity to cross-present antigens<br />
to CD8 + T cells, CD141 + DCs are likely to be involved in the development of cytotoxic T-lymphocyte<br />
responses. CD141 + DCs might also be involved in the development of humoral responses through<br />
IL-12 secretion. This hypothesis is supported by mouse in vivo antigen-targeting studies showing that<br />
CD8 + DCs, the mouse counterpart of human CD141 + DCs, can induce both cytotoxic T-lymphocyte and<br />
humoral responses 12,13 , although the mechanisms may be different. It will be important to determine<br />
whether and how CD141 + DCs are related to Langerhans cells and to dermal DCs, and how these DC<br />
subsets shape adaptive immunity.<br />
approaches, including detailed functional and<br />
phenotypic analysis 1,3 , as well as the discovery<br />
of a chemokine receptor expressed on both<br />
cell types 2,4 .<br />
First, like mouse CD8 + DCs, human CD141 +<br />
DCs are present in secondary lymphoid organs<br />
such as tonsils and spleen 1,3 . Further studies<br />
are needed to determine whether they are also<br />
present in tissues.<br />
Second, although human CD141 + DCs do<br />
not express CD8, they share with mouse CD8 +<br />
DCs expression of other surface molecules,<br />
including CLEC9A 1,3,12,13 and the adhesion<br />
molecule, NECL2 (refs. 3,14). NECL2 binds to<br />
class I–restricted T cell–associated molecule,<br />
a cell-surface protein primarily expressed by<br />
natural killer cells, natural killer T cells and<br />
activated CD8 + T cells 14 .<br />
Third, human CD141 + DCs uniquely express<br />
the chemokine receptor XCR1 (refs. 2,4), in<br />
line with the unique expression of XCR1 by<br />
mouse CD8 + DCs shown previously. XCR1<br />
expressed in both human and mouse DCs is<br />
functional, as the cells migrate in response to<br />
the ligand XCL1 (refs. 2,4), a secreted protein<br />
known to be produced by natural killer cells<br />
and activated CD8 + T cells. These observations<br />
suggest a potential for interactions<br />
between human CD141 + DCs/mouse CD8 +<br />
DCs and natural killer cells or CD8 + T cells,<br />
which might be a mechanism involved in the<br />
efficient induction of cytotoxic T lymphocyte<br />
responses. For example, interferon (IFN)-γ<br />
released by natural killer cells and/or CD8 +<br />
T cells might stimulate CD141 + DCs/CD8 +<br />
DCs to secrete more IL-12 (refs. 2,4).<br />
Fourth, all of the new studies 1–4 demonstrate<br />
that human CD141 + DCs are highly efficient in<br />
inducing CD8 + T-cell responses through their<br />
capacity to cross-present exogenous antigens.<br />
This evidence suggests that human CD141 +<br />
DCs participate in the development of cytotoxic<br />
lymphocyte responses in vivo.<br />
Fifth, human CD141 + DCs and mouse CD8 +<br />
DCs express the transcription factors Batf3<br />
and IRF-8 (refs. 1,3), both of which are strictly<br />
required for the development of mouse CD8 +<br />
DCs 5 . In contrast, CD141 + DCs do not express<br />
IRF4 (refs. 1,3), a transcription factor required<br />
for the development of other mouse spleen<br />
CD4 + DCs 5 . Thus, CD141 + DCs and mouse<br />
CD8 + DCs might share a common developmental<br />
pathway.<br />
Finally, two of the studies 1,3 show similarities<br />
between human CD141 + DCs and mouse<br />
CD8 + DCs in the expression of Toll-like<br />
receptors (TLRs). TLRs belong to the family<br />
of pattern recognition receptors through<br />
which DCs sense microbes and dying cells.<br />
Engagement of these receptors by pathogen-<br />
and danger-associated molecular patterns<br />
expressed by microbes and dying cells<br />
triggers DC maturation, a complex series of<br />
events that includes expression of new surface<br />
molecules, secretion of cytokines and a<br />
reduction in antigen capture. Different DC<br />
subsets express different sets of pattern recognition<br />
receptors, particularly in humans,<br />
which provides flexibility in responding to<br />
different microbes.<br />
Similar to mouse CD8 + DCs, human CD141 +<br />
DCs are found to express TLR3 and TLR8,<br />
and stimulation with their respective ligands<br />
(poly I:C and poly U) induces their maturation<br />
and cytokine secretion. In contrast<br />
to the relatively limited TLR expression by<br />
CD141 + DCs, it is known that CD1c + DCs,<br />
another blood mDC subset, express a wide<br />
array, including TLR4, 5 and 7. Whether<br />
human CD141 + DCs express other pattern<br />
recognition receptors, such as NOD-like<br />
receptors and RIG-I-like receptors, has yet<br />
to be determined.<br />
The identification of the human counterpart<br />
of mouse CD8 + DCs opens the possibility<br />
of translating to humans knowledge<br />
814 volume 28 number 8 AUGUST 2010 nature biotechnology
news and views<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
generated in the mouse. There are still many<br />
infectious diseases for which no efficient vaccines<br />
are available, including AIDS, malaria,<br />
hepatitis C infection and tuberculosis. Most<br />
of these would benefit from the induction of<br />
potent cytotoxic T lymphocytes to eliminate<br />
the infected cells. Similarly, strong cytotoxic<br />
T-lymphocyte responses would be beneficial<br />
in the context of cancer immunotherapy.<br />
Thus, it may be possible to exploit CD141 +<br />
DCs in the ‘DC-targeting’ vaccination strategy,<br />
in which vaccines are generated from<br />
recombinant anti-DC antibodies fused to<br />
selected antigens 15 . Studies in mice have<br />
shown that targeting antigen to DCs in this<br />
manner in vivo results in potent antigenspecific<br />
CD4 + and CD8 + T-cell immunity 15 ,<br />
provided adjuvants are co-administered to<br />
activate the targeted DCs. Indeed, antibodies<br />
to CLEC9 allowed targeting of antigen to<br />
mouse CD8 + DCs in vivo, inducing potent<br />
cytotoxic T-lymphocyte responses when<br />
combined with anti-CD40 administration 12<br />
and potent antibody responses even without<br />
co-administration of adjuvants 13 .<br />
It should be emphasized, however, that<br />
translating mouse immunological data to<br />
the clinic is fraught with uncertainty, as 65<br />
million years of independent evolution have<br />
produced many nuances that distinguish the<br />
human and mouse immune systems 16 . As one<br />
example, other human DCs, such as CD1c +<br />
DCs 1,3 and epidermal Langerhans cells 9 , can<br />
also cross-present antigens. Thus, it remains<br />
to be determined whether and how human<br />
CD141 + mDCs are related to other mDCs<br />
subsets and how all the mDC subsets cooperate<br />
in shaping adaptive immunity.<br />
COMPETING FINANCIAL INTERESTS<br />
The authors declare no competing financial interests.<br />
1. Jongbloed, S.L. et al. J. Exp. Med. 207, 1247–1260<br />
(2010).<br />
2. Bachem, A. et al. J. Exp. Med. 207, 1273–1281<br />
(2010).<br />
3. Poulin, L.F. et al. J. Exp. Med. 207, 1261–1271<br />
(2010).<br />
4. Crozat, K. et al. J. Exp. Med. 207, 1283–1292<br />
(2010).<br />
5. Shortman, K. & Heath, W.R. Immunol. Rev. 234,<br />
18–31 (2010).<br />
6. Goubier, A. et al. Immunity 29, 464–475 (2008).<br />
7. Liu, Y.J. Annu. Rev. Immunol. 23, 275–306 (2005).<br />
8. Matsui, T. et al. J. Immunol. 182, 6815–6823<br />
(2009).<br />
9. Klechevsky, E. et al. Immunity 29, 497–510 (2008).<br />
10. Schmitt, N. et al. Immunity 31, 158–169 (2009).<br />
11. Dzionek, A. et al. J. Immunol. 165, 6037–6046<br />
(2000).<br />
12. Sancho, D. et al. J. Clin. Invest. 118, 2098–2110<br />
(2008).<br />
13. Caminschi, I. et al. Blood 112, 3264–3273 (2008).<br />
14. Galibert, L. et al. J. Biol. Chem. 280, 21955–21964<br />
(2005).<br />
15. Bonifaz, L.C. et al. J. Exp. Med. 199, 815–824<br />
(2004).<br />
16. Mestas, J. & Hughes, C.C. J. Immunol. 172, 2731–<br />
2738 (2004).<br />
nature biotechnology volume 28 number 8 AUGUST 2010 815
esearch highlights<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
Lung on a chip<br />
Efforts to mimic the<br />
alveolar-capillary<br />
interface—the<br />
fundamental functional<br />
unit of the lung—in<br />
cell culture have been<br />
frustrated primarily<br />
by the challenge<br />
of replicating the<br />
structural and functional<br />
properties of the system<br />
while simulating the<br />
mechanical changes<br />
associated with normal<br />
breathing. Huh et al.<br />
recreate the behavior<br />
of lung tissue in a<br />
microfluidic device<br />
by lining a thin (10 µm), porous and flexible membrane<br />
with human alveolar epithelial cells on one side and human<br />
pulmonary microvascular endothelial cells on the other.<br />
Application and release of a vacuum to two flanking chambers<br />
causes the membrane with its adherent tissue layers to stretch<br />
and then relax to its original size, thus recreating the dynamic<br />
mechanical distortion of the alveolar-capillary interface caused<br />
by breathing. The device reproduces organ-level responses to<br />
bacterial infection and inflammatory cytokines, and its use<br />
suggests that mechanical strain can promote nanoparticleinduced<br />
toxicity. These findings underscore the potential of<br />
the chip for evaluating the safety and efficacy of new drugs for<br />
lung disorders, or the effects of environmental toxins.<br />
(Science 328, 1662–1668, 2010)<br />
PH<br />
miRNAs, Dicer and metastasis<br />
MicroRNAs (miRNAs) play a key role in the pathogenesis of cancer.<br />
Although the overexpression of individual miRNAs is important<br />
in numerous tumors, a global downregulation of miRNA levels is a<br />
hallmark of cancer. Martello et al. now show that members of the<br />
miR-103/107 family suppress the expression of Dicer, the enzyme<br />
responsible for the maturation of pre-miRNAs into miRNAs. Levels of<br />
miR-103/107 are inversely proportional to Dicer abundance in cancer<br />
cell lines and high miR-103/107 expression correlates with metastasis<br />
and poor prognosis in breast cancer. In mouse models of breast cancer,<br />
nonmetastatic cell lines can be converted to an invasive phenotype by<br />
miR-103/107 expression. Therapeutic targeting of the miRNAs with<br />
a specific antisense molecule reduces the number of lung metastases,<br />
making these miRNAs promising targets for antimetastatic drugs,<br />
although no effect on the growth of the primary tumor was observed.<br />
The miR-103/107 molecules promote an epithelial-to-mesenchymal<br />
transition, a developmental program associated with increased mobility<br />
and loss of cell adhesion that is frequently observed in metastatic<br />
cancer. (Cell 141, 1195–1207, 2010)<br />
ME<br />
Written by Kathy Aschheim, Laura DeFrancesco, Markus Elsner,<br />
Peter Hare & Craig Mak<br />
Fungal histone acetylation inhibitors<br />
Targeting fungal histone acetylation may provide a new source of drugs<br />
against Candida albicans infections, a particular problem for immunocompromised<br />
individuals, research by Wurtele et al. suggests. The authors<br />
set out to determine whether a fungal histone acetyltransferase enzyme<br />
(RTT109) not found in humans would make a good drug target. The<br />
particular modification that the enzyme makes—acetylation of lysine 56<br />
on histone 3 (H3 Lys56)—is found on close to 30% of C. albicans histones,<br />
whereas only 1% of human histones bear the mark. Knocking out both<br />
copies of RTT109 creates strains with greater sensitivity to certain antifungal<br />
agents; repressing the activity of the HST3 deacetylase enzyme led<br />
to fungal cell death. The effects were also mirrored by nicotinamide, an<br />
inhibitor of NAD-dependent deacetylases. A/J mice, a model particularly<br />
sensitive to C. albicans infection, which were injected with an HST3-<br />
repressed strain of the fungus or an RTT109-deleted strain failed to show<br />
signs of infection. Once again, nicotinamide treatment mirrored the effects<br />
of HST3 repression, but only in strains with wild-type RTT109, suggesting<br />
that nicotinamide, which acts as an anti-inflammatory, exerts its effects<br />
on infection through its interaction with the histone deacetylase pathway.<br />
Finally, the researchers showed that whereas some fungal pathogens are<br />
sensitive in various degrees to nicotinamide, all tested clinical isolates of<br />
C. albicans, the fungus with the greatest impact on human health, were<br />
sensitive. (Nat. Med. 16, 774–780, 2010)<br />
LD<br />
iPS cells from blood<br />
As researchers contemplate clinical applications of induced pluripotent stem<br />
(iPS) cells, one practical consideration is the accessibility of the donor cells<br />
used for reprogramming. So far, most human iPS cells have been derived<br />
from fibroblasts collected through skin biopsies, a procedure that requires an<br />
incision and stitches. Following three 2009 papers on the reprogramming of<br />
human hematopoietic stem/progenitor cells from cord blood or from adults<br />
after mobilization by granulocyte colony stimulating factor, three new studies<br />
describe iPS cells from unmobilized adult blood cells. All three groups rely<br />
on the standard ‘Yamanaka’ reprogramming factors (OCT4, SOX2, KLF4,<br />
C-MYC), but Loh et al. and Staerk et al. deliver these with retroviruses,<br />
whereas Seki et al. use the nonintegrating Sendai virus. The latter method<br />
appears more efficient, allowing iPSCs to be generated from samples as small<br />
as 1 ml. Like keratinocytes from plucked hair (Nat. Biotechnol. 26, 1276–1284,<br />
2008), peripheral blood cells may provide a convenient source of iPS cells in<br />
a clinical context. (Cell Stem Cell 7, 15–19; 20–24; 11–14, 2010) KA<br />
Antibody therapy for thrombosis<br />
Small-molecule therapeutics, such as aspirin and clopidogrel (Plavix),<br />
reduce the risk for heart attack and stroke by inhibiting platelets but at<br />
the cost of increased risk for excessive bleeding. Tucker et al. demonstrate<br />
an alternative strategy in baboons based on reducing platelet counts using<br />
neutralizing antibodies. This strategy was tested using a vascular graft<br />
model that mimics a damaged blood vessel at risk for thrombosis. Animals<br />
with fewer circulating platelets showed less potential for thrombosis in<br />
the graft model. Notably, the blood of these animals did not take longer<br />
to clot after cutting the animals’ forearm, whereas aspirin treatment led to<br />
a statistically significant increase in bleeding time. Tucker et al. reduced<br />
platelet counts by treating animals with serum containing polyclonal neutralizing<br />
antibodies raised in baboons against thrombopoietin, a hormone<br />
essential for platelet production. Drugs that can be safely used to inhibit<br />
platelet production will be required before this strategy can be tested in<br />
humans. (Sci. Transl. Med. 2, 37ra45, 2010)<br />
CM<br />
816 volume 28 number 8 august 2010 nature biotechnology
A n a ly s i s<br />
Discovery and characterization of chromatin states for<br />
systematic annotation of the human genome<br />
Jason Ernst 1,2 & Manolis Kellis 1,2<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
A plethora of epigenetic modifications have been described<br />
in the human genome and shown to play diverse roles in gene<br />
regulation, cellular differentiation and the onset of disease.<br />
Although individual modifications have been linked to the<br />
activity levels of various genetic functional elements, their<br />
combinatorial patterns are still unresolved and their potential<br />
for systematic de novo genome annotation remains untapped.<br />
Here, we use a multivariate Hidden Markov Model to reveal<br />
‘chromatin states’ in human T cells, based on recurrent and<br />
spatially coherent combinations of chromatin marks. We define<br />
51 distinct chromatin states, including promoter-associated,<br />
transcription-associated, active intergenic, large-scale repressed<br />
and repeat-associated states. Each chromatin state shows<br />
specific enrichments in functional annotations, sequence<br />
motifs and specific experimentally observed characteristics,<br />
suggesting distinct biological roles. This approach provides a<br />
complementary functional annotation of the human genome<br />
that reveals the genome-wide locations of diverse classes of<br />
epigenetic function.<br />
The primary DNA sequence of the human genome encodes the<br />
genetic information of each cell, but numerous epigenetic modifications<br />
can modulate the interpretation of the primary sequence.<br />
These modifications contribute to the diversity of phenotypes found<br />
across different human cell types, play key roles in the establishment<br />
and maintenance of cellular identity during development and have<br />
been associated with DNA repair, replication and human disease.<br />
Post-translational modifications in the tails of histone proteins that<br />
package DNA into chromatin constitute perhaps the most versatile<br />
type of such epigenetic information. More than a dozen positions of<br />
multiple histone proteins can undergo a number of modifications,<br />
such as acetylation and mono-, di- or tri-methylation 1,2 .<br />
More than 100 distinct histone modifications have been described,<br />
leading to the ‘histone code hypothesis’ that specific combinations of<br />
chromatin modifications would encode distinct biological functions 3 .<br />
Others, however, have instead proposed that individual epigenetic<br />
marks act in additive ways and the multitude of modifications simply<br />
contributes to stability and robustness 4 . The specific combinations of<br />
1 MIT Computer Science and Artificial Intelligence Laboratory, Cambridge,<br />
Massachusetts, USA. 2 Broad Institute of MIT and Harvard, Cambridge,<br />
Massachusetts, USA. Correspondence should be addressed to M.K.<br />
(manoli@mit.edu).<br />
Published online 25 July 2010; doi:10.1038/nbt.1662<br />
epigenetic modifications that are biologically meaningful, and their<br />
corresponding functional roles, are still largely unknown.<br />
To directly address these questions, we introduce an approach for<br />
the de novo discovery of ‘chromatin states’ (Fig. 1, Supplementary<br />
Table 1 and Supplementary Fig. 1), or biologically meaningful and<br />
spatially coherent combinations of chromatin marks, by performing<br />
a systematic genome-wide analysis based on a multivariate Hidden<br />
Markov Model (HMM). Multivariate HMMs are graphical probabilistic<br />
models that model multiple ‘observed’ inputs as generated by<br />
unobserved ‘hidden’ states, using transitions between hidden states<br />
to model spatial relationships (Online Methods).<br />
Our model captures two types of chromatin information. The frequency<br />
with which different chromatin mark combinations are found<br />
with each other are captured by a vector of ‘emission’ probabilities<br />
associated with each chromatin state (Fig. 2 and Supplementary<br />
Figs. 2 and 3) and the frequency with which different chromatin<br />
states occur in spatial relationships of each other along the genome<br />
are encoded in a ‘transition’ probability vector associated with each<br />
state. These spatial relationships capture both the spreading of certain<br />
chromatin domains across the genome, as well as the functional ordering<br />
of different states such as from intergenic regions to promoter regions<br />
and transcribed regions (Supplementary Notes and Supplementary<br />
Figs. 4–6). Biologically the genomic locations associated with a<br />
given chromatin state may correspond to specific types of functional<br />
elements, such as transcription start sites (TSS), enhancers, active genes,<br />
repressed genes, exons or heterochromatin, which can be inferred<br />
solely from the corresponding combinations of chromatin marks in<br />
their spatial context, even though no information about these annotations<br />
is given to the model as input.<br />
We applied our model to the largest data set of chromatin mark<br />
information available, consisting of the genome-wide occupancy data<br />
for a set of 38 different histone methylation and acetylation marks and<br />
for the histone variant H2AZ, RNA polymerase II (PolII) and CTCF in<br />
human CD4 T-cells. The maps were previously obtained using chromatin<br />
immunoprecipitation followed by next generation sequencing<br />
(ChIP-seq) (Online Methods) 5,6 . To understand the biological importance<br />
of the resulting chromatin states, we undertook a large-scale,<br />
systematic data-mining effort, bringing to bear dozens of genomewide<br />
data sets including gene annotations, expression information,<br />
evolutionary conservation, regulatory motif instances, compositional<br />
biases, genome-wide association data, transcription-factor binding,<br />
DNaseI hypersensitivity and nuclear lamina maps.<br />
This work provides an unbiased and systematic chromatin-driven<br />
annotation for every region of the genome at a 200 base pair resolution,<br />
refining previously described epigenetic states and introducing<br />
nature biotechnology VOLUME 28 NUMBER 8 AUGUST 2010 817
A n a ly s i s<br />
additional ones. Regardless of whether these chromatin states are<br />
causal in directing regulatory processes, or simply reinforcing independent<br />
regulatory decisions, these annotations should provide a<br />
resource for interpreting biological and medical data sets, such as<br />
genome-wide association studies for diverse phenotypes and could<br />
potentially help to identify new classes of functional elements.<br />
RESULTS<br />
Chromatin states model and comparison to previous work<br />
Previous analyses have largely focused on characterizing the marks<br />
predictive of specific classes of genomic elements defined a priori such<br />
as transcribed regions, promoters or putative enhancers, and using<br />
the characterization to identify new instances of these classes 5–12 .<br />
Chr 7:<br />
116,260 kb<br />
116,270 kb<br />
116,280 kb<br />
116,290 kb<br />
116,300 kb<br />
116,310 kb<br />
116,320 kb 116,330 kb 116,340 kb 116,350 kb 116,360 kb<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
Chromatin states<br />
Chromatin marks<br />
State 3<br />
State 5<br />
State 7<br />
State 8<br />
State 10<br />
State 11<br />
State 13<br />
State 15<br />
State 16<br />
State 17<br />
State 18<br />
State 19<br />
State 24<br />
State 25<br />
State 26<br />
State 36<br />
State 37<br />
State 38<br />
State 39<br />
State 43<br />
State 44<br />
State 51<br />
H3K14ac<br />
H3K23ac<br />
H4K12ac<br />
H2AK9ac<br />
H4K16ac<br />
H2AK5ac<br />
H4K91ac<br />
H3K4ac<br />
H2BK20ac<br />
H3K18ac<br />
H2BK120ac<br />
H3K27ac<br />
H2BK5ac<br />
H2BK12ac<br />
H3K36ac<br />
H4K5ac<br />
H4K8ac<br />
H3K9ac<br />
PolII<br />
CTCF<br />
H2AZ<br />
H3K4me3<br />
H3K4me2<br />
H3K4me1<br />
H3K9me1<br />
H3K79me3<br />
H3K79me2<br />
H3K79me1<br />
H3K27me1<br />
H2BK5me1<br />
H4K20me1<br />
H3K36me3<br />
H3K36me1<br />
H3R2me1<br />
H3R2me2<br />
H3K27me2<br />
H3K27me3<br />
H4R3me2<br />
H3K9me2<br />
H3K9me3<br />
H4K20me3<br />
Promoter states<br />
Transcribed states<br />
Active intergenic<br />
Repressed<br />
Repetitive<br />
CAPZA2<br />
50 kb<br />
Figure 1 Example of chromatin state annotation. Input chromatin mark information and resulting chromatin state annotation for a 120-kb region of<br />
human chromosome 7 surrounding the CAPZA2 gene. For each 200-bp interval, the input ChIP-Seq sequence tag count (black bars) is processed into a<br />
binary presence and/or absence call for each of 18 acetylation marks (light blue), 20 methylation marks (pink) and CTCF/Pol2/H2AZ (brown). The precise<br />
combination of these marks in each interval in their spatial context is used to infer the most probable chromatin state assignment (colored boxes). Although<br />
chromatin states were learned independently of any prior genome annotation, they correlate strongly with upstream and downstream promoters (red),<br />
5′-proximal and distal transcribed regions (purple), active intergenic regions (yellow), repressed (gray) and repetitive (blue) regions (state descriptions<br />
shown in Supplementary Table 1). This example illustrates that even when the signal coming from chromatin marks is noisy, the resulting chromatin state<br />
annotation is very robust, directly interpretable and shows a strong correspondence with the gene annotation. Several spatially coherent transitions are seen<br />
from large-scale repressed to active intergenic regions near active genes, from upstream to downstream promoter states surrounding the TSS and from<br />
5′-proximal to distal transcribed regions along the body of the gene. The frequent transitions to state 16 correlate with annotated Alu elements (57%<br />
overlap versus 4% and 25% for states 13 and 15, respectively). Transitions to state 13 are likely due to enhancer elements in the first intron of CAPZA2,<br />
a region where regulatory elements are commonly found and correlate with several enhancer marks. The maximum-probability state assignments are shown<br />
here, and the full posterior probability for each state in this region is shown in Supplementary Figure 1.<br />
818 VOLUME 28 NUMBER 8 AUGUST 2010 nature biotechnology
a n a ly s i s<br />
An unsupervised (without using prior knowledge) local chromatin<br />
pattern discovery method 13 first demonstrated that many of the<br />
patterns previously associated with promoters and enhancers could<br />
be discovered de novo, but did not discover patterns associated with<br />
broader domains and left the vast majority of the genome unannotated<br />
(Supplementary Fig. 7).<br />
Unsupervised HMM approaches that modeled chromatin mark<br />
signal intensity levels using multivariate normals or nonparametric<br />
histograms 14–18 have been previously used, but in contrast we use<br />
a binarization approach that explicitly models the presence/absence<br />
frequency of each mark. Specifically, we make a local call of whether a<br />
mark was present in each 200-bp interval, and use a Bernoulli random<br />
variable to model the probability of detection of each mark in isolation,<br />
and a product of independent probabilities to model the probability<br />
of each combination of marks (Online Methods). Our approach<br />
has the advantage that the model parameters are directly interpretable<br />
as the frequencies of each mark and each mark combination, in<br />
contrast to previous approaches for which the biological significance<br />
of the parameters corresponding to varying signal intensity levels for<br />
each mark is often unclear. Moreover, the binarization also makes our<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
a<br />
b<br />
Repetitive Repressive Active intergenic<br />
Transcribed states<br />
Promoter states<br />
State<br />
H3K14ac<br />
H3K23ac<br />
H4K12ac<br />
H2AK9ac<br />
H4K16ac<br />
H2AK5ac<br />
H4K91ac<br />
H3K4ac<br />
H2BK20ac<br />
H3K18ac<br />
H2BK120ac<br />
H3K27ac<br />
H2BK5ac<br />
H2BK12ac<br />
H3K36ac<br />
H4K5ac<br />
H4K8ac<br />
H3K9ac<br />
PolII<br />
CTCF<br />
H2AZ<br />
H3K4me3<br />
H3K4me2<br />
H3K4me1<br />
H3K9me1<br />
H3K79me3<br />
H3K79me2<br />
H3K79me1<br />
H3K27me1<br />
H2BK5me1<br />
H4K20me1<br />
H3K36me3<br />
H3K36me1<br />
H3R2me1<br />
H3R2me2<br />
H3K27me2<br />
H3K27me3<br />
H4R3me2<br />
H3K9me2<br />
H3K9me3<br />
H4K20me3<br />
State<br />
Percent of genome<br />
% +-2kb TSS<br />
Percent of TSS<br />
Chromatin mark frequency<br />
0.01 0.08 1<br />
xF TSS exact<br />
% RefSeq gene<br />
Expression level<br />
xF ZNF gene<br />
5′ UTR<br />
xF<br />
All exons<br />
xF<br />
xF Spliced exons<br />
xF 3′ UTR<br />
xF TES<br />
xF Conserved<br />
xF DNaseI<br />
TF binding<br />
xF<br />
xF CpG island<br />
% GC<br />
% Lamina<br />
% Repeat<br />
c<br />
Promoter upstream high expr; potential enh looping<br />
Promoter upstream med expr; potential enh looping<br />
Promoter upstream low expr; potential enh looping<br />
Repressed promoter<br />
TSS low-med expr; most GC rich<br />
TSS med expr<br />
TSS high expr<br />
Transcribed promoter; highest expr, TSS for active genes<br />
Transcribed promoter; highest expr, downstream<br />
Transcribed promoter; high expr, near TSS<br />
Transcribed promoter; high expr, downstream<br />
Transcribed 5′ proximal, higher expr, open chr, TF binding<br />
Transcribed 5′ proximal, higher expr, open chr<br />
Transcribed 5′ proximal, high expr, open chr<br />
Transcribed 5′ proximal, high expr<br />
Transcribed 5′ proximal, med expr; Alu repeats<br />
Transcribed less 5′ proximal, med expr, open chr<br />
Transcribed less 5′ proximal, med expr<br />
Transcribed less 5′ proximal, lower expr; Alu repeats<br />
Candidate strong enhancer in transcribed regions<br />
Spliced exons/GC rich; open chr, TF binding<br />
Spliced exons/GC rich<br />
Spliced exons/GC rich; Alu repeats<br />
Transcribed 5′ distal; exons<br />
Transcribed further 5′ distal; exons<br />
Transcribed 5′ distal; Alu repeats<br />
End of transcription; exons; high expr<br />
ZNF genes; KAP-1 repressed state<br />
Cand strong distal enh; higher open chr; higher target expr<br />
Cand strong distal enh; high open chr; higher target expr<br />
Intergenic H2AZ with open chr/TF binding. Cand. distal enh<br />
Candidate weak distal enhancer<br />
Candidate distal enhancer<br />
Proximal to active enhancers; Alu repeats<br />
Active intergenic regions not enhancer specific<br />
Active intergenic further from enhancers; Alu repeats<br />
Non-repressive intergenic domains; Alu repeats<br />
H2AZ specific state<br />
CTCF island; candidate insulator<br />
Unmappable<br />
Heterochr; nuclear lamina; most AT rich<br />
Heterochr; nuclear lamina; ERVL repeats<br />
Heterochr; lower gene depletion<br />
Heterochr; ERVL repeats: lower gene/exon depletion<br />
Specific repression<br />
Simple repeats (CA)n, (TG)n<br />
L1/LTR repeats<br />
Satellite repeat<br />
Satellite repeat; moderate mapping bias<br />
Satellite repeat; high mapping bias<br />
Satellite repeat/rRNA; extreme mapping bias<br />
Genome total/average<br />
Figure 2 Chromatin state definition and functional interpretation. (a) Chromatin mark combinations associated with each state. Each row shows the specific<br />
combination of marks associated with each chromatin state and the frequencies between 0 and 1 with which they occur (color scale). These correspond to<br />
the emission probability parameters of the Hidden Markov Model (HMM) learned across the genome during model training (values shown in Supplementary<br />
Fig. 2). Marks and states colored as in Figure 1. (b) Genomic and functional enrichments of chromatin states. %, percentage; xF, fold enrichment. In order,<br />
columns are: percentage of the genome assigned to the state; percentage of state that overlaps a 200-bp interval within 2 kb of an annotated RefSeq TSS;<br />
percentage of RefSeq TSS found in the state; fold enrichment for TSS; percentage of state overlapping a RefSeq transcribed region; average expression level<br />
of genomic intervals overlapping the state; fold-enrichment for zinc-finger–named gene; fold-enrichment for RefSeq 5′ Untranslated Region (5′-UTR) exon<br />
and introns; fold enrichment for RefSeq exons; fold enrichment for spliced exons (2 nd exon or later); fold enrichment for RefSeq 3′ Untranslated Region<br />
(3′-UTR) exons and introns; fold enrichment for RefSeq transcription end sites (TES); fold enrichment for PhastCons conserved elements; fold enrichment<br />
for DNaseI hypersensitive sites; median fold enrichment for transcription factor binding sites over a set of experiments (expanded in Supplementary<br />
Fig. 23); fold-enrichment for CpG islands; percentage of GC nucleotides; percent overlapping experimental nuclear lamina data; percent overlapping a<br />
RepeatMasker element (expanded in Supplementary Fig. 31). All enrichments are based on the posterior probability assignments. Genome total indicates<br />
the total percentage of 200 bp interval intersecting the feature or the genome average for expression and percent GC. (c) Brief description of biological state<br />
function and interpretation (chr, chromatin; enh, enhancer, full descriptions in Supplementary Table 1).<br />
nature biotechnology VOLUME 28 NUMBER 8 AUGUST 2010 819
A n a ly s i s<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
a<br />
c<br />
Number of genes<br />
States 24–28 shown<br />
2,000<br />
State 26<br />
1,500<br />
1,000<br />
500<br />
State 25<br />
State 24<br />
State 27<br />
0<br />
State 28<br />
2,000<br />
State 19<br />
States 13–23 shown<br />
1,500<br />
21 16<br />
15<br />
1,000<br />
23<br />
20 18<br />
500<br />
22<br />
13<br />
0<br />
0<br />
Gene GO<br />
category<br />
Cell cycle<br />
phase<br />
Embryonic<br />
development<br />
Chromatin<br />
Response to<br />
DNA damage<br />
RNA<br />
processing<br />
T-cell<br />
activation<br />
1,600<br />
3,200<br />
4,800<br />
3 4 5 6 7 8<br />
2.70<br />
(10 –7 )<br />
1.24<br />
(1.0)<br />
1.20<br />
(1.0)<br />
1.20<br />
(1.0)<br />
0.49<br />
(1.0)<br />
0.77<br />
(1.0)<br />
6,400<br />
8,000<br />
9,600<br />
11,200<br />
12,800<br />
14,400<br />
16,000<br />
17,600<br />
19,200<br />
Distance from transcription start site<br />
Chromatin state at TSS of corresponding gene<br />
0.57<br />
(1.0)<br />
2.82<br />
(10 –22 )<br />
0.48<br />
(1.0)<br />
0.35<br />
(1.0)<br />
0.26<br />
(1.0)<br />
0.88<br />
(1.0)<br />
1.61<br />
(10 –3 )<br />
1.07<br />
(1.0)<br />
2.17<br />
(10 –7 )<br />
1.55<br />
(0.07)<br />
1.31<br />
(1.0)<br />
1.27<br />
(1.0)<br />
Fold enrichment<br />
1.45<br />
(1.0)<br />
0.85<br />
(1.0)<br />
1.64<br />
(1.0)<br />
2.13<br />
(10 –11 )<br />
1.91<br />
(10 –11 )<br />
0.70<br />
(1.0)<br />
14<br />
12<br />
10<br />
8<br />
6<br />
4<br />
2<br />
0<br />
1.15<br />
(1.0)<br />
0.54<br />
(1.0)<br />
0.85<br />
(1.0)<br />
1.97<br />
(10 –4 )<br />
2.64<br />
(10 –24 )<br />
0.79<br />
(1.0)<br />
States 12–23 shown<br />
–2,000<br />
–1,600<br />
–1,200<br />
–800<br />
–400<br />
0<br />
400<br />
1.51<br />
(1.0)<br />
1.00<br />
(1.0)<br />
0.85<br />
(1.0)<br />
0.84<br />
(1.0)<br />
2.46<br />
(10 –4 )<br />
4.72<br />
(10 –7 )<br />
State 22<br />
State 21<br />
State 23<br />
State 20<br />
Distance from spliced exon start<br />
b<br />
Fold enrichment<br />
800<br />
1,200<br />
1,600<br />
2,000<br />
80<br />
60<br />
40<br />
20<br />
0<br />
160<br />
Fold enrichment<br />
120<br />
80<br />
40<br />
0<br />
80<br />
60<br />
40<br />
20<br />
0<br />
14<br />
12<br />
10<br />
8<br />
6<br />
4<br />
2<br />
0<br />
Dual peaking<br />
State 1<br />
State 2<br />
State 3<br />
TSS centered<br />
State 4<br />
State 5<br />
State 6<br />
State 7<br />
Downstream<br />
State 8<br />
State 9<br />
State 10<br />
State 11<br />
States 12–28 shown<br />
–4,000<br />
–3,200<br />
–2,400<br />
Distance from transcription start site<br />
State 21<br />
State 23<br />
–1,600<br />
–800<br />
State 27<br />
0<br />
800<br />
1,600<br />
2,400<br />
3,200<br />
4,000<br />
Distance from transcription end site<br />
–2,000<br />
–1,600<br />
–1,200<br />
–800<br />
–400<br />
0<br />
400<br />
800<br />
1,200<br />
1,600<br />
2,000<br />
State 12<br />
State 13<br />
State 14<br />
State 15<br />
State 16<br />
State 17<br />
State 18<br />
State 19<br />
State 20<br />
State 21<br />
State 22<br />
State 23<br />
State 24<br />
State 25<br />
State 26<br />
State 27<br />
State 28<br />
Figure 3 Promoter and transcribed chromatin states show distinct functional and positional enrichments. (a) Distinct Gene Ontology (GO) functional<br />
enrichments (fold and corrected P-values) found for genes associated with different promoter states at their TSS. For additional states and GO terms, see<br />
Supplementary Figure 29. (b) Distinct positional biases of promoter states with respect to nearest RefSeq TSS distinguish states peaking upstream, only<br />
downstream and centered at the TSS. (c) Positional biases of transcribed states with respect to TSS, nearest spliced exon start and transcription end<br />
sites (TES). These distinguish 5′-proximal states (12–23, left panel), 5′-distal states (24–28), states strongly enriched for spliced exons (middle panel,<br />
see also Supplementary Fig. 24 for plot for states 24–28) and TES-associated states (with state 27 being particularly precisely positioned, right panel).<br />
model less prone to forming states overfitting potentially insignificant<br />
variations in signal intensity levels. In contrast to models that use a<br />
multivariate normal distribution, our method avoids this strong parametric<br />
assumption, which is generally violated by the often relatively<br />
small discrete counts found in ChIP-seq experiments, enabling more<br />
robust models to be inferred. In comparison to the models previously<br />
inferred based on a nonparametric histogram strategy 18 , our binarization<br />
approach uses an order of magnitude fewer parameters per state,<br />
further increasing model robustness and interpretability.<br />
We developed a procedure for learning sets of chromatin states<br />
across a range of model complexities. For a given number of states and<br />
from a set of initial parameters, standard expectation maximization<br />
based procedures enable simultaneous local optimization of the state<br />
definitions (emission and transition probabilities) and the corresponding<br />
genome annotation consistent with the observed data. However<br />
the model inferred and its quality can depend on the initial set of<br />
parameters, which can confound comparing models with different<br />
number of states learned from independent initializations. We therefore<br />
used a two-stage process that first selected a 79-state model which<br />
had the highest complexity-penalized likelihood score across a large<br />
compendium of randomly-initialized models of varying complexity.<br />
We then pruned and optimized this model down to smaller numbers<br />
of states, leading to a model with 51 states that were relatively<br />
consistently recovered across the compendium of models, and that<br />
sufficiently captured all states found in larger models for which we<br />
could give a distinct biological interpretation (see Online Methods).<br />
This enabled us to maintain a relatively small number of states while<br />
capturing most of the unique biology uncovered across our compendium<br />
of randomly-initialized models. Put in other words, this<br />
procedure enabled us to maximize biological interpretability, while<br />
minimizing model complexity. We further ensured that general<br />
properties of the resulting model validated our approach, including<br />
robustness to varying thresholds and different background models,<br />
and independence of marks given a chromatin state (Supplementary<br />
Notes, Supplementary Figs. 8–21 and Supplementary Table 2).<br />
We next describe the likely biological functions of the 51 discovered<br />
chromatin states, divided into five large groups.<br />
Promoter-associated states<br />
The first group of states, states 1–11, all had high enrichment for<br />
promoter regions: 40–89% of each state was within 2 kb of a RefSeq<br />
TSS, compared with 2.7% genome-wide (P < 10 −200 , for all states).<br />
820 VOLUME 28 NUMBER 8 AUGUST 2010 nature biotechnology
a n a ly s i s<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
Figure 4 SNP and GWAS enrichments for<br />
chromatin states. (a) Several chromatin states<br />
show enrichments for disease association<br />
data sets. For each state is shown: genome<br />
percentage; fold enrichment for SNPs from the<br />
HapMap CEU population; fold enrichment from a<br />
collection of 1,640 GWAS SNPs associated with<br />
a variety of diseases and traits from numerous<br />
studies 25 ; fold enrichment of GWAS SNPs<br />
relative to the HapMap CEU SNP enrichment;<br />
significance of GWAS SNPs relative to the<br />
underlying SNP frequency (when the corrected<br />
P-value < 0.01). (b) Example of intergenic<br />
SNP in GWAS-enriched state 33, found 40 kb<br />
downstream of the IKZF2 gene and associated<br />
with plasma eosinophil count levels 26 . SNP<br />
significance as reported 26 is shown for each<br />
SNP in the region (blue circles) and associated<br />
chromatin state annotation (similar to Fig. 1).<br />
Red circle denotes top SNP and its overlap with<br />
state 33. In addition to top SNPs, secondary<br />
SNPs were also frequently found at or near<br />
GWAS-enriched states in several cases.<br />
These states accounted for 59% of all RefSeq TSS although they<br />
covered only 1.3% of genome. These states all had a high frequency of<br />
H3K4me3 in common, as well as significant enrichments for DNaseI<br />
hypersensitive sites, CpG islands, evolutionarily conserved motifs and<br />
bound transcription factors (Fig. 2). They differed however in the<br />
presence and levels of other associated marks, primarily H3K79me2/3,<br />
H4K20me1, H3K4me1/2 and H3K9me1, and of numerous acetylations<br />
leading to varying strength of the aforementioned functional<br />
enrichments, and varying expression levels of the downstream genes<br />
(Supplementary Figs. 22 and 23).<br />
Promoter states differed in the enrichment of Gene Ontology (GO)<br />
terms of associated genes including cell cycle, embryonic development,<br />
RNA processing and T-cell activation (Fig. 3a). For instance, the term<br />
‘embryonic development’ is specifically enriched in state 4, whereas<br />
the term ‘T-cell activation’ is specifically enriched in state 8. Promoter<br />
states also differed in their preferentially enriched positions with respect<br />
to the TSS of associated genes (Fig. 3b). States 4–7 were most concentrated<br />
over the TSS (showing upwards of 100-fold enrichment), states<br />
8–11 peaked between 400 bp and 1,200 bp downstream of the TSS and<br />
corresponded to transcribed promoter regions of expressed genes and<br />
states 1–3 peaked both upstream and downstream of the TSS.<br />
Transcription-associated states<br />
The second large group of chromatin states consisted of 17<br />
transcription-associated states. These are 70–95% contained within<br />
RefSeq-annotated transcribed regions compared to 36% for the rest<br />
of the genome (Fig. 2b, P < 10 −200 , for all states). This group was not<br />
predominantly associated with a single mark, but instead defined by<br />
combinations of seven marks, H3K79me3, H3K79me2, H3K79me1,<br />
H3K27me1, H2BK5me1, H4K20me1 and H3K36me3 (Fig. 2a).<br />
Inspection of the transition frequencies between these states revealed<br />
subgroups of states that are associated with 5′-proximal or 5′-distal<br />
locations and with different expression levels (Fig. 2c, Supplementary<br />
Notes, Supplementary Table 1 and Supplementary Fig. 4).<br />
We observed several states strongly enriched for spliced exons (states<br />
21–25 and 27–28 with 5.7- to 9.7-fold enrichments) (Figs. 2b and 3c and<br />
Supplementary Fig. 24). Spliced exons were previously reported to be<br />
enriched in several individual marks 19–21 . In contrast to these previous<br />
studies, the combinatorial approach we have taken here shows that<br />
a<br />
State<br />
Percent<br />
genome<br />
HapMap<br />
CEU SNP<br />
GWAS<br />
HapMap CEU<br />
SNP and GWAS<br />
P value<br />
4.6E-04<br />
3.2E-03<br />
5.2E-05<br />
5.8E-04<br />
3.6E-06<br />
b<br />
Promoter<br />
states<br />
Transcribed<br />
states<br />
Active<br />
State 33<br />
intergenic<br />
states<br />
Repressed<br />
states<br />
Repetitive<br />
states<br />
Human mRNAs<br />
Spliced ESTs<br />
Mammal cons<br />
–log P<br />
6<br />
5<br />
4<br />
3<br />
2<br />
1<br />
0<br />
213.3<br />
individual marks in spliced exonic states are also frequently detected in<br />
several other states that show only a modest 1.3- to 1.6-fold enrichment<br />
for spliced exons (e.g., states 12, 13, 14 and 17). This suggests that the<br />
chromatin signature of spliced exons is not solely defined by the presence<br />
of the previously reported H3K36me3, H2BK5me1, H4K20me1<br />
and H3K79me1 marks, but their specific combinations and the absence<br />
of H3K4me2, H3K9me1 and H3K79me2/3.<br />
State 27 showed a 12.5-fold enrichment for transcription end sites<br />
(TES) with its enrichment peaking directly over these locations (Fig. 3c).<br />
It was characterized both by the presence of H3K36me3, PolII and<br />
H4K20me1 and the absence of H3K4me1, H3K4me2 and H3K4me3,<br />
distinguishing it from other transcribed states with higher PolII or<br />
H3K36me3 frequencies. This suggests a distinct signature for 3′ ends of<br />
genes for which, to our knowledge, no specific chromatin signature had<br />
been described before. This was further validated by a 3.4-fold signal<br />
enrichment for the elongating form of PolII surveyed in an independent<br />
study 22 (Supplementary Fig. 25), even though our input data did not<br />
distinguish between the elongating and non-elongating form.<br />
State 28 showed a 112-fold enrichment in zinc-finger genes, which<br />
comprise 58% of the state. This state was characterized by the high frequency<br />
for H3K9me3, H4K20me3 and H3K36me3 and relatively low<br />
frequency of other marks. This specific combination has been independently<br />
reported as marking regions of KAP1 binding, a zinc-finger–<br />
specific co-repressor, which also shows a specific 44-fold enrichment<br />
for state 28 (refs. 23,24). Although the association of H3K9me3 and<br />
H4K20me3 with zinc-finger genes has been previously reported 5 , the<br />
de novo discovery of this highly specific signature of zinc-finger genes<br />
illustrates the utility of the methodology and also reveals the additional<br />
presence of H3K36me3 and lower frequency of other marks as<br />
complementing the signature of zinc-finger genes.<br />
Active intergenic states<br />
The third broad class of chromatin states consisted of 11 active<br />
intergenic states (states 29–39), including several classes of candidate<br />
enhancer regions, insulator regions and other regions proximal<br />
to expressed genes (Supplementary Notes). These states were<br />
associated with higher frequencies for H3K4me1, H2AZ, numerous<br />
acetylation marks and/or CTCF and with lower frequencies<br />
for other methylation marks (Fig. 2a and Supplementary Figs. 2<br />
rs12619285<br />
IKZF2<br />
IKZF2<br />
213.4 213.5 213.6 213.7 213.8<br />
Position (Mb)<br />
nature biotechnology VOLUME 28 NUMBER 8 AUGUST 2010 821
A n a ly s i s<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
a<br />
True-positive rate<br />
b<br />
True-positive rate<br />
0.7<br />
0.6<br />
0.5<br />
0.4<br />
0.3<br />
0.2<br />
0.1<br />
0<br />
0<br />
0.5<br />
0.4<br />
0.3<br />
RefSeq gene transcription start sites<br />
5<br />
7<br />
6<br />
8<br />
4<br />
2 9 1011<br />
H3K4me3<br />
3 1<br />
H3K9ac<br />
Pol2<br />
RefSeq gene transcripts<br />
False-positive rate<br />
45 21 20 31<br />
Individual marks (CD4T)<br />
Chromatin states ordered<br />
(CD4T cells only)<br />
CAGE tags (all cell types)<br />
H3K4me3 at varying cutoffs<br />
0.005 0.01 0.015 0.02 0.025 0.03<br />
10 8 26<br />
2114<br />
20<br />
2328<br />
7 6 5 4<br />
27<br />
Individual marks (CD4T cells)<br />
0.2<br />
9<br />
19<br />
Chromatin states ordered<br />
24<br />
(CD4T cells only)<br />
25<br />
Expressed sequence tags<br />
11<br />
16<br />
H4K20me1<br />
(all cell types)<br />
0.1 22<br />
H3K79me3<br />
12<br />
H3K36me3<br />
18<br />
17<br />
H2BK5me1<br />
13<br />
15 H3K79me1<br />
H3K79me2<br />
0<br />
0 0.005 0.01 0.015 0.02 0.025 0.03<br />
False-positive rate<br />
and 3). They occurred primarily away from promoter regions<br />
(85–97% outside 2 kb of a TSS) and outside of transcribed genes<br />
(48–64% outside of RefSeq annotations, Fig. 2b). When they overlapped<br />
gene annotations, it was mainly in regions that were repressed<br />
or not highly expressed (see expression column in Fig. 2b).<br />
States 29–33 were notable as they corresponded to smaller fractions<br />
of the genome specifically associated with greater DNaseI<br />
hypersensitivity, transcription factor binding and regulatory motif<br />
instances and are likely to represent enhancer regions (Fig. 2 and<br />
Supplementary Fig. 23). Although these candidate enhancer states all<br />
shared higher H3K4me1 frequencies, they showed differences in the<br />
expression levels of downstream genes associated with subtle differences<br />
in their specific mark combinations (Supplementary Fig. 22).<br />
For instance, genes downstream of state 30 had a consistently higher<br />
average expression level than genes downstream of state 31 (P < 0.001<br />
at 10 kb, two-sided t-test). The two states differed in the frequency of<br />
several acetylation marks (state 30 relative to 31 showed higher frequency<br />
for H2BK120ac, H3K27ac and H2BK5ac and lower frequency<br />
for H4K5ac, H4K8ac) and also in the level of H2AZ (higher in state<br />
31 than 30), suggesting that these marks may be playing a more<br />
complex role than previously thought in enhancer regions.<br />
Several active intergenic states showed significant enrichments<br />
for genome-wide association study (GWAS) hits (e.g., 3.3-fold for<br />
candidate enhancer state 33, Fig. 4a), based on a curated database<br />
of top-scoring single-nucleotide polymorphisms (SNPs) in a range<br />
of diseases and traits 25 . These states thus provide a likely common<br />
functional role and means of refining many intergenic SNPs even<br />
in the absence of other annotations. For example (Fig. 4b), a SNP<br />
reported to be strongly associated with plasma eosinophil count levels<br />
in inflammatory diseases (rs12619285) 26 and located 40 kb downstream<br />
of IKZF2 in an intergenic region devoid of annotations is in<br />
a section of the genome in the chromatin state 33, which is enriched<br />
c<br />
State<br />
CAGE (%)<br />
CAGE (%) | not RefSeq TSS<br />
mRNA (%)<br />
mRNA (%) | not RefSeq<br />
% Overall 2 2 46 16<br />
Figure 5 Discovery power of chromatin states for genome annotation.<br />
(a) Comparison of the power to discover TSS for individual chromatin<br />
marks (red), chromatin states (blue) ordered by their TSS enrichment<br />
and a directed experimental approach based on CAGE sequence tag data<br />
read counts from all available cell types 36 (gold), whereas the chromatin<br />
states and marks use only data from CD4 T-cells. Both chromatin states<br />
and CAGE tags are compared using a receiver operating characteristic<br />
(ROC) curve that shows the false-positive (x axis) and true-positive<br />
(y axis) rates at varying prediction thresholds or increasing numbers<br />
of states in the task of predicting if a 200-bp interval intersects a<br />
RefSeq TSS. Thin red curve compares performance of H3K4me3 mark<br />
at varying intensity thresholds. (b) Comparison of the power to detect<br />
RefSeq transcribed regions for chromatin states and marks as in a, and<br />
directed experimental information coming from EST data (gold) based<br />
on sequence counts from all available cell types 37,38 . (c) Independent<br />
experimental information provides support that a significant fraction of<br />
false positives in a and b are genuine unannotated TSS and transcribed<br />
regions currently missing from RefSeq. Percentage of each state<br />
supported by a CAGE tag (column 1), and the same percentage for<br />
locations at least 2 kb away from a RefSeq TSS (column 2), suggests that<br />
many promoter-associated state assignments outside RefSeq promoters<br />
are supported by CAGE tag evidence. Similarly, percentage of each state<br />
overlapping a GenBank mRNA (column 3), and the same percentage<br />
specifically outside RefSeq genes (column 4), suggest that transcriptionassociated<br />
state assignments outside RefSeq genes are supported<br />
by mRNA evidence. Similar support is found by GenBank ESTs and<br />
evolutionarily conserved, predicted new exons (Supplementary Fig. 33).<br />
for GWAS hits. In contrast, the surrounding region of the genome<br />
is assigned to other active or repressed intergenic states with no<br />
significant GWAS association.<br />
Large-scale repressed states<br />
The next group of states (40–45) marked large-scale repressed and<br />
heterochromatic regions, representing 64% of the genome. The two<br />
most frequently detected modifications in total for all the states in this<br />
group were H3K27me3 and H3K9me3. State 40, covering 13% of the<br />
genome, was essentially devoid of any detected modifications, states<br />
41–42 (25% of the genome) had a higher frequency for H3K9me3 than<br />
H3K27me3, whereas states 43–45 (26% of the genome) had a higher<br />
frequency for H3K27me3. States 41–42 as compared to states 43–45<br />
showed a stronger depletion for genes, promoters and conserved elements<br />
and stronger association with nuclear lamina regions 27 and the<br />
darkest-staining chromosomal bands 28 . It also had a higher frequency<br />
of A/T nucleotides (Fig. 2b and Supplementary Figs. 26–28).<br />
State 45 likely corresponds to targeted gene repression. It showed<br />
the highest frequency for H3K27me3 and was unique among repressed<br />
states to show enrichment for TSS. The corresponding genes were<br />
enriched for development-related GO categories (Supplementary<br />
Fig. 29), similar to the repressed promoter state 4 marked by<br />
H3K4me3. However, in contrast to state 4, state 45 showed almost no<br />
change in acetylation levels in response to histone deacetylase inhibitor<br />
(HDACi) treatment (Supplementary Fig. 30), suggesting that state 4<br />
is poised for activation whereas state 45 is stably repressed 29 .<br />
Repetitive states<br />
The final group of six states (46–51) showed strong and distinct<br />
enrichments for specific repetitive elements (Supplementary Fig. 31).<br />
State 46 had a strong enrichment of simple repeats, specifically<br />
(CA) n , (TG) n or (CATG) n (44, 45 and 302-fold, respectively), possibly<br />
due to sequence biases in ChIP-based experiments 30 . State 47<br />
was characterized specifically by H3K9me3 and enriched for L1 and<br />
LTR repeats. State 48–51 all had higher frequencies of H4K20me3<br />
and H3K9me3 and were heavily enriched for satellite repeat elements.<br />
822 VOLUME 28 NUMBER 8 AUGUST 2010 nature biotechnology
a n a ly s i s<br />
States 49–51 showed seemingly high frequencies for numerous<br />
modifications, but also strong enrichments in sequence reads from<br />
a nonspecific antibody (IgG) control 31 (Supplementary Fig. 20),<br />
suggesting these enrichments are due to a lack of coverage for the<br />
additional copies of these repeat elements in the reference genome<br />
assembly 32 , thus illustrating the ability of our model to capture such<br />
potential artifacts by considering all marks jointly.<br />
Predictive power for genome annotation<br />
We next set out to study the predictive power of chromatin states for<br />
the discovery of functional elements. We focused on two classes of<br />
elements that benefit from ample experimental information independent<br />
of chromatin marks, TSS and transcribed regions. We found<br />
that chromatin states consistently outperformed predictions based on<br />
individual marks (Fig. 5a,b), emphasizing the importance of using<br />
a<br />
State<br />
None<br />
H3K4me2<br />
H3K18ac<br />
H3K4me3<br />
H3K79me3<br />
H2BK5me1<br />
H3K36me3<br />
H2BK120ac<br />
H3K9me3<br />
H3K4me1<br />
H4K20me1<br />
H2AZ<br />
CTCF<br />
H2BK5ac<br />
H4K91ac<br />
H3K27me3<br />
H4K20me3<br />
H3K9me1<br />
H4K5ac<br />
H3K79me2<br />
H2BK20ac<br />
H3K27me1<br />
H3K27ac<br />
H3K79me1<br />
H3K27me2<br />
PolII<br />
H3K4ac<br />
H3R2me1<br />
H2AK5ac<br />
H4K8ac<br />
H3K36ac<br />
H3R2me2<br />
H3K9me2<br />
H2BK12ac<br />
H3K9ac<br />
H3K36me1<br />
H4K16ac<br />
H4R3me2<br />
H3K23ac<br />
H4K12ac<br />
H2AK9ac<br />
H3K14ac<br />
b<br />
State<br />
First 10 greedy<br />
Ref. 38<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
c<br />
50<br />
45<br />
40<br />
35<br />
Squared error<br />
30<br />
25<br />
20<br />
15<br />
10<br />
5<br />
0<br />
0 2 4 6 8 10 12 14 16 18 20<br />
22<br />
24<br />
26<br />
28<br />
30<br />
32<br />
34<br />
36<br />
38<br />
40<br />
Number of marks<br />
Figure 6 Recovery of chromatin states with subsets of marks. (a) The figure shows the ordering of marks (top, from left to right) based on a greedy<br />
forward selection algorithm to optimize a squared error penalty on state misassignments (Online Methods). Conditioned on all the marks to the left<br />
having already been profiled, the mark listed is the optimal selection for one additional mark to be profiled based on the target optimization function.<br />
Below each mark is the percentage of a state with identical assignments using the subset of marks. (b) Comparison of the percentage of each state<br />
recovered between the first ten marks based on the greedy method and the ten marks previously used 33 (Supplementary Fig. 39). The two columns after<br />
the state IDs are the proportion of the states recovered using the greedy algorithm and the set previously used 33 . (c) The figure shows a progressive<br />
decrease in squared error for state misassignment as a function of the number of marks selected based on the greedy algorithm.<br />
nature biotechnology VOLUME 28 NUMBER 8 AUGUST 2010 823
A n a ly s i s<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
mark combinations and spatial genomic information (Supplementary<br />
Notes and Supplementary Fig. 32 for a comparison to k-means clustering<br />
and a supervised classifier). The prediction performance of<br />
chromatin states based on just CD4 T-cells was similar to that of cap<br />
analysis of gene expression (CAGE) tags and expressed sequence tags<br />
(ESTs) data, even though these were obtained across many diverse cell<br />
types. This was possible because active and inactive states together<br />
capture the information about genetic elements across cell type<br />
boundaries (Fig. 5 and Supplementary Figs. 33–35). Moreover, based<br />
on our 51-state model, we could predict TSS and transcribed regions<br />
when applied to occupancy data obtained for a subset of ten chromatin<br />
marks in CD36 erythrocyte precursors and CD133 hematopoietic<br />
stem cells 33 (Supplementary Fig. 36).<br />
We also found that chromatin states revealed candidate promoter<br />
and transcribed regions not in RefSeq, but further supported by independent<br />
experimental evidence. Candidate promoters overlapped with<br />
CAGE tags (Fig. 5c) and intergenic PolII (Supplementary Fig. 37), and<br />
candidate transcribed regions overlapped GenBank mRNAs (Fig. 5c)<br />
and EST data (Supplementary Fig. 33). A number of promoter and<br />
transcribed states outside known genes were also strongly enriched<br />
for not previously described protein-coding exons predicted using<br />
evolutionary comparisons of 29 mammals (Lin and M.K., unpublished<br />
data) (Supplementary Fig. 33). We note that some candidate promoters<br />
may represent distal enhancers, sharing promoter-associated marks<br />
potentially due to looping of enhancer to promoter regions 7 .<br />
Recovery of chromatin states using subsets of marks<br />
As the large majority of chromatin states were defined by multiple marks,<br />
we next sought to specifically study the contribution of each mark in<br />
defining chromatin states. First, we found several notable examples of<br />
both additive relationships, such as acetylation marks in promoter regions,<br />
and combinatorial relationships, such as methylation marks associated<br />
with repressive and repetitive elements (Supplementary Notes and<br />
Supplementary Fig. 38). We also evaluated varying subsets of chromatin<br />
marks in their ability to distinguish between chromatin states<br />
(Supplementary Notes and Supplementary Figs. 39–41). More generally,<br />
we sought to provide guidelines for selecting subsets of chromatin marks<br />
to survey in new cell types that would be maximally informative.<br />
As a proof of principle, we evaluated the recovery power of increasing<br />
numbers of marks in a greedy way, that is, selecting the best mark given<br />
all previous selected marks, weighing each state equally and penalizing<br />
mismatches uniformly (see Online Methods), which provided an<br />
initial unbiased recommendation of marks to survey for a new cell type<br />
(Fig. 6). We find that increasing subsets of marks rapidly converge to a<br />
fairly accurate annotation of chromatin states (Fig. 6c), providing costefficient<br />
recommendations for new cell types. In addition to an overall<br />
error score, this analysis provides information on the proportion of each<br />
state accurately recovered, and specific pairwise state misassignments.<br />
Such information could be incorporated in a modified scoring function<br />
to provide chromatin mark recommendations targeted to the<br />
subset of chromatin states that are of particular biological interest, or<br />
the particular state distinctions that are most important to each study.<br />
DISCUSSION<br />
The discovery and systematic characterization of chromatin states presented<br />
here reveals a diverse epigenomic landscape with 51 functionally<br />
distinct chromatin states. Although the exact number of chromatin states<br />
can vary based on the number of chromatin marks surveyed and the<br />
desired resolution at which state differences are studied, our results suggest<br />
that the genome annotation resulting from these states can extend the<br />
interpretable part of the human genome, especially outside protein-coding<br />
genes. The definition of the states themselves revealed numerous insights<br />
into the combinatorial and additive roles of chromatin marks, sometimes<br />
hinting at combinations of chromatin marks that were not previously<br />
described, and the genome-wide annotation of these states exposed many<br />
previously unannotated candidate functional elements.<br />
We expect the usefulness of the methods presented here will<br />
increase as additional genome-wide epigenetic data sets become<br />
available, and as additional cell types are surveyed systematically.<br />
Chromatin states can be inferred with virtually any type of epigenetic<br />
and related information, including histone variants, DNA methylation,<br />
DNaseI hypersensitivity and binding of chromatin-associated<br />
and sequence-specific transcription factors. Although we focused on<br />
a single human cell type, the methods are generally applicable to any<br />
species and any number of cell types and even whole embryos, albeit<br />
in mixed cell populations mutually exclusive marks found in different<br />
subsets of cells could potentially be interpreted as co-occurring.<br />
Specifically for understanding epigenomic dynamics, chromatin<br />
states can play a central role going forward, as they provide a uniform<br />
language for interpreting and comparing diverse epigenetic data<br />
sets, for selecting and prioritizing chromatin marks for additional<br />
cell types and for summarizing complex relationships of dozens of<br />
marks in directly-interpretable chromatin states. As several largescale<br />
data production efforts are currently underway to map the<br />
epigenomes of many more cell types, exemplified by the ENCODE 34 ,<br />
modENCODE 35 and Epigenome Roadmap projects (http://www.<br />
roadmapepigenomics.org/), chromatin states will likely play a key<br />
role in the understanding of the human epigenome and its role in<br />
development, health and disease.<br />
Methods<br />
Methods and any associated references are available in the online version<br />
of the paper at http://www.nature.com/naturebiotechnology/.<br />
Note: Supplementary information is available on the <strong>Nature</strong> Biotechnology website.<br />
Acknowledgments<br />
We thank P. Kheradpour for regulatory motif instances and M.F. Lin for predicted<br />
new exons. We thank M. Garber, A. Siepel, K. Lindblad-Toh, and E. Lander for use of<br />
comparative information on 29 mammals. We thank B. Bernstein, N. Shoresh, C. Epstein<br />
and T. Mikkelsen for helpful discussions. We thank L. Goff, C. Bristow, R. Sealfon and<br />
all members of the MIT CompBio Group for comments, feedback and support. This<br />
material is based upon work supported by the National Science Foundation under award<br />
no. 0905968 and funding from the US National Human Genome Research Institute<br />
(NHGRI) under awards U54-HG004570 and RC1-HG005334.<br />
AUTHOR CONTRIBUTIONS<br />
J.E. and M.K. developed the method, analyzed results and wrote the paper.<br />
COMPETING FINANCIAL INTERESTS<br />
The authors declare no competing financial interests.<br />
Published online at http://www.nature.com/naturebiotechnology/.<br />
Reprints and permissions information is available online at http://npg.nature.com/<br />
reprintsandpermissions/.<br />
1. Bernstein, B.E., Meissner, A. & Lander, E.S. The mammalian epigenome. Cell 128,<br />
669–681 (2007).<br />
2. Kouzarides, T. Chromatin modifications and their function. Cell 128, 693–705<br />
(2007).<br />
3. Strahl, B.D. & Allis, C.D. The language of covalent histone modifications. <strong>Nature</strong><br />
403, 41–45 (2000).<br />
4. Schreiber, S.L. & Bernstein, B.E. Signaling network model of chromatin. Cell 111,<br />
771–778 (2002).<br />
5. Barski, A. et al. High-resolution profiling of histone methylations in the human<br />
genome. Cell 129, 823–837 (2007).<br />
6. Wang, Z. et al. Combinatorial patterns of histone acetylations and methylations in<br />
the human genome. Nat. Genet. 40, 897–903 (2008).<br />
824 VOLUME 28 NUMBER 8 AUGUST 2010 nature biotechnology
a n a ly s i s<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
7. Heintzman, N.D. et al. Distinct and predictive chromatin signatures of transcriptional<br />
promoters and enhancers in the human genome. Nat. Genet. 39, 311–318 (2007).<br />
8. Heintzman, N.D. et al. Histone modifications at human enhancers reflect global<br />
cell-type-specific gene expression. <strong>Nature</strong> 459, 108–112 (2009).<br />
9. Guttman, M. et al. Chromatin signature reveals over a thousand highly conserved<br />
large non-coding RNAs in mammals. <strong>Nature</strong> 458, 223–227 (2009).<br />
10. Hon, G., Wang, W. & Ren, B. Discovery and annotation of functional chromatin<br />
signatures in the human genome. PLoS Comput. Biol. 5, e1000566 (2009).<br />
11. Wang, X., Xuan, Z., Zhao, X., Li, Y. & Zhang, M.Q. High-resolution human corepromoter<br />
prediction with CoreBoost_HM. Genome Res. 19, 266–275 (2009).<br />
12. Won, K.J., Chepelev, I., Ren, B. & Wang, W. Prediction of regulatory elements in<br />
mammalian genomes using chromatin signatures. BMC Bioinformatics 9, 547 (2008).<br />
13. Hon, G., Ren, B. & Wang, W. ChromaSig: a probabilistic approach to finding common<br />
chromatin signatures in the human genome. PLOS Comput. Biol. 4, e1000201<br />
(2008).<br />
14. Day, N., Hemmaplardh, A., Thurman, R.E., Stamatoyannopoulos, J.A. & Noble, W.S.<br />
Unsupervised segmentation of continuous genomic data. Bioinformatics 23,<br />
1424–1426 (2007).<br />
15. Jia, L. et al. Functional enhancers at the gene-poor 8q24 cancer-linked locus. PLoS<br />
Genet. 5, e1000597 (2009).<br />
16. Thurman, R.E., Day, N., Noble, W.S. & Stamatoyannopoulos, J.A. Identification of<br />
higher-order functional domains in the human ENCODE regions. Genome Res. 17,<br />
917 (2007).<br />
17. Schuettengruber, B. et al. Functional anatomy of polycomb and trithorax chromatin<br />
landscapes in Drosophila embryos. PLoS Biol. 7, e13 (2009).<br />
18. Jaschek, R. & Tanay, A. Spatial clustering of multivariate genomic and epigenomic<br />
information. in Proceedings of the 13th Annual International Conference on Research<br />
in Computational Molecular Biology (ed. Batzoglou, S.) 170–183 (Springer, 2009).<br />
19. Schwartz, S., Meshorer, E. & Ast, G. Chromatin organization marks exon-intron<br />
structure. Nat. Struct. Mol. Biol. 16, 990–995 (2009).<br />
20. Kolasinska-Zwierz, P. et al. Differential chromatin marking of introns and expressed<br />
exons by H3K36me3. Nat. Genet. 41, 376–381 (2009).<br />
21. Andersson, R., Enroth, S., Rada-Iglesias, A., Wadelius, C. & Komorowski, J.<br />
Nucleosomes are well positioned in exons and carry characteristic histone<br />
modifications. Genome Res. 19, 1732–1741 (2009).<br />
22. Schones, D.E. et al. Dynamic regulation of nucleosome positioning in the human<br />
genome. Cell. 132, 878–898 (2008).<br />
23. Sripathy, S.P., Stevens, J. & Schultz, D.C. The KAP1 corepressor functions to<br />
coordinate the assembly of de novo HP1-demarcated microenvironments of<br />
heterochromatin required for KRAB zinc finger protein-mediated transcriptional<br />
repression. Mol. Cell. Biol. 26, 8623–8638 (2006).<br />
24. O’Geen, H. et al. Genome-wide analysis of KAP1 binding suggests autoregulation<br />
of KRAB-ZNFs. PLoS Genet. 3, e89 (2007).<br />
25. Hindorff, L.A., Junkins, H.A., Mehta, J.P. & Manolio, T.A. A catalog of published<br />
genome-wide association studies. accessed<br />
July 22, 2009.<br />
26. Gudbjartsson, D.F. et al. Sequence variants affecting eosinophil numbers associate<br />
with asthma and myocardial infarction. Nat. Genet. 41, 342–347 (2009).<br />
27. Guelen, L. et al. Domain organization of human chromosomes revealed by mapping<br />
of nuclear lamina interactions. <strong>Nature</strong> 453, 948–951 (2008).<br />
28. Furey, T.S. & Haussler, D. Integration of the cytogenetic map with the draft human<br />
genome sequence. Hum. Mol. Genet. 12, 1037–1044 (2003).<br />
29. Wang, Z. et al. Genome-wide mapping of HATs and HDACs reveals distinct functions<br />
in active and inactive genes. Cell 138, 1019–1031 (2009).<br />
30. Johnson, D.S. et al. Systematic evaluation of variability in ChIP-chip experiments<br />
using predefined DNA targets. Genome Res. 18, 393–403 (2008).<br />
31. Zang, C. et al. A clustering approach for identification of enriched domains from<br />
histone modification ChIP-Seq data. Bioinformatics 25, 1952–1958 (2009).<br />
32. Zhang, Y., Shin, H., Song, J.S., Lei, Y. & Liu, X.S. Identifying positioned nucleosomes<br />
with epigenetic marks in human from ChIP-Seq. BMC Genomics 9, 537 (2008).<br />
33. Cui, K. et al. Chromatin signatures in multipotent human hematopoietic stem cells<br />
indicate the fate of bivalent genes during differentiation. Cell Stem Cell 4, 80–93<br />
(2009).<br />
34. ENCODE Project Consortium. Identification and analysis of functional elements in<br />
1% of the human genome by the ENCODE pilot project. <strong>Nature</strong> 447, 799–816<br />
(2007).<br />
35. Celniker, S.E. et al. Unlocking the secrets of the genome. <strong>Nature</strong> 459, 927–930<br />
(2009).<br />
36. Carninci, P. et al. Genome-wide analysis of mammalian promoter architecture and<br />
evolution. Nat. Genet. 38, 626–635 (2006).<br />
37. Karolchik, D. et al. The UCSC Genome Browser Database: 2008 update. Nucleic Acids<br />
Res. 36, D773–D779 (2008).<br />
38. Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J. & Wheeler, D.L. GenBank:<br />
update. Nucleic Acids Res. 32, D23–D26 (2004).<br />
nature biotechnology VOLUME 28 NUMBER 8 AUGUST 2010 825
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
ONLINE METHODS<br />
Input data for modeling. The initial unprocessed data were bed files containing<br />
the genomic coordinates and strand orientation of mapped sequence reads<br />
from ChIP-seq experiments 5,6 . There was a separate bed file for each of the 18<br />
acetylations, 20 methylations, H2AZ, CTCF and PolII in CD4 T cells. We used<br />
the updated version of the H3K79me1/2/3 data, as reported 6 , which differs<br />
from the version first reported 5 .<br />
To apply the model we first divided the genome into 200-base-pair nonoverlapping<br />
intervals within which we independently made a call as to whether<br />
each of the 41 marks was detected as being present or not based on the count<br />
of tags mapping to the interval. Each tag was uniquely assigned to one interval<br />
based on the location of the 5′ end of the tag after applying a shift of 100 bases<br />
in the 5′ to 3′ direction of the tag. The threshold, t, for each mark was based<br />
on the total number of mapped reads for the mark (Supplementary Table 2),<br />
and was set to be the smallest integer t such that P(X>t)
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
The sequence data for computed nucleotide frequencies, CpG islands, repeats 42<br />
and conservation data were also obtained from the UCSC genome browser.<br />
The conservation data were based on PhastCon conserved elements using the<br />
44-way vertebrate alignment 43,44 (Lindblad-Toh, K. et al., Broad Institute,<br />
unpublished data ). Transcription factor binding enrichments were computed<br />
for 18 experiments from numerous publications (Supplementary Fig. 23), the<br />
median enrichment over all these experiments is reported in Figure 2b. The<br />
DNaseI hypersensitivity data was as described 45 obtained from the UCSC genome<br />
browser. The nuclear lamina data of human fibroblasts was obtained from<br />
ref. 27. The zinc-finger genes were defined as those that had ‘ZNF’ at the beginning<br />
of the gene symbol in the RefSeq gene table. For published coordinates<br />
that were in hg17 we converted them to hg18 using the liftover tool from the<br />
UCSC genome browser 46 .<br />
Expression, motif and gene ontology analyses. We obtained the processed<br />
CD4 T expression data from ref. 47 for both replicates. We then averaged the<br />
two replicates. After averaging the two replicates we performed a natural log<br />
transform of the average values. We then standardized all values by subtracting<br />
the mean log transformed value and then dividing by the s.d. of the log transform<br />
values. The genome coordinates of each probe set were obtained from the UCSC<br />
genome browser. Each 200 bp interval that overlapped a probe set obtained the<br />
transformed expression score. If multiple probe sets overlapped the same 200 bp<br />
then the average of the expression values associated with these were taken.<br />
We generated transcription factor motif enrichments as described 48 ,<br />
extended for position-weight matrices (PWMs) (Kheradpour, P., MIT, and<br />
M.K., unpublished data) based on the hard state assignments.<br />
Gene ontology enrichments were based on the hard state assignment of the<br />
interval containing the RefSeq annotated TSS of the gene. Enrichments were<br />
computed using the STEM software (v.1.3.4) and the Bonferroni corrected<br />
P-values are reported 49 .<br />
SNP and GWAS analysis. The HapMap CEU 50 data were downloaded from<br />
the UCSC genome browser. Significant GWAS hits were taken from ref. 25.<br />
SNPs listed as occurring multiple times were only counted once, and for the<br />
SNP set listed as a 17-marker haplotype only the first SNP was used giving<br />
1,640 SNPs. In computing enrichment for HapMap and GWAS SNPs, if two<br />
SNPs mapped to the same interval, we counted them multiple times. To determine<br />
if the number of GWAS SNPs in a chromatin state was more significant<br />
than would be expected based on the general SNP frequency in the state<br />
we used a binomial distribution where n = 1,640 and p is the proportion of<br />
HapMap CEU SNPs assigned to the state. We applied a Bonferroni correction<br />
for testing multiple states and only reported those P-values significantly<br />
enriched with P < 0.01.<br />
RefSeq TSS and gene transcripts discovery. The ROC curve for the CAGE data<br />
was based on the number of CAGE tags mapping to a 200 bp interval retrieved from<br />
the Fantom database and converted from hg17 to hg18 using the UCSC genome<br />
browser liftover tool 36 . The overlap with EST was based on those EST listed in<br />
the UCSC genome browser all_est table as of November, 29, 2009 (refs. 37,38).<br />
The overlap with GenBank mRNA is based on the overlap with the UCSC genome<br />
browser mRNA listed in the table as of October 31, 2009 (refs. 37,38). The novel<br />
exon predictions are from (Lin, M.F., MIT, and M.K., unpublished data).<br />
Mark subset evaluation and selection. When evaluating the coverage of<br />
a specified subset of marks, first a posterior distribution over the states at<br />
each interval is computed using the model learned on the full set of marks,<br />
except that the marks not in the subset are omitted when computing emission<br />
probabilities. For an interval t we define here s t,k and f t,k to be the posterior<br />
assignment to state k at interval t based on the subset and full set of marks,<br />
respectively. The proportion of state k recovered with a subset of marks is<br />
defined as:<br />
min( f s<br />
c t t, k, t,<br />
k)<br />
k<br />
ft,<br />
k<br />
= ∑ ∑ t<br />
where the sum is over all intervals t in the genome. The ordering of marks presented<br />
without any prior biological knowledge was based on a greedy forward selection<br />
algorithm designed to select marks that would minimize this function:<br />
∑<br />
2<br />
( 1−<br />
c k )<br />
k<br />
where the sum is over all states. At each step the algorithm would then choose<br />
the one additional mark, conditioned on all the other previously selected<br />
marks that would cause this function to be minimized. We note that this<br />
target function considers all nonidentical state assignments to have equal loss.<br />
An extension of this approach would be to apply target functions that weigh<br />
different misassignments differently. The proportion of state k with the full<br />
set of marks that is misassigned to state i using a subset of marks, m k,i , as is<br />
presented in Supplementary Figures 39 and 40, is defined as:<br />
mk,<br />
i =<br />
∑<br />
⎛<br />
⎛ max( st, i − ft,<br />
i, 0)<br />
⎞⎞<br />
⎜max( ft, k − st,<br />
k , 0)<br />
⎜ ⎟⎟<br />
t ⎜<br />
⎜ max( st j f<br />
j , − t,<br />
j, )<br />
⎝ ∑<br />
0<br />
⎠<br />
⎟⎟<br />
⎝<br />
⎠<br />
∑<br />
t f t,<br />
k<br />
The first term in the sum in the numerator represents for an interval t the<br />
amount of posterior probability assigned to state k using the full set of marks<br />
not assigned using the subset of marks. The second term represents the portion<br />
of this posterior probability that will be credited to state i. The portion<br />
credited to state i is the proportion of the surplus posterior state i received<br />
with the subset of marks in the interval relative to the total surplus posterior<br />
all states received in the interval.<br />
39. Durbin, R., Eddy, S., Krogh, A. & Mitchison, G. Biological Sequence Analysis<br />
(Cambridge Univ. Press, 1998).<br />
40. Neal, R.M. & Hinton, G.E. A view of the EM algorithm that justifies incremental,<br />
sparse, and other variants. Learn. Graph. Models 89, 355–368 (1998).<br />
41. Pruitt, K.D., Tatusova, T. & Maglott, D.R. NCBI reference sequences (RefSeq): a<br />
curated non-redundant sequence database of genomes, transcripts and proteins.<br />
Nucleic Acids Res. 35, D61–D65 (2007).<br />
42. Smit, A., Hubley, R. & Green, P. RepeatMasker Open-3.0 1996-2010 .<br />
43. Miller, W. et al. 28-way vertebrate alignment and conservation track in the UCSC<br />
Genome Browser. Genome Res. 17, 1797–1808 (2007).<br />
44. Siepel, A. et al. Evolutionarily conserved elements in vertebrate, insect, worm, and<br />
yeast genomes. Genome Res. 15, 1034–1050 (2005).<br />
45. Boyle, A.P. et al. High-resolution mapping and characterization of open chromatin<br />
across the genome. Cell 132, 311–322 (2008).<br />
46. Kent, W.J. et al. The human genome browser at UCSC. Genome Res. 12, 996–1006<br />
(2002).<br />
47. Su, A.I. et al. A gene atlas of the mouse and human protein-encoding transcriptomes.<br />
Proc. Natl. Acad. Sci. USA 101, 6062–6067 (2004).<br />
48. Kheradpour, P., Stark, A., Roy, S. & Kellis, M. Reliable prediction of regulator<br />
targets using 12 Drosophila genomes. Genome Res. 17, 1919–1931 (2007).<br />
49. Ernst, J. & Bar-Joseph, Z. STEM: a tool for the analysis of short time series gene<br />
expression data. BMC Bioinformatics 7, 191 (2006).<br />
50. International HapMap Consortium. A second generation human haplotype map of<br />
over 3.1 million SNPs. <strong>Nature</strong> 449, 851–861 (2007).<br />
doi:10.1038/nbt.1662<br />
nature biotechnology
A r t i c l e s<br />
The MicroArray Quality Control (MAQC)-II study of<br />
common practices for the development and validation<br />
of microarray-based predictive models<br />
MAQC Consortium *<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
Gene expression data from microarrays are being applied to predict preclinical and clinical endpoints, but the reliability of<br />
these predictions has not been established. In the MAQC-II project, 36 independent teams analyzed six microarray data sets<br />
to generate predictive models for classifying a sample with respect to one of 13 endpoints indicative of lung or liver toxicity in<br />
rodents, or of breast cancer, multiple myeloma or neuroblastoma in humans. In total, >30,000 models were built using many<br />
combinations of analytical methods. The teams generated predictive models without knowing the biological meaning of some of<br />
the endpoints and, to mimic clinical reality, tested the models on data that had not been used for training. We found that model<br />
performance depended largely on the endpoint and team proficiency and that different approaches generated models of similar<br />
performance. The conclusions and recommendations from MAQC-II should be useful for regulatory agencies, study committees<br />
and independent investigators that evaluate methods for global gene expression analysis.<br />
As part of the United States Food and Drug Administration’s (FDA’s)<br />
Critical Path Initiative to medical product development (http://www.<br />
fda.gov/oc/initiatives/criticalpath/), the MAQC consortium began in<br />
February 2005 with the goal of addressing various microarray reliability<br />
concerns raised in publications 1–9 pertaining to reproducibility<br />
of gene signatures. The first phase of this project (MAQC-I) extensively<br />
evaluated the technical performance of microarray platforms<br />
in identifying all differentially expressed genes that would potentially<br />
constitute biomarkers. The MAQC-I found high intra-platform reproducibility<br />
across test sites, as well as inter-platform concordance of<br />
differentially expressed gene lists 10–15 and confirmed that microarray<br />
technology is able to reliably identify differentially expressed genes<br />
between sample classes or populations 16,17 . Importantly, the MAQC-I<br />
helped produce companion guidance regarding genomic data submission<br />
to the FDA (http://www.fda.gov/downloads/Drugs/GuidanceCo<br />
mplianceRegulatoryInformation/Guidances/ucm079855.pdf).<br />
Although the MAQC-I focused on the technical aspects of gene<br />
expression measurements, robust technology platforms alone are<br />
not sufficient to fully realize the promise of this technology. An<br />
additional requirement is the development of accurate and reproducible<br />
multivariate gene expression–based prediction models, also<br />
referred to as classifiers. Such models take gene expression data from<br />
a patient as input and as output produce a prediction of a clinically<br />
relevant outcome for that patient. Therefore, the second phase of the<br />
project (MAQC-II) has focused on these predictive models 18 , studying<br />
both how they are developed and how they are evaluated. For<br />
any given microarray data set, many computational approaches can<br />
be followed to develop predictive models and to estimate the future<br />
performance of these models. Understanding the strengths and limitations<br />
of these various approaches is critical to the formulation<br />
of guidelines for safe and effective use of preclinical and clinical<br />
genomic data. Although previous studies have compared and benchmarked<br />
individual steps in the model development process 19 , no<br />
prior published work has, to our knowledge, extensively evaluated<br />
current community practices on the development and validation of<br />
microarray-based predictive models.<br />
Microarray-based gene expression data and prediction models are<br />
increasingly being submitted by the regulated industry to the FDA<br />
to support medical product development and testing applications 20 .<br />
For example, gene expression microarray–based assays that have<br />
been approved by the FDA as diagnostic tests include the Agendia<br />
MammaPrint microarray to assess prognosis of distant metastasis in<br />
breast cancer patients 21,22 and the Pathwork Tissue of Origin Test<br />
to assess the degree of similarity of the RNA expression pattern in<br />
a patient’s tumor to that in a database of tumor samples for which<br />
the origin of the tumor is known 23 . Gene expression data have<br />
also been the basis for the development of PCR-based diagnostic<br />
assays, including the xDx Allomap test for detection of rejection of<br />
heart transplants 24 .<br />
The possible uses of gene expression data are vast and include diagnosis,<br />
early detection (screening), monitoring of disease progression,<br />
risk assessment, prognosis, complex medical product characterization<br />
and prediction of response to treatment (with regard to safety or<br />
efficacy) with a drug or device labeling intent. The ability to generate<br />
models in a reproducible fashion is an important consideration in<br />
predictive model development.<br />
A lack of consistency in generating classifiers from publicly available<br />
data is problematic and may be due to any number of factors<br />
including insufficient annotation, incomplete clinical identifiers,<br />
coding errors and/or inappropriate use of methodology 25,26 . There<br />
* A full list of authors and affiliations appears at the end of the paper. Correspondence should be addressed to L.S. (leming.shi@fda.hhs.gov or leming.shi@gmail.com).<br />
Received 2 March; accepted 30 June; published online 30 July 2010; doi:10.1038/nbt.1665<br />
nature biotechnology VOLUME 28 NUMBER 8 AUGUST 2010 827
A rt i c l e s<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
are also examples in the literature of classifiers whose performance<br />
cannot be reproduced on independent data sets because of poor study<br />
design 27 , poor data quality and/or insufficient cross-validation of all<br />
model development steps 28,29 . Each of these factors may contribute<br />
to a certain level of skepticism about claims of performance levels<br />
achieved by microarray-based classifiers.<br />
Previous evaluations of the reproducibility of microarray-based<br />
classifiers, with only very few exceptions 30,31 , have been limited<br />
to simulation studies or reanalysis of previously published results.<br />
Frequently, published benchmarking studies have split data sets at<br />
random, and used one part for training and the other for validation.<br />
This design assumes that the training and validation sets are produced<br />
by unbiased sampling of a large, homogeneous population of samples.<br />
However, specimens in clinical studies are usually accrued over years<br />
and there may be a shift in the participating patient population and<br />
also in the methods used to assign disease status owing to changing<br />
practice standards. There may also be batch effects owing to time<br />
variations in tissue analysis or due to distinct methods of sample<br />
collection and handling at different medical centers. As a result,<br />
samples derived from sequentially accrued patient populations, as<br />
was done in MAQC-II to mimic clinical reality, where the first cohort<br />
is used for developing predictive models and subsequent patients are<br />
included in validation, may differ from each other in many ways that<br />
could influence the prediction performance.<br />
The MAQC-II project was designed to evaluate these sources of<br />
bias in study design by constructing training and validation sets at<br />
different times, swapping the test and training sets and also using<br />
data from diverse preclinical and clinical scenarios. The goals of<br />
MAQC-II were to survey approaches in genomic model development<br />
in an attempt to understand sources of variability in prediction<br />
performance and to assess the influences of endpoint signal strength<br />
in data. By providing the same data sets to many organizations for<br />
analysis, but not restricting their data analysis protocols, the project<br />
has made it possible to evaluate to what extent, if any, results depend<br />
on the team that performs the analysis. This contrasts with previous<br />
benchmarking studies that have typically been conducted by single<br />
laboratories. Enrolling a large number of organizations has also made<br />
it feasible to test many more approaches than would be practical for<br />
any single team. MAQC-II also strives to develop good modeling<br />
practice guidelines, drawing on a large international collaboration of<br />
experts and the lessons learned in the perhaps unprecedented effort<br />
of developing and evaluating >30,000 genomic classifiers to predict<br />
a variety of endpoints from diverse data sets.<br />
MAQC-II is a collaborative research project that includes<br />
participants from the FDA, other government agencies, industry<br />
and academia. This paper describes the MAQC-II structure and<br />
experimental design and summarizes the main findings and key<br />
results of the consortium, whose members have learned a great deal<br />
during the process. The resulting guidelines are general and should<br />
not be construed as specific recommendations by the FDA for<br />
regulatory submissions.<br />
RESULTS<br />
Generating a unique compendium of >30,000 prediction models<br />
The MAQC-II consortium was conceived with the primary<br />
goal of examining model development practices for generating<br />
binary classifiers in two types of data sets, preclinical and clinical<br />
(Supplementary Tables 1 and 2). To accomplish this, the project<br />
leader distributed six data sets containing 13 preclinical and clinical<br />
endpoints coded A through M (Table 1) to 36 voluntary participating<br />
data analysis teams representing academia, industry<br />
and government institutions (Supplementary Table 3). Endpoints<br />
were coded so as to hide the identities of two negative-control endpoints<br />
(endpoints I and M, for which class labels were randomly<br />
assigned and are not predictable by the microarray data) and two<br />
positive-control endpoints (endpoints H and L, representing the<br />
sex of patients, which is highly predictable by the microarray data).<br />
Endpoints A, B and C tested teams’ ability to predict the toxicity<br />
of chemical agents in rodent lung and liver models. The remaining<br />
endpoints were predicted from microarray data sets from human<br />
patients diagnosed with breast cancer (D and E), multiple myeloma<br />
(F and G) or neuroblastoma (J and K). For the multiple myeloma<br />
and neuroblastoma data sets, the endpoints represented event free<br />
survival (abbreviated EFS), meaning a lack of malignancy or disease<br />
recurrence, and overall survival (abbreviated OS) after 730 days<br />
(for multiple myeloma) or 900 days (for neuroblastoma) post treatment<br />
or diagnosis. For breast cancer, the endpoints represented<br />
estrogen receptor status, a common diagnostic marker of this<br />
cancer type (abbreviated ‘erpos’), and the success of treatment<br />
involving chemotherapy followed by surgical resection of a tumor<br />
(abbreviated ‘pCR’). The biological meaning of the control endpoints<br />
was known only to the project leader and not revealed to<br />
the project participants until all model development and external<br />
validation processes had been completed.<br />
To evaluate the reproducibility of the models developed by a data<br />
analysis team for a given data set, we asked teams to submit models<br />
from two stages of analyses. In the first stage (hereafter referred to as<br />
the ‘original’ experiment), each team built prediction models for up to<br />
13 different coded endpoints using six training data sets. Models were<br />
‘frozen’ against further modification, submitted to the consortium<br />
and then tested on a blinded validation data set that was not available<br />
to the analysis teams during training. In the second stage (referred<br />
to as the ‘swap’ experiment), teams repeated the model building and<br />
validation process by training models on the original validation set<br />
and validating them using the original training set.<br />
To simulate the potential decision-making process for evaluating a<br />
microarray-based classifier, we established a process for each group<br />
to receive training data with coded endpoints, propose a data analysis<br />
protocol (DAP) based on exploratory analysis, receive feedback on<br />
the protocol and then perform the analysis and validation (Fig. 1).<br />
Analysis protocols were reviewed internally by other MAQC-II participants<br />
(at least two reviewers per protocol) and by members of the<br />
MAQC-II Regulatory Biostatistics Working Group (RBWG), a team<br />
from the FDA and industry comprising biostatisticians and others<br />
with extensive model building expertise. Teams were encouraged to<br />
revise their protocols to incorporate feedback from reviewers, but<br />
each team was eventually considered responsible for its own analysis<br />
protocol and incorporating reviewers’ feedback was not mandatory<br />
(see Online Methods for more details).<br />
We assembled two large tables from the original and swap experiments<br />
(Supplementary Tables 1 and 2, respectively) containing<br />
summary information about the algorithms and analytic steps, or<br />
‘modeling factors’, used to construct each model and the ‘internal’<br />
and ‘external’ performance of each model. Internal performance<br />
measures the ability of the model to classify the training samples,<br />
based on cross-validation exercises. External performance measures<br />
the ability of the model to classify the blinded independent validation<br />
data. We considered several performance metrics, including Matthews<br />
Correlation Coefficient (MCC), accuracy, sensitivity, specificity,<br />
area under the receiver operating characteristic curve (AUC) and<br />
root mean squared error (r.m.s.e.). These two tables contain data on<br />
>30,000 models. Here we report performance based on MCC because<br />
828 VOLUME 28 NUMBER 8 AUGUST 2010 nature biotechnology
A rt i c l e s<br />
it is informative when the distribution of the two classes in a data set<br />
is highly skewed and because it is simple to calculate and was available<br />
for all models. MCC values range from +1 to −1, with +1 indicating<br />
perfect prediction (that is, all samples classified correctly and none<br />
incorrectly), 0 indicates random prediction and −1 indicating perfect<br />
inverse prediction.<br />
The 36 analysis teams applied many different options under each<br />
modeling factor for developing models (Supplementary Table 4)<br />
including 17 summary and normalization methods, nine batch-effect<br />
removal methods, 33 feature selection methods (between 1 and >1,000<br />
features), 24 classification algorithms and six internal validation<br />
methods. Such diversity suggests the community’s common practices are<br />
Table 1 Microarray data sets used for model development and validation in the MAQC-II project<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
Date set<br />
code<br />
Endpoint<br />
code<br />
Endpoint<br />
description<br />
Hamner A Lung tumorigen<br />
vs. non-tumorigen<br />
(mouse)<br />
Iconix B Non-genotoxic liver<br />
carcinogens vs.<br />
non-carcinogens<br />
(rat)<br />
NIEHS C Liver toxicants vs.<br />
non-toxicants based<br />
on overall necrosis<br />
score (rat)<br />
Breast<br />
cancer<br />
(BR)<br />
Multiple<br />
myeloma<br />
(MM)<br />
Neuroblastoma<br />
(NB)<br />
D<br />
E<br />
F<br />
G<br />
H<br />
I<br />
J<br />
K<br />
L<br />
M<br />
Pre-operative treatment<br />
response (pCR,<br />
pathologic complete<br />
response)<br />
Estrogen receptor<br />
status (erpos)<br />
Overall survival<br />
milestone outcome<br />
(OS, 730-d cutoff)<br />
Event-free survival<br />
milestone outcome<br />
(EFS, 730-d cutoff)<br />
Clinical parameter<br />
S1 (CPS1). The<br />
actual class label<br />
is the sex of the<br />
patient. Used as a<br />
“positive” control<br />
endpoint<br />
Clinical parameter<br />
R1 (CPR1). The<br />
actual class label is<br />
randomly assigned.<br />
Used as a “negative”<br />
control endpoint<br />
Overall survival<br />
milestone outcome<br />
(OS, 900-d cutoff)<br />
Event-free survival<br />
milestone outcome<br />
(EFS, 900-d cutoff)<br />
Newly established<br />
parameter S (NEP_S).<br />
The actual class label<br />
is the sex of the<br />
patient. Used as a<br />
“positive” control<br />
endpoint<br />
Newly established<br />
parameter R (NEP_R).<br />
The actual class label<br />
is randomly assigned.<br />
Used as a “negative”<br />
control endpoint<br />
Microarray<br />
platform<br />
Affymetrix Mouse<br />
430 2.0<br />
Amersham Uniset<br />
Rat 1 Bioarray<br />
Affymetrix<br />
Rat 230 2.0<br />
Affymetrix Human<br />
U133A<br />
Affymetrix Human<br />
U133Plus 2.0<br />
Different versions<br />
of Agilent human<br />
microarrays<br />
Number<br />
of samples<br />
Comments and references<br />
Positives Negatives P/N Number Positives Negatives P/N<br />
(P) (N) ratio of samples (P) (N) ratio<br />
Training set a Validation set a<br />
70 26 44 0.59 88 28 60 0.47 The training set was first<br />
published in 2007 (ref. 50) and<br />
the validation set was generated<br />
for MAQC-II<br />
216 73 143 0.51 201 57 144 0.40 The data set was first published<br />
in 2007 (ref. 51). Raw microarray<br />
intensity data, instead of ratio<br />
data, were provided for MAQC-II<br />
data analysis<br />
214 79 135 0.58 204 78 126 0.62 Exploratory visualization of the<br />
data set was reported in 2008<br />
(ref. 53). However, the phenotype<br />
classification problem was<br />
formulated specifically for<br />
MAQC-II. A large amount of<br />
additional microarray and<br />
phenotype data were provided to<br />
MAQC-II for cross-platform and<br />
cross-tissue comparisons<br />
130 33 97 0.34 100 15 85 0.18 The training set was first<br />
published in 2006 (ref. 56) and<br />
the validation set was specifically<br />
generated for MAQC-II. In addition,<br />
130 80 50 1.6 100 61 39 1.56<br />
two distinct endpoints (D<br />
and E) were analyzed in MAQC-II<br />
340 51 289 0.18 214 27 187 0.14 The data set was first published<br />
in 2006 (ref. 57) and 2007<br />
(ref. 58). However, patient<br />
340 84 256 0.33 214 34 180 0.19 survival data were updated and<br />
the raw microarray data (CEL<br />
files) were provided specifically<br />
340 194 146 1.33 214 140 74 1.89 for MAQC-II data analysis. In<br />
addition, endpoints H and I were<br />
designed and analyzed specifically<br />
in MAQC-II<br />
340 200 140 1.43 214 122 92 1.33<br />
238<br />
239<br />
246<br />
246<br />
22<br />
49<br />
145<br />
145<br />
216<br />
190<br />
101<br />
101<br />
0.10<br />
0.26<br />
1.44<br />
1.44<br />
177<br />
193<br />
231<br />
253<br />
39<br />
83<br />
133<br />
143<br />
138<br />
110<br />
98<br />
110<br />
0.28<br />
0.75<br />
1.36<br />
1.30<br />
The training data set was first<br />
published in 2006 (ref. 63).<br />
The validation set (two-color<br />
Agilent platform) was generated<br />
specifically for MAQC-II. In addition,<br />
one-color Agilent platform<br />
data were also generated for most<br />
samples used in the training and<br />
validation sets specifically for<br />
MAQC-II to compare the prediction<br />
performance of two-color<br />
versus one-color platforms.<br />
Patient survival data were also<br />
updated. In addition, endpoints L<br />
and M were designed and<br />
analyzed specifically in MAQC-II<br />
The first three data sets (Hamner, Iconix and NIEHS) are from preclinical toxicogenomics studies, whereas the other three data sets are from clinical studies. Endpoints H and L are positive<br />
controls (sex of patient) and endpoints I and M are negative controls (randomly assigned class labels). The nature of H, I, L and M was unknown to MAQC-II participants except for the project<br />
leader until all calculations were completed.<br />
a Numbers shown are the actual number of samples used for model development or validation.<br />
nature biotechnology VOLUME 28 NUMBER 8 AUGUST 2010 829
A rt i c l e s<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
Figure 1 Experimental design and timeline<br />
of the MAQC-II project. Numbers (1–11)<br />
order the steps of analysis. Step 11 indicates<br />
when the original training and validation<br />
data sets were swapped to repeat steps 4–10.<br />
See main text for description of each step.<br />
Every effort was made to ensure the complete<br />
independence of the validation data sets from<br />
the training sets. Each model is characterized<br />
by several modeling factors and seven internal<br />
and external validation performance metrics<br />
(Supplementary Tables 1 and 2). The modeling<br />
factors include: (i) organization code; (ii) data<br />
set code; (iii) endpoint code; (iv) summary and<br />
normalization; (v) feature selection method;<br />
(vi) number of features used; (vii) classification<br />
algorithm; (viii) batch-effect removal method;<br />
(ix) type of internal validation; and (x) number<br />
of iterations of internal validation. The seven<br />
performance metrics for internal validation and<br />
external validation are: (i) MCC; (ii) accuracy;<br />
(iii) sensitivity; (iv) specificity; (v) AUC;<br />
(vi) mean of sensitivity and specificity; and<br />
(vii) r.m.s.e. s.d. of metrics are also provided for<br />
internal validation results.<br />
9/07 – 10/07<br />
1. Exploratory<br />
data analysis<br />
(36 DATs)<br />
well represented. For each of the models nominated by a team as being<br />
the best model for a particular endpoint, we compiled the list of features<br />
used for both the original and swap experiments (see the MAQC Web<br />
site at http://edkb.fda.gov/MAQC/). These comprehensive tables represent<br />
a unique resource. The results that follow describe data mining<br />
efforts to determine the potential and limitations of current practices for<br />
developing and validating gene expression–based prediction models.<br />
Performance depends on endpoint and can be estimated<br />
during training<br />
Unlike many previous efforts, the study design of MAQC-II provided<br />
the opportunity to assess the performance of many different modeling<br />
a<br />
External validation (MCC)<br />
c<br />
MCC<br />
1.0<br />
0.8<br />
0.6<br />
0.4<br />
0.2<br />
0<br />
–0.2<br />
–0.4<br />
b<br />
External validation (MCC)<br />
10/07<br />
9/1/2007 2/1/2009<br />
10/07 – 12/07 1/08 – 3/08<br />
3/08 – 8/08 8/08 – 9/08 10/08 – 2/09<br />
4. Data sets<br />
5. Classifiers<br />
12/07 – 1/08<br />
3. Review & approval<br />
of DAP by RBWG<br />
11/07 12/07<br />
2. Data analysis<br />
protocol (DAP)<br />
1. Exploration 2. DAP 3. DAP review<br />
11. Swap<br />
r = 0.840, N = 18,060 1.0 r = 0.951, N = 13<br />
Endpoint<br />
A<br />
0.8<br />
B<br />
C<br />
D<br />
0.6<br />
E<br />
F<br />
0.4<br />
G<br />
H<br />
0.2<br />
I<br />
I G<br />
J<br />
K<br />
0<br />
L<br />
M –0.2<br />
M<br />
–0.4<br />
–0.6<br />
–0.6 –0.4 –0.2 0 0.2 0.4 0.6 0.8 1.0<br />
Internal validation (MCC)<br />
1.0<br />
L C H E K<br />
0.8<br />
0.6<br />
0.4<br />
0.2<br />
0<br />
J<br />
1/08<br />
1/08 3/08 8/08 9/08<br />
Face-to-face<br />
meeting<br />
4. Six training<br />
data sets<br />
(13 endpoints)<br />
2/08 3/08<br />
5. Classifiers are frozen<br />
(mark one for validation)<br />
7. Validation<br />
(blind test)<br />
data sets<br />
distribution<br />
6. Models 7. Validation 8. Prediction<br />
4/08 5/08 6/08 7/08 8/08 9/08 10/08 11/08 12/08 1/09<br />
6. MAQC-II’s<br />
candidate models<br />
9-10. Meta-data<br />
distribution<br />
9<br />
8. Prediction<br />
results<br />
approaches on a clinically realistic blinded external validation data set.<br />
This is especially important in light of the intended clinical or preclinical<br />
uses of classifiers that are constructed using initial data sets and<br />
validated for regulatory approval and then are expected to accurately<br />
predict samples collected under diverse conditions perhaps months or<br />
years later. To assess the reliability of performance estimates derived<br />
during model training, we compared the performance on the internal<br />
training data set with performance on the external validation data set<br />
for of each of the 18,060 models in the original experiment (Fig. 2a).<br />
Models without complete metadata were not included in the analysis.<br />
We selected 13 ‘candidate models’, representing the best model for<br />
each endpoint, before external validation was performed. We required<br />
that each analysis team nominate one model<br />
L<br />
H C<br />
E<br />
J<br />
K B<br />
–0.6<br />
–0.6 –0.4 –0.2 0 0.2 0.4 0.6 0.8 1.0<br />
Internal validation (MCC)<br />
D G F A I M<br />
Internal validation<br />
External validation<br />
–0.2<br />
–0.4<br />
1796 970 866 1143 1079 2263 1192 2905 877 863 1569 807 1730<br />
NBpositive<br />
NIEHS MM-<br />
BR-<br />
NB- NB- Iconix BR-<br />
MM- MM- Hamner MM-<br />
NB-<br />
(rat liver positive erpos EFS OS (rat liver pCR EFS OS (mouse negative negative<br />
necrosis)<br />
tumor)<br />
lung tumor)<br />
B<br />
D<br />
A<br />
F<br />
5′<br />
Models<br />
9/08 – 10/08<br />
11. Swap<br />
prediction<br />
results<br />
12. Meta-data analysis<br />
& visualization<br />
10. Table of model information<br />
Performance metrics<br />
1<br />
2<br />
3<br />
...<br />
...<br />
...<br />
...<br />
...<br />
n<br />
Modeling<br />
factors<br />
Internal<br />
validation<br />
External<br />
validation<br />
1<br />
... ... ... ... ... ... ... ...<br />
2 3 m<br />
...<br />
...<br />
...<br />
MF1 MF2 MF3 IV1 IV2 IV3 EV1 EV2 EV3<br />
12. Meta-data analysis<br />
for each endpoint they analyzed and we then<br />
selected one candidate from these nominations<br />
for each endpoint. We observed a<br />
higher correlation between internal and<br />
external performance estimates in terms<br />
1.0<br />
0.9<br />
0.8<br />
0.7<br />
0.6<br />
0.5<br />
0.4<br />
0.3<br />
0.2<br />
0.1<br />
r = 0.8495, N = 17,092<br />
0.2<br />
0.30.40.50.60.70.80.91.0<br />
Figure 2 Model performance on internal<br />
validation compared with external validation.<br />
(a) Performance of 18,060 models that were<br />
validated with blinded validation data.<br />
(b) Performance of 13 candidate models.<br />
r, Pearson correlation coefficient; N, number<br />
of models. Candidate models with binary and<br />
continuous prediction values are marked as<br />
circles and squares, respectively, and the<br />
standard error estimate was obtained using<br />
500-times resampling with bagging of the<br />
prediction results from each model. (c) Distribution<br />
of MCC values of all models for each endpoint in<br />
internal (left, yellow) and external (right, green)<br />
validation performance. Endpoints H and L (sex of<br />
the patients) are included as positive controls and<br />
endpoints I and M (randomly assigned sample<br />
class labels) as negative controls. Boxes indicate<br />
the 25% and 75% percentiles, and whiskers<br />
indicate the 5% and 95% percentiles.<br />
830 VOLUME 28 NUMBER 8 AUGUST 2010 nature biotechnology
A rt i c l e s<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
Figure 3 Performance, measured using MCC,<br />
of the best models nominated by the 17 data<br />
analysis teams (DATs) that analyzed all 13<br />
endpoints in the original training-validation<br />
experiment. The median MCC value for<br />
an endpoint, representative of the level of<br />
predicability of the endpoint, was calculated<br />
based on values from the 17 data analysis<br />
teams. The mean MCC value for a data analysis<br />
team, representative of the team’s proficiency<br />
in developing predictive models, was calculated<br />
based on values from the 11 non-random<br />
endpoints (excluding negative controls I and M).<br />
Red boxes highlight candidate models. Lack<br />
of a red box in an endpoint indicates that the<br />
candidate model was developed by a data analysis<br />
team that did not analyze all 13 endpoints.<br />
DAT24<br />
DAT13<br />
DAT25<br />
DAT11<br />
DAT12<br />
DAT32<br />
DAT10<br />
DAT20<br />
DAT4<br />
DAT18<br />
DAT36<br />
DAT29<br />
DAT35<br />
DAT7<br />
DAT19<br />
DAT33<br />
DAT3<br />
Median<br />
of MCC for the selected candidate models<br />
Candidate<br />
(r = 0.951, n = 13, Fig. 2b) than for the overall<br />
Mean* L<br />
set of models (r = 0.840, n = 18,060, Fig. 2a),<br />
suggesting that extensive peer review of<br />
analysis protocols was able to avoid selecting<br />
models that could result in less reliable<br />
predictions in external validation. Yet, even<br />
for the hand-selected candidate models, there is noticeable bias in the<br />
performance estimated from internal validation. That is, the internal<br />
validation performance is higher than the external validation performance<br />
for most endpoints (Fig. 2b). However, for some endpoints<br />
and for some model building methods or teams, internal and external<br />
performance correlations were more modest as described in the following<br />
sections.<br />
To evaluate whether some endpoints might be more predictable<br />
than others and to calibrate performance against the positive- and<br />
negative-control endpoints, we assessed all models generated for each<br />
endpoint (Fig. 2c). We observed a clear dependence of prediction<br />
performance on endpoint. For example, endpoints C (liver necrosis<br />
score of rats treated with hepatotoxicants), E (estrogen receptor status<br />
of breast cancer patients), and H and L (sex of the multiple myeloma<br />
and neuroblastoma patients, respectively) were the easiest to predict<br />
(mean MCC > 0.7). Toxicological endpoints A and B and disease<br />
progression endpoints D, F, G, J and K were more difficult to predict<br />
(mean MCC ~0.1–0.4). Negative-control endpoints I and M were<br />
totally unpredictable (mean MCC ~0), as expected. For 11 endpoints<br />
(excluding the negative controls), a large proportion of the submitted<br />
models predicted the endpoint significantly better than chance (MCC<br />
> 0) and for a given endpoint many models performed similarly well<br />
on both internal and external validation (see the distribution of MCC<br />
in Fig. 2c). On the other hand, not all the submitted models performed<br />
equally well for any given endpoint. Some models performed<br />
no better than chance, even for some of the easy-to-predict endpoints,<br />
suggesting that additional factors were responsible for differences in<br />
model performance.<br />
Data analysis teams show different proficiency<br />
Next, we summarized the external validation performance of the<br />
models nominated by the 17 teams that analyzed all 13 endpoints<br />
(Fig. 3). Nominated models represent a team’s best assessment of its<br />
model-building effort. The mean external validation MCC per team<br />
over 11 endpoints, excluding negative controls I and M, varied from<br />
0.532 for data analysis team (DAT)24 to 0.263 for DAT3, indicating<br />
appreciable differences in performance of the models developed by different<br />
teams for the same data. Similar trends were observed when AUC<br />
Data analysis team code<br />
0.532 0.982 0.910 0.845 0.748 0.575 0.557 0.311 0.323 0.244 0.193 0.168 0.011 −0.059<br />
0.513 0.973 0.918 0.829 0.792 0.493 0.437 0.322 0.306 0.307 0.202 0.060 0.044 −0.041<br />
0.504 0.965 0.801 0.816 0.652 0.514 0.349 0.383 0.360 0.217 0.243 0.247 0.016 −0.051<br />
0.500 0.991 0.752 0.750 0.778 0.509 0.483 0.345 0.305 0.295 0.193 0.099 0.029 0.012<br />
0.495 0.973 0.869 0.825 0.755 0.403 0.413 0.321 0.275 0.193 0.266 0.152 −0.016 −0.117<br />
0.489 0.982 0.762 0.823 0.702 0.533 0.557 0.284 0.203 0.143 0.257 0.129 0.043 −0.006<br />
0.485 0.982 0.871 0.445 0.728 0.472 0.249 0.429 0.353 0.295 0.293 0.222 0.016 −0.035<br />
0.483 0.930 0.838 0.805 0.773 0.542 0.386 0.345 0.289 0.225 0.181 0.000 0.067 −0.152<br />
0.473 0.982 0.847 0.835 0.737 0.488 0.344 0.118 0.324 0.110 0.176 0.247 −0.067 −0.112<br />
0.460 0.973 0.860 0.829 0.690 0.371 0.376 0.344 0.229 0.057 0.243 0.090 −0.059 −0.059<br />
0.457 0.956 0.815 0.847 0.773 0.491 0.202 0.185 0.385 −0.014 0.187 0.203 0.002 −0.075<br />
0.443 0.982 0.847 0.780 0.755 0.377 0.423 0.313 −0.042 0.198 0.241 0.000 0.000 −0.041<br />
0.427 0.725 0.782 0.824 0.770 0.531 0.344 0.168 0.349 −0.096 0.165 0.140 0.068 0.036<br />
0.371 0.982 0.707 0.782 0.466 0.499 0.184 0.271 0.000 −0.062 0.203 0.051 0.013 −0.103<br />
0.364 0.636 0.761 0.454 0.748 0.247 0.377 0.062 0.324 0.043 0.085 0.271 0.016 −0.020<br />
0.284 0.856 0.054 0.709 0.751 0.455 −0.213 –0.078 0.114 0.479 −0.096 0.091 0.051 0.024<br />
0.263 0.982 0.830 0.595 0.544 0.036 −0.090 −0.027 0.336 −0.143 −0.030 −0.142 −0.047 0.019<br />
0.488 0.973 0.830 0.816 0.748 0.491 0.376 0.311 0.306 0.193 0.193 0.129 0.016 −0.041<br />
0.511 0.982 0.891 0.829 0.732 0.403 0.479 0.429 0.301 0.217 0.162 0.196 0.067 −0.103<br />
H C E K J B D A G F I M<br />
NB pos<br />
MM pos<br />
Rat liver necr.<br />
BR erpos<br />
NB EFS<br />
NB OS<br />
Rat liver tumor<br />
BR pCR<br />
Mouse lung tumor<br />
MM EFS<br />
MM OS<br />
MM neg<br />
NB neg<br />
Endpoint<br />
was used as the performance metric (Supplementary Table 5) or when<br />
the original training and validation sets were swapped (Supplementary<br />
Tables 6 and 7). Table 2 summarizes the modeling approaches that<br />
were used by two or more MAQC-II data analysis teams.<br />
Many factors may have played a role in the difference of external validation<br />
performance between teams. For instance, teams used different<br />
modeling factors, criteria for selecting the nominated models, and software<br />
packages and code. Moreover, some teams may have been more<br />
proficient at microarray data modeling and better at guarding against<br />
clerical errors. We noticed substantial variations in performance among<br />
the many K-nearest neighbor algorithm (KNN)-based models developed<br />
by four analysis teams (Supplementary Fig. 1). Follow-up investigations<br />
identified a few possible causes leading to the discrepancies in<br />
performance 32 . For example, DAT20 fixed the parameter ‘number of<br />
neighbors’ K = 3 in its data analysis protocol for all endpoints, whereas<br />
DAT18 varied K from 3 to 15 with a step size of 2. This investigation<br />
also revealed that even a detailed but standardized description of model<br />
building requested from all groups failed to capture many important<br />
tuning variables in the process. The subtle modeling differences not<br />
captured may have contributed to the differing performance levels<br />
achieved by the data analysis teams. The differences in performance<br />
for the models developed by various data analysis teams can also be<br />
observed from the changing patterns of internal and external validation<br />
performance across the 13 endpoints (Fig. 3, Supplementary<br />
Tables 5–7 and Supplementary Figs. 2–4). Our observations highlight<br />
the importance of good modeling practice in developing and validating<br />
microarray-based predictive models including reporting of computational<br />
details for results to be replicated 26 . In light of the MAQC-II<br />
experience, recording structured information about the steps and<br />
parameters of an analysis process seems highly desirable to facilitate<br />
peer review and reanalysis of results.<br />
Swap and original analyses lead to consistent results<br />
To evaluate the reproducibility of the models generated by each team,<br />
we correlated the performance of each team’s models on the original<br />
training data set to performance on the validation data set and<br />
repeated this calculation for the swap experiment (Fig. 4). The correlation<br />
varied from 0.698–0.966 on the original experiment and from<br />
1.0<br />
0.8<br />
0.6<br />
0.4<br />
0.2<br />
0<br />
−0.2<br />
−0.4<br />
−0.6<br />
−0.8<br />
−1.0<br />
nature biotechnology VOLUME 28 NUMBER 8 AUGUST 2010 831
A rt i c l e s<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
Table 2 Modeling factor options frequently adopted by MAQC-II data<br />
analysis teams<br />
Original analysis (training => validation)<br />
Modeling factor<br />
Option<br />
Number<br />
of teams<br />
Number<br />
of endpoints<br />
Number<br />
of models<br />
Summary and normalization Loess 12 3 2,563<br />
RMA 3 7 46<br />
MAS5 11 7 4,947<br />
Batch-effect removal None 10 11 2,281<br />
Mean shift 3 11 7,279<br />
Feature selection SAM 4 11 3,771<br />
FC+P 8 11 4,711<br />
T-Test 5 11 400<br />
RFE 2 11 647<br />
Number of features 0~9 10 11 393<br />
10~99 13 11 4,445<br />
≥1,000 3 11 474<br />
100~999 10 11 4,298<br />
Classification algorithm DA 4 11 103<br />
Tree 5 11 358<br />
NB 4 11 924<br />
KNN 8 11 6,904<br />
SVM 9 11 986<br />
Analytic options used by two or more of the 14 teams that submitted models for all endpoints in both<br />
the original and swap experiments. RMA, robust multichip analysis; SAM, significance analysis of<br />
microarrays; FC, fold change; RFE, recursive feature elimination; DA, discriminant analysis; Tree,<br />
decision tree; NB, naive Bayes; KNN, K-nearest neighbors; SVM, support vector machine.<br />
0.443–0.954 on the swap experiment. For all but three teams (DAT3,<br />
DAT10 and DAT11) the original and swap correlations were within<br />
±0.2, and all but three others (DAT4, DAT13 and DAT36) were within<br />
±0.1, suggesting that the model building process was relatively robust,<br />
at least with respect to generating models with similar performance.<br />
For some data analysis teams the internal validation performance<br />
drastically overestimated the performance of the same model in predicting<br />
the validation data. Examination of some of those models<br />
revealed several reasons, including bias in the feature selection and<br />
cross-validation process 28 , findings consistent with what was observed<br />
from a recent literature survey 33 .<br />
Previously, reanalysis of a widely cited single study 34 found that<br />
the results in the original publication were very fragile—that is, not<br />
reproducible if the training and validation sets were swapped 35 . Our<br />
observations, except for DAT3, DAT11 and DAT36 with correlation<br />
65%<br />
of the variability in the external validation performance. All other<br />
factors explain 1%.<br />
The BLUPs reveal the effect of each level of the factor to the corresponding<br />
MCC value. The BLUPs of the main endpoint effect show<br />
that rat liver necrosis, breast cancer estrogen receptor status and the<br />
sex of the patient (endpoints C, E, H and L) are relatively easier to be<br />
predicted with ~0.2–0.4 advantage contributed on the corresponding<br />
MCC values. The rest of the endpoints are relatively harder to<br />
be predicted with about −0.1 to −0.2 disadvantage contributed to<br />
the corresponding MCC values. The main factors of normalization,<br />
classification algorithm, the number of selected features and<br />
the feature selection method have an impact of −0.1 to 0.1 on the<br />
corresponding MCC values. Loess normalization was applied to the<br />
endpoints (J, K and L) for the neuroblastoma data set with the twocolor<br />
Agilent platform and has 0.1 advantage to MCC values. Among<br />
the Microarray Analysis Suite version 5 (MAS5), Robust Multichip<br />
Analysis (RMA) and dChip normalization methods that were<br />
applied to all endpoints (A, C, D, E, F, G and H) for Affymetrix data,<br />
the dChip method has a lower BLUP than the others. Because<br />
normalization methods are partially confounded with endpoints, it<br />
may not be suitable to compare methods between different confounded<br />
groups. Among classification methods, discriminant analysis has the<br />
largest positive impact of 0.056 on the MCC values. Regarding the<br />
number of selected features, larger bin number has better impact on<br />
the average across endpoints. The bin number is assigned by applying<br />
the ceiling function to the log base 10 of the number of selected features.<br />
All the feature selection methods have a slight impact of −0.025 to 0.025<br />
Correlation in swap analysis (validation → training)<br />
1.0<br />
0.9<br />
0.8<br />
0.7<br />
0.6<br />
0.5<br />
10<br />
12 18<br />
24<br />
20<br />
4<br />
29<br />
32<br />
13<br />
7<br />
0.4<br />
0.4 0.5 0.6 0.7 0.8 0.9 1.0<br />
Correlation in original analysis (training → validation)<br />
Figure 4 Correlation between internal and external validation is<br />
dependent on data analysis team. Pearson correlation coefficients<br />
between internal and external validation performance in terms of MCC are<br />
displayed for the 14 teams that submitted models for all 13 endpoints<br />
in both the original (x axis) and swap (y axis) analyses. The unusually low<br />
correlation in the swap analysis for DAT3, DAT11 and DAT36 is a result<br />
of their failure to accurately predict the positive endpoint H, likely due to<br />
operator errors (Supplementary Table 6).<br />
36<br />
25<br />
3<br />
11<br />
832 VOLUME 28 NUMBER 8 AUGUST 2010 nature biotechnology
Bscatter<br />
FC<br />
Fisher<br />
Golub<br />
KS<br />
RFE P<br />
SAM<br />
T-Test<br />
Welch<br />
Wilcoxon<br />
DA<br />
Forest<br />
GLM<br />
KNN<br />
NC<br />
NB<br />
PLS<br />
RFE<br />
SVM<br />
Tree<br />
A rt i c l e s<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
a<br />
Endpoint<br />
Summary normalization<br />
Classification algorithm<br />
Number of features<br />
Feature selection<br />
Validation iterations<br />
Organization<br />
Batch effect removal<br />
Organization*classification<br />
algorithm<br />
Summary normalization*endpoint<br />
Classification algorithm*endpoint<br />
Number of features*endpoint<br />
Feature selection*endpoint<br />
Validation iterations*endpoint<br />
Organization*endpoint<br />
Batch effect removal*endpoint<br />
Organization*classification<br />
algorithm*endpoint<br />
Residual<br />
0 10 20 30 40 50 60 70<br />
0 1 2 3 4 5 6 7 8 9<br />
Percentage of variation<br />
0.40<br />
0.30<br />
0.20<br />
0.10<br />
0<br />
–0.10<br />
–0.20<br />
on MCC values except for recursive feature elimination (RFE) that<br />
has an impact of −0.006. In the plots of the four selected interactions,<br />
the estimated BLUPs vary across endpoints. The large variation across<br />
endpoints implies the impact of the corresponding modeling factor on<br />
different endpoints can be very different. Among the four interaction<br />
plots (see Supplementary Fig. 6 for a clear labeling of each interaction<br />
term), the corresponding BLUPs of the three-way interaction<br />
of organization, classification algorithm and endpoint show the highest<br />
variation. This may be due to different tuning parameters applied<br />
to individual algorithms for different organizations, as was the case<br />
for KNN 32 .<br />
We also analyzed the relative importance of modeling factors on<br />
external-validation prediction performance using a decision tree<br />
model 38 . The analysis results revealed observations (Supplementary<br />
Fig. 7) largely consistent with those above. First, the endpoint code<br />
was the most influential modeling factor. Second, feature selection<br />
method, normalization and summarization method, classification<br />
method and organization code also contributed to prediction performance,<br />
but their contribution was relatively small.<br />
Feature list stability is correlated with endpoint predictability<br />
Prediction performance is the most important criterion for evaluating<br />
the performance of a predictive model and its modeling process.<br />
However, the robustness and mechanistic relevance of the model and<br />
b<br />
BLUP<br />
BLUP<br />
BLUP<br />
A B C D E F G H J K L<br />
Tox BR MM NB<br />
Endpoint<br />
0.10<br />
0.05<br />
0<br />
–0.05<br />
–0.10<br />
1 2 3 4 5<br />
Number of features<br />
FC+P<br />
dChip<br />
GA<br />
Loess<br />
MAS5<br />
Mean<br />
Median<br />
RMA<br />
Vote<br />
Logistic<br />
ML<br />
A B C D E F G H J K L A B C D E F G H J K L A B C D E F G H J K L<br />
Classification algorithm*<br />
endpoint<br />
0.10<br />
0.05<br />
0<br />
–0.05<br />
–0.10<br />
0.10<br />
0.05<br />
Summary normalization<br />
Feature selection method<br />
0.10<br />
0.20 0.20<br />
0.15<br />
0.10<br />
0.05<br />
0.10<br />
0<br />
–0.10<br />
0<br />
0.05<br />
–0.20<br />
0<br />
–0.05<br />
–0.30<br />
–0.05<br />
–0.40<br />
–0.10 –0.10<br />
–0.50<br />
0<br />
–0.05<br />
–0.10<br />
Number of features*<br />
endpoint<br />
0.10<br />
0.05<br />
0<br />
–0.05<br />
–0.10<br />
the corresponding gene signature is also important (Supplementary<br />
Fig. 8). That is, given comparable prediction performance between<br />
two modeling processes, the one yielding a more robust and reproducible<br />
gene signature across similar data sets (e.g., by swapping the<br />
training and validation sets), which is therefore less susceptible to<br />
sporadic fluctuations in the data, or the one that provides new insights<br />
to the underlying biology is preferable. Reproducibility or stability of<br />
feature sets is best studied by running the same model selection protocol<br />
on two distinct collections of samples, a scenario only possible, in<br />
this case, after the blind validation data were distributed to the data<br />
analysis teams that were asked to perform their analysis after swapping<br />
their original training and test sets. Supplementary Figures 9 and 10<br />
show that, although the feature space is extremely large for microarray<br />
data, different teams and protocols were able to consistently select the<br />
best-performing features. Analysis of the lists of features indicated that<br />
for endpoints relatively easy to predict, various data analysis teams<br />
arrived at models that used more common features and the overlap<br />
of the lists from the original and swap analyses is greater than those<br />
for more difficult endpoints (Supplementary Figs. 9–11). Therefore,<br />
the level of stability of feature lists can be associated to the level of difficulty<br />
of the prediction problem (Supplementary Fig. 11), although<br />
multiple models with different feature lists and comparable performance<br />
can be found from the same data set 39 . Functional analysis of the<br />
most frequently selected genes by all data analysis protocols shows<br />
0.10<br />
0.05<br />
0<br />
–0.05<br />
–0.10<br />
ANN<br />
SMO<br />
Classification algorithm<br />
A B C D E F G H J K L<br />
Tox BR MM NB<br />
Summary normalization*<br />
endpoint<br />
Tox BR MM NB Tox BR MM NB Tox BR MM NB<br />
Organization*classification*<br />
endpoint<br />
Figure 5 Effect of modeling factors on estimates of model performance. (a) Random-effect models of external validation performance (MCC) were<br />
developed to estimate a distinct variance component for each modeling factor and several selected interactions. The estimated variance components<br />
were then divided by their total in order to compare the proportion of variability explained by each modeling factor. The endpoint code contributes the<br />
most to the variability in external validation performance. (b) The BLUP plots of the corresponding factors having proportion of variation larger than 1%<br />
in a. Endpoint abbreviations (Tox., preclinical toxicity; BR, breast cancer; MM, multiple myeloma; NB, neuroblastoma). Endpoints H and L are the sex<br />
of the patient. Summary normalization abbreviations (GA, genetic algorithm; RMA, robust multichip analysis). Classification algorithm abbreviations<br />
(ANN, artificial neural network; DA, discriminant analysis; Forest, random forest; GLM, generalized linear model; KNN, K-nearest neighbors; Logistic,<br />
logistic regression; ML, maximum likelihood; NB, Naïve Bayes; NC, nearest centroid; PLS, partial least squares; RFE, recursive feature elimination;<br />
SMO, sequential minimal optimization; SVM, support vector machine; Tree, decision tree). Feature selection method abbreviations (Bscatter, betweenclass<br />
scatter; FC, fold change; KS, Kolmogorov-Smirnov algorithm; SAM, significance analysis of microarrays).<br />
nature biotechnology VOLUME 28 NUMBER 8 AUGUST 2010 833
A rt i c l e s<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
that many of these genes represent biological processes that are highly<br />
relevant to the clinical outcome that is being predicted 36 . The sexbased<br />
endpoints have the best overlap, whereas more difficult survival<br />
endpoints (in which disease processes are confounded by many other<br />
factors) have only marginally better overlap with biological processes<br />
relevant to the disease than that expected by random chance.<br />
Summary of MAQC-II observations and recommendations<br />
The MAQC-II data analysis teams comprised a diverse group, some<br />
of whom were experienced microarray analysts whereas others were<br />
graduate students with little experience. In aggregate, the group’s<br />
composition likely mimicked the broad scientific community engaged<br />
in building and publishing models derived from microarray data. The<br />
more than 30,000 models developed by 36 data analysis teams for<br />
13 endpoints from six diverse clinical and preclinical data sets are a<br />
rich source from which to highlight several important observations.<br />
First, model prediction performance was largely endpoint (biology)<br />
dependent (Figs. 2c and 3). The incorporation of multiple data<br />
sets and endpoints (including positive and negative controls) in the<br />
MAQC-II study design made this observation possible. Some endpoints<br />
are highly predictive based on the nature of the data, which<br />
makes it possible to build good models, provided that sound modeling<br />
procedures are used. Other endpoints are inherently difficult to predict<br />
regardless of the model development protocol.<br />
Second, there are clear differences in proficiency between data<br />
analysis teams (organizations) and such differences are correlated<br />
with the level of experience of the team. For example, the topperforming<br />
teams shown in Figure 3 were mainly industrial participants<br />
with many years of experience in microarray data analysis, whereas<br />
bottom-performing teams were mainly less-experienced graduate<br />
students or researchers. Based on results from the positive and negative<br />
endpoints, we noticed that simple errors were sometimes made,<br />
suggesting rushed efforts due to lack of time or unnoticed implementation<br />
flaws. This observation strongly suggests that mechanisms are<br />
needed to ensure the reliability of results presented to the regulatory<br />
agencies, journal editors and the research community. By examining<br />
the practices of teams whose models did not perform well, future<br />
studies might be able to identify pitfalls to be avoided. Likewise,<br />
practices adopted by top-performing teams can provide the basis for<br />
developing good modeling practices.<br />
Third, the internal validation performance from well-implemented,<br />
unbiased cross-validation shows a high degree of concordance with the<br />
external validation performance in a strict blinding process (Fig. 2).<br />
This observation was not possible from previously published studies<br />
owing to the small number of available endpoints tested in them.<br />
Fourth, many models with similar performance can be developed<br />
from a given data set (Fig. 2). Similar prediction performance is<br />
attainable when using different modeling algorithms and parameters,<br />
and simple data analysis methods often perform as well as more<br />
complicated approaches 32,40 . Although it is not essential to include<br />
the same features in these models to achieve comparable prediction<br />
performance, endpoints that were easier to predict generally yielded<br />
models with more common features, when analyzed by different<br />
teams (Supplementary Fig. 11).<br />
Finally, applying good modeling practices appeared to be more<br />
important than the actual choice of a particular algorithm over the<br />
others within the same step in the modeling process. This can be seen<br />
in the diverse choices of the modeling factors used by teams that produced<br />
models that performed well in the blinded validation (Table 2)<br />
where modeling factors did not universally contribute to variations in<br />
model performance among good performing teams (Fig. 5).<br />
Summarized below are the model building steps recommended to<br />
the MAQC-II data analysis teams. These may be applicable to model<br />
building practitioners in the general scientific community.<br />
Step one (design). There is no exclusive set of steps and procedures,<br />
in the form of a checklist, to be followed by any practitioner for all<br />
problems. However, normal good practice on the study design and<br />
the ratio of sample size to classifier complexity should be followed.<br />
The frequently used options for normalization, feature selection and<br />
classification are good starting points (Table 2).<br />
Step two (pilot study or internal validation). This can be accomplished<br />
by bootstrap or cross-validation such as the ten repeats of a<br />
fivefold cross-validation procedure adopted by most MAQC-II teams.<br />
The samples from the pilot study are not replaced for the pivotal<br />
study; rather they are augmented to achieve ‘appropriate’ target size.<br />
Step three (pivotal study or external validation). Many investigators<br />
assume that the most conservative approach to a pivotal study is to<br />
simply obtain a test set completely independent of the training set(s).<br />
However, it is good to keep in mind the exchange 34,35 regarding the<br />
fragility of results when the training and validation sets are swapped.<br />
Results from further resampling (including simple swapping as in<br />
MAQC-II) across the training and validation sets can provide important<br />
information about the reliability of the models and the modeling<br />
procedures, but the complete separation of the training and validation<br />
sets should be maintained 41 .<br />
Finally, a perennial issue concerns reuse of the independent validation<br />
set after modifications to an originally designed and validated<br />
data analysis algorithm or protocol. Such a process turns the validation<br />
set into part of the design or training set 42 . Ground rules must<br />
be developed for avoiding this approach and penalizing it when it<br />
occurs; and practitioners should guard against using it before such<br />
ground rules are well established.<br />
DISCUSSION<br />
MAQC-II conducted a broad observational study of the current community<br />
landscape of gene-expression profile–based predictive model<br />
development. Microarray gene expression profiling is among the most<br />
commonly used analytical tools in biomedical research. Analysis of<br />
the high-dimensional data generated by these experiments involves<br />
multiple steps and several critical decision points that can profoundly<br />
influence the soundness of the results 43 . An important requirement<br />
of a sound internal validation is that it must include feature selection<br />
and parameter optimization within each iteration to avoid overly optimistic<br />
estimations of prediction performance 28,29,44 . To what extent<br />
this information has been disseminated and followed by the scientific<br />
community in current microarray analysis remains unknown 33 .<br />
Concerns have been raised that results published by one group of<br />
investigators often cannot be confirmed by others even if the same<br />
data set is used 26 . An inability to confirm results may stem from any<br />
of several reasons: (i) insufficient information is provided about the<br />
methodology that describes which analysis has actually been done;<br />
(ii) data preprocessing (normalization, gene filtering and feature<br />
selection) is too complicated and insufficiently documented to be<br />
reproduced; or (iii) incorrect or biased complex analytical methods 26<br />
are performed. A distinct but related concern is that genomic data may<br />
yield prediction models that, even if reproducible on the discovery<br />
data set, cannot be extrapolated well in independent validation. The<br />
MAQC-II project provided a unique opportunity to address some of<br />
these concerns.<br />
Notably, we did not place restrictions on the model building methods<br />
used by the data analysis teams. Accordingly, they adopted numerous<br />
different modeling approaches (Table 2 and Supplementary Table 4).<br />
834 VOLUME 28 NUMBER 8 AUGUST 2010 nature biotechnology
A rt i c l e s<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
For example, feature selection methods varied widely, from statistical<br />
significance tests, to machine learning algorithms, to those more<br />
reliant on differences in expression amplitude, to those employing<br />
knowledge of putative biological mechanisms associated with the<br />
endpoint. Prediction algorithms also varied widely. To make internal<br />
validation performance results comparable across teams for different<br />
models, we recommended that a model’s internal performance was<br />
estimated using a ten times repeated fivefold cross-validation, but this<br />
recommendation was not strictly followed by all teams, which also<br />
allows us to survey internal validation approaches. The diversity of<br />
analysis protocols used by the teams is likely to closely resemble that<br />
of current research going forward, and in this context mimics reality.<br />
In terms of the space of modeling factors explored, MAQC-II is a survey<br />
of current practices rather than a randomized, controlled experiment;<br />
therefore, care should be taken in interpreting the results. For<br />
example, some teams did not analyze all endpoints, causing missing<br />
data (models) that may be confounded with other modeling factors.<br />
Overall, the procedure followed to nominate MAQC-II candidate<br />
models was quite effective in selecting models that performed reasonably<br />
well during validation using independent data sets, although<br />
generally the selected models did not do as well in validation as in<br />
training. The drop in performance associated with the validation<br />
highlights the importance of not relying solely on internal validation<br />
performance, and points to the need to subject every classifier to at<br />
least one external validation. The selection of the 13 candidate models<br />
from many nominated models was achieved through a peer-review<br />
collaborative effort of many experts and could be described as slow,<br />
tedious and sometimes subjective (e.g., a data analysis team could<br />
only contribute one of the 13 candidate models). Even though they<br />
were still subject to over-optimism, the internal and external performance<br />
estimates of the candidate models were more concordant than<br />
those of the overall set of models. Thus the review was productive in<br />
identifying characteristics of reliable models.<br />
An important lesson learned through MAQC-II is that it is almost<br />
impossible to retrospectively retrieve and document decisions that<br />
were made at every step during the feature selection and model development<br />
stage. This lack of complete description of the model building<br />
process is likely to be a common reason for the inability of different<br />
data analysis teams to fully reproduce each other’s results 32 . Therefore,<br />
although meticulously documenting the classifier building procedure<br />
can be cumbersome, we recommend that all genomic publications<br />
include supplementary materials describing the model building and<br />
evaluation process in an electronic format. MAQC-II is making available<br />
six data sets with 13 endpoints that can be used in the future as a<br />
benchmark to verify that software used to implement new approaches<br />
performs as expected. Subjecting new software to benchmarks against<br />
these data sets could reassure potential users that the software is<br />
mature enough to be used for the development of predictive models<br />
in new data sets. It would seem advantageous to develop alternative<br />
ways to help determine whether specific implementations of modeling<br />
approaches and performance evaluation procedures are sound, and to<br />
identify procedures to capture this information in public databases.<br />
The findings of the MAQC-II project suggest that when the same<br />
data sets are provided to a large number of data analysis teams, many<br />
groups can generate similar results even when different model building<br />
approaches are followed. This is concordant with studies 29,33 that<br />
found that given good quality data and an adequate number of informative<br />
features, most classification methods, if properly used, will yield<br />
similar predictive performance. This also confirms reports 6,7,39 on<br />
small data sets by individual groups that have suggested that several<br />
different feature selection methods and prediction algorithms can<br />
yield many models that are distinct, but have statistically similar<br />
performance. Taken together, these results provide perspective on<br />
the large number of publications in the bioinformatics literature that<br />
have examined the various steps of the multivariate prediction model<br />
building process and identified elements that are critical for achieving<br />
reliable results.<br />
An important and previously underappreciated observation from<br />
MAQC-II is that different clinical endpoints represent very different<br />
levels of classification difficulty. For some endpoints the currently<br />
available data are sufficient to generate robust models, whereas for<br />
other endpoints currently available data do not seem to be sufficient<br />
to yield highly predictive models. An analysis done as part of the<br />
MAQC-II project and that focused on the breast cancer data demonstrates<br />
these points in more detail 40 . It is also important to point out<br />
that for some clinically meaningful endpoints studied in the MAQC-II<br />
project, gene expression data did not seem to significantly outperform<br />
models based on clinical covariates alone, highlighting the challenges<br />
in predicting the outcome of patients in a heterogeneous population<br />
and the potential need to combine gene expression data with<br />
clinical covariates (unpublished data).<br />
The accuracy of the clinical sample annotation information may<br />
also play a role in the difficulty to obtain accurate prediction results<br />
on validation samples. For example, some samples were misclassified<br />
by almost all models (Supplementary Fig. 12). It is true even for some<br />
samples within the positive control endpoints H and L, as shown<br />
in Supplementary Table 8. Clinical information of neuroblastoma<br />
patients for whom the positive control endpoint L was uniformly<br />
misclassified were rechecked and the sex of three out of eight cases<br />
(NB412, NB504 and NB522) was found to be incorrectly annotated.<br />
The companion MAQC-II papers published elsewhere give more<br />
in-depth analyses of specific issues such as the clinical benefits of<br />
genomic classifiers (unpublished data), the impact of different<br />
modeling factors on prediction performance 45 , the objective assessment<br />
of microarray cross-platform prediction 46 , cross-tissue prediction<br />
47 , one-color versus two-color prediction comparison 48 ,<br />
functional analysis of gene signatures 36 and recommendation of a<br />
simple yet robust data analysis protocol based on the KNN 32 . For<br />
example, we systematically compared the classification performance<br />
resulting from one- and two-color gene-expression profiles of<br />
478 neuroblastoma samples and found that analyses based on either<br />
platform yielded similar classification performance 48 . This newly generated<br />
one-color data set has been used to evaluate the applicability of<br />
the KNN-based simple data analysis protocol to future data sets 32 . In<br />
addition, the MAQC-II Genome-Wide Association Working Group<br />
assessed the variabilities in genotype calling due to experimental or<br />
algorithmic factors 49 .<br />
In summary, MAQC-II has demonstrated that current methods<br />
commonly used to develop and assess multivariate gene-expression<br />
based predictors of clinical outcome were used appropriately by<br />
most of the analysis teams in this consortium. However, differences<br />
in proficiency emerged and this underscores the importance<br />
of proper implementation of otherwise robust analytical methods.<br />
Observations based on analysis of the MAQC-II data sets may be<br />
applicable to other diseases. The MAQC-II data sets are publicly<br />
available and are expected to be used by the scientific community<br />
as benchmarks to ensure proper modeling practices. The experience<br />
with the MAQC-II clinical data sets also reinforces the notion that<br />
clinical classification problems represent several different degrees<br />
of prediction difficulty that are likely to be associated with whether<br />
mRNA abundances measured in a specific data set are informative for<br />
the specific prediction problem. We anticipate that including other<br />
nature biotechnology VOLUME 28 NUMBER 8 AUGUST 2010 835
A rt i c l e s<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
types of biological data at the DNA, microRNA, protein or metabolite<br />
levels will enhance our capability to more accurately predict<br />
the clinically relevant endpoints. The good modeling practice guidelines<br />
established by MAQC-II and lessons learned from this unprecedented<br />
collaboration provide a solid foundation from which other<br />
high-dimensional biological data could be more reliably used for the<br />
purpose of predictive and personalized medicine.<br />
Methods<br />
Methods and any associated references are available in the online<br />
version of the paper at http://www.nature.com/naturebiotechnology/.<br />
Accession codes. All MAQC-II data sets are available through<br />
GEO (series accession number: GSE16716), the MAQC Web site<br />
(http://www.fda.gov/nctr/science/centers/toxicoinformatics/maqc/),<br />
ArrayTrack (http://www.fda.gov/nctr/science/centers/toxicoinformatics/ArrayTrack/)<br />
or CEBS (http://cebs.niehs.nih.gov/) accession<br />
number: 009-00002-0010-000-3.<br />
Note: Supplementary information is available on the <strong>Nature</strong> Biotechnology website.<br />
Acknowledgments<br />
The MAQC-II project was funded in part by the FDA’s Office of Critical Path<br />
Programs (to L.S.). Participants from the National Institutes of Health (NIH) were<br />
supported by the Intramural Research Program of NIH, Bethesda, Maryland or<br />
the Intramural Research Program of the NIH, National Institute of Environmental<br />
Health Sciences (NIEHS), Research Triangle Park, North Carolina. J.F. was<br />
supported by the Division of Intramural Research of the NIEHS under contract<br />
HHSN273200700046U. Participants from the Johns Hopkins University were<br />
supported by grants from the NIH (1R01GM083084-01 and 1R01RR021967-01A2<br />
to R.A.I. and T32GM074906 to M.M.). Participants from the Weill Medical College<br />
of Cornell University were partially supported by the Biomedical Informatics<br />
Core of the Institutional Clinical and Translational Science Award RFA-RM-07-<br />
002. F.C. acknowledges resources from The HRH Prince Alwaleed Bin Talal Bin<br />
Abdulaziz Alsaud Institute for Computational Biomedicine and from the David A.<br />
Cofrin Center for Biomedical Information at Weill Cornell. The data set from The<br />
Hamner Institutes for Health Sciences was supported by a grant from the American<br />
Chemistry Council’s Long Range Research Initiative. The breast cancer data set<br />
was generated with support of grants from NIH (R-01 to L.P.), The Breast Cancer<br />
Research Foundation (to L.P. and W.F.S.) and the Faculty Incentive Funds of the<br />
University of Texas MD Anderson Cancer Center (to W.F.S.). The data set from<br />
the University of Arkansas for Medical Sciences was supported by National Cancer<br />
Institute (NCI) PO1 grant CA55819-01A1, NCI R33 Grant CA97513-01, Donna D.<br />
and Donald M. Lambert Lebow Fund to Cure Myeloma and Nancy and Steven<br />
Grand Foundation. We are grateful to the individuals whose gene expression data<br />
were used in this study. All MAQC-II participants freely donated their time and<br />
reagents for the completion and analyses of the MAQC-II project. The MAQC-II<br />
consortium also thanks R. O’Neill for his encouragement and coordination among<br />
FDA Centers on the formation of the RBWG. The MAQC-II consortium gratefully<br />
dedicates this work in memory of R.F. Wagner who enthusiastically worked on the<br />
MAQC-II project and inspired many of us until he unexpectedly passed away in<br />
June 2008.<br />
DISCLAIMER<br />
This work includes contributions from, and was reviewed by, individuals at the<br />
FDA, the Environmental Protection Agency (EPA) and the NIH. This work has<br />
been approved for publication by these agencies, but it does not necessarily reflect<br />
official agency policy. Certain commercial materials and equipment are identified<br />
in order to adequately specify experimental procedures. In no case does such<br />
identification imply recommendation or endorsement by the FDA, the EPA or the<br />
NIH, nor does it imply that the items identified are necessarily the best available<br />
for the purpose.<br />
COMPETING FINANCIAL INTERESTS<br />
The authors declare competing financial interests: details accompany the full-text<br />
HTML version of the paper at http://www.nature.com/naturebiotechnology/.<br />
Published online at http://www.nature.com/naturebiotechnology/.<br />
Reprints and permissions information is available online at http://npg.nature.com/<br />
reprintsandpermissions/.<br />
1. Marshall, E. Getting the noise out of gene arrays. Science 306, 630–631 (2004).<br />
2. Frantz, S. An array of problems. Nat. Rev. Drug Discov. 4, 362–363 (2005).<br />
3. Michiels, S., Koscielny, S. & Hill, C. Prediction of cancer outcome with microarrays:<br />
a multiple random validation strategy. Lancet 365, 488–492 (2005).<br />
4. Ntzani, E.E. & Ioannidis, J.P. Predictive ability of DNA microarrays for cancer<br />
outcomes and correlates: an empirical assessment. Lancet 362, 1439–1444<br />
(2003).<br />
5. Ioannidis, J.P. Microarrays and molecular research: noise discovery? Lancet 365,<br />
454–455 (2005).<br />
6. Ein-Dor, L., Kela, I., Getz, G., Givol, D. & Domany, E. Outcome signature genes in<br />
breast cancer: is there a unique set? Bioinformatics 21, 171–178 (2005).<br />
7. Ein-Dor, L., Zuk, O. & Domany, E. Thousands of samples are needed to generate<br />
a robust gene list for predicting outcome in cancer. Proc. Natl. Acad. Sci. USA<br />
103, 5923–5928 (2006).<br />
8. Shi, L. et al. QA/QC: challenges and pitfalls facing the microarray community and<br />
regulatory agencies. Expert Rev. Mol. Diagn. 4, 761–777 (2004).<br />
9. Shi, L. et al. Cross-platform comparability of microarray technology: intra-platform<br />
consistency and appropriate data analysis procedures are essential. BMC<br />
Bioinformatics 6 Suppl 2, S12 (2005).<br />
10. Shi, L. et al. The MicroArray Quality Control (MAQC) project shows inter- and<br />
intraplatform reproducibility of gene expression measurements. Nat. Biotechnol.<br />
24, 1151–1161 (2006).<br />
11. Guo, L. et al. Rat toxicogenomic study reveals analytical consistency across<br />
microarray platforms. Nat. Biotechnol. 24, 1162–1169 (2006).<br />
12. Canales, R.D. et al. Evaluation of DNA microarray results with quantitative gene<br />
expression platforms. Nat. Biotechnol. 24, 1115–1122 (2006).<br />
13. Patterson, T.A. et al. Performance comparison of one-color and two-color platforms<br />
within the MicroArray Quality Control (MAQC) project. Nat. Biotechnol. 24,<br />
1140–1150 (2006).<br />
14. Shippy, R. et al. Using RNA sample titrations to assess microarray platform<br />
performance and normalization techniques. Nat. Biotechnol. 24, 1123–1131<br />
(2006).<br />
15. Tong, W. et al. Evaluation of external RNA controls for the assessment of microarray<br />
performance. Nat. Biotechnol. 24, 1132–1139 (2006).<br />
16. Irizarry, R.A. et al. Multiple-laboratory comparison of microarray platforms. Nat.<br />
Methods 2, 345–350 (2005).<br />
17. Strauss, E. Arrays of hope. Cell 127, 657–659 (2006).<br />
18. Shi, L., Perkins, R.G., Fang, H. & Tong, W. Reproducible and reliable microarray<br />
results through quality control: good laboratory proficiency and appropriate data<br />
analysis practices are essential. Curr. Opin. Biotechnol. 19, 10–18 (2008).<br />
19. Dudoit, S., Fridlyand, J. & Speed, T.P. Comparison of discrimination methods for<br />
the classification of tumors using gene expression data. J. Am. Stat. Assoc. 97,<br />
77–87 (2002).<br />
20. Goodsaid, F.M. et al. Voluntary exploratory data submissions to the US FDA and<br />
the EMA: experience and impact. Nat. Rev. Drug Discov. 9, 435–445 (2010).<br />
21. van ‘t Veer, L.J. et al. Gene expression profiling predicts clinical outcome of breast<br />
cancer. <strong>Nature</strong> 415, 530–536 (2002).<br />
22. Buyse, M. et al. Validation and clinical utility of a 70-gene prognostic signature for<br />
women with node-negative breast cancer. J. Natl. Cancer Inst. 98, 1183–1192<br />
(2006).<br />
23. Dumur, C.I. et al. Interlaboratory performance of a microarray-based gene expression<br />
test to determine tissue of origin in poorly differentiated and undifferentiated<br />
cancers. J. Mol. Diagn. 10, 67–77 (2008).<br />
24. Deng, M.C. et al. Noninvasive discrimination of rejection in cardiac allograft recipients<br />
using gene expression profiling. Am. J. Transplant. 6, 150–160 (2006).<br />
25. Coombes, K.R., Wang, J. & Baggerly, K.A. Microarrays: retracing steps. Nat. Med.<br />
13, 1276–1277, author reply 1277–1278 (2007).<br />
26. Ioannidis, J.P.A. et al. Repeatability of published microarray gene expression<br />
analyses. Nat. Genet. 41, 149–155 (2009).<br />
27. Baggerly, K.A., Edmonson, S.R., Morris, J.S. & Coombes, K.R. High-resolution serum<br />
proteomic patterns for ovarian cancer detection. Endocr. Relat. Cancer 11,<br />
583–584, author reply 585–587 (2004).<br />
28. Ambroise, C. & McLachlan, G.J. Selection bias in gene extraction on the basis of<br />
microarray gene-expression data. Proc. Natl. Acad. Sci. USA 99, 6562–6566<br />
(2002).<br />
29. Simon, R. Using DNA microarrays for diagnostic and prognostic prediction. Expert<br />
Rev. Mol. Diagn. 3, 587–595 (2003).<br />
30. Dobbin, K.K. et al. Interlaboratory comparability study of cancer gene expression<br />
analysis using oligonucleotide microarrays. Clin. Cancer Res. 11, 565–572<br />
(2005).<br />
31. Shedden, K. et al. Gene expression-based survival prediction in lung adenocarcinoma:<br />
a multi-site, blinded validation study. Nat. Med. 14, 822–827 (2008).<br />
32. Parry, R.M. et al. K-nearest neighbors (KNN) models for microarray gene-expression<br />
analysis and reliable clinical outcome prediction. Pharmacogenomics J. 10, 292–309<br />
(2010).<br />
33. Dupuy, A. & Simon, R.M. Critical review of published microarray studies for cancer<br />
outcome and guidelines on statistical analysis and reporting. J. Natl. Cancer Inst.<br />
99, 147–157 (2007).<br />
34. Dave, S.S. et al. Prediction of survival in follicular lymphoma based on molecular<br />
features of tumor-infiltrating immune cells. N. Engl. J. Med. 351, 2159–2169<br />
(2004).<br />
35. Tibshirani, R. Immune signatures in follicular lymphoma. N. Engl. J. Med. 352,<br />
1496–1497, author reply 1496–1497 (2005).<br />
836 VOLUME 28 NUMBER 8 AUGUST 2010 nature biotechnology
A rt i c l e s<br />
36. Shi, W. et al. Functional analysis of multiple genomic signatures demonstrates that<br />
classification algorithms choose phenotype-related genes. Pharmacogenomics J. 10,<br />
310–323 (2010).<br />
37. Robinson, G.K. That BLUP is a good thing: the estimation of random effects.<br />
Stat. Sci. 6, 15–32 (1991).<br />
38. Hothorn, T., Hornik, K. & Zeileis, A. Unbiased recursive partitioning: a conditional<br />
inference framework. J. Comput. Graph. Statist. 15, 651–674 (2006).<br />
39. Boutros, P.C. et al. Prognostic gene signatures for non-small-cell lung cancer. Proc.<br />
Natl. Acad. Sci. USA 106, 2824–2828 (2009).<br />
40. Popovici, V. et al. Effect of training sample size and classification difficulty on the<br />
accuracy of genomic predictors. Breast Cancer Res. 12, R5 (2010).<br />
41. Yousef, W.A., Wagner, R.F. & Loew, M.H. Assessing classifiers from two independent<br />
data sets using ROC analysis: a nonparametric approach. IEEE Trans. Pattern Anal.<br />
Mach. Intell. 28, 1809–1817 (2006).<br />
42. Gur, D., Wagner, R.F. & Chan, H.P. On the repeated use of databases for testing<br />
incremental improvement of computer-aided detection schemes. Acad. Radiol. 11,<br />
103–105 (2004).<br />
43. Allison, D.B., Cui, X., Page, G.P. & Sabripour, M. Microarray data analysis: from<br />
disarray to consolidation and consensus. Nat. Rev. Genet. 7, 55–65 (2006).<br />
44. Wood, I.A., Visscher, P.M. & Mengersen, K.L. Classification based upon gene expression<br />
data: bias and precision of error rates. Bioinformatics 23, 1363–1370 (2007).<br />
45. Luo, J. et al. A comparison of batch effect removal methods for enhancement of<br />
prediction performance using MAQC-II microarray gene expression data.<br />
Pharmacogenomics J. 10, 278–291 (2010).<br />
46. Fan, X. et al. Consistency of predictive signature genes and classifiers generated using<br />
different microarray platforms. Pharmacogenomics J. 10, 247–257 (2010).<br />
47. Huang, J. et al. Genomic indicators in the blood predict drug-induced liver injury.<br />
Pharmacogenomics J. 10, 267–277 (2010).<br />
48. Oberthuer, A. et al. Comparison of performance of one-color and two-color geneexpression<br />
analyses in predicting clinical endpoints of neuroblastoma patients.<br />
Pharmacogenomics J. 10, 258–266 (2010).<br />
49. Hong, H. et al. Assessing sources of inconsistencies in genotypes and their effects<br />
on genome-wide association studies with HapMap samples. Pharmacogenomics J.<br />
10, 364–374 (2010).<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
Leming Shi 1 , Gregory Campbell 2 , Wendell D Jones 3 , Fabien Campagne 4 , Zhining Wen 1 , Stephen J Walker 5 ,<br />
Zhenqiang Su 6 , Tzu-Ming Chu 7 , Federico M Goodsaid 8 , Lajos Pusztai 9 , John D Shaughnessy Jr 10 ,<br />
André Oberthuer 11 , Russell S Thomas 12 , Richard S Paules 13 , Mark Fielden 14 , Bart Barlogie 10 , Weijie Chen 2 ,<br />
Pan Du 15 , Matthias Fischer 11 , Cesare Furlanello 16 , Brandon D Gallas 2 , Xijin Ge 17 , Dalila B Megherbi 18 ,<br />
W Fraser Symmans 19 , May D Wang 20 , John Zhang 21 , Hans Bitter 22 , Benedikt Brors 23 , Pierre R Bushel 13 ,<br />
Max Bylesjo 24 , Minjun Chen 1 , Jie Cheng 25 , Jing Cheng 26 , Jeff Chou 13 , Timothy S Davison 27 , Mauro Delorenzi 28 ,<br />
Youping Deng 29 , Viswanath Devanarayan 30 , David J Dix 31 , Joaquin Dopazo 32 , Kevin C Dorff 33 , Fathi Elloumi 31 ,<br />
Jianqing Fan 34 , Shicai Fan 35 , Xiaohui Fan 36 , Hong Fang 6 , Nina Gonzaludo 37 , Kenneth R Hess 38 ,<br />
Huixiao Hong 1 , Jun Huan 39 , Rafael A Irizarry 40 , Richard Judson 31 , Dilafruz Juraeva 23 , Samir Lababidi 41 ,<br />
Christophe G Lambert 42 , Li Li 7 , Yanen Li 43 , Zhen Li 31 , Simon M Lin 15 , Guozhen Liu 44 , Edward K Lobenhofer 45 ,<br />
Jun Luo 21 , Wen Luo 46 , Matthew N McCall 40 , Yuri Nikolsky 47 , Gene A Pennello 2 , Roger G Perkins 1 , Reena Philip 2 ,<br />
Vlad Popovici 28 , Nathan D Price 48 , Feng Qian 6 , Andreas Scherer 49 , Tieliu Shi 50 , Weiwei Shi 47 , Jaeyun Sung 48 ,<br />
Danielle Thierry-Mieg 51 , Jean Thierry-Mieg 51 , Venkata Thodima 52 , Johan Trygg 24 , Lakshmi Vishnuvajjala 2 ,<br />
Sue Jane Wang 8 , Jianping Wu 53 , Yichao Wu 54 , Qian Xie 55 , Waleed A Yousef 56 , Liang Zhang 53 , Xuegong Zhang 35 ,<br />
Sheng Zhong 57 , Yiming Zhou 10 , Sheng Zhu 53 , Dhivya Arasappan 6 , Wenjun Bao 7 , Anne Bergstrom Lucas 58 ,<br />
Frank Berthold 11 , Richard J Brennan 47 , Andreas Buness 59 , Jennifer G Catalano 41 , Chang Chang 50 ,<br />
Rong Chen 60 , Yiyu Cheng 36 , Jian Cui 50 , Wendy Czika 7 , Francesca Demichelis 61 , Xutao Deng 62 ,<br />
Damir Dosymbekov 63 , Roland Eils 23 , Yang Feng 34 , Jennifer Fostel 13 , Stephanie Fulmer-Smentek 58 ,<br />
James C Fuscoe 1 , Laurent Gatto 64 , Weigong Ge 1 , Darlene R Goldstein 65 , Li Guo 66 , Donald N Halbert 67 ,<br />
Jing Han 41 , Stephen C Harris 1 , Christos Hatzis 68 , Damir Herman 69 , Jianping Huang 36 , Roderick V Jensen 70 ,<br />
Rui Jiang 35 , Charles D Johnson 71 , Giuseppe Jurman 16 , Yvonne Kahlert 11 , Sadik A Khuder 72 , Matthias Kohl 73 ,<br />
Jianying Li 74 , Li Li 75 , Menglong Li 76 , Quan-Zhen Li 77 , Shao Li 36 , Zhiguang Li 1 , Jie Liu 1 , Ying Liu 35 , Zhichao Liu 1 ,<br />
Lu Meng 35 , Manuel Madera 18 , Francisco Martinez-Murillo 2 , Ignacio Medina 78 , Joseph Meehan 6 , Kelci Miclaus 7 ,<br />
Richard A Moffitt 20 , David Montaner 78 , Piali Mukherjee 33 , George J Mulligan 79 , Padraic Neville 7 ,<br />
Tatiana Nikolskaya 47 , Baitang Ning 1 , Grier P Page 80 , Joel Parker 3 , R Mitchell Parry 20 , Xuejun Peng 81 ,<br />
Ron L Peterson 82 , John H Phan 20 , Brian Quanz 39 , Yi Ren 83 , Samantha Riccadonna 16 , Alan H Roter 84 ,<br />
Frank W Samuelson 2 , Martin M Schumacher 85 , Joseph D Shambaugh 86 , Qiang Shi 1 , Richard Shippy 87 ,<br />
Shengzhu Si 88 , Aaron Smalter 39 , Christos Sotiriou 89 , Mat Soukup 8 , Frank Staedtler 85 , Guido Steiner 90 ,<br />
Todd H Stokes 20 , Qinglan Sun 53 , Pei-Yi Tan 7 , Rong Tang 2 , Zivana Tezak 2 , Brett Thorn 1 , Marina Tsyganova 63 ,<br />
Yaron Turpaz 91 , Silvia C Vega 92 , Roberto Visintainer 16 , Juergen von Frese 93 , Charles Wang 62 , Eric Wang 21 ,<br />
Junwei Wang 50 , Wei Wang 94 , Frank Westermann 23 , James C Willey 95 , Matthew Woods 21 , Shujian Wu 96 ,<br />
Nianqing Xiao 97 , Joshua Xu 6 , Lei Xu 1 , Lun Yang 1 , Xiao Zeng 44 , Jialu Zhang 8 , Li Zhang 8 , Min Zhang 1 ,<br />
Chen Zhao 50 , Raj K Puri 41 , Uwe Scherf 2 , Weida Tong 1 & Russell D Wolfinger 7<br />
1 National Center for Toxicological Research, US Food and Drug Administration, Jefferson, Arkansas, USA. 2 Center for Devices and Radiological Health, US Food and<br />
Drug Administration, Silver Spring, Maryland, USA. 3 Expression Analysis Inc., Durham, North Carolina, USA. 4 Department of Physiology and Biophysics and HRH<br />
Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Medical College of Cornell University, New York, New York, USA.<br />
5 Wake Forest Institute for Regenerative Medicine, Wake Forest University, Winston-Salem, North Carolina, USA. 6 Z-Tech, an ICF International Company at NCTR/FDA,<br />
Jefferson, Arkansas, USA. 7 SAS Institute Inc., Cary, North Carolina, USA. 8 Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring,<br />
Maryland, USA. 9 Breast Medical Oncology Department, University of Texas (UT) M.D. Anderson Cancer Center, Houston, Texas, USA. 10 Myeloma Institute for Research<br />
nature biotechnology VOLUME 28 NUMBER 8 AUGUST 2010 837
A rt i c l e s<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
and Therapy, University of Arkansas for Medical Sciences, Little Rock, Arkansas, USA. 11 Department of Pediatric Oncology and Hematology and Center for Molecular<br />
Medicine (CMMC), University of Cologne, Cologne, Germany. 12 The Hamner Institutes for Health Sciences, Research Triangle Park, North Carolina, USA. 13 National<br />
Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, North Carolina, USA. 14 Roche Palo Alto LLC, South San Francisco,<br />
California, USA. 15 Biomedical Informatics Center, Northwestern University, Chicago, Illinois, USA. 16 Fondazione Bruno Kessler, Povo-Trento, Italy. 17 Department of<br />
Mathematics & Statistics, South Dakota State University, Brookings, South Dakota, USA. 18 CMINDS Research Center, Department of Electrical and Computer<br />
Engineering, University of Massachusetts Lowell, Lowell, Massachusetts, USA. 19 Department of Pathology, UT M.D. Anderson Cancer Center, Houston, Texas, USA.<br />
20 Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, Georgia, USA. 21 Systems Analytics Inc., Waltham,<br />
Massachusetts, USA. 22 Hoffmann-LaRoche, Nutley, New Jersey, USA. 23 Department of Theoretical Bioinformatics, German Cancer Research Center (DKFZ),<br />
Heidelberg, Germany. 24 Computational Life Science Cluster (CLiC), Chemical Biology Center (KBC), Umeå University, Umeå, Sweden. 25 GlaxoSmithKline, Collegeville,<br />
Pennsylvania, USA. 26 Medical Systems Biology Research Center, School of Medicine, Tsinghua University, Beijing, China. 27 Almac Diagnostics Ltd., Craigavon, UK.<br />
28 Swiss Institute of Bioinformatics, Lausanne, Switzerland. 29 Department of Biological Sciences, University of Southern Mississippi, Hattiesburg, Mississippi, USA.<br />
30 Global Pharmaceutical R&D, Abbott Laboratories, Souderton, Pennsylvania, USA. 31 National Center for Computational Toxicology, US Environmental Protection<br />
Agency, Research Triangle Park, North Carolina, USA. 32 Department of Bioinformatics and Genomics, Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain.<br />
33 HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Medical College of Cornell University, New York, New York,<br />
USA. 34 Department of Operation Research and Financial Engineering, Princeton University, Princeton, New Jersey, USA. 35 MOE Key Laboratory of Bioinformatics<br />
and Bioinformatics Division, TNLIST / Department of Automation, Tsinghua University, Beijing, China. 36 Institute of Pharmaceutical Informatics, College of<br />
Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, China. 37 Roche Palo Alto LLC, Palo Alto, California, USA. 38 Department of Biostatistics,<br />
UT M.D. Anderson Cancer Center, Houston, Texas, USA. 39 Department of Electrical Engineering & Computer Science, University of Kansas, Lawrence, Kansas, USA.<br />
40 Department of Biostatistics, Johns Hopkins University, Baltimore, Maryland, USA. 41 Center for Biologics Evaluation and Research, US Food and Drug<br />
Administration, Bethesda, Maryland, USA. 42 Golden Helix Inc., Bozeman, Montana, USA. 43 Department of Computer Science, University of Illinois at Urbana-<br />
Champaign, Urbana, Illinois, USA. 44 SABiosciences Corp., a Qiagen Company, Frederick, Maryland, USA. 45 Cogenics, a Division of Clinical Data Inc., Morrisville,<br />
North Carolina, USA. 46 Ligand Pharmaceuticals Inc., La Jolla, California, USA. 47 GeneGo Inc., Encinitas, California, USA. 48 Department of Chemical and<br />
Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA. 49 Spheromics, Kontiolahti, Finland. 50 The Center for Bioinformatics and<br />
The Institute of Biomedical Sciences, School of Life Science, East China Normal University, Shanghai, China. 51 National Center for Biotechnology Information,<br />
National Institutes of Health, Bethesda, Maryland, USA. 52 Rockefeller Research Laboratories, Memorial Sloan-Kettering Cancer Center, New York, New York, USA.<br />
53 CapitalBio Corporation, Beijing, China. 54 Department of Statistics, North Carolina State University, Raleigh, North Carolina, USA. 55 SRA International (EMMES),<br />
Rockville, Maryland, USA. 56 Helwan University, Helwan, Egypt. 57 Department of Bioengineering, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA.<br />
58 Agilent Technologies Inc., Santa Clara, California, USA. 59 F. Hoffmann-La Roche Ltd., Basel, Switzerland. 60 Stanford Center for Biomedical Informatics Research,<br />
Stanford University, Stanford, California, USA. 61 Department of Pathology and Laboratory Medicine and HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute<br />
for Computational Biomedicine, Weill Medical College of Cornell University, New York, New York, USA. 62 Cedars-Sinai Medical Center, UCLA David Geffen School of<br />
Medicine, Los Angeles, California, USA. 63 Vavilov Institute for General Genetics, Russian Academy of Sciences, Moscow, Russia. 64 DNAVision SA, Gosselies, Belgium.<br />
65 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland. 66 State Key Laboratory of Multi-phase Complex Systems, Institute of Process<br />
Engineering, Chinese Academy of Sciences, Beijing, China. 67 Abbott Laboratories, Abbott Park, Illinois, USA. 68 Nuvera Biosciences Inc., Woburn, Massachusetts,<br />
USA. 69 Winthrop P. Rockefeller Cancer Institute, University of Arkansas for Medical Sciences, Little Rock, Arkansas, USA. 70 VirginiaTech, Blacksburg, Virgina, USA.<br />
71 BioMath Solutions, LLC, Austin, Texas, USA. 72 Bioinformatic Program, University of Toledo, Toledo, Ohio, USA. 73 Department of Mathematics, University of<br />
Bayreuth, Bayreuth, Germany. 74 Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, North Carolina, USA. 75 Pediatric Department,<br />
Stanford University, Stanford, California, USA. 76 College of Chemistry, Sichuan University, Chengdu, Sichuan, China. 77 University of Texas Southwestern Medical<br />
Center (UTSW), Dallas, Texas, USA. 78 Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain. 79 Millennium Pharmaceuticals Inc., Cambridge,<br />
Massachusetts, USA. 80 RTI International, Atlanta, Georgia, USA. 81 Takeda Global R & D Center, Inc., Deerfield, Illinois, USA. 82 Novartis Institutes of Biomedical<br />
Research, Cambridge, Massachusetts, USA. 83 W.M. Keck Center for Collaborative Neuroscience, Rutgers, The State University of New Jersey, Piscataway, New Jersey,<br />
USA. 84 Entelos Inc., Foster City, California, USA. 85 Biomarker Development, Novartis Institutes of BioMedical Research, Novartis Pharma AG, Basel, Switzerland.<br />
86 Genedata Inc., Lexington, Massachusetts, USA. 87 Affymetrix Inc., Santa Clara, California, USA. 88 Department of Chemistry and Chemical Engineering, Hefei<br />
Teachers College, Hefei, Anhui, China. 89 Institut Jules Bordet, Brussels, Belgium. 90 Biostatistics, F. Hoffmann-La Roche Ltd., Basel, Switzerland. 91 Lilly Singapore<br />
Centre for Drug Discovery, Immunos, Singapore. 92 Microsoft Corporation, US Health Solutions Group, Redmond, Washington, USA. 93 Data Analysis Solutions DA-SOL<br />
GmbH, Greifenberg, Germany. 94 Cornell University, Ithaca, New York, USA. 95 Division of Pulmonary and Critical Care Medicine, Department of Medicine, University of<br />
Toledo Health Sciences Campus, Toledo, Ohio, USA. 96 Bristol-Myers Squibb, Pennington, New Jersey, USA. 97 OpGen Inc., Gaithersburg, Maryland, USA.<br />
838 VOLUME 28 NUMBER 8 AUGUST 2010 nature biotechnology
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
ONLINE METHODS<br />
MAQC-II participants. MAQC-II participants can be grouped into several<br />
categories. Data providers are the participants who provided data sets to the<br />
consortium. The MAQC-II Regulatory Biostatistics Working Group, whose<br />
members included a number of biostatisticians, provided guidance and standard<br />
operating procedures for model development and performance estimation. One<br />
or more data analysis teams were formed at each organization. Each data analysis<br />
team actively analyzed the data sets and produced prediction models. Other participants<br />
also contributed to discussion and execution of the project. The 36 data<br />
analysis teams listed in Supplementary Table 3 developed data analysis protocols<br />
and predictive models for one or more of the 13 endpoints. The teams included<br />
more than 100 scientists and engineers with diverse backgrounds in machine<br />
learning, statistics, biology, medicine and chemistry, among others. They volunteered<br />
tremendous time and effort to conduct the data analysis tasks.<br />
Six data sets including 13 prediction endpoints. To increase the chance<br />
that MAQC-II would reach generalized conclusions, consortium members<br />
strongly believed that they needed to study several data sets, each of high<br />
quality and sufficient size, which would collectively represent a diverse set of<br />
prediction tasks. Accordingly, significant early effort went toward the selection<br />
of appropriate data sets. Over ten nominated data sets were reviewed<br />
for quality of sample collection and processing consistency, and quality of<br />
microarray and clinical data. Six data sets with 13 endpoints were ultimately<br />
selected among those nominated during a face-to-face project meeting with<br />
extensive deliberations among many participants (Table 1). Importantly, three<br />
preclinical (toxicogenomics) and three clinical data sets were selected to test<br />
whether baseline practice conclusions could be generalized across these rather<br />
disparate experimental types. An important criterion for data set selection<br />
was the anticipated support of MAQC-II by the data provider and the commitment<br />
to continue experimentation to provide a large external validation<br />
test set of comparable size to the training set. The three toxicogenomics data<br />
sets would allow the development of predictive models that predict toxicity<br />
of compounds in animal models, a prediction task of interest to the pharmaceutical<br />
industry, which could use such models to speed up the evaluation of<br />
toxicity for new drug candidates. The three clinical data sets were for endpoints<br />
associated with three diseases, breast cancer (BR), multiple myeloma (MM)<br />
and neuroblastoma (NB). Each clinical data set had more than one endpoint,<br />
and together incorporated several types of clinical applications, including<br />
treatment outcome and disease prognosis. The MAQC-II predictive modeling<br />
was limited to binary classification problems; therefore, continuous endpoint<br />
values such as overall survival (OS) and event-free survival (EFS) times were<br />
dichotomized using a ‘milestone’ cutoff of censor data. Prediction endpoints<br />
were chosen to span a wide range of prediction difficulty. Two endpoints,<br />
H (CPS1) and L (NEP_S), representing the sex of the patients, were used as<br />
positive control endpoints, as they are easily predictable by microarrays. Two<br />
other endpoints, I (CPR1) and M (NEP_R), representing randomly assigned<br />
class labels, were designed to serve as negative control endpoints, as they<br />
are not supposed to be predictable. Data analysis teams were not aware of<br />
the characteristics of endpoints H, I, L and M until their swap prediction<br />
results had been submitted. If a data analysis protocol did not yield models to<br />
accurately predict endpoints H and L, or if a data analysis protocol claims to<br />
be able to yield models to accurately predict endpoints I and M, something<br />
must have gone wrong.<br />
The Hamner data set (endpoint A) was provided by The Hamner Institutes<br />
for Health Sciences. The study objective was to apply microarray gene expression<br />
data from the lung of female B6C3F1 mice exposed to a 13-week treatment<br />
of chemicals to predict increased lung tumor incidence in the 2-year<br />
rodent cancer bioassays of the National Toxicology Program 50 . If successful,<br />
the results may form the basis of a more efficient and economical approach<br />
for evaluating the carcinogenic activity of chemicals. Microarray analysis was<br />
performed using Affymetrix Mouse Genome 430 2.0 arrays on three to four<br />
mice per treatment group, and a total of 70 mice were analyzed and used as<br />
MAQC-II’s training set. Additional data from another set of 88 mice were<br />
collected later and provided as MAQC-II’s external validation set.<br />
The Iconix data set (endpoint B) was provided by Iconix Biosciences.<br />
The study objective was to assess, upon short-term exposure, hepatic tumor<br />
induction by nongenotoxic chemicals 51 , as there are currently no accurate and<br />
well-validated short-term tests to identify nongenotoxic hepatic tumorigens,<br />
thus necessitating an expensive 2-year rodent bioassay before a risk assessment<br />
can begin. The training set consists of hepatic gene expression data from 216<br />
male Sprague-Dawley rats treated for 5 d with one of 76 structurally and mechanistically<br />
diverse nongenotoxic hepatocarcinogens and nonhepatocarcinogens.<br />
The validation set consists of 201 male Sprague-Dawley rats treated for 5 d with<br />
one of 68 structurally and mechanistically diverse nongenotoxic hepatocarcinogens<br />
and nonhepatocarcinogens. Gene expression data were generated using the<br />
Amersham Codelink Uniset Rat 1 Bioarray (GE HealthCare) 52 . The separation<br />
of the training set and validation set was based on the time when the microarray<br />
data were collected; that is, microarrays processed earlier in the study<br />
were used as training and those processed later were used as validation.<br />
The NIEHS data set (endpoint C) was provided by the National Institute<br />
of Environmental Health Sciences (NIEHS) of the US National Institutes<br />
of Health. The study objective was to use microarray gene expression data<br />
acquired from the liver of rats exposed to hepatotoxicants to build classifiers<br />
for prediction of liver necrosis. The gene expression ‘compendium’ data set<br />
was collected from 418 rats exposed to one of eight compounds (1,2-dichlorobenzene,<br />
1,4-dichlorobenzene, bromobenzene, monocrotaline, N-nitrosomorpholine,<br />
thioacetamide, galactosamine and diquat dibromide). All eight<br />
compounds were studied using standardized procedures, that is, a common<br />
array platform (Affymetrix Rat 230 2.0 microarray), experimental procedures<br />
and data retrieving and analysis processes. For details of the experimental<br />
design see ref. 53. Briefly, for each compound, four to six male, 12-week-old<br />
F344 rats were exposed to a low dose, mid dose(s) and a high dose of the toxicant<br />
and sacrificed 6, 24 and 48 h later. At necropsy, liver was harvested for<br />
RNA extraction, histopathology and clinical chemistry assessments.<br />
Animal use in the studies was approved by the respective Institutional<br />
Animal Use and Care Committees of the data providers and was conducted<br />
in accordance with the National Institutes of Health (NIH) guidelines<br />
for the care and use of laboratory animals. Animals were housed in fully<br />
accredited American Association for Accreditation of Laboratory Animal<br />
Care facilities.<br />
The human breast cancer (BR) data set (endpoints D and E) was contributed<br />
by the University of Texas M.D. Anderson Cancer Center. Gene expression data<br />
from 230 stage I–III breast cancers were generated from fine needle aspiration<br />
specimens of newly diagnosed breast cancers before any therapy. The biopsy<br />
specimens were collected sequentially during a prospective pharmacogenomic<br />
marker discovery study between 2000 and 2008. These specimens represent<br />
70–90% pure neoplastic cells with minimal stromal contamination 54 . Patients<br />
received 6 months of preoperative (neoadjuvant) chemotherapy including<br />
paclitaxel (Taxol), 5-fluorouracil, cyclophosphamide and doxorubicin<br />
(Adriamycin) followed by surgical resection of the cancer. Response to preoperative<br />
chemotherapy was categorized as a pathological complete response<br />
(pCR = no residual invasive cancer in the breast or lymph nodes) or residual<br />
invasive cancer (RD), and used as endpoint D for prediction. Endpoint E is the<br />
clinical estrogen-receptor status as established by immunohistochemistry 55 .<br />
RNA extraction and gene expression profiling were performed in multiple<br />
batches over time using Affymetrix U133A microarrays. Genomic analysis of<br />
a subset of this sequentially accrued patient population were reported previously<br />
56 . For each endpoint, the first 130 cases were used as a training set and<br />
the next 100 cases were used as an independent validation set.<br />
The multiple myeloma (MM) data set (endpoints F, G, H and I) was contributed<br />
by the Myeloma Institute for Research and Therapy at the University<br />
of Arkansas for Medical Sciences. Gene expression profiling of highly purified<br />
bone marrow plasma cells was performed in newly diagnosed patients with<br />
MM 57–59 . The training set consisted of 340 cases enrolled in total therapy 2<br />
(TT2) and the validation set comprised 214 patients enrolled in total therapy 3<br />
(TT3) 59 . Plasma cells were enriched by anti-CD138 immunomagnetic bead<br />
selection of mononuclear cell fractions of bone marrow aspirates in a central<br />
laboratory. All samples applied to the microarray contained >85% plasma<br />
cells as determined by two-color flow cytometry (CD38 + and CD45 − /dim)<br />
performed after selection. Dichotomized overall survival (OS) and event-free<br />
survival (EFS) were determined based on a 2-year milestone cutoff. A gene<br />
expression model of high-risk multiple myeloma was developed and validated<br />
by the data provider 58 and later on validated in three additional independent<br />
data sets 60–62 .<br />
doi:10.1038/nbt.1665<br />
nature biotechnology
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
The neuroblastoma (NB) data set (endpoints J, K, L and M) was contributed<br />
by the Children’s Hospital of the University of Cologne, Germany. Tumor<br />
samples were checked by a pathologist before RNA isolation; only samples<br />
with ≥60% tumor content were used and total RNA was isolated from ~50 mg<br />
of snap-frozen neuroblastoma tissue obtained before chemotherapeutic<br />
treatment. First, 502 preexisting 11 K Agilent dye-flipped, dual-color replicate<br />
profiles for 251 patients were provided 63 . Of these, profiles of 246 neuroblastoma<br />
samples passed an independent MAQC-II quality assessment by majority<br />
decision and formed the MAQC-II training data set. Subsequently, 514 dyeflipped<br />
dual-color 11 K replicate profiles for 256 independent neuroblastoma<br />
tumor samples were generated and profiles for 253 samples were selected to<br />
form the MAQC-II validation set. Of note, for one patient of the validation<br />
set, two different tumor samples were analyzed using both versions of the<br />
2 × 11K microarray (see below). All dual-color gene-expression of the MAQC-II<br />
training set were generated using a customized 2 × 11K neuroblastoma-related<br />
microarray 63 . Furthermore, 20 patients of the MAQC-II validation set were<br />
also profiled using this microarray. Dual-color profiles of the remaining<br />
patients of the MAQC-II validation set were performed using a slightly revised<br />
version of the 2 × 11K microarray. This version V2.0 of the array comprised<br />
200 novel oligonucleotide probes whereas 100 oligonucleotide probes of the<br />
original design were removed due to consistent low expression values (near<br />
background) observed in the training set profiles. These minor modifications<br />
of the microarray design resulted in a total of 9,986 probes present on both<br />
versions of the 2 × 11K microarray. The experimental protocol did not differ<br />
between both sets and gene-expression profiles were performed as described 63 .<br />
Furthermore, single-color gene-expression profiles were generated for 478/499<br />
neuroblastoma samples of the MAQC-II dual-color training and validation sets<br />
(training set 244/246; validation set 234/253). For the remaining 21 samples<br />
no single-color data were available, due to either shortage of tumor material<br />
of these patients (n = 15), poor experimental quality of the generated singlecolor<br />
profiles (n = 5), or correlation of one single-color profile to two different<br />
dual-color profiles for the one patient profiled with both versions of the 2 ×<br />
11K microarrays (n = 1). Single-color gene-expression profiles were generated<br />
using customized 4 × 44K oligonucleotide microarrays produced by Agilent<br />
Technologies. These 4 × 44K microarrays included all probes represented by<br />
Agilent’s Whole Human Genome Oligo Microarray and all probes of the version<br />
V2.0 of the 2 × 11K customized microarray that were not present in the<br />
former probe set. Labeling and hybridization was performed following the<br />
manufacturer’s protocol as described 48 .<br />
Sample annotation information along with clinical co-variates of the patient<br />
cohorts is available at the MAQC web site (http://edkb.fda.gov/MAQC/). The<br />
institutional review boards of the respective providers of the clinical microarray<br />
data sets had approved the research studies, and all subjects had provided<br />
written informed consent to both treatment protocols and sample procurement,<br />
in accordance with the Declaration of Helsinki.<br />
MAQC-II effort and data analysis procedure. This section provides details<br />
about some of the analysis steps presented in Figure 1. Steps 2–4 in a first<br />
round of analysis was conducted where each data analysis team analyzed<br />
MAQC-II data sets to generate predictive models and associated performance<br />
estimates. After this first round of analysis, most participants attended<br />
a consortium meeting where approaches were presented and discussed. The<br />
meeting helped members decide on a common performance evaluation protocol,<br />
which most data analysis teams agreed to follow to render performance<br />
statistics comparable across the consortium. It should be noted that some data<br />
analysis teams decided not to follow the recommendations for performance<br />
evaluation protocol and used instead an approach of their choosing, resulting<br />
in various internal validation approaches in the final results. Data analysis<br />
teams were given 2 months to implement the revised analysis protocol (the<br />
group recommended using fivefold stratified cross-validation with ten repeats<br />
across all endpoints for the internal validation strategy) and submit their final<br />
models. The amount of metadata to collect for characterizing the modeling<br />
approach used to derive each model was also discussed at the meeting.<br />
For each endpoint, each team was also required to select one of its<br />
submitted models as its nominated model. No specific guideline was given<br />
and groups could select nominated models according to any objective or<br />
subjective criteria. Because the consortium lacked an agreed upon reference<br />
performance measure (Supplementary Fig. 13), it was not clear how the<br />
nominated models would be evaluated, and data analysis teams ranked models<br />
by different measures or combinations of measures. Data analysis teams were<br />
encouraged to report a common set of performance measures for each model<br />
so that models could be reranked consistently a posteriori. Models trained<br />
with the training set were frozen (step 6). MAQC-II selected for each endpoint<br />
one model from the up-to 36 nominations as the MAQC-II candidate<br />
for validation (step 6).<br />
External validation sets lacking class labels for all endpoints were distributed<br />
to the data analysis teams. Each data analysis team used its previously<br />
frozen models to make class predictions on the validation data set (step 7).<br />
The sample-by-sample prediction results were submitted to MAQC-II by<br />
each data analysis team (step 8). Results were used to calculate the external<br />
validation performance metrics for each model. Calculations were carried<br />
out by three independent groups not involved in developing models, which<br />
were provided with validation class labels. Data analysis teams that still had<br />
no access to the validation class labels were given an opportunity to correct<br />
apparent clerical mistakes in prediction submissions (e.g., inversion of class<br />
labels). Class labels were then distributed to enable data analysis teams to<br />
check prediction performance metrics and perform in depth analysis of results.<br />
A table of performance metrics was assembled from information collected in<br />
steps 5 and 8 (step 10, Supplementary Table 1).<br />
To check the consistency of modeling approaches, the original validation and<br />
training sets were swapped and steps 4–10 were repeated (step 11). Briefly, each<br />
team used the validation class labels and the validation data sets as a training<br />
set. Prediction models and evaluation performance were collected by internal<br />
and external validation (considering the original training set as a validation<br />
set). Data analysis teams were asked to apply the same data analysis protocols<br />
that they used for the original ‘Blind’ Training → Validation analysis. Swap<br />
analysis results are provided in Supplementary Table 2. It should be noted<br />
that during the swap experiment, the data analysis teams inevitably already<br />
had access to the class label information for samples in the swap validation set,<br />
that is, the original training set.<br />
Model summary information tables. To enable a systematic comparison of<br />
models for each endpoint, a table of information was constructed containing<br />
a row for each model from each data analysis team, with columns containing<br />
three categories of information: (i) modeling factors that describe the model<br />
development process; (ii) performance metrics from internal validation; and<br />
(iii) performance metrics from external validation (Fig. 1; step 10).<br />
Each data analysis team was requested to report several modeling factors for<br />
each model they generated. These modeling factors are organization code, data<br />
set code, endpoint code, summary or normalization method, feature selection<br />
method, number of features used in final model, classification algorithm,<br />
internal validation protocol, validation iterations (number of repeats of crossvalidation<br />
or bootstrap sampling) and batch-effect-removal method. A set of<br />
valid entries for each modeling factor was distributed to all data analysis teams<br />
in advance of model submission, to help consolidate a common vocabulary<br />
that would support analysis of the completed information table. It should be<br />
noted that since modeling factors are self-reported, two models that share a<br />
given modeling factor may still differ in their implementation of the modeling<br />
approach described by the modeling factor.<br />
The seven performance metrics for internal validation and external validation<br />
are MCC (Matthews Correlation Coefficient), accuracy, sensitivity, specificity,<br />
AUC (area under the receiver operating characteristic curve), binary<br />
AUC (that is, mean of sensitivity and specificity) and r.m.s.e. For internal<br />
validation, s.d. for each performance metric is also included in the table.<br />
Missing entries indicate that the data analysis team has not submitted the<br />
requested information.<br />
In addition, the lists of features used in the data analysis team’s nominated<br />
models are recorded as part of the model submission for functional analysis<br />
and reproducibility assessment of the feature lists (see the MAQC Web site at<br />
http://edkb.fda.gov/MAQC/).<br />
Selection of nominated models by each data analysis team and selection<br />
of MAQC-II candidate and backup models by RBWG and the steering<br />
committee. In addition to providing results to generate the model information<br />
nature biotechnology<br />
doi:10.1038/nbt.1665
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
table, each team nominated a single model for each endpoint as its preferred<br />
model for validation, resulting in a total of 323 nominated models, 318 of<br />
which were applied to the prediction of the validation sets. These nominated<br />
models were peer reviewed, debated and ranked for each endpoint by the<br />
RBWG before validation set predictions. The rankings were given to the<br />
MAQC-II steering committee, and those members not directly involved in<br />
developing models selected a single model for each endpoint, forming the 13<br />
MAQC-II candidate models. If there was sufficient evidence through documentation<br />
to establish that the data analysis team had followed the guidelines<br />
of good classifier principles for model development outlined in the standard<br />
operating procedure (Supplementary Data), then their nominated models<br />
were considered as potential candidate models. The nomination and selection<br />
of candidate models occurred before the validation data were released.<br />
Selection of one candidate model for each endpoint across MAQC-II was<br />
performed to reduce multiple selection concerns. This selection process turned<br />
out to be highly interesting, time consuming, but worthy, as participants had<br />
different viewpoints and criteria in ranking the data analysis protocols and<br />
selecting the candidate model for an endpoint. One additional criterion was<br />
to select the 13 candidate models in such a way that only one of the 13 models<br />
would be selected from the same data analysis team to ensure that a variety<br />
of approaches to model development were considered. For each endpoint, a<br />
backup model was also selected under the same selection process and criteria<br />
as for the candidate models. The 13 candidate models selected by MAQC-II<br />
indeed performed well in the validation prediction (Figs. 2c and 3).<br />
50. Thomas, R.S., Pluta, L., Yang, L. & Halsey, T.A. Application of genomic biomarkers<br />
to predict increased lung tumor incidence in 2-year rodent cancer bioassays. Toxicol.<br />
Sci. 97, 55–64 (2007).<br />
51. Fielden, M.R., Brennan, R. & Gollub, J. A gene expression biomarker provides early<br />
prediction and mechanistic assessment of hepatic tumor induction by nongenotoxic<br />
chemicals. Toxicol. Sci. 99, 90–100 (2007).<br />
52. Ganter, B. et al. Development of a large-scale chemogenomics database to improve<br />
drug candidate selection and to understand mechanisms of chemical toxicity and<br />
action. J. Biotechnol. 119, 219–244 (2005).<br />
53. Lobenhofer, E.K. et al. Gene expression response in target organ and whole blood<br />
varies as a function of target organ injury phenotype. Genome Biol. 9, R100<br />
(2008).<br />
54. Symmans, W.F. et al. Total RNA yield and microarray gene expression profiles from<br />
fine-needle aspiration biopsy and core-needle biopsy samples of breast carcinoma.<br />
Cancer 97, 2960–2971 (2003).<br />
55. Gong, Y. et al. Determination of oestrogen-receptor status and ERBB2 status of<br />
breast carcinoma: a gene-expression profiling study. Lancet Oncol. 8, 203–211<br />
(2007).<br />
56. Hess, K.R. et al. Pharmacogenomic predictor of sensitivity to preoperative<br />
chemotherapy with paclitaxel and fluorouracil, doxorubicin, and cyclophosphamide<br />
in breast cancer. J. Clin. Oncol. 24, 4236–4244 (2006).<br />
57. Zhan, F. et al. The molecular classification of multiple myeloma. Blood 108,<br />
2020–2028 (2006).<br />
58. Shaughnessy, J.D. Jr. et al. A validated gene expression model of high-risk multiple<br />
myeloma is defined by deregulated expression of genes mapping to chromosome 1.<br />
Blood 109, 2276–2284 (2007).<br />
59. Barlogie, B. et al. Thalidomide and hematopoietic-cell transplantation for multiple<br />
myeloma. N. Engl. J. Med. 354, 1021–1030 (2006).<br />
60. Zhan, F., Barlogie, B., Mulligan, G., Shaughnessy, J.D. Jr. & Bryant, B. High-risk<br />
myeloma: a gene expression based risk-stratification model for newly diagnosed<br />
multiple myeloma treated with high-dose therapy is predictive of outcome in<br />
relapsed disease treated with single-agent bortezomib or high-dose dexamethasone.<br />
Blood 111, 968–969 (2008).<br />
61. Chng, W.J., Kuehl, W.M., Bergsagel, P.L. & Fonseca, R. Translocation t(4;14) retains<br />
prognostic significance even in the setting of high-risk molecular signature.<br />
Leukemia 22, 459–461 (2008).<br />
62. Decaux, O. et al. Prediction of survival in multiple myeloma based on gene<br />
expression profiles reveals cell cycle and chromosomal instability signatures in<br />
high-risk patients and hyperdiploid signatures in low-risk patients: a study of the<br />
Intergroupe Francophone du Myelome. J. Clin. Oncol. 26, 4798–4805 (2008).<br />
63. Oberthuer, A. et al. Customized oligonucleotide microarray gene expression-based<br />
classification of neuroblastoma patients outperforms current clinical risk<br />
stratification. J. Clin. Oncol. 24, 5070–5078 (2006).<br />
doi:10.1038/nbt.1665<br />
nature biotechnology
articles<br />
Human hematopoietic stem/progenitor cells modified<br />
by zinc-finger nucleases targeted to CCR5 control<br />
HIV-1 in vivo<br />
Nathalia Holt 1 , Jianbin Wang 2 , Kenneth Kim 2 , Geoffrey Friedman 2 , Xingchao Wang 3 , Vanessa Taupin 3 ,<br />
Gay M Crooks 4 , Donald B Kohn 4 , Philip D Gregory 2 , Michael C Holmes 2 & Paula M Cannon 1<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
CCR5 is the major HIV-1 co-receptor, and individuals homozygous for a 32-bp deletion in CCR5 are resistant to infection by<br />
CCR5-tropic HIV-1. Using engineered zinc-finger nucleases (ZFNs), we disrupted CCR5 in human CD34 + hematopoietic stem/<br />
progenitor cells (HSPCs) at a mean frequency of 17% of the total alleles in a population. This procedure produces both mono- and<br />
bi-allelically disrupted cells. ZFN-treated HSPCs retained the ability to engraft NOD/SCID/IL2rγ null mice and gave rise to polyclonal<br />
multi-lineage progeny in which CCR5 was permanently disrupted. Control mice receiving untreated HSPCs and challenged with<br />
CCR5-tropic HIV-1 showed profound CD4 + T-cell loss. In contrast, mice transplanted with ZFN-modified HSPCs underwent<br />
rapid selection for CCR5 −/− cells, had significantly lower HIV-1 levels and preserved human cells throughout their tissues. The<br />
demonstration that a minority of CCR5 −/− HSPCs can populate an infected animal with HIV-1-resistant, CCR5 −/− progeny supports<br />
the use of ZFN-modified autologous hematopoietic stem cells as a clinical approach to treating HIV-1.<br />
The entry of HIV-1 into target cells involves sequential binding of<br />
the viral gp120 Env protein to the CD4 receptor and a chemokine<br />
co-receptor 1 . CCR5 is the major co-receptor used by HIV-1 and is<br />
expressed on key T-cell subsets that are depleted during HIV-1 infection,<br />
including memory T cells 2 . A genetic 32-bp deletion in CCR5<br />
(CCR5Δ32) is relatively common in Western European populations<br />
and confers resistance to HIV-1 infection and AIDS in homozygotes<br />
3,4 . The absence of any other significant phenotype associated<br />
with a lack of CCR5 (refs. 5–7) has spurred the development of<br />
therapies aimed at blocking the virus–CCR5 interaction, and CCR5<br />
antagonists have proved to be an effective salvage therapy in patients<br />
with drug-resistant strains of HIV-1 (ref. 8).<br />
Recently, the ability of CCR5 −/− mobilized CD34 + peripheral blood<br />
cells to generate HIV-resistant progeny that suppress HIV-1 replication in<br />
vivo was demonstrated in an HIV-infected patient undergoing transplantation<br />
from a homozygous CCR5Δ32 donor during treatment for acute<br />
myeloid leukemia 9 . The donor cells conferred long-term control of HIV-1<br />
replication and restored the patient’s CD4 + T-cell levels in the absence of<br />
antiretroviral drug therapy. These clinical data support the potential of<br />
gene or stem cell therapies based on the elimination of CCR5. However,<br />
the risks associated with allogeneic transplantation and the impracticality<br />
of obtaining sufficient numbers of matched CCR5Δ32 donors 10<br />
mean that broader application of this approach will require methods for<br />
generating autologous CCR5 −/− cells. Various gene therapy approaches<br />
to block CCR5 expression are being evaluated, including CCR5-specific<br />
ribozymes 11,12 , siRNAs 13 and intrabodies 14 . The targeted cell populations<br />
include both mature T cells and CD34 + HSPCs. Loss of CCR5 in HSPCs<br />
appears to have no adverse effects on hematopoiesis 12,13,15 .<br />
An alternative approach is the use of engineered ZFNs to permanently<br />
disrupt the CCR5 open reading frame. ZFNs comprise a series<br />
of linked zinc fingers engineered to bind specific DNA sequences<br />
and fused to an endonuclease domain 16 . Concerted binding of two<br />
juxtaposed ZFNs on DNA, followed by dimerization of the endonuclease<br />
domains, generates a double-stranded break at the DNA<br />
target. Such double-stranded breaks are rapidly repaired by cellular<br />
repair pathways, notably the mutagenic nonhomologous end-joining<br />
pathway, which leads to frequent disruption of the gene due to the<br />
addition or deletion of nucleotides at the break site 17,18 . A significant<br />
advantage of this approach is that permanent gene disruption can<br />
result from only transient ZFN expression.<br />
CD4 + T cells modified by CCR5-targeted ZFNs 19 are currently being<br />
evaluated in a clinical trial. However, disruption of CCR5 in HSPCs<br />
is likely to provide a more durable anti-viral effect and to give rise to<br />
CCR5 −/− cells in both the lymphoid and myeloid compartments that<br />
HIV-1 infects. To evaluate this approach, we optimized the delivery of<br />
CCR5-specific ZFNs to human CD34 + HSPCs and transplanted the<br />
modified cells into nonobese diabetic/severe combined immunodeficient/interleukin<br />
2rγ null (NOD/SCID/IL2rγ null ; NSG) mice, which support<br />
both human hematopoiesis 20 and HIV-1 infection 13 . Infection of<br />
the mice with a CCR5-tropic strain of HIV-1 led to rapid selection for<br />
CCR5 – human cells, a significant reduction in viral load and protection<br />
of human T-cell populations in the key tissues that HIV-1 infects. These<br />
1 Keck School of Medicine of the University of Southern California, Los Angeles, California, USA. 2 Sangamo BioSciences, Inc., Richmond, California, USA. 3 Childrens<br />
Hospital Los Angeles, Los Angeles, California, USA. 4 David Geffen School of Medicine at the University of California Los Angeles, Los Angeles, California, USA.<br />
Correspondence should be addressed to P.M.C. (pcannon@usc.edu).<br />
Received 20 October 2009; accepted 24 June 2010; published online 2 July 2010; corrected online 22 July 2010; doi:10.1038/nbt.1663<br />
nature biotechnology volume 28 number 8 august 2010 839
articles<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
Figure 1 ZFN-mediated disruption of CCR5 in<br />
CD34 + HSPCs. (a) Representative gel showing<br />
extent of CCR5 disruption in CD34 + HSPCs<br />
24 h after nucleofection with ZFN-expressing<br />
plasmids (ZFN) or mock nucleofected (mock).<br />
Neg. is untreated CD34 + HSPCs. CCR5<br />
disruption was measured by PCR amplification<br />
across the ZFN target site, followed by Cel<br />
1 nuclease digestion and quantification of<br />
products by PAGE. (b) Graph showing mean<br />
± s.d. percentage of human CD45 + cells in<br />
peripheral blood of mice at 8 weeks after<br />
transplantation with either untreated, mock<br />
nucleofected or ZFN nucleofected CD34 +<br />
HSPCs (n = 5 each group). (c) FACS profiles<br />
of human cells from various organs of one<br />
representative mouse into which ZFN-treated<br />
CD34 + HSPCs were transplanted. Cells were<br />
gated on FSC/SSC (forward scatter/ side<br />
scatter) to remove debris. Staining for human<br />
CD45, a pan leukocyte marker, was used to<br />
reveal the level of engraftment with human<br />
cells in each organ. CD45 + -gated populations<br />
were further analyzed for subsets, as indicated:<br />
CD19 (B cells) in bone marrow, CD14<br />
(monocytes/macrophages) in lung, CD4 and<br />
CD8 (T cells) in thymus and spleen and CD3 (T<br />
cells) in the small intestine (lamina propria).<br />
The CD45 + population from the small intestine<br />
was further analyzed for CD4 and CCR5<br />
expression. Peripheral blood cells from CD45 +<br />
and lymphoid gates were analyzed for CD4<br />
and CD8 expression. The percentage of cells<br />
in each indicated area is shown. No staining<br />
was observed with isotype-matched control<br />
antibodies (Supplementary Fig. 1) or in animals<br />
receiving no human graft (data not shown).<br />
Bone marrow<br />
findings suggest that ZFN engineering of autologous HSPCs may enable<br />
long-term control of HIV-1 in infected individuals.<br />
CD34 + cells<br />
Neg. Mock ZFN<br />
0% 0% 16%<br />
RESULTS<br />
Efficient disruption of CCR5 in human CD34 + HSPCs<br />
Gene delivery methods suitable to express ZFNs include plasmid<br />
DNA nucleofection 16 , integrase-defective lentiviral vectors 21 and<br />
adenoviral vectors 19 . Although nonviral methods are attractive,<br />
nucleofection can be associated with relatively high toxicity for<br />
human CD34 + HSPCs and loss of engraftment potential 22 , although,<br />
more recently, less toxic outcomes have been described 23–25 . We<br />
evaluated different parameters to identify nucleofection conditions<br />
that allowed efficient disruption of CCR5 while limiting toxicity. The<br />
extent of CCR5 disruption was quantified using PCR amplification<br />
across the CCR5 locus, denaturation and reannealing of products,<br />
and digestion with the Cel 1 nuclease, which preferentially cleaves<br />
DNA at distorted duplexes caused by mismatches. The Cel 1 nuclease<br />
assay detects a linear range of CCR5 disruption between 0.69% and<br />
44% of the total alleles in a population, with an upper limit of sensitivity<br />
of 70–80% disruption (ref. 19 and data not shown). We used<br />
this assay to monitor CCR5 disruption as only a minority of human<br />
CD34 + cells expresses CCR5 (ref. 26), making it difficult to measure<br />
CCR5 expression by flow cytometry.<br />
Using CD34 + HSPCs harvested from umbilical cord blood and optimized<br />
nucleofection conditions, we achieved mean disruption rates of<br />
a<br />
c<br />
1,000<br />
SSC<br />
SSC<br />
SSC<br />
74<br />
CD45<br />
0<br />
10 0 10 1 10 2 10 3 10 4<br />
CD45<br />
CD8<br />
CD45<br />
Cel 1 digestion<br />
products<br />
CCR5<br />
Lung<br />
SSC<br />
% CD45 + in blood<br />
CD8<br />
100<br />
17% ± 10 (n = 21) of the total CCR5 alleles in the population (Fig. 1a).<br />
Similar results were also achieved using CD34 + HSPCs isolated from<br />
human fetal liver (data not shown). Previous studies in human cell<br />
lines 16 and primary human T cells 19 have shown that the percentage<br />
of bi-allelically modified cells in a ZFN-treated population is 30–40%<br />
of the total number of disrupted alleles detected by the Cel 1 assay. We<br />
therefore estimated that 5–7% of ZFN-treated cells would be CCR5 −/− ,<br />
although this was not directly measured.<br />
We evaluated toxicity by measuring induction of apoptosis. Although<br />
nucleofection increased toxicity to human CD34 + cells threefold compared<br />
to untreated cells, inclusion of the ZFN plasmids had no additional<br />
effect compared to mock nucleofected controls (data not shown).<br />
Overall, we consider that any adverse effects of nucleofection on cell<br />
viability may be offset by the high levels of CCR5 disruption achieved<br />
as well as the speed and simplicity of the procedure compared to viral<br />
vector systems 19,21 .<br />
ZFN-modified CD34 + HSPCs are capable of multi-lineage<br />
engraftment in NSG mice<br />
NSG mice can be engrafted with human CD34 + HSPCs 20 and thereby<br />
provide a rigorous readout of the hematopoietic potential of genetically<br />
modified HSPCs. We evaluated the effects of nucleofection and/<br />
or CCR5 disruption by transplanting both untreated and ZFN-treated<br />
human CD34 + HSPCs into 1-d-old mice that had received low-dose<br />
(150 cGy) radiation. Engraftment of human cells was efficient and rapid,<br />
80<br />
60<br />
40<br />
20<br />
0<br />
Neg. Mock ZFN<br />
10 4<br />
1,000<br />
10 4<br />
71<br />
10 10 23<br />
3<br />
3<br />
10 2<br />
10 2<br />
13<br />
10 1<br />
10 1<br />
10 0<br />
10<br />
0 0<br />
10 0 10 1 10 2 10 3 10 4 10 0 10 1 10 2 10 3 10 4 10 0 10 1 10 2 10 3 10 4<br />
CD19<br />
CD45<br />
CD14<br />
1 10 2 10 3<br />
10 0 1 10 2 10 3 10 4 10 0 10 2 10 3 10 4<br />
Spleen<br />
Thymus<br />
CD45<br />
1,000<br />
10 4<br />
10 4<br />
21<br />
10 3<br />
10 3<br />
10 2<br />
10 2<br />
66 10 1<br />
33<br />
10 1<br />
0<br />
10 0 10 0 10 0 10 10 0 10 1 10 2 10 3 10 4 10 0 10 1 10 2 10 3 10 4<br />
10 4<br />
10 0 10 1 10 2 10 3 10 4 10 0 10 1 10 2 10 3 10 4 10 1<br />
CD45<br />
CD4<br />
CD4<br />
Small intestine<br />
Blood<br />
CD45 CD4<br />
CD45 Lymphoid<br />
1,000<br />
10 46<br />
10 4<br />
10 4<br />
0<br />
10 0 10 0 10 0<br />
42<br />
10 3<br />
10 3<br />
23<br />
10 3<br />
10 2<br />
10 2<br />
10 2<br />
6<br />
10 1<br />
10 1<br />
10 1<br />
38<br />
CD45 CD3<br />
10<br />
CD4 CD4<br />
b<br />
CD45<br />
CD8<br />
840 volume 28 number 8 august 2010 nature biotechnology
articles<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
typically resulting in 40% human CD45 + leukocytes<br />
in the peripheral blood at 8 weeks after<br />
transplantation. The animals showed no obvious<br />
toxicity or ill health, as reported for higher<br />
radiation doses 27 . ZFN-treated cells engrafted<br />
NSG mice as efficiently as untreated control<br />
cells (Fig. 1b), with no statistically significant<br />
difference between the two groups (Student’s<br />
t-test, P = 0.26).<br />
Eight to 12 weeks after transplantation, we<br />
analyzed engraftment of various mouse tissues with human CD45 +<br />
leukocytes and with cells from specific hematopoietic lineages (Fig.<br />
1c). Human cells were detected using human-specific antibodies, and<br />
specificity was confirmed using both unengrafted animals and isotypematched<br />
antibody controls (Supplementary Fig. 1). High levels of<br />
human cells were found in both the peripheral blood and tissues, ranging<br />
from 5–15% of the intestine, >50% of blood, spleen and bone marrow,<br />
and >90% of the thymus (Supplementary Table 1). CD4 + and CD8 +<br />
T cells were present in multiple organs, including the thymus, spleen,<br />
and both the intraepithelial and lamina propria regions of the small and<br />
large intestines; B-cell progenitors were present in the bone marrow; and<br />
CD14 + macrophage and/or monocytes were detected in the lung. Of<br />
particular interest was the large population of human CD4 + CCR5 + cells<br />
in the intestines, as these cells are targeted by both HIV-1 in humans 28–31<br />
and SIV in primates 32–34 . Overall, the profile of human cells in mice<br />
receiving ZFN-treated CD34 + HSPCs was indistinguishable from that<br />
of mice transplanted with unmodified cells, both with respect to the<br />
percentage of human cells in each tissue and the frequencies of different<br />
subsets (Supplementary Table 1), suggesting that ZFN-modified CD34 +<br />
HSPCs are functionally normal.<br />
ZFN-treated CD34 + HSPCs produce CCR5-disrupted progeny<br />
after secondary transplantation<br />
To evaluate whether ZFN treatment of the bulk CD34 + population<br />
modified true SCID-repopulating stem cells, we harvested<br />
bone marrow from an animal 18 weeks after engraftment<br />
with ZFN-treated CD34 + HSPCs, in which the extent of CCR5<br />
disruption in the bone marrow was 11% (Table 1). This marrow<br />
was transplanted into three 8-week-old recipients.<br />
At the same time, bone marrow from a control animal engrafted with<br />
Table 1 Secondary transplantation of ZFN-treated HSPCs<br />
Donor animals a CD45 b blood (%) Cel 1 c BM (%) Secondary<br />
recipients<br />
CD45 b blood (%) Cel 1 c blood (%)<br />
ZFN (1) 41 11 ZFN (3) 34 +/- 5 16 +/- 4<br />
Neg. (1) 47 0 Neg. (3) 37 +/- 7 0 +/- 0<br />
a Bone marrow (BM) was harvested from donor mice engrafted with ZFN-treated HSPCs (ZFN) or untreated HSPCs (Neg.) and<br />
transplanted into three secondary recipients for each BM. b Levels of human CD45 + cells were measured in blood of both donor<br />
and recipient mice at 8 weeks post-transplantation. c CCR5 disruption rates, measured by Cel 1 analysis of donor BM at time of<br />
harvest and in blood of recipient mice at 10 weeks post-transplantation.<br />
untreated CD34 + HSPCs was transplanted into three additional animals.<br />
Analysis of the peripheral blood of the secondary recipients 8<br />
weeks later revealed that all six animals had engrafted and that there<br />
was no significant difference in the percentage of human CD45 + leukocytes<br />
between the ZFN-treated and control groups. Furthermore,<br />
human cells in the blood of the ZFN cohort had levels of CCR5 disruption<br />
that slightly exceeded the level in the original donor marrow<br />
(12–20%) (Table 1). These data demonstrate that ZFN activity<br />
can lead to permanent disruption of CCR5 in SCID-repopulating<br />
stem cells and that such modified cells retain their engraftment and<br />
differentiation potential.<br />
Protection of CD4 + T cells in peripheral blood of NSG mice after<br />
HIV-1 infection<br />
Engrafted animals at 8–12 weeks after transplantation that had received<br />
either unmodified or ZFN-treated CD34 + HSPCs were challenged with<br />
the CCR5-tropic virus HIV-1 BAL . This strain of HIV-1 causes a robust<br />
infection and significant CD4 + T-cell depletion in humanized mouse<br />
models 35,36 , mimicking the human infection, in which depletion of<br />
CD4 + CCR5 + lymphocytes results from a combination of direct infection,<br />
systemic immune activation 36 and the upregulation of CCR5 on thymic<br />
precursors 37,38 . After infection, blood samples were collected from the<br />
mice every 2 weeks and analyzed for HIV-1 RNA levels, T-cell subsets and<br />
the extent of CCR5 disruption. At 8–12 weeks after infection, animals were<br />
euthanized and multiple tissues analyzed (Supplementary Fig. 2).<br />
Changes in the ratio of CD4 + to CD8 + T cells in the peripheral blood<br />
are characteristic of progressive infection in individuals with AIDS 39,40 .<br />
We therefore examined the CD4/CD8 ratio in blood samples from individual<br />
mice both before and after infection and found that the mean<br />
ratio before infection was similar for both the untreated and ZFN-treated<br />
a<br />
HIV-1 infected<br />
b<br />
Uninf. (3) Neg. (3) ZFN (9)<br />
CD45 Lymphoid<br />
10 4<br />
2.5<br />
P = 0.8892 P = 0.0001<br />
Neg.<br />
ZFN<br />
Blood<br />
CD8<br />
10 4<br />
10 4<br />
10 3 42 10 3 67 10 3 32<br />
10 2<br />
10 2<br />
10 2<br />
10 1<br />
10 1<br />
10 1<br />
38<br />
0 39<br />
10 0 10 0 10 0<br />
10 0 10 1 10 2 10 3 10 4 10 0 10 1 10 2 10 3 10 4 10 0 10 1 10 2 10 3 10 4<br />
CD4 + /CD8 + ratio<br />
2.0<br />
1.5<br />
1.0<br />
0.5<br />
0.0<br />
CD4<br />
Pre-infection<br />
Post-infection<br />
Figure 2 Protection of human CD4 + T cells in peripheral blood of HIV-infected mice previously engrafted with ZFN-modified CD34 + HSPCs. (a) FACS plots<br />
showing human CD4 + and CD8 + T cells in peripheral blood of representative animals from each of three cohorts: uninfected mice previously engrafted with<br />
either untreated or ZFN-treated CD34 + HSPCs (Uninf.), and HIV-1 infected animals previously engrafted with either untreated (Neg.) or ZFN-treated (ZFN)<br />
CD34 + HSPCs, at 4 weeks post-infection. The total number of animals analyzed in each cohort is indicated. Cells were gated on FSC/SSC to remove debris,<br />
on human CD45, and a lymphoid gate applied. Percentage of cells in indicated compartments is shown. (b) Ratio of human CD4 + to CD8 + lymphocytes in<br />
peripheral blood of individual mice into which untreated (Neg.) or ZFN-modified CD34 + HSPCs were transplanted, measured pre-infection and at 6–8 weeks<br />
post-infection. Statistical analysis comparing Neg. and ZFN cohorts at each time point is shown.<br />
nature biotechnology volume 28 number 8 august 2010 841
articles<br />
groups. After HIV-1 challenge, the ratios became highly skewed in the<br />
control group owing to the pronounced loss of CD4 + cells, whereas the<br />
ZFN-treated animals maintained normal ratios (Fig. 2a,b).<br />
Protection of human cells in mouse tissues after HIV-1 infection<br />
We next analyzed the human cells present in various mouse tissues 12<br />
weeks after infection with HIV-1 BAL . NSG mice into which unmodified<br />
cells were transplanted displayed a characteristic loss of certain<br />
human cell populations, whereas the ZFN-treated cohort retained<br />
normal human cell profiles throughout their tissues despite HIV-1<br />
challenge (Fig. 3a). In the intestines and spleen, which are the organs<br />
harboring the highest percentage of human CD4 + CCR5 + cells in<br />
this model (Supplementary Fig. 3), we observed specific depletion<br />
of CD4 + T cells from the spleen and the complete loss of all human<br />
lymphocytes from the intestines of untreated animals, whereas these<br />
populations were fully preserved in the ZFN-treated cohort (Fig. 3b).<br />
In the bone marrow, which is not a major target organ of HIV-1 infection,<br />
levels of human CD45 + cells were similar in all three groups.<br />
Notably, HIV-1 BAL infection resulted in the loss of virtually all<br />
human cells from the thymus of mice receiving untreated CD34 +<br />
HSPCs by 12 weeks after infection (Fig. 3a). Depletion of thymocytes<br />
has been proposed to occur as a consequence of the upregulation<br />
of CCR5 on these cells during HIV-1 infection 37,38 , and likely<br />
contributed both to the observed depletion in the thymus and to the<br />
reduction in the numbers of mature CD4 + and CD8 + T cells observed<br />
in other tissues.<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
a<br />
Bone marrow<br />
HIV-1 infected<br />
Spleen<br />
HIV-1 infected<br />
Uninf. (3) Neg. (3) ZFN (9) Uninf. (3) Neg. (3) ZFN (9)<br />
CD45<br />
1,000<br />
1,000<br />
1,000<br />
10 4<br />
1,000<br />
1,000<br />
10 3<br />
21<br />
45 11<br />
10 65<br />
47 41<br />
2<br />
10 33<br />
0<br />
34<br />
1<br />
0<br />
0<br />
0<br />
10 0<br />
0<br />
0<br />
10 0 10 1 10 2 10 3 10 4 10 0 10 1 10 2 10 3 10 4 10 0 10 1 10 2 10 3 10 4 10 0 10 1 10 2 10 3 10 4 10 0 10 1 10 2 10 3 10 4 10 0 10 1 10 2 10 3 10 4<br />
CD45<br />
CD4<br />
Thymus<br />
Small intestine<br />
CD45<br />
CD45<br />
10 4<br />
10 4<br />
10 4<br />
10 4<br />
10 4<br />
10 4<br />
10 83 0 94<br />
3<br />
10 3<br />
10 3<br />
10 3<br />
10 3<br />
10 3<br />
10 79 0 84<br />
2 10 2 10 2<br />
10 2 10 2 10 2<br />
10 1<br />
10 1<br />
10 1<br />
10 1<br />
10 1<br />
10 1<br />
10 0<br />
10 0<br />
10 0<br />
10 0<br />
10 0<br />
10 0<br />
10 0 10 1 10 2 10 3 10 4 10 0 10 1 10 2 10 3 10 4 10 0 10 1 10 2 10 3 10 4 10 0 10 1 10 2 10 3 10 4 10 0 10 1 10 2 10 3 10 4 10 0 10 1 10 2 10 3 10 4<br />
CD4<br />
CD3<br />
b<br />
SSC<br />
CD8<br />
CD8<br />
CD45<br />
HIV-1 infected<br />
No graft (2) Neg. (2) ZFN (2) Neg. (3) ZFN (9)<br />
Small intestine, anti-CD3<br />
Spleen, anti-CD4<br />
Figure 3 Effects of HIV-1 infection on human cells in HSPC-engrafted NSG mice. (a) FACS<br />
analysis of human cells in tissues of representative NSG mice from three cohorts: uninfected<br />
mice previously engrafted with either untreated or ZFN-treated CD34 + HSPCs (Uninf.), and<br />
HIV-1 infected animals previously engrafted with either untreated (Neg.) or ZFN-treated (ZFN)<br />
CD34 + HSPCs. Mice were necropsied at 12 weeks post-infection or at the equivalent time point<br />
for uninfected animals. The total number of animals analyzed in each cohort is indicated. FACS<br />
analysis was performed as described in Figure 1. Small intestine sample is lamina propria, and<br />
similar results were obtained when samples from the large intestine were analyzed. Percentage<br />
of cells in indicated compartments is shown. (b) Immunohistochemical analysis of human CD3<br />
expression in small intestine, and CD4 expression in spleen of representative NSG mice, into<br />
which untreated (Neg.) or ZFN-treated (ZFN) CD34 + HSPCs were transplanted, with and without<br />
HIV-1 infection. Animals were necropsied at 12 weeks after infection or at the same time point<br />
for uninfected animals. Control animals receiving no human CD34 + HSPCs (no graft) were also<br />
analyzed. The number of animals analyzed in each cohort is shown. Scale bars, 50 µM.<br />
HIV-1 infection rapidly selects for<br />
CCR5 – T cells<br />
We examined whether the survival of T cells in<br />
the mice receiving ZFN-treated CD34 + HSPCs<br />
was the result of selection for ZFN-modified<br />
progeny. We measured the percentage of disrupted<br />
CCR5 alleles in the blood of mice at<br />
sequential time points after HIV-1 challenge,<br />
using both the Cel 1 assay and a specific PCR<br />
amplification that detects a common 5-bp<br />
duplication at the ZFN target site that typically<br />
accounts for 10–30% of total modifications 19 .<br />
Both assays revealed a rapid increase in the frequency<br />
of ZFN-disrupted alleles, reaching the<br />
upper limit of the Cel 1 assay by 4 weeks after<br />
infection (Fig. 4a).<br />
We also examined levels of CCR5 disruption<br />
in multiple tissues from ZFN-treated animals,<br />
either uninfected or 12 weeks after HIV-1 BAL<br />
challenge, and observed a sharp increase<br />
in CCR5 disruption after HIV-1 infection<br />
(Fig. 4b). FACS analysis of the spleen and intestine<br />
revealed that, in contrast to uninfected animals,<br />
in which ~25% of CD4 + cells were also<br />
CCR5 + , very little or no CCR5 expression was<br />
detected in the CD4 + T cells that persisted in<br />
the ZFN-treated animals (Fig. 4c,d). Together,<br />
these data suggest that the protection of CD4 +<br />
lymphocytes in ZFN-treated mice was a consequence<br />
of selection for CCR5 – , HIV-1-resistant<br />
cells derived from ZFN-edited cells.<br />
Heterogeneity of CCR5 modifications<br />
suggests polyclonal origins<br />
ZFN-induced double-stranded breaks<br />
repaired by nonhomologous end-joining<br />
result in highly heterogeneous changes at<br />
the targeted locus 19 . We used this property<br />
to investigate whether the CCR5 – cells that<br />
developed in mice that received ZFN-treated<br />
CD34 + HSPCs were polyclonal in origin.<br />
Sequencing of 60 individual CCR5 alleles<br />
amplified from the large intestine of an HIV-<br />
1-infected mouse into which ZFN-treated<br />
CD34 + HSPCs were previously transplanted<br />
revealed that 59 alleles harbored mutations<br />
at the ZFN target site (Fig. 5). As previously<br />
842 volume 28 number 8 august 2010 nature biotechnology
articles<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
reported for this ZFN pair 19 , a high proportion (13 out of 59) of the<br />
mutated loci contained a characteristic 5-bp duplication, with the<br />
remaining 46 clones bearing 36 unique sequences. In contrast, all<br />
alleles sequenced from a mouse receiving untreated CD34 + HSPCs<br />
contained the wild-type sequence (data not shown). The high degree<br />
of sequence diversity observed strongly suggests that multiple stem<br />
or progenitor cells were modified by the ZFNs. These findings also<br />
predict that the overwhelming majority of cells selected by HIV-1 BAL<br />
infection would be CCR5 −/− , which is in agreement with the data<br />
from flow cytometry analysis (Fig. 4c).<br />
Presence of ZFN-modified cells controls HIV-1 replication in vivo<br />
Quantitative PCR analysis of HIV-1 RNA levels in the peripheral<br />
blood of animals revealed that peak viremia occurred at 6 weeks after<br />
infection for animals that received transplants of either untreated or<br />
ZFN-treated CD34 + HSPCs (Fig. 6a), although the levels were significantly<br />
lower (P = 0.03) in the ZFN cohort. By 8 weeks after infection,<br />
viral loads in both cohorts were dropping but there continued<br />
to be a statistically significant difference between the two groups (P<br />
= 0.001). Measurements of p24 levels in the blood by enzyme-linked<br />
immunosorbent assay (ELISA) corroborated these findings, with a<br />
Figure 4 HIV-1 infection selects for disrupted<br />
CCR5 alleles. (a) Mean ± s.d. levels of CCR5<br />
disruption (Cel 1 assay, black bars) in sequential<br />
peripheral blood samples taken from mice<br />
into which ZFN-treated CD34 + HSPCs were<br />
transplanted and which were subsequently<br />
infected with HIV-1. Upper limit of linearity of<br />
Cel 1 assay is 44% (ref. 19) and is indicated by<br />
the dotted line; upper limit of sensitivity of assay<br />
is 70–80%. White bars show the frequency of<br />
a common 5-bp duplication at the ZFN target<br />
site that typically comprises 10–30% of total<br />
CCR5 mutations 19 . Numbers of mice analyzed<br />
at each time point, and in each assay, are shown<br />
above the appropriate bar. (b) Mean ± s.d. levels<br />
of CCR5 disruption (Cel 1 assay) in indicated<br />
tissues from mice into which ZFN-treated CD34 +<br />
HSPCs were transplanted; mice were necropsied<br />
at 12 weeks after infection (black bars) or at<br />
an equivalent time point for uninfected ZFNtreated<br />
animals (white bars). Numbers analyzed<br />
in each group are shown above the appropriate<br />
bar. One representative Cel 1 analysis from the<br />
large intestine (lamina propria) of uninfected<br />
and infected mice is shown. Animals receiving<br />
untreated cells gave no Cel 1 digestion products<br />
at any time point analyzed (data not shown).<br />
Asterisk indicates levels too low to quantify.<br />
(c) Contour FACS analyses of human CD4 +<br />
cells in the small intestine (lamina propria) and<br />
spleen of one representative animal from each<br />
indicated cohort are shown. Cells were gated<br />
on FSC/SSC to remove debris and gated on<br />
human CD45 and CD4. Numbers indicate the<br />
percentage of cells that are CCR5 + . (d) Mean ±<br />
s.d. numbers of human CD4 + cells (gray bars)<br />
and CD4 + CCR5 + cells (white bars) per 5,000<br />
human CD45 + cells analyzed from different<br />
sections of the intestine and from the indicated<br />
cohorts. Asterisk indicates levels too low to<br />
quantify. Number of animals analyzed in each<br />
cohort is indicated. Abbr. S, small intestine; L,<br />
large intestine; E, intraepithelial lymphocytes; P,<br />
lamina propria lymphocytes; BM, bone marrow.<br />
a<br />
b<br />
c<br />
d<br />
CCR5 disruption in<br />
peripheral blood (%)<br />
CCR5 disruption (%)<br />
Small intestine<br />
Number cells per 5,000<br />
human CD45 + cells<br />
CCR5<br />
100<br />
80<br />
60<br />
40<br />
20<br />
0<br />
80<br />
60<br />
40<br />
20<br />
0<br />
3,000<br />
2,000<br />
1,000<br />
significant difference (P = 0.02) in antigenemia between the two<br />
groups observed by the 6-week time point (data not shown).<br />
These differences between the two cohorts are more striking when<br />
the levels of human CD4 + T cells are also considered (Fig. 6a), as the<br />
loss of CD4 + T cells in the untreated mice probably contributed to the<br />
lowering of overall viral levels seen as the infection progressed. The<br />
continued presence of virus in the blood, despite acute loss of CD4 +<br />
cells, also occurs during progression to AIDS, where high viral load<br />
measurements in serum are typically observed when T-cell death is<br />
rapidly occurring 41 . In contrast, CD4 + T-cell levels in the ZFN-treated<br />
mice rebounded after the 2-week nadir and recovered to normal levels<br />
by 4 weeks after infection. In contrast to these findings with HIV-<br />
1 BAL , ZFN-treated mice challenged with a CXCR4-tropic HIV-1 strain<br />
did not control viral levels or preserve CD4 + T cells, confirming that<br />
the mechanism is CCR5 specific (Supplementary Fig. 4).<br />
We also measured HIV-1 levels in intestinal samples. In tissues<br />
harvested at 8 and 9 weeks after infection, viral levels in the ZFNtreated<br />
mice were 4 orders of magnitude lower than in the untreated<br />
controls. By the 10- and 12-week time points, HIV-1 RNA was undetectable<br />
in the ZFN-treated mice (Fig. 6b). This drop in viral load<br />
occurred despite the maintenance of normal numbers of human<br />
5 3 2 4 Total disuptions<br />
5 bp duplication<br />
2<br />
5<br />
3<br />
4<br />
2 5<br />
1 1<br />
0 2 4 6 8 10<br />
Weeks post-infection<br />
2<br />
Thymus<br />
CD4<br />
SE<br />
3 3<br />
2<br />
Lung<br />
2<br />
Spleen<br />
SP<br />
LE<br />
LP<br />
2<br />
2<br />
SE<br />
3 3 3 3<br />
2<br />
SP<br />
SE<br />
SP<br />
LE<br />
LP<br />
2<br />
LE<br />
HIV-1<br />
2<br />
LP<br />
HIV-1 infected<br />
Uninf. (3) Neg. (3) ZFN (9)<br />
SE<br />
SP<br />
LE<br />
2<br />
BM<br />
LP<br />
3<br />
Uninf.<br />
HIV-1<br />
Spleen<br />
CCR5<br />
CD4<br />
Large intestine<br />
Uninf. HIV-1<br />
8 56<br />
Cel 1 products (%)<br />
HIV-1<br />
Uninf. (3) Neg. (3) ZFN (9)<br />
Uninf. (3) Neg. (3) ZFN (9)<br />
CD45 CD4 CD45 CD4<br />
10 4<br />
10 4<br />
10<br />
10<br />
10 4<br />
10 4<br />
10<br />
10 4<br />
10<br />
10 4<br />
10 3<br />
10 3<br />
10 3<br />
10 3<br />
10 3<br />
10 3<br />
33 0 0 30 0 0<br />
10 2<br />
10 2<br />
10 2<br />
10 2<br />
10 2<br />
10 2<br />
10 1<br />
10 10 10 1<br />
10 1<br />
1<br />
1<br />
10 1<br />
10 0 10 0 10 1 10 2 10 3 10 4 0 10 0 10 1 10 2 10 3 10 4 0 10 0 10 1 10 2 10 3 10 4 0 10 0 10 1 10 2 10 3 10 4 0 10 0 10 1 10 2 10 3 10 4 10 0<br />
10 0 10 1 10 2 10 3 10 4<br />
0<br />
CD4 + CCR5 +<br />
CD4 +<br />
nature biotechnology volume 28 number 8 august 2010 843
articles<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
Wild-type (1)<br />
gttttgtgggcaacatgctggtcatcctcatcctgataaactgcaaaaggctgaagagcatgactgaca wt<br />
Deletions (43)<br />
gttttgtgggcaacatgctggtcatcctcat-ctgataaactgcaaaaggctgaagagcatgactgaca -1<br />
gttttgtgggcaacatgctggtcatcctcatcctgat--actgcaaaaggctgaagagcatgactgaca -2<br />
gttttgtgggcaacatgctggtcatcctcatcctg--aaactgcaaaaggctgaagagcatgactgaca -2 2X<br />
gttttgtgggcaacatgctggtcatcc---tcctgataaactgcaaaaggctgaagagcatgactgaca -3<br />
gttttgtgggcaacatgctggtcatcctcatc----taaactgcaaaaggctgaagagcatgactgaca -4<br />
gttttgtgggcaacatgctggtcatcctcatc-----aaactgcaaaaggctgaagagcatgactgaca -5 3X<br />
gttttgtgggcaacatgctggAcatcctcatcctgat------caaaaggctgaagagcatgactgaca -6<br />
gttttgtgggcaacatgctggtcatcctcatc------aaTtgcaaaaggctgaagagcatgactgaca -6<br />
gttttgtgggcaacatgctggtcatcctcatcctgat-------aaaaggctgaagagcatgactgaca -7<br />
gttttgtgggcaacatgctggtcat-------ctgataaactgcaaaaggctgaagagcatgactgaca -7<br />
gttttgtgggcaacatgctggtcatcctcatc--------ctgcaaaaggctgaagagcatgactgaca -8<br />
gttttgtgggcaacatgctggtcatcctcatcctgat--------aaaggctgaagagcatgactgaca -8<br />
gttttgtgggcaacatgctggtcatcctc--------aaactgcaaaaggctgaagagcatgactgaca -8<br />
gttttgtgggcaacatgctggtcatcc--------ataaactgcaaaaggctAaagagcatgactgaca -8<br />
gttttgtgggcaacatgctggtcatcctcat---------ctgcaaaaggctgaagagcatgactgaca -9<br />
gttttgtgggcaacatgctggtcatcctcatcctgat----------aggctgaagagcatgactgaca -10<br />
gttttgtgggcaacatgctggt----------ctgataaactgcaaaaggctgaagagcatgactgaca -10<br />
gttttgtgggcaacatgctggtcatcctcatc-----------caaaaggctgaagagcatgactgaca -11<br />
gttttgtgggcaacatgctggtcatcctca-----------tgcaaaaggctgaagagcatgactgaca -11 2X<br />
gttttgtgggcaacatgctggtcatcctcatc------------aaaaggctgaaAagGatgactgaca -12<br />
gttttgtgggcaacatgctg------------ctgGtaaactgcaaaaggctgaagagcatgactgaca -12<br />
gttttgtgggcaacatgctggtcatcct--------------gcaaaaggctgaagagcatgactgaca -14 5X<br />
gttttgtgggcaacatgctggtcat---------------ctgcaaaaggctgaagagcatgactgaca -15<br />
gttttgtgggcaacatgctggtcatcct---------------caaaaggctgaagagcatgactgaca -15 2X<br />
gttttgtgggcaacatgctggtcatcctcatcctgataa----------------gagcatgactgaca -16<br />
gttttgtgggcaacatgctggtcatcctcatcctgat-----------------Cgagcatgactgaca -17<br />
gttttgtgggcaacatgctggtcatcctcatcctga-------------------gagcatgactgaca -19<br />
gttttgtgggcaacatgctggtcatcctcatc-------------------tgaagagcatgactgaca -19<br />
gttttgtgggcaacatgctggtcatcctcatcctgat--------------------gcatgactgaca -20<br />
gttttgtgggcaacatgctggtcatcctcatc----------------------agagcatgactgaca -22<br />
gttttgtgggcaacatgc--------------------------aaaaggctgaagagcatgactgaca -26<br />
gttttgtgggcaa------------------------------caaaaggctgaagagcatgactgaca -30<br />
gttttgtgggcaacatgctggtcatcctcatcctg--------------------------------ca -32<br />
gttttgtgggcaacatgctggt---------------------------------------------ca -45<br />
Insertions (16)<br />
gttttgtgggcaacatgctggtcatcctcatcctCTgataaactgcaaaaggctgaagagcatgactga +2<br />
gttttgtgggcaacatgctggtcatcctcatcctgataTAaactgcaaaaggctgaagagcatgactga +2<br />
gttttgtgggcaacatgctggtcatcctcatcctgatCTGATaaactgcaaaaggctgaagagcatgac +5 13X<br />
T lymphocytes in the intestines and other tissues (Fig. 3). These<br />
observations are consistent with a strong selective pressure for HIVresistant<br />
CCR5 −/− cells to replace CCR5-expressing cells, leading to<br />
control of viral replication.<br />
DISCUSSION<br />
Despite major advances in anti-retroviral therapy, HIV-1 infection<br />
remains an epidemic cause of morbidity and mortality. Effective antiretroviral<br />
therapy often involves costly, multi-drug regimens that are<br />
not well tolerated by a significant percentage of patients 42 , and even<br />
successful adherence to the therapy does not eradicate the virus, and a<br />
rapid rebound in HIV-1 levels can occur if therapy is discontinued 43 .<br />
An alternative approach to controlling HIV-1 replication is engineering<br />
of the body’s immune cells to be resistant to infection 44 . In this regard,<br />
the CCR5 co-receptor is an attractive target because of the HIV-resistant<br />
phenotype of homozygous CCR5Δ32 individuals 3 . In the present study,<br />
we identified conditions that allow efficient disruption of CCR5 in<br />
human CD34 + HSPCs and demonstrated that such modified cells<br />
generate CCR5 −/− , HIV-resistant progeny in a mouse model of human<br />
hematopoiesis and HIV-1 infection, leading to control of HIV-1 replication.<br />
These findings suggest that transplantation of autologous HSPCs<br />
modified by CCR5-specific ZFNs may provide a permanent supply of<br />
HIV-resistant progeny that could replace cells killed by HIV-1, reconstitute<br />
the immune system and control viral replication long term in the<br />
absence of anti-retroviral therapy.<br />
The high levels of CCR5 disruption that we achieved were possible<br />
because of an efficient gene editing technology based on ZFNs.<br />
ZFNs can be designed to bind to a specific genomic DNA sequence<br />
Figure 5 ZFN activity produces heterogeneous<br />
mutations in CCR5. Sequence analysis was<br />
performed on 60 cloned human CCR5 alleles,<br />
PCR amplified from intraepithelial cells from<br />
the large intestine of an HIV-infected mouse into<br />
which ZFN-treated CD34 + HSPCs were previously<br />
transplanted, and at 12 weeks post-infection.<br />
The number of nucleotides deleted or inserted<br />
at the ZFN target site (underlined) in each clone<br />
is indicated on the right of each sequence,<br />
together with the number of times the sequence<br />
was found. Dashes (–) indicate deleted bases<br />
compared to the wild-type sequence; uppercase<br />
letters are point mutations; underlined upper<br />
case letters are inserted bases. Some specific<br />
mutations of CCR5 occurred more frequently,<br />
in particular a 5-bp duplication at the ZFN<br />
target site that was identified 13 times (bottom<br />
sequence). No mutations in CCR5 were observed<br />
in a similar analysis performed on control samples<br />
from a mouse receiving unmodified CD34 +<br />
HSPCs (data not shown).<br />
and effect permanent knockout of the targeted<br />
gene 19,45–47 . Only transient expression<br />
of the ZFNs is required during a brief period<br />
of ex vivo culture, and the genetic mutation<br />
is present for the life of the cell and its progeny.<br />
Thus, a major shortcoming of other gene<br />
therapy technologies—the need for continued<br />
expression of a foreign transgene—is avoided.<br />
Moreover, unlike approaches based on small<br />
molecules, antibodies or RNA interference 44 ,<br />
ZFN-mediated gene disruption can completely<br />
eliminate CCR5 from the surface of<br />
cells through bi-allelic modification. By using an optimized nucleofection<br />
procedure, we were able to overcome the technical challenges to<br />
ZFN-induced genome editing in CD34 + cells previously reported 21 and<br />
achieve, on average, disruption at 17% of the loci, which we estimate will<br />
produce 5–7% bi-allelically modified cells.<br />
The safety and efficacy of T lymphocytes modified with CCR5-<br />
targeted ZFNs are currently being evaluated in a phase 1 clinical trial.<br />
In a preclinical study, investigation of the specificity of the same CCR5-<br />
targeted ZFNs as used in this study revealed off-target cleavage events in<br />
T cells at significant levels only at the homologous CCR2 locus 19 . Studies<br />
in mice have not detected any deleterious phenotype associated with<br />
loss of CCR2 (ref. 48), and human genetic studies have even suggested<br />
a beneficial phenotype from the loss of this gene in HIV-infected individuals<br />
49 . Although not analyzed here, modification of CD34 + HSPCs<br />
with these same CCR5 ZFN reagents is likely to result in similar, low<br />
levels of off-target cleavage events. Any safety concerns associated with<br />
nonspecific cleavage must be evaluated in larger, future studies.<br />
Although T lymphocytes are the primary target of HIV-1 infection,<br />
ZFN modification of HSPCs may allow longer-term production of<br />
CCR5 −/− cells in patients. The scientific rationale for CCR5 modification<br />
of HSPCs is supported by the recent finding that an HIV + leukemia patient<br />
receiving a transplant from a CCR5 −/− donor was effectively cured of his<br />
infection, despite discontinuing antiretroviral therapy 9 . As shown by our<br />
data, ZFN-modified HSPCs retained full functionality and gave rise to<br />
CCR5 – cells in lineages relevant to HIV-1 pathogenesis. ZFNs delivered to<br />
purified CD34 + cell populations by nucleofection were capable of modifying<br />
true SCID-repopulating stem cells, and the high levels of CCR5 editing<br />
were maintained after secondary transplantation.<br />
844 volume 28 number 8 august 2010 nature biotechnology
articles<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
The experimental mouse model of HIV-1 infection used in these<br />
studies revealed a strong selection for CCR5 – progeny during acute<br />
infection with a CCR5-tropic strain of HIV-1. This suggests that<br />
CCR5 −/− stem cells, even if the minority, produced sufficient numbers<br />
of CCR5 −/− progeny to support immune reconstitution and inhibit<br />
HIV-1 replication. Such selection is consistent with clinical observations<br />
from genetic diseases such as adenosine deaminase deficiency<br />
(ADA)-SCID, X-linked SCID and Wiskott-Aldrich syndrome, in which<br />
normal hematopoietic cells have a selective advantage, so that spontaneous<br />
monoclonal reversions can lead to selective outgrowth of such cells<br />
and amelioration of symptoms 50–53 .<br />
The observation of almost complete replacement of human T cells in<br />
the intestines of the infected mice with CCR5 – cells is consistent with<br />
this tissue harboring the majority of the body’s CD4 + CCR5 + effector<br />
memory cells. A characteristic feature of HIV-1 replication in mucosal<br />
tissues is an ongoing cycle of T-cell death and the recruitment of replacement<br />
T cells, which, in an activated state, are highly permissive for HIV-1<br />
infection 37 . This is especially true in the gut mucosa, a key battleground<br />
in HIV-1 infection 54–56 . We also observed a strong selection for CCR5 –<br />
cells in the thymus, suggesting that CCR5 – cells would be selected at both<br />
a precursor stage in the thymus and at an effector stage in the mucosa.<br />
Ultimately, the presence of HIV-resistant CCR5 – cells in mucosal tissues<br />
should both protect individual cells from infection and help to break<br />
the cycle of immune hyperactivation that may underlie much of the<br />
pathology of AIDS 57 .<br />
a<br />
HIV-1 RNA copies/ml blood<br />
b<br />
10 7 80<br />
10 1 Neg. (3)<br />
ZFN (9)<br />
10 0 8 9 10 12 8 9 10 12 Weeks post-infection<br />
10 6<br />
60<br />
10 5<br />
10 4<br />
40<br />
10 3<br />
20<br />
10 2<br />
0<br />
2 4 6 8<br />
0 2 4 6 8<br />
Weeks post-infection<br />
Weeks post-infection<br />
10 8<br />
2 2 2<br />
2<br />
Neg.<br />
ZFN<br />
10 6<br />
3<br />
2 3<br />
10 4<br />
2<br />
2<br />
2<br />
2 2<br />
10 2<br />
2 9 2 9<br />
HIV-1 RNA copies/10 6 cells<br />
Small intestine<br />
CD4+ in blood (%)<br />
Large intestine<br />
Figure 6 Control of HIV-1 replication in mice receiving ZFN-treated CD34 +<br />
HSPCs . (a) Mean +/− s.d. levels of HIV-1 RNA (left) and percent CD4 +<br />
human T cells (right) in peripheral blood of mice into which untreated (Neg.)<br />
or ZFN-treated CD34 + HSPCs were transplanted, at indicated times postinfection.<br />
Dashed line is limit of detection of assay. Asterisk indicates a<br />
statistically significant difference between two groups (P < 0.05). (b) Mean<br />
± s.d. HIV-1 RNA levels in small and large intestine lamina propria from<br />
Neg. or ZFN mice, from animals necropsied between 8 and 12 weeks postinfection.<br />
Numbers of mice analyzed at each time point are shown above the<br />
appropriate bar. Dashed line indicates limits of detection of assay. Asterisk<br />
indicates undetectable levels.<br />
Although antiretroviral therapy is highly effective in many patients, the<br />
associated costs and potential for side effects can be considerable when<br />
extrapolated over a lifetime. In contrast, our approach may provide a<br />
one-shot treatment that would be most suited to the setting of autologous<br />
HSPC transplantation. Procedures for isolating and processing HSPCs<br />
for autologous or allogeneic transplantation are well established. The use<br />
of a patient’s own stem cells may remove the requirement for full ablation<br />
of the marrow hematopoietic compartment and the immune suppression<br />
that is necessary in allogeneic transplantation. Indeed, the toxicity of such<br />
regimens is one reason that allogeneic stem cell transplantation from<br />
CCR5Δ32 donors is not a realistic treatment option for HIV + patients in<br />
the absence of other conditions that necessitate the transplant.<br />
Of note, certain HIV-infected individuals, such as AIDS lymphoma<br />
patients, already undergo full ablation and autologous HSPC rescue<br />
as part of their therapy 58 and may be suitable candidates for HSPCbased<br />
gene therapies 44 . In addition, the experience of autologous HSPC<br />
transplantation in gene therapy treatments for ADA-SCID 59,60 , chronic<br />
granulomatous disease 61 and X-linked adrenoleukodystrophy 62 is that<br />
nonmyeloablative conditioning can facilitate engraftment of gene-modified<br />
autologous HSPCs with minimal associated toxicity. It is possible<br />
that the use of nonmyeloablative regimens, together with the selective<br />
advantage conferred on CCR5 −/− progeny, could prove an effective combination<br />
for HIV + patients receiving ZFN-treated autologous HSPCs.<br />
Targeting CCR5 is not expected to provide protection against viruses<br />
that use alternate co-receptors such as CXCR4. Although only a handful<br />
of cases of HIV-1 infection of CCR5Δ32 homozygotes have been<br />
reported 63,64 , CXCR4-tropic viruses have been associated with accelerated<br />
disease progression 65 , so that selection for such strains could be an<br />
undesirable consequence of targeting CCR5. However, this outcome is<br />
not generally observed in patients treated with CCR5 inhibitors unless<br />
CXCR4-tropic viruses were present before therapy, and resistance to<br />
these drugs occurs by viral adaptation to the drug-bound form of CCR5<br />
(refs. 66,67). Notably, although the patient who received the CCR5Δ32<br />
transplant harbored CXCR4-tropic virus before the procedure, his HIV-1<br />
infection was still controlled long term 9,10 . Similar to the recommendations<br />
for CCR5 inhibitors, it may be prudent to restrict CCR5 ZFN treatment<br />
of HSPCs to individuals with no detectable CXCR4-tropic virus.<br />
In contrast to the acute HIV-1 infection modeled in this study, HIV-1<br />
patients usually present in a chronic phase of the disease, and their viral<br />
levels can be effectively controlled by antiretroviral therapy. The requirement<br />
for the selective pressure of active HIV-1 replication in the success<br />
of this, or other, anti-HIV gene therapies is at present unknown. It has<br />
been suggested that low-level viral replication continues in certain sanctuary<br />
sites, even in well-controlled patients on antiretroviral therapy 43,68 ,<br />
which could provide a low level of selection, although drug intensification<br />
trials have not provided evidence of ongoing replication 69 . It is also<br />
possible that the high levels of CCR5 disruption we achieved without<br />
selection, if extrapolated to HIV + patients, could be sufficient to provide<br />
a therapeutic effect even in the absence of a strong selective pressure.<br />
Alternatively, ZFN knockout of CCR5 in HSPCs could be viewed<br />
as a backup strategy in the event that antiretroviral therapy fails or is<br />
withdrawn. It may also be possible to incorporate antiretroviral therapy<br />
interruptions into an overall therapeutic strategy, as recently described<br />
for HIV-infected individuals receiving autologous HSPCs engineered<br />
with anti-HIV ribozymes, where gene-marked progeny were found at<br />
higher levels after treatment interruptions 70 .<br />
In summary, our data demonstrate that transient ZFN treatment of<br />
human CD34 + HSPCs can efficiently disrupt CCR5 while yielding cells<br />
that remain competent to engraft and support hematopoiesis. In the<br />
presence of CCR5-tropic HIV-1, CCR5 −/− progeny rapidly replaced cells<br />
depleted by the virus, leading to a polyclonal population that ultimately<br />
nature biotechnology volume 28 number 8 august 2010 845
articles<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
preserved human immune cells in multiple tissues. Our findings indicate<br />
that the modification of only a minority of human CD34 + HSPCs may<br />
provide the same strong anti-viral benefit as was conferred by a complete<br />
CCR5Δ32 stem cell transplantation in a patient 9 . And they further<br />
suggest that a partially modified autologous transplant, administered<br />
under only mildly ablative transplantation regimens may also be effective,<br />
opening up the treatment to many more HIV-infected individuals.<br />
Finally, the identification of conditions that allow the efficient use of<br />
ZFNs in human CD34 + HSPCs suggests the use of this technology in<br />
other diseases for which HSPC modification may be curative.<br />
METHODS<br />
Methods and any associated references are available in the online version<br />
of the paper at http://www.nature.com/naturebiotechnology/.<br />
Note: Supplementary information is available on the <strong>Nature</strong> Biotechnology website.<br />
ACKNOWLEDGMENTS<br />
We would like to thank A. Cuddihy, S. Ge, R. Hollis and N. Smiley for expert<br />
technical assistance; C. Lutzko, V. Garcia, R. Akkina, B. Torbett and M. McCune for<br />
advice regarding humanized mice; and M. McCune for communicating unpublished<br />
data. This work was supported by funding from the California HIV/AIDS Research<br />
Project (P.M.C.), The Saban Research Institute (V.T.), and the National Heart, Lung,<br />
and Blood Institute P01 HL73104 (G.M.C., D.B.K. and P.M.C.).<br />
AUTHOR CONTRIBUTIONS<br />
N.H. performed most of the experiments; J.W., K.K., G.F. and X.W. developed assays<br />
and analyzed samples; V.T. contributed to discussions; N.H., G.M.C., D.B.K., P.D.G.,<br />
M.C.H. and P.M.C. designed the experiments and analyzed data; N.H. and P.M.C.<br />
wrote the manuscript.<br />
COMPETING FINANCIAL INTERESTS<br />
The authors declare competing financial interests: details accompany the full-text<br />
HTML version of the paper at http://www.nature.com/naturebiotechnology/.<br />
Published online at http://www.nature.com/naturebiotechnology/.<br />
Reprints and permissions information is available online at<br />
http://npg.nature.com/reprintsandpermissions/.<br />
1. Wu, L. et al. CD4-induced interaction of primary HIV-1 gp120 glycoproteins with the<br />
chemokine receptor CCR-5. <strong>Nature</strong> 384, 179–183 (1996).<br />
2. deRoda Husman, A.M., Blaak, H., Brouwer, M. & Schuitemaker, H. CC chemokine<br />
receptor 5 cell-surface expression in relation to CC chemokine receptor 5 genotype and<br />
the clinical course of HIV-1 infection. J. Immunol. 163, 84597–84603 (1999).<br />
3. Samson, M. et al. Resistance to HIV-1 infection in Caucasian individuals bearing mutant<br />
alleles of the CCR-5 chemokine receptor gene. <strong>Nature</strong> 382, 722–725 (1996).<br />
4. Novembre, J. et al. The geographic spread of the CCR5 Delta32 HIV-resistance allele.<br />
PLoS Biol. 3, e339 (2005).<br />
5. Glass, W.G. et al. CCR5 deficiency increases risk of symptomatic West Nile virus<br />
infection. J. Exp. Med. 203, 35–40 (2006).<br />
6. Kantarci, O.H. et al. CCR5∆32 polymorphism effects on CCR5 expression, patterns<br />
of immunopathology and disease course in multiple sclerosis. J. Neuroimmunol. 169,<br />
137–143 (2005).<br />
7. Rossol, M. et al. Negative association of the chemokine receptor CCR5 d32 polymorphism<br />
with systemic inflammatory response, extra-articular symptoms and joint<br />
erosion in rheumatoid arthritis. Arthritis Res. Ther. 11, R91–98 (2009).<br />
8. Dau, B. & Holodiny, M. Novel targets for antiretroviral therapy: clinical progress to<br />
date. Drugs 69, 31–50 (2009).<br />
9. Hutter, G. et al. Long-term control of HIV by CCR5 Delta32/Delta32 stem-cell transplantation.<br />
N. Engl. J. Med. 360, 692–698 (2009).<br />
10. Hutter, G., Schneider, T. & Thiel, E. Transplantation of selected or transgenic blood<br />
stem cells—a future treatment for HIV/AIDS? J. Int. AIDS Soc. 12, 10–14 (2009).<br />
11. Anderson, J. et al. Safety and efficacy of a lentiviral vector containing three anti-HIV<br />
genes–CCR5 ribozyme, tat-rev siRNA, and TAR decoy–in SCID-hu mouse-derived T<br />
cells. Mol. Ther. 15, 1182–1188 (2007).<br />
12. Bai, J. et al. Characterization of anti-CCR5 ribozyme-transduced CD34+ hematopoietic<br />
progenitor cells in vitro and in a SCID-hu mouse model in vivo. Mol. Ther. 1, 244–254<br />
(2000).<br />
13. Kumar, P. et al. T cell-specific siRNA delivery suppresses HIV-1 infection in humanized<br />
mice. Cell 134, 577–586 (2008).<br />
14. Swan, C.H. et al. T-cell protection and enrichment through lentiviral CCR5 intrabody<br />
gene delivery. Gene Ther. 13, 1480–1492 (2006).<br />
15. Swan, C.H. & Torbett, B.E. Can gene delivery close the door to HIV-1 entry after<br />
escape? J. Med. Primatol. 35, 236–247 (2006).<br />
16. Urnov, F.D. et al. Highly efficient endogenous human gene correction using designed<br />
zinc-finger nucleases. <strong>Nature</strong> 435, 646–651 (2005).<br />
17. Jasin, M. et al. Genetic manipulation of genomes with rare-cutting endonucleases.<br />
Trends Genet. 12, 224–228 (1996).<br />
18. Sonoda, E. et al. Differential usage of non-homologous end-joining and homologous<br />
recombination in double strand break repair. DNA Repair (Amst.) 5, 1021–1029<br />
(2006).<br />
19. Perez, E.E. et al. Establishment of HIV-1 resistance in CD4+ T cells by genome editing<br />
using zinc-finger nucleases. Nat. Biotechnol. 26, 808–816 (2008).<br />
20. Ishikawa, F. et al. Development of functional human blood and immune systems in NOD/<br />
SCID/IL2 receptor {gamma} chain(null) mice. Blood 106, 1565–1573 (2005).<br />
21. Lombardo, A. et al. Gene editing in human stem cells using zinc finger nucleases<br />
and integrase-defective lentiviral vector delivery. Nat. Biotechnol. 25, 1298–1306<br />
(2007).<br />
22. Hollis, R.P. et al. Stable gene transfer to human CD34(+) hematopoietic cells using<br />
the Sleeping Beauty transposon. Exp. Hematol. 34, 1333–1343 (2006).<br />
23. Sumiyoshi, T. et al. Stable transgene expression in primitive human CD34+ hematopoietic<br />
stem/progenitor cells, using the Sleeping Beauty transposon system. Hum. Gene<br />
Ther. 20, 1607–1626 (2009).<br />
24. Mátés, L. et al. Molecular evolution of a novel hyperactive Sleeping Beauty transposase<br />
enables robust stable gene transfer in vertebrates. Nat. Genet. 41, 753–761<br />
(2009).<br />
25. Xue, X. et al. Stable gene transfer and expression in cord blood-derived CD34+<br />
hematopoietic stem and progenitor cells by a hyperactive Sleeping Beauty transposon<br />
system. Blood 114, 1319–1330 (2009).<br />
26. Basu, S. & Broxmeyer, H.E. CCR5 ligands modulate CXCL12-induced chemotaxis,<br />
adhesion, and Akt phosphorylation of human cord blood CD34+ cells. J. Immunol.<br />
183, 7478–7488 (2009).<br />
27. Watanabe, S. et al. Hematopoietic stem cell-engrafted NOD/SCID/IL2Rgamma null<br />
mice develop human lymphoid systems and induce long-lasting HIV-1 infection with<br />
specific humoral immune responses. Blood 109, 212–218 (2007).<br />
28. Brenchley, J.M. et al. CD4 + T cell depletion during all stages of HIV disease occurs<br />
predominantly in the gastrointestinal tract. J. Exp. Med. 200, 749–759 (2004).<br />
29. Brenchley, J.M. et al. HIV disease: fallout from a mucosal catastrophe? Nat. Immunol.<br />
7, 235–239 (2006).<br />
30. Guadalupe, M. et al. Severe CD4+ T-cell depletion in gut lymphoid tissue during<br />
primary human immunodeficiency virus type 1 infection and substantial delay in<br />
restoration following highly active antiretroviral therapy. J. Virol. 77, 11708–11717<br />
(2003).<br />
31. Talal, A.H. et al. Effect of HIV-1 infection on lymphocyte proliferation in gut-associated<br />
lymphoid tissue. J. Acquir. Immune Defic. Syndr. 26, 208–217 (2001).<br />
32. Li, Q. et al. Peak SIV replication in resting memory CD4 + T cells depletes gut lamina<br />
propria CD4 + T cells. <strong>Nature</strong> 434, 1148–1152 (2005).<br />
33. Mattapallil, J.J. et al. Massive infection and loss of memory CD4 + T cells in multiple<br />
tissues during acute SIV infection. <strong>Nature</strong> 434, 1093–1097 (2005).<br />
34. Veazey, R.S. et al. Gastrointestinal tract as a major site of CD4 + T cell depletion and<br />
viral replication in SIV infection. Science 280, 427–431 (1998).<br />
35. Berges, B.K. et al. HIV-1 infection and CD4 T cell depletion in the humanized<br />
Rag2−/−gamma c−/− (RAG-hu) mouse model. Retrovirology 3, 76–90 (2006).<br />
36. Appay, V. & Sauce, D. Immune activation and inflammation in HIV-1 infection: causes<br />
and consequences. J. Pathol. 214, 231–241 (2008).<br />
37. Stoddart, C.A. et al. IFN-alpha-induced upregulation of CCR5 leads to expanded HIV<br />
tropism in vivo. PLoS Pathog. 6, e1000766 (2010).<br />
38. Choudhary, S.K. et al. R5 human immunodeficiency virus type 1 infection of fetal<br />
thymic organ culture induces cytokine and CCR5 expression. J. Virol. 79, 458–471<br />
(2005).<br />
39. Kahn, J.O. & Walker, B.D. Acute human immunodeficiency virus type 1 infection. N.<br />
Engl. J. Med. 339, 33–39 (1998).<br />
40. Margolick, J.B. et al. Impact of inversion of the CD4/CD8 ratio on the natural history<br />
of HIV-1 infection. J. Acquir. Immune Defic. Syndr. 42, 620–626 (2007).<br />
41. Henrard, D.R. et al. Natural History of HIV-1 cell-free viremia. J. Am. Med. Assoc.<br />
274, 554–558 (1995).<br />
42. Chen, R.Y. et al. Distribution of health care expenditures for HIV-infected patients.<br />
Clin. Infect. Dis. 42, 1003–1010 (2006).<br />
43. Richman, D.D. et al. The challenge of finding a cure for HIV infection. Science 323,<br />
1304–1307 (2009).<br />
44. Rossi, J.J., June, C.H. & Kohn, D.B. Genetic therapies against HIV. Nat. Biotechnol.<br />
25, 1444–1454 (2007).<br />
45. Bibikova, M. et al. Targeted chromosomal cleavage and mutagenesis in Drosophila<br />
using zinc-finger nucleases. Genetics 161, 1169–1175 (2002).<br />
46. Doyon, Y. et al. Heritable targeted gene disruption in zebrafish using designed zincfinger<br />
nucleases. Nat. Biotechnol. 26, 702–708 (2008).<br />
47. Santiago, Y. et al. Targeted gene knockout in mammalian cells by using engineered<br />
zinc-finger nucleases. Proc. Natl. Acad. Sci. USA 105, 5809–5814 (2008).<br />
48. Peters, W., Dupuis, M. & Charo, I.F. A mechanism for the impaired IFN-gamma production<br />
in C–C chemokine receptor 2 (CCR2) knockout mice: Role of CCR2 in linking<br />
the innate and adaptive immune responses. J. Immunol. 165, 7072–7077 (2000).<br />
49. Smith, M.W. et al. CCR2 chemokine receptor and AIDS progression. Nat. Med. 3,<br />
1052–1053 (1997).<br />
50. Davis, B.R. & Candotti, F. Revertant somatic mosaicism in the Wiskott-Aldrich syndrome.<br />
Immunol. Res. 44, 127–131 (2009).<br />
51. Hirschhorn, R. et al. Spontaneous in vivo reversion to normal of an inherited mutation<br />
in a patient with adenosine deaminase deficiency. Nat. Genet. 3, 290–295 (1996).<br />
846 volume 28 number 8 august 2010 nature biotechnology
articles<br />
52. Hirschhorn, R. et al. In vivo reversion to normal of inherited mutations in humans.<br />
J. Med. Genet. 40, 721–728 (2003).<br />
53. Stephan, V. et al. Atypical X-linked severe combined immunodeficiency due to possible<br />
spontaneous reversion of the genetic defect in T cells. N. Engl. J. Med. 335,<br />
1563–1567 (1996).<br />
54. Chun, T.W. et al. Persistence of HIV in gut-associated lymphoid tissue despite longterm<br />
antiretroviral therapy. J. Infect. Dis. 197, 714–720 (2008).<br />
55. Lackner, A.A. et al. The gastrointestinal tract and AIDS pathogenesis. Gastroenterology<br />
136, 1965–1978 (2009).<br />
56. Picker, L.J. Immunopathogenesis of acute AIDS virus infection. Curr. Opin. Immunol.<br />
18, 399–405 (2006).<br />
57. Veazey, R.S., Marx, P.A. & Lackner, A.A. The mucosal immune system: primary target<br />
for HIV infection and AIDS. Trends Immunol. 22, 626–633 (2001).<br />
58. Krishnan, A. et al. Autologous stem cell transplantation for HIV associated lymphoma.<br />
Blood 98, 3857–3859 (2001).<br />
59. Aiuti, A. et al. Correction of ADA-SCID by stem cell gene therapy combined with<br />
nonmyeloablative conditioning. Science 296, 2410–2413 (2002).<br />
60. Aiuti, A. et al. Gene therapy for immunodeficiency due to adenosine deaminase<br />
deficiency. N. Engl. J. Med. 360, 447–458 (2009).<br />
61. Ott, M.G. et al. Correction of X-linked chronic granulomatous disease by gene therapy,<br />
augmented by insertional activation of MDS1–EVI1, PRDM16 or SETBP1. Nat. Med.<br />
12, 401–409 (2006).<br />
62. Cartier, N. et al. Hematopoietic stem cell gene therapy with a lentiviral vector in<br />
X-linked adrenoleukodystrophy. Science 326, 818–823 (2009).<br />
63. Biti, R. et al. HIV-1 infection in an individual homozygous for the CCR5 deletion allele.<br />
Nat. Med. 3, 252–253 (1997).<br />
64. Oh, D.Y. et al. CCR5Delta32 genotypes in a German HIV-1 seroconverter cohort and<br />
report of HIV-1 infection in a CCR5Delta32 homozygous individual. PLoS ONE 3,<br />
e2747–2753 (2008).<br />
65. Weiser, B. et al. HIV-1 coreceptor usage and CXCR4-specific viral load predict clinical<br />
disease progression during combination antiretroviral therapy. AIDS 22, 469–479<br />
(2008).<br />
66. Ogert, R.A. et al. Mapping Resistance to the CCR5 co-receptor antagonist vicriviroc<br />
using heterologous chimeric HIV-1 envelope genes reveals key determinants in the<br />
C2–V5 domain of gp120. Virology 373, 387–399 (2008).<br />
67. Soulie, C. et al. Primary genotypic resistance of HIV-1 to CCR5 antagonist treatmentnaïve<br />
patients. AIDS 22, 2212–2214 (2008).<br />
68. Palmer, S. et al. Low-level viremia persists for at least 7 years in patients on suppressive<br />
antiretroviral therapy. Proc. Natl. Acad. Sci. USA 105, 3879–3884 (2008).<br />
69. Dinoso, J.B. et al. Treatment intensification does not reduce residual HIV-1 viremia<br />
in patients on highly active antiretroviral therapy. Proc. Natl. Acad. Sci. USA 106,<br />
9403–9408 (2009).<br />
70. Mitsuyasu, R.T. et al. Phase 2 gene therapy trial of an anti-HIV ribozyme in autologous<br />
CD34+ cells. Nat. Med. 15, 285–292 (2009).<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
nature biotechnology volume 28 number 8 august 2010 847
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
ONLINE METHODS<br />
Hematopoietic stem/progenitor cell isolation. Human CD3 + HSPCs were<br />
isolated from umbilical cord blood collected from normal deliveries at local<br />
hospitals, according to guidelines approved by the Children’s Hospital Los<br />
Angeles Committee on Clinical Investigation, or as waste cord blood material<br />
from StemCyte Corp. Immunomagnetic enrichment for CD34 + cells<br />
was performed using the magnetic-activated cell sorting (MACS) system<br />
(Miltenyi Biotec), per the manufacturer’s instructions, with the modification<br />
that the initial purified CD34 + population was put through a second<br />
column and washed three times with 3 ml of the supplied buffer per wash<br />
before the final elution. This additional step gave a > 99% pure CD34 + population,<br />
as measured by FACS analysis using the anti-CD34 antibody, 8G12<br />
(BD Biosciences).<br />
Nucleofection of CD34 + HSPCs with ZFN expression plasmids. Freshly<br />
isolated CD34 + cells were stimulated for 5–12 h in X-VIVO 10 media (Lonza)<br />
containing 2 nM l-glutamine, 50 ng/ml SCF, 50 ng/ml Flt-3 and 50 ng/ml<br />
TPO (R&D Systems). 1 × 10 6 cells were nucleofected with 2.5 µg each of a<br />
plasmid pair expressing ZFNs binding upstream (ZFN-L) or downstream<br />
(ZFN-R) of codon Leu55 within TM1 of human CCR5 (ref. 19). The CD34 +<br />
cell/DNA mix was processed in an X series Amaxa Nucleofector (Lonza)<br />
using the U-01 setting and the human CD34 + nucleofector solution, according<br />
to the manufacturer’s instructions. Following nucleofection, cells were<br />
immediately placed in pre-warmed IMDM media (Lonza) containing 26%<br />
FBS (Mediatech), 0.35% BSA, 2nM l-glutamine, 0.5% 10 −3 mol/l hydrocortisone<br />
(Stem Cell Technologies), 5 ng/ml IL-3, 10 ng/ml IL-6 and 25 ng/ml<br />
SCF (R&D Systems). Cells were allowed to recover in this media for 2–12 h<br />
before injection into mice.<br />
Apoptosis assay. CD34 + HSPCs were collected at 24 h post-nucleofection<br />
and analyzed for the percent of viable cells marked for apoptosis using the<br />
PE apoptosis detection kit (BD Biosciences) according to the manufacturer’s<br />
instructions. Cells were stained with 7-AAD (detects viable cells) and annexin<br />
V (detects apoptotic cells) and analyzed using a FACScan flow cytometer (BD<br />
Biosciences). This double staining allowed the identification of cells in the<br />
early stages of apoptosis.<br />
NSG mouse transplantation. NOD.Cg-Prkdc scid Il2rg tm1Wj/SzJ (NOD/<br />
SCID/IL2rγ null , NSG) mice 71 were obtained from Jackson Laboratories.<br />
Neonatal mice within 48 h of birth received 150 cGy radiation, then 2–4 h<br />
later 1 × 10 6 ZFN-modified or mock-treated human CD34 + HSPCs in 50 µl<br />
PBS containing 1% heparin were injected through the facial vein. For secondary<br />
transplantations, bone marrow was harvested by needle aspiration<br />
from the upper and lower limbs of 18-week-old animals previously engrafted<br />
with human CD34 + HSPCs, filtered through a 70 µm nylon mesh screen<br />
(Fisher Scientific) and washed in PBS. The cells were transplanted into three<br />
8-week-old mice that had previously received 350 cGy radiation, using retroorbital<br />
injection of 2 × 10 7 bone marrow cells per mouse. Mouse cohorts are<br />
described in Supplementary Table 2.<br />
Analysis of CCR5 disruption. The percentage of CCR5 alleles disrupted by<br />
ZFN treatment was measured by performing PCR across the ZFN target site<br />
followed by digestion with the Surveyor (Cel 1) nuclease (Transgenomic),<br />
which detects heteroduplex formation, as previously described 19 . Briefly,<br />
genomic DNA was extracted from mouse tissues and subject to nested PCR<br />
amplification using human CCR5-specific primers, with the resulting radiolabeled<br />
products digested with Cel 1 nuclease and resolved by PAGE. The<br />
ratio of cleaved to uncleaved products was calculated to give a measure of<br />
the frequency of gene disruption. The assay is sensitive enough to detect<br />
single-nucleotide changes and has a linear detection range between 0.69 and<br />
44% 19 .<br />
In addition, a common 5-bp (pentamer) duplication that occurs<br />
after nonhomologous end-joining repair of ZFN-cleaved CCR5 (ref. 19)<br />
was detected by PCR. The first-round PCR product generated during<br />
Cel 1 analysis was diluted 1:5,000 and 5 µl used in a Taqman qPCR reaction<br />
using primers (5′-GGTCATCCTCATCCTGATCTGA-3′ and<br />
5′-GATGATGAAGAAGATTCCAGAGAAGAAG-3′) and probe 5′-FAM d<br />
(CCTTCTTACTGTCCCCTTCTGGGCTCAC) BHQ-1-3′ (Biosearch<br />
Technologies), and analyzed using a 7,900HT real-time PCR machine<br />
(Applied Biosystems). At the same time, 5 µl of a 1:50,000 dilution of<br />
the PCR product were used in a Taqman qPCR reaction using primers<br />
(5′- CCAAAAAATCAATGTGAAGCAAATC-3′ and 5′- TGCCCACAAAAC<br />
CAAAGATG -3′) and probe 5′- FAM d(CAGCCCGCCTCCTGCCTCC)<br />
BHQ-1-3′ to detect total copies of human CCR5. Data were analyzed using<br />
software supplied by the manufacturer and the frequency of pentamer insertions<br />
in CCR5 calculated. The assay is sensitive enough to detect a single<br />
pentamer insertion event in 100,000 cells (data not shown).<br />
ZFN-induced modifications of CCR5 were analyzed by directly sequencing<br />
cloned CCR5 alleles, isolated by PCR amplification as described above, and<br />
TOPO-TA cloning (Invitrogen). Plasmid DNA was isolated from 60 individual<br />
bacterial colonies for each tissue analyzed.<br />
HIV-1 infection and analysis. A cell-free virus stock of HIV-1 BaL and a<br />
molecular clone of HIV-1 NL4-3 were obtained from the AIDS Research and<br />
Reference Reagent Program (ARRRP), Division of AIDS, NIAID, NIH from<br />
material deposited by Suzanne Gartner, Mikulas Popovic, Robert Gallo and<br />
Malcolm Martin. HIV-1 BaL virus was propagated in PM1 cells, obtained from<br />
the ARRRP and deposited by Marvin Reitz and harvested 10 d post-infection.<br />
HIV-1 NL4-3 viruses were generated by transient transfection of 293T<br />
cells (ATCC). Viruses were titrated using the Alliance HIV-1 p24 ELISA<br />
kit (PerkinElmer) and by TCID 50 analysis on U373-MAGI cells (ARRRP,<br />
deposited by Michael Emerman and Adam Geballe). Mice to be infected<br />
with HIV-1 were anesthetized with inhalant 2.5% isoflourane and injected<br />
intraperitoneally with virus stocks containing 200 ng p24, 7 × 10 4 TCID 50<br />
units, in 100 µl total volume.<br />
HIV-1 levels in peripheral blood or tissues harvested at necropsy were<br />
determined by extracting RNA from 5 × 10 5 cells using the master pure<br />
complete DNA and RNA purification kit (Epicentre Biotechnologies) and<br />
performing Taqman qPCR using a primer and probe set targeting the HIV-1<br />
LTR region, as previously described 72 . In addition, p24 levels were measured<br />
in blood samples by ELISA.<br />
Mouse blood and tissue collection. Peripheral blood samples were collected<br />
every 2 weeks starting at 8 weeks of age, using retro-orbital sampling. Whole<br />
blood was blocked in FBS (Mediatech) for 30 min., the red blood cells were<br />
lysed using Pharmlyse solution (BD Biosciences) and cells were washed with<br />
PBS. Tissue samples were collected at necropsy and processed immediately<br />
for cell isolation and FACS analysis, or kept in freezing media (IMDM plus<br />
20% DMSO) in liquid nitrogen, for later analysis and DNA extraction. Tissue<br />
samples were manually agitated in PBS before filtering through a sterile 70<br />
µm nylon mesh screen (Fisher Scientific) and suspension cell preparations<br />
produced as previously described 19 . Intestinal samples were processed as<br />
previously described 73 , with the modification that the mononuclear cell<br />
population was isolated after incubation in citrate buffer and collagenase<br />
enzyme for 2 h, followed by nylon wool filtration (Amersham Biosciences)<br />
and ficoll-hypaque gradient isolation (GE Healthcare).<br />
Analysis of human cells in mouse tissues. FACS analysis of human cells was<br />
performed using a FACSCalibur instrument (BD Biosciences) with either<br />
BD CellQuest Pro version 5.2 (BD Biosciences) or FlowJo software version<br />
8.8.6 for Macintosh (Treestar). The gating strategy performed was an initial<br />
forward scatter versus side scatter (FSC/SSC) gate to exclude debris, followed<br />
by a human CD45 gate. For analysis of lymphocyte populations in peripheral<br />
blood, a further lymphoid gate (low side scatter) was also applied to exclude<br />
cells of monocytic origin 74 . All antibodies used were fluorochrome conjugated<br />
and human specific, and obtained from BD Biosciences: CD45 (clone 2D1),<br />
CD19 (clone HIB19), CD14 (clone MϕP9), CD3 (clone SK7), CD4 (clone<br />
SK3), CD8 (clone HIT8a), CCR5 (2D7). Gates were set using fluorescence<br />
minus one controls, where cells were stained with all antibodies except the one<br />
of interest. Specificity was also confirmed using isotype-matched nonspecific<br />
antibodies (BD Biosciences) (Supplementary Fig. 1) and with tissues from<br />
animals that had not been engrafted with human cells.<br />
Immunohistochemical analysis of human CD3 and CD4 expression,<br />
respectively, in the small intestine and spleen tissue from HSPC-engrafted<br />
nature biotechnology doi:10.1038/nbt.1663
mice was performed on fixed paraffin-embedded tissue sections, as previously<br />
described 73 . Controls included isotype-matched nonspecific antibodies and<br />
unengrafted NSG mice.<br />
Statistical analysis. All statistical analysis was performed using GraphPad<br />
Prism version 5.0b for Mac OSX (GraphPad Software). Unpaired two-tailed<br />
t-tests were performed assuming equal variance to calculate P-values. A 95%<br />
confidence interval was used to determine significance. A minimum of three<br />
data points was used for each analysis.<br />
71. Shultz, L.D. et al. Human lymphoid and myeloid cell development in NOD/LtSz-scid<br />
IL2R gamma null mice engrafted with mobilized human hematopoietic stem cells.<br />
J. Immunol. 174, 6477–6489 (2005).<br />
72. Rouet, F. et al. Transfer and evaluation of an automated, low-cost real-time reverse<br />
transcription-PCR test for diagnosis and monitoring of human immunodeficiency<br />
virus type 1 infection in a West African resource-limited setting. J. Clin. Microbiol.<br />
43, 2709–2717 (2005).<br />
73. Sun, Z. et al. Intrarectal transmission, systemic infection, and CD4+ T cell depletion<br />
in humanized mice infected with HIV-1. J. Exp. Med. 204, 705–714 (2007).<br />
74. Loken, M.R. et al. Establishing lymphocyte gates for immunophenotyping by flow<br />
cytometry. Cytometry 11, 453–459 (1990).<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
doi:10.1038/nbt.1663<br />
nature biotechnology
articles<br />
Cell type of origin influences the molecular and<br />
functional properties of mouse induced pluripotent<br />
stem cells<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
Jose M Polo 1–4 , Susanna Liu 5 , Maria Eugenia Figueroa 6 , Warakorn Kulalert 1–4 , Sarah Eminli 1–4 ,<br />
Kah Yong Tan 1,4,7 , Effie Apostolou 1–4 , Matthias Stadtfeld 1–4 , Yushan Li 6 , Toshi Shioda 2 , Sridaran Natesan 8 ,<br />
Amy J Wagers 1,4,7 , Ari Melnick 6 , Todd Evans 5 & Konrad Hochedlinger 1–4<br />
Induced pluripotent stem cells (iPSCs) have been derived from various somatic cell populations through ectopic expression of defined<br />
factors. It remains unclear whether iPSCs generated from different cell types are molecularly and functionally similar. Here we<br />
show that iPSCs obtained from mouse fibroblasts, hematopoietic and myogenic cells exhibit distinct transcriptional and epigenetic<br />
patterns. Moreover, we demonstrate that cellular origin influences the in vitro differentiation potentials of iPSCs into embryoid bodies<br />
and different hematopoietic cell types. Notably, continuous passaging of iPSCs largely attenuates these differences. Our results<br />
suggest that early-passage iPSCs retain a transient epigenetic memory of their somatic cells of origin, which manifests as differential<br />
gene expression and altered differentiation capacity. These observations may influence ongoing attempts to use iPSCs for disease<br />
modeling and could also be exploited in potential therapeutic applications to enhance differentiation into desired cell lineages.<br />
IPSCs are usually obtained from fibroblasts after infection with viral constructs<br />
expressing the four transcription factors Oct4, Sox2, Klf4 and<br />
c-Myc 1–10 . In addition, other cell types, including blood 2,4,11 , stomach<br />
and liver cells 1 , keratinocytes 12,13 , melanocytes 14 , pancreatic β cells 7 and<br />
neural progenitors 3,15–17 have been reprogrammed into iPSCs. Although<br />
these iPSC lines have been shown to express pluripotency genes and<br />
support the differentiation into cell types of all three germ layers, recent<br />
studies detected substantial molecular and functional differences among<br />
iPSCs derived from distinctive cell types. For example, iPSCs produced<br />
from various fibroblasts, stomach and liver cells showed different propensities<br />
to form tumors in mice, although the underlying molecular<br />
mechanisms remain elusive 18 . Another study identified persistent donor<br />
cell–specific gene expression patterns in human iPSCs produced from<br />
different cell types, suggesting an influence of the somatic cell of origin<br />
on the molecular properties of resultant iPSCs 19 . Whether cellular origin<br />
also affected the functional properties of iPSCs remained unexplored<br />
in that report. Of note, the findings of some of these studies may be<br />
confounded by the presence of different viral insertions in individual<br />
iPSC lines and by the fact that the analyzed iPSC lines were of different<br />
genetic background, which can affect both gene expression patterns 20<br />
and the functionality 9,21 of cells. Indeed, we have recently shown that<br />
many mouse iPSC lines derived from different somatic cell types show<br />
aberrant silencing of a surprisingly small set of transcripts compared with<br />
embryonic stem cells (ESCs) 22 . However, our study did not investigate<br />
whether additional cell-of-origin–specific differences may exist in iPSC<br />
lines derived from different cell types.<br />
Patient-specific iPSCs are a valuable tool for the study of disease and<br />
possibly for the development of therapies 20,23–26 . Thus, resolving the question<br />
of whether iPSCs produced from different cell types are molecularly<br />
and functionally equivalent is crucial for using these cells to model disease,<br />
which entails detecting subtle differences in the differentiation potential<br />
of patient-derived iPSCs 24,27 . Furthermore, the identification of somatic<br />
cells that influence the differentiation capacities of resultant iPSCs into<br />
desired cell lineages could be useful in a therapeutic setting.<br />
To assess whether iPSCs derived from different somatic cell types are<br />
distinguishable, we compared here the transcriptional and epigenetic<br />
patterns, as well as the in vitro differentiation potentials, of iPSCs produced<br />
from four genetically identical adult mouse cell types that differed<br />
only in the lineage from which they were derived.<br />
RESULTS<br />
Genetically matched iPSCs derived from different cell types<br />
Because the genetic background of ESCs can influence their transcriptional<br />
and functional behaviors, we used a previously described<br />
‘secondary system’ to generate genetically identical iPSCs 2,28 (Fig. 1a).<br />
Briefly, iPSCs were generated from somatic cells using doxycyclineinducible<br />
lentiviruses expressing Oct4, Sox2, Klf4 and c-Myc 29 , and<br />
then injected into blastocysts to produce isogenic chimeric mice.<br />
1 Howard Hughes Medical Institute and Department of Stem Cell and Regenerative Biology, Harvard University and Harvard Medical School, Cambridge,<br />
Massachusetts, USA. 2 Massachusetts General Hospital Cancer Center, Charlestown, Massachusetts, USA. 3 Massachusetts General Hospital Center for<br />
Regenerative Medicine, Boston, Massachusetts, USA. 4 Harvard Stem Cell Institute, Cambridge, Massachusetts, USA. 5 Department of Surgery, Weill Cornell<br />
Medical College, New York, New York, USA. 6 Department of Medicine, Hematology Oncology Division, Weill Cornell Medical College, New York, New York, USA.<br />
7 Joslin Diabetes Center, Boston, Massachusetts, USA. 8 Sanofi-Aventis Cambridge Genomics Center, Cambridge, Massachusetts, USA. Correspondence should be<br />
addressed to K.H. (khochedlinger@helix.mgh.harvard.edu).<br />
Received 26 March; accepted 9 July; published online 19 July 2010; doi:10.1038/nbt1667<br />
848 volume 28 number 8 august 2010 nature biotechnology
articles<br />
a<br />
Blast<br />
injection<br />
Secondary iPSC clone<br />
(carry dox-inducible copies<br />
of Oct4, Sox2, Klf4, c-Myc)<br />
Chi no. 1<br />
Granulocytes<br />
SMP cells<br />
+ dox<br />
B cells<br />
Gra-iPSC<br />
SMP-iPSC<br />
B-iPSC<br />
• Gene expression<br />
• DNA methylation<br />
• ChIP for histone modifications<br />
• In vitro differentiation<br />
Chi no. 2<br />
TTFs<br />
TTF-iPSC<br />
b<br />
Cxcr4<br />
Itgb1<br />
Gr-1<br />
Lysozyme<br />
0.12<br />
1.00<br />
0.05<br />
0.05<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
Fold GAPDH<br />
c<br />
0.10<br />
0.08<br />
0.06<br />
0.04<br />
0.02<br />
0.00<br />
SMPiPSC<br />
GraiPSC<br />
SMPiPSC<br />
Fold GAPDH<br />
0.80<br />
0.60<br />
0.40<br />
0.20<br />
0.00<br />
Chi no. 1 Chi no. 2<br />
SMPiPSC<br />
GraiPSC<br />
Fold GAPDH<br />
d<br />
0.04<br />
0.03<br />
0.02<br />
0.01<br />
0.00<br />
SMP-iPSC1<br />
SMP-iPSC2<br />
SMP-iPSC3<br />
Gra-iPSC3<br />
Gra-iPSC1<br />
Gra-iPSC2<br />
Fold GAPDH<br />
0.04<br />
0.03<br />
0.02<br />
0.01<br />
0.00<br />
B-iPSC3<br />
B-iPSC1<br />
B-iPSC2<br />
TTF-iPSC3<br />
TTF-iPSC1<br />
TTF-iPSC2<br />
Chi no. 1 Chi no. 2<br />
1 2 3 1 2 3 1 2 3 1 2 3<br />
GraiPSC<br />
SMPiPSC<br />
B-<br />
iPSC<br />
TTFiPSC<br />
GraiPSC<br />
SMPiPSC<br />
GraiPSC<br />
Figure 1 iPSCs derived from different cell types are transcriptionally distinguishable. (a) Flow chart explaining the derivation and analysis of genetically<br />
matched iPSCs from different cell types. Secondary iPSCs were first injected into blastocysts to generate chimeric mice, from which the indicated somatic<br />
cell types were isolated. Exposure of these cells to doxycycline (dox) then gave rise to iPSCs. ChIP, chromatin immunoprecipitation. (b) Quantification of<br />
the expression levels of Cxcr4, Itgb1, Gr-1 and Lysozyme by quantitative PCR in SMP-iPSCs, in red, and Gra-iPSCs, in gray. The values were normalized to<br />
GAPDH expression; the error bars depict the s.e.m. (n = 3). (c) Heat map showing top 104 probes with highest variance in their expression levels. Left panel,<br />
SMP-iPSCs and Gra-iPSCs derived from chimera no. 1. Right panel, TTF-iPSCs and B-iPSCs derived from chimera no. 2. (d) Hierarchical, unsupervised<br />
clustering of iPSC expression profiles using the correlation distance and the Ward method. SMP-iPSCs and Gra-iPSCs were derived from chimera no. 1 (left<br />
panel), TTF-iPSCs and B-iPSCs originate from chimera no. 2 (right panel). Chi no. 1, chimera no. 1; chi no. 2, chimera no. 2.<br />
Thus, isolation of different cell types from these chimeras and their<br />
subsequent exposure to doxycycline gave rise to iPSCs with the same<br />
genetic makeup. In this study, we focused on iPSCs derived from tail<br />
tip–derived fibroblasts (TTFs), splenic B cells (B), bone marrow–<br />
derived granulocytes and skeletal muscle precursors (SMPs) 30 , which<br />
were continuously cultured for 2–3 weeks (passage 4 to 6) after picking.<br />
The pluripotency of some of these cell lines has been previously<br />
documented 2 , or was analyzed in this study (Supplementary Table<br />
1 and Supplementary Fig. 1). All cell lines grew at similar rates and<br />
independently of viral transgene expression (Supplementary Fig.<br />
2) and upregulated the endogenous pluripotency genes Nanog,<br />
Sox2 and Oct4, indicating successful molecular reprogramming<br />
(Supplementary Table 1). Moreover, all lines gave rise to differentiated<br />
teratomas, and all tested lines supported the development of<br />
chimeric animals upon blastocyst injection, demonstrating their<br />
pluripotency (Supplementary Table 1). We therefore concluded that<br />
the cell lines analyzed here qualify as bona fide iPSC lines.<br />
iPSCs produced from different cell types are transcriptionally<br />
distinguishable<br />
We first evaluated whether iPSCs derived from defined somatic cell<br />
types retain gene expression patterns indicative of their cells of origin.<br />
Specifically, we assessed the expression of cell lineage–specific<br />
candidate genes in iPSCs derived from granulocytes (Gra-iPSCs)<br />
and SMPs (SMP-iPSCs). As expected, the SMP markers Cxcr4 and<br />
Integrin B1 and the granulocyte markers Lysozyme (also known as<br />
Lyz1 and Lyz2) and Gr-1 (also known as Ly6g) were expressed at considerably<br />
higher levels in the somatic cells of origin than in resultant<br />
nature biotechnology volume 28 number 8 august 2010 849
articles<br />
a<br />
Chi no. 1 Chi no. 2<br />
b<br />
Gra-iPSC2<br />
Gra-iPSC1<br />
B-iPSC1<br />
B-iPSC3<br />
d = 0.02 d = 0.02<br />
Gra-iPSC3<br />
B-iPSC2<br />
SMP-iPSC2<br />
TTF-iPSC1<br />
SMP-iPSC3<br />
TTF-iPSC3<br />
SMP-iPSC1<br />
TTF-iPSC2<br />
Chi no. 1 Chi no. 2<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
c<br />
SMP-iPSC1<br />
SMP-iPSC2<br />
SMP-iPSC3<br />
Gra-iPSC1<br />
Gra-iPSC2<br />
Gra-iPSC3<br />
ESC<br />
d<br />
Percent input<br />
0.12<br />
0.1<br />
0.08<br />
0.06<br />
0.04<br />
0.02<br />
Gr-1<br />
Gr-1<br />
Lysozyme<br />
Percent of methylation<br />
0% 100% Not analyzed<br />
Percent input<br />
Itgb1<br />
Cxcr4<br />
0.6<br />
0.5<br />
0.4<br />
0.3<br />
0.2<br />
0.1<br />
Cxcr4<br />
H3Ac<br />
H3K4me3<br />
H3K27me3<br />
IgG<br />
0<br />
0<br />
Gra<br />
SMP<br />
Gra SMP GraiPSC<br />
SMPiPSC<br />
GraiPSC<br />
SMPiPSC<br />
0.4<br />
Lysozyme<br />
0.8<br />
Itgb1<br />
Percent input<br />
0.3<br />
0.2<br />
Percent input<br />
0.6<br />
0.4<br />
0.1<br />
0.2<br />
0<br />
0<br />
Gra SMP GraiPSC<br />
SMPiPSC<br />
Gra SMP<br />
GraiPSC<br />
SMPiPSC<br />
Figure 2 iPSCs derived from different cell types exhibit distinguishable epigenetic signatures. (a) Hierarchical unsupervised clustering analysis of<br />
HELP genome-wide methylation data from indicated iPSC lines. (b) Correspondence analysis of SMP-iPSCs and Gra-iPSCs (left panel) from chimera<br />
no. 1, TTF-iPSCs and B-iPSCs (right panel) from chimera no. 2. (c) Graphic representation of DNA methylation quantification of specific CpGs<br />
(circles) in the promoter regions of the indicated candidate genes using EpiTYPER DNA methylation analyses. Yellow indicates 0% methylation and<br />
blue 100% methylation. (d) Chromatin immunoprecipitation (ChIP) for H3 pan-acetylated (H3Ac, in blue), H3K4 trimethylated (H3K4me3, in green),<br />
H3K27 trimethylated (H3K27me3, in red) and isotype control (IgG, in light blue) of granulocytes (Gra), SMPs, Gra-iPSCs and SMP-iPSCs. Chi no. 1,<br />
chimera no. 1; chi no. 2, chimera no. 2. The error bars depict the s.e.m. (n = 3).<br />
850 volume 28 number 8 august 2010 nature biotechnology
articles<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
EryP colonies<br />
a<br />
b<br />
c<br />
EB diameter<br />
(in arbitrary units)<br />
2,000<br />
1,600<br />
1,200<br />
800<br />
400<br />
0<br />
B-iPSC<br />
TTF-iPSC<br />
Gra-iPSC<br />
SMP-iPSC<br />
8<br />
7<br />
6<br />
5<br />
4<br />
3<br />
2<br />
1<br />
0<br />
B-<br />
iPSC<br />
B-iPSC<br />
B-<br />
iPSC<br />
P < 0.001<br />
5,000 cells/ml<br />
6 days<br />
P < 0.001<br />
Chi no. 2<br />
2,500<br />
2,000<br />
1,500<br />
1,000<br />
500<br />
0<br />
P < 0.05<br />
EBs<br />
TTF-iPSC<br />
Dissociate and<br />
plate 100,000/ml<br />
iPSCs (Supplementary Fig. 3). Moreover, SMP-iPSCs expressed substantially<br />
higher levels of Cxcr4 and Itgb1 than did Gra-iPSCs (Fig.<br />
1b), and Gra-iPSCs showed higher expression levels of Lysozyme<br />
and Gr-1 compared with SMP-iPSCs (Fig. 1b). Together, these data<br />
suggest that iPSCs retain a transcriptional memory of their somatic<br />
cell of origin.<br />
To test this notion globally, we compared the transcriptional profiles<br />
of iPSC lines originating from SMPs (n = 3) with those derived from<br />
granulocytes (n = 3), as well as expression profiles of iPSC lines originating<br />
from B cells (n = 3) with those produced from TTFs (n = 3).<br />
Note that iPSCs were compared with each other only if they originated<br />
from the same chimeric mouse (SMP-iPSCs versus Gra-iPSCs and<br />
B-iPSCs versus TTF-iPSCs) (Fig. 1a) to eliminate potential variability<br />
d<br />
250<br />
200<br />
150<br />
100<br />
50<br />
0<br />
eryPs<br />
250<br />
200<br />
150<br />
100<br />
50<br />
0<br />
Gra-iPSC<br />
e f g<br />
EryP colonies<br />
SMPiPSC<br />
TTFiPSC<br />
GraiPSC<br />
SMPiPSC<br />
Chi no. 1<br />
Macrophage colonies<br />
B-<br />
iPSC<br />
Macrophage colonies<br />
P < 0.07<br />
EPO<br />
4 days<br />
7 days<br />
cytokines<br />
8 days<br />
Macrophages<br />
IL-3/M-CSF<br />
3<br />
2<br />
1<br />
0<br />
B-<br />
iPSC<br />
SMP-iPSC<br />
eryPs<br />
Macrophages<br />
Mixed colonies<br />
Mixed colonies<br />
Chi no. 2 Chi no. 1<br />
Chi no. 2 Chi no. 1<br />
Chi no. 2<br />
Chi no. 1<br />
4<br />
3<br />
2<br />
1<br />
0<br />
P < 0.05<br />
Figure 3 iPSCs derived from different cell types have distinctive in vitro differentiation potentials. (a) Experimental<br />
outline. iPSCs were first differentiated into embryoid bodies. At day 6, embryoid bodies were dissociated and<br />
plated in conditions to favor differentiation into erythrocyte progenitors (eryP) and macrophage and mixed<br />
hematopoietic colonies. (b) Phase contrast images showing embryoid bodies derived from B-iPSCs, TTF-iPSCs,<br />
Gra-iPSCs and SMP-iPSCs at same magnification. (c) Quantification of embryoid body sizes derived from B-iPSCs,<br />
TTF-iPSCs, Gra-iPSCs and SMP-iPSCs; the diameter of the embryoid bodies was measured using arbitrary units<br />
(AU). The error bars depict the s.e.m. (n = 30) (d) Representative images of erythrocyte progenitors (eryPs),<br />
macrophage colonies and mixed hematopoietic colonies. (e–g) Quantification of in vitro differentiation potentials<br />
of the different iPSCs into EryPs (e), macrophage colonies (f) and mixed hematopoietic colonies (g). Chi no. 1,<br />
chimera no. 1; chi no. 2, chimera no. 2. The error bars depict the s.e.m. (n = 12).<br />
Mixed colonies<br />
Mixed colonies<br />
TTFiPSC<br />
GraiPSC<br />
SMPiPSC<br />
TTFiPSC<br />
TTFiPSC<br />
GraiPSC<br />
SMPiPSC<br />
GraiPSC<br />
between different experiments and<br />
individual animals. All iPSC lines<br />
analyzed were between passage (p)<br />
4 and 6. There were 1,388 genes differentially<br />
expressed (twofold, corrected<br />
P = 0.05) between SMP-iPSCs<br />
and Gra-iPSCs, and 1,090 genes<br />
between B-iPSCs and TTF-iPSCs<br />
(Supplementary Table 2). An analysis<br />
of the 100 genes with the greatest<br />
range of expression levels across<br />
all samples indicated that iPSCs<br />
with the same cell of origin clustered<br />
together (Fig. 1c). Consistent<br />
with this observation, unsupervised<br />
hierarchical clustering (Fig.<br />
1d) as well as principal component<br />
analysis (Supplementary Fig. 4)<br />
of all genes placed SMP-iPSCs and<br />
Gra-iPSCs, as well as B-iPSCs and<br />
TTF-iPSCs, into different groups<br />
according to their cells of origin.<br />
Notably, Gene Ontology (GO)<br />
analysis of the 100 genes with the<br />
greatest range of expression between<br />
SMP-iPSCs and Gra-iPSCs indicated<br />
an enrichment for genes belonging<br />
to the categories ‘myofibril’ (7.6-<br />
fold enrichment), ‘contractile fiber’<br />
(7.3-fold enrichment) and ‘muscle<br />
development’ (5.9-fold enrichment)<br />
as well as ‘B-cell activation’<br />
(6.8-fold enrichment) and ‘leukocyte<br />
activation’ (3.7-fold enrichment)<br />
(when compared with the<br />
expected background). Together,<br />
these results show that genetically<br />
identical iPSCs obtained from four<br />
different somatic cell types are distinguishable<br />
from each other using<br />
genome-wide transcriptional analyses,<br />
further supporting the notion<br />
that the donor cell type influences<br />
the overall gene expression pattern<br />
of resultant iPSCs.<br />
To determine the effect on gene<br />
expression patterns of deriving<br />
iPSCs from different animals in<br />
independent experiments, we compared the expression profiles of<br />
Gra-iPSCs derived from chimera no. 1 (n = 3) with Gra-iPSCs from<br />
chimera no. 2 (n = 3) as well as with SMP-iPSCs from chimera no. 1<br />
and TTF-iPSCs from chimera no. 2 (Fig. 1a). Hierarchical clustering<br />
separated Gra-iPSCs according to their origin from different animals,<br />
suggesting a significant contribution of this experimental variable to<br />
gene expression patterns (Supplementary Fig. 5). However, when the<br />
expression data from TTF-iPSCs and SMP-iPSCs were included in the<br />
analysis, we found that differences due to cell of origin were stronger<br />
than those arising from variations in experimental conditions or animals.<br />
These data reinforce the observation that iPSCs derived from<br />
different somatic cell types are transcriptionally distinguishable, even<br />
when they originate from different animals.<br />
nature biotechnology volume 28 number 8 august 2010 851
articles<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
Figure 4 Continuous<br />
passaging of iPSCs<br />
abrogates transcriptional,<br />
epigenetic and functional<br />
differences. (a) Hierarchical<br />
unsupervised clustering of<br />
expression profiles from<br />
B-iPSCs, T-iPSCs, TTFiPSCs<br />
and Gra-iPSCs from<br />
chimera no. 2. Left panel<br />
shows clustering analysis of<br />
all iPSC samples at passage<br />
p4, the middle panel at p10<br />
and the right panel at p16.<br />
(b) Number of differentially<br />
expressed probes between<br />
pairs of iPSC samples used<br />
in a; iPSCs at p4 are shown<br />
in blue bars, iPSCs at p10<br />
are shown in orange bars<br />
and iPSCs at p16 are shown<br />
in red bars. The number<br />
of differently expressed<br />
probes between iPSCs was<br />
calculated using a pairwise<br />
analysis (twofold), with t-test<br />
P = 0.05, with Bejamini and<br />
Hochberg correction (n = 3).<br />
(c) Venn diagram and GO<br />
analysis showing overlap of<br />
genes that change from p4<br />
to p16 in Gra-IPSCs, TTFiPSCs<br />
and B-iPSCs. Red line<br />
marks functional GO cluster<br />
of genes shared between all<br />
three iPSC groups. Black<br />
line marks functional GO<br />
cluster of genes shared<br />
by at least two of the<br />
iPSC groups. Functional<br />
ontology cluster analysis was<br />
performed using the DAVIS<br />
algorithm. (d) Hierarchical<br />
unsupervised clustering<br />
using HELP genome-wide<br />
methylation profiles of<br />
B-iPSCs and TTF-iPSCs at<br />
p16. (e–g) Quantification<br />
of in vitro differentiation<br />
potentials of B-iPSCs and<br />
TTF-iPSCs at p16 into EryPs<br />
(e), macrophage colonies<br />
(f) and mixed hematopoietic<br />
colonies (g). The error bars<br />
depict the s.e.m. (n = 9).<br />
a<br />
T-iPSC2<br />
T-iPSC1<br />
T-iPSC3<br />
TTF-iPSC3<br />
TTF iPSC1<br />
TTF iPSC2<br />
Gra-iPSC3<br />
Gra-iPSC1<br />
Gra-iPSC2<br />
B-iPSC3<br />
B-iPSC1<br />
B-iPSC2<br />
c<br />
Gra-iPSC p4 vs. p16<br />
TTF-iPSC p4 vs. p16<br />
d<br />
TTF iPSC2<br />
B-iPSC3<br />
T-iPSC2<br />
Gra-iPSC1<br />
Gra-iPSC2<br />
Gra-iPSC3<br />
B-iPSC3<br />
TTF-iPSC1<br />
TTF-iPSC2<br />
B-iPSC2<br />
B-iPSC1<br />
TTF-iPSC3<br />
B-iPSC1<br />
T-iPSC1<br />
T-iPSC3<br />
685<br />
474<br />
TTF iPSC1<br />
125<br />
56<br />
508<br />
Organ development:<br />
EGLN1, EN2, AA409316,<br />
GYS1, IQGAP2, LOXL3, MGP,<br />
NDRG1, NOPE, PHF21A,<br />
BC021588, CYB5R3,<br />
SNRPD3, NM_008681,<br />
NM_030247<br />
B-iPSC p4 vs. p16<br />
B-iPSC2<br />
B-iPSC1<br />
T-iPSC1<br />
T-iPSC2<br />
TTF-iPSC3<br />
B-iPSC2<br />
Gra-iPSC2<br />
TTF-iPSC2<br />
Gra-iPSC3<br />
T-iPSC3<br />
Gra-iPSC1<br />
B-iPSC3<br />
TTF-iPSC1<br />
p4 p10 p16<br />
68 15<br />
TTF-iPSC3<br />
e<br />
EryP colonies<br />
900<br />
800<br />
700<br />
600<br />
500<br />
400<br />
300<br />
200<br />
100<br />
0<br />
g<br />
Mixed colonies<br />
1.5<br />
1<br />
0.5<br />
0<br />
b<br />
Differentially expressed probes<br />
B-iPSC<br />
B-iPSC<br />
2,500 p4<br />
2,000<br />
1,500<br />
1,000<br />
500<br />
0<br />
T-iPSC<br />
vs.<br />
B-iPSC<br />
T-iPSC<br />
vs.<br />
TTF-iPSC<br />
Functional cluster<br />
Tube morphogenesis<br />
B-iPSC<br />
vs.<br />
TTF-iPSC<br />
T-iPSC<br />
vs.<br />
Gra-iPSC<br />
Positive regulation of cellular process<br />
Morphogenesis of a branching structure<br />
Response to heat<br />
Organ development<br />
mRNA metabolic process<br />
Cellular component assembly<br />
Cartilage and skeletal development<br />
Regulation of cell cycle<br />
Tissue development<br />
Spermatogenesis<br />
TTF-iPSC<br />
TTF-iPSC<br />
f<br />
Macrophage colonies<br />
140<br />
120<br />
100<br />
80<br />
60<br />
40<br />
20<br />
0<br />
B-iPSC<br />
B-iPSC<br />
vs.<br />
Gra-iPSC<br />
Gra-iPSC<br />
vs.<br />
TTF-iPSC<br />
Enrichment score<br />
2.75<br />
2.09<br />
2.04<br />
1.98<br />
1.96<br />
1.95<br />
1.79<br />
1.75<br />
1.69<br />
1.41<br />
1.40<br />
TTF-iPSC<br />
p10<br />
p16<br />
To exclude the possibility that the observed gene expression differences<br />
were due to the specific secondary system used, we derived<br />
iPSCs from SMPs, granulocytes, B cells and peritoneal fibroblasts<br />
from reprogrammable mice 31 , which carry dox-inducible copies of<br />
all four reprogramming factors in a defined genomic locus. All iPSC<br />
lines grew independently of dox and gave rise to differentiated teratomas<br />
(Supplementary Fig. 6a). Analysis of gene expression profiles of<br />
these lines at p4 showed clustering according to their cells of origin,<br />
with the exception of peritoneal fibroblast–derived iPSCs, which<br />
may be a consequence of the heterogeneity of the starting population.<br />
Collectively, these results corroborate the notion that iPSCs<br />
generated from different cell types exhibit distinct transcriptional<br />
patterns (Supplementary Fig. 6b).<br />
iPSCs derived from different cell types exhibit distinguishable<br />
epigenetic patterns<br />
We next asked whether the differential gene expression patterns we<br />
observed correlated with differences in epigenetic marks. To this end, we<br />
performed a genome-wide, restriction enzyme–based methylation analysis<br />
of promoters termed ‘HpaII tiny fragment enrichment by ligationmediated<br />
PCR’ (HELP) on the same cell lines we used for expression<br />
analysis. Unsupervised hierarchical clustering showed that Gra-iPSCs<br />
852 volume 28 number 8 august 2010 nature biotechnology
articles<br />
Reprogramming<br />
(transgene-dependent phase)<br />
Reprogramming<br />
(transgene-independent phase)<br />
two genes. A similar pattern was observed for the<br />
granulocyte-specific genes in Gra-iPSCs compared<br />
with SMP-iPSCs, with Gr-1 and Lysozyme being<br />
elevated for H3K4me3 (Fig. 2d). These data show<br />
that the observed expression differences among<br />
iPSCs derived from different cell types may be predominantly<br />
the consequence of differences in histone<br />
marks, further suggesting that iPSCs retain an<br />
epigenetic memory of their cells of origin.<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
Cell of origin<br />
Partially reprogrammed cells<br />
• No endogenous pluripotent<br />
gene expression<br />
• No contribution to chimeras<br />
• Teratoma formation<br />
Early passage iPSC<br />
• Activation of endogenous<br />
pluripotency genes<br />
• Promoter demethylation<br />
• Teratoma formation<br />
• Chimera contribution<br />
• Transcriptionally distinguishable<br />
• Transient epigenetic memory<br />
• Altered differentiation<br />
Continuous passaging of iPSCs abrogates transcriptional,<br />
epigenetic and functional differences<br />
Previously published data suggest that early-passage, human iPSCs<br />
derived from fibroblasts are transcriptionally distinct from late-passage<br />
iPSCs 32 . However, that study did not examine the effect of passaging on<br />
the iPSC functionality. We therefore wondered whether continuous passaging<br />
of the various iPSC lines would eliminate the observed differences<br />
in gene expression and differentiation potential. For this analysis, we<br />
added to the B-iPSC/TTF-iPSC group, studied before (Figs. 1 and 2a,b), a<br />
new set of T cell– and granulocyte-derived iPSCs, which were all derived<br />
from chimera no. 2. These 12 iPSC lines were subjected to several additional<br />
rounds of passaging under identical culture conditions, and RNA<br />
was harvested at p10 and p16 for expression profiling. Whereas unsupervised<br />
hierarchical clustering of these cell lines at early passage (p4)<br />
clearly separated each of the different iPSC lines according to their cells<br />
of origin (Fig. 4a, left panel), unsupervised clustering of these lines at p10<br />
showed that B-iPSCs, TTF-iPSCs and T-iPSCs were indistinguishable<br />
from each other, whereas the Gra-iPSCs still clustered together (Fig. 4a,<br />
middle panel). Further passaging of these cells until p16 entirely eliminated<br />
these differences (Fig. 4a, right panel). Together, these data indiand<br />
SMP-iPSCs, as well as B-iPSCs and TTF-iPSCs, which clustered<br />
separately in the transcriptional assays, were also distinguishable based<br />
on their methylation patterns (Fig. 2a). Correspondence analysis of the<br />
same samples corroborated this finding (Fig. 2b), indicating that the<br />
donor cell type affects not only the overall transcriptional pattern but<br />
also the promoter methylation pattern of resultant iPSCs.<br />
Despite the separation of Gra-iPSCs from SMP-iPSCs and of<br />
TTF-iPSCs from B-iPSCs (Fig. 2a,b) by hierarchical clustering, we<br />
detected few loci that were differentially methylated with statistical<br />
significance using supervised analysis (69 genes between GraiPSCs<br />
and SMP-iPSCs and 0 genes between B-iPSCs and TTF-iPSCs;<br />
Supplementary Table 3). To complement these results, we interrogated<br />
the DNA methylation status at the promoter regions of the<br />
previously analyzed markers Cxcr4, Itgb1, Lysozyme and Gr-1 (Fig.<br />
1b) using EpiTYPER DNA methylation analysis, which quantifies<br />
gene-specific CpG methylation. We failed to detect differences in the<br />
methylation levels of these candidate genes between SMP-iPSCs and<br />
Gra-iPSCs (Fig. 2c), further indicating that methylation differences<br />
are more subtle than the observed gene expression differences and<br />
raising the possibility that other chromatin marks may be responsible<br />
for the observed expression differences.<br />
Indeed, we observed high levels of the activating marks H3Ac and<br />
H3K4me3 and low levels of the repressive marks H3K27me3 at the promoters<br />
of Cxcr4 and Itgb1 in SMPs and at the promoters of Lysozyme and<br />
Gr-1 in granulocytes, respectively, consistent with their abundant expression<br />
in these cell types (Fig. 2d). Notably, SMP-iPSCs, which showed<br />
higher expression levels of Cxcr4 and Itgb1 than did Gra-iPSCs (Fig.<br />
1b), were enriched for H3K4me3 compared with Gra-iPSCs at these<br />
iPSCs derived from different cell types have<br />
distinctive in vitro differentiation potentials<br />
Because the gene expression differences we observed<br />
among different iPSC lines affected genes known to<br />
be involved in the lineage-specific differentiation<br />
and function of the somatic cell types from which<br />
they were derived, we reasoned that these differences<br />
might affect their capacity to differentiate<br />
into defined cell lineages. Thus, we evaluated the<br />
autonomous differentiation potential of the four<br />
types of iPSC lines by assessing their abilities to<br />
produce embryoid bodies, erythrocyte progenitors,<br />
macrophages and mixed hematopoietic colonies<br />
using established semiquantitative differentiation<br />
protocols (Fig. 3a). Most notably, TTF-iPSCs produced<br />
significantly smaller and fewer embryoid<br />
bodies compared with all the other iPSC lines (P<br />
< 0.001; Fig. 3b,c). Moreover, the embryoid bodies<br />
derived from TTF-iPSC generated relatively<br />
few erythrocyte, macrophage and mixed colony<br />
progenitors compared with B-iPSCs derived from<br />
the same animal despite equal numbers of input<br />
cells, indicating striking differences in the differentiation<br />
potentials of these iPSCs (Fig. 3d–g). In contrast, SMP-iPSCs<br />
and Gra-iPSCs showed equivalent abilities to produce embryoid bodies<br />
(Fig. 3d–g). However, Gra-iPSCs gave rise to erythrocyte, macrophages<br />
and mixed colonies at higher efficiencies than SMP-iPSCs, suggesting<br />
a pattern of differentiation that reflects their cells of origin. Together,<br />
these data show that the cell type of origin may bias the differentiation<br />
potential of resultant iPSC lines.<br />
Late passage iPSC<br />
• Activation of endogenous<br />
pluripotency genes<br />
• Promoter demethylation<br />
• Teratoma formation<br />
• Chimera contribution<br />
Figure 5 Model summarizing the presented data. iPSCs derived from different somatic cell<br />
types retain a transient epigenetic and transcriptional memory of their cell type of origin at early<br />
passage, despite acquiring pluripotent gene expression, transgene-independent growth and the<br />
ability to contribute to tissues in chimeras. Continuous passaging resolves these differences,<br />
giving rise to iPSCs that are molecularly and functionally indistinguishable. Note the difference<br />
between early passage iPSCs and partially reprogrammed cells, which require continuous<br />
viral transgene expression and fail to activate endogenous pluripotency genes or support the<br />
development of viable mice.<br />
nature biotechnology volume 28 number 8 august 2010 853
articles<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
cate that continuous cell division resolves transcriptional differences<br />
among iPSC lines. Consistent with this observation, the total number<br />
of differentially expressed genes between various pairs of iPSC lines<br />
derived from different cellular origins was reduced from ~500–2,000<br />
in early-passage cultures to only ~50 or even 0 in late-passage cultures,<br />
further demonstrating that after extensive in vitro propagation, these<br />
iPSC lines have become very similar to each other (Fig. 4b).<br />
Analysis of the genes whose expression changed between p4 and p16<br />
in Gra-iPSCs, B-iPSCs and TTF-iPSCs showed 25% overlap with at least<br />
one of the other two groups of iPSC lines, suggesting that iPSCs undergo<br />
some common changes during passaging, irrespective of their cell of origin<br />
(Fig. 4c). GO analysis of these changes indicated a strong enrichment<br />
for developmental regulators. Moreover, the only GO cluster common to<br />
all three groups was ‘organ development’, indicating that the passaging of<br />
iPSCs results in a change of differentiation-associated gene expression<br />
patterns (Fig. 4c). The expression levels of the pluripotency genes Sox2<br />
and Oct4, which are high already at early passage (Supplementary Table<br />
1), increased even further during the passaging process, supporting the<br />
notion that the pluripotency network becomes increasingly solidified<br />
during culture (Supplementary Fig. 7), consistent with a previous report<br />
showing gradual upregulation of pluripotency-associated genes upon<br />
passaging of human iPSC lines 32 .<br />
To evaluate whether the passaging of iPSCs attenuates the observed<br />
epigenetic differences, we performed HELP analysis on B-iPSCs and<br />
TTF-iPSCs at late passage. In contrast to early-passage iPSCs, the latepassage<br />
iPSCs could not be separated by hierarchical unsupervised<br />
clustering analysis based on their cells of origin (Fig. 4d). Accordingly,<br />
the methylation levels of histones at candidate genes in Gra-iPSCs and<br />
SMP-iPSCs became indistinguishable (Supplementary Fig. 8). Notably,<br />
several of the analyzed loci showed an enrichment for both H3K4me3<br />
and H3K27me3, indicative of bivalent domains that are characteristic of<br />
pluripotent stem cells 33 . Thus, continuous passaging leads to an equilibration<br />
of the epigenetic differences detected in early-passage iPSCs.<br />
Two possible mechanisms could account for the observed loss of<br />
epigenetic and transcriptional memory with increased passage number:<br />
(i) passive replication-dependent loss of somatic marks in the majority<br />
of iPSCs and (ii) selection of rare, preexisting, fully reprogrammed cells<br />
over time. Because the selection model predicts that such rare clones<br />
would have a growth or survival advantage, we would expect to see<br />
impaired growth rates of bulk iPSC cultures at early passage compared<br />
with late passage, which we did not observe (Supplementary Fig. 9a).<br />
We also did not detect significant differences when the growth rates of<br />
single-cell clones established from early and late passage iPSC lines were<br />
examined using a colorimetric assay (XTT assay) that detects metabolic<br />
activity (Supplementary Fig. 10) or by measuring the increase<br />
in cell numbers on three consecutive days (Supplementary Figs. 11<br />
and 12). Similarly, an analysis of the colony formation efficiency of<br />
single cell-sorted iPSC from early- and late-passage cultures did not<br />
yield detectable differences (Supplementary Fig. 9b). Collectively, these<br />
data argue against the presence of rare subclones that become selected<br />
over time and are consistent with the notion that all iPSC lines gradually<br />
resolve transcriptional and epigenetic differences with increased<br />
passaging. However, our results do not exclude a combined model<br />
involving passive resolution of epigenetic marks as well as selection<br />
of multiple clones.<br />
Finally, we asked whether the similar transcriptional and epigenetic<br />
patterns of late-passage iPSCs derived from distinct cells of origin would<br />
translate into an equalization of their differentiation potentials. We first<br />
performed an embryoid-body formation assay at different passages for<br />
TTF-iPSCs and B-iPSCs, which showed a strong difference at early passage.<br />
TTF-iPSCs gave rise to similarly-sized embryoid bodies as B-iPSCs<br />
around p10–p12 (Supplementary Fig. 13a,b) and were indistinguishable<br />
at p16 (Supplementary Fig. 13c,d). Moreover, embryoid bodies derived<br />
from TTF-iPSCs and B-iPSCs at p16 differentiated into similar numbers<br />
of erythrocyte (Fig. 4e), macrophage (Fig. 4f) and mixed-colony progenitors<br />
(Fig. 4g), thus proving that extensive cellular passaging eliminates<br />
differences in the differentiation potentials of these iPSCs.<br />
DISCUSSION<br />
Our study shows that genetically matched iPSCs retain a transient transcriptional<br />
and epigenetic memory of their cell of origin at early passage,<br />
which can substantially affect their potential to differentiate into<br />
embryoid bodies and different hematopoietic cell types (Fig. 5). These<br />
molecular and functional differences are lost upon continuous passaging,<br />
however, indicating that complete reprogramming is a gradual<br />
process that continues beyond the acquisition of a bona fide iPSC state<br />
as measured by the activation of endogenous pluripotency genes, viral<br />
transgene–independent growth and the ability to differentiate into<br />
cell types of all three germ layers. Notably, the previously seen silencing<br />
of the Dlk1-Dio3 locus in many iPSC lines 22 is not affected by the<br />
passaging of cells (data not shown). Of note, the early-passage iPSCs<br />
described here are different from “partially reprogrammed iPSCs” 34,35 ,<br />
which depend on the continuous expression of viral transgenes and do<br />
not activate and demethylate pluripotency genes or contribute to the<br />
formation of viable chimeras (Fig. 5).<br />
The mechanism by which passaging eliminates the molecular and<br />
functional differences between iPSCs of different origins remains to<br />
be determined. Three key observations argue against the possibility<br />
of selective expansion of a rare subset of completely reprogrammed<br />
iPSCs: (i) both early- and late-passage iPSCs had similar proliferation<br />
rates; (ii) there was little variability in the growth rate of single-cell<br />
iPSC clones from early- and late-passage lines; and (iii) the number<br />
of passages required to resolve cell-of-origin differences was dependent<br />
upon the starting cell type. These observations suggest that the<br />
consolidation of the pluripotent transcriptional network upon passaging<br />
is a slow process, potentially facilitated by a positive feedback<br />
mechanism that gradually resolves the residual cell-of-origin–specific<br />
epigenetic marks and transcriptional patterns. In accordance with this<br />
idea is the finding that telomeres become gradually elongated with<br />
increased passage number of iPSCs 36 . Our results are also consistent<br />
with the previous observation that cloned embryos often retain donor<br />
cell–specific transcriptional patterns and do not efficiently activate<br />
embryonic genes over many cell divisions 37–40 , suggesting possible<br />
similarities in the mechanisms of reprogramming by nuclear transfer<br />
and induced pluripotency.<br />
Because of the lack of ESC lines genetically matched to the secondary<br />
iPSC lines used here, we did not include ESC lines in our<br />
comparative analysis. Nevertheless, the present results may help to<br />
explain some of the previously reported differences between ESCs<br />
and iPSCs 41,42 . Some of these studies compared late-passage ESC lines<br />
with iPSC lines of undefined, but presumably earlier, passage that may<br />
not yet have reached an ESC-equivalent ground state. It should be<br />
informative to revisit these studies with genetically matched, transgene-free<br />
late-passage iPSCs to determine whether this abrogates such<br />
gene expression and differentiation differences.<br />
The observed tendency of early-passage iPSC lines to differentiate preferentially<br />
into the cell lineage of origin could potentially be exploited<br />
in clinical settings to produce certain somatic cell types that have been<br />
difficult to obtain from ESCs thus far. However, these data also serve as a<br />
cautionary note for ongoing attempts to recapitulate disease phenotypes<br />
in vitro using patient-specific, early-passage iPSC lines, as the epigenetic,<br />
transcriptional and functional ‘immaturity’ of these cells might confound<br />
854 volume 28 number 8 august 2010 nature biotechnology
articles<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
the data obtained from them. Further elucidation of the molecular indicators<br />
of fully reprogrammed iPSCs should help in the establishment of<br />
standardized iPSC lines that can be compared with confidence in basic<br />
biological and drug discovery studies.<br />
METHODS<br />
Methods and any associated references are available in the online version<br />
of the paper at http://www.nature.com/naturebiotechnology/.<br />
Accession code. GEO: GSE22043, GSE22827, GSE22908.<br />
Note: Supplementary information is available on the <strong>Nature</strong> Biotechnology website.<br />
ACKNOWLEDGMENTS<br />
We thank N. Maherali and R. Walsh for helpful suggestions and critical reading of<br />
the manuscript, B. Wittner for statistical advice, J. LaVecchio, G. Buruzula,<br />
K. Folz-Donahue and L. Prickett for expert cell sorting and K. Coser for technical<br />
assistance. J.M.P. was supported by an MGH ECOR fellowship, E.A. by a Jane<br />
Coffin Childs fellowship, M.S. by a Schering fellowship and K.Y.T. by the Agency<br />
of Science, Technology and Research Singapore. Support to A.M. was from the<br />
Lymphoma Society, SCOR no. 7132-08; to T.E. from National Institutes of Health<br />
(NIH) grant HL056182 and NYSTEM; to A.J.W. in part from the Burroughs<br />
Wellcome Fund, Harvard Stem Cell Institute, Peabody Foundation, and NIH 1<br />
DP2 OD004345-01, and the Joslin Diabetes Center DERC (P30DK036836); to<br />
K.H. from Howard Hughes Medical Institute, the NIH Director’s Innovator Award<br />
and the Harvard Stem Cell Institute. The content is solely the responsibility of the<br />
authors and does not necessarily represent the official views of the NIH.<br />
AUTHOR CONTRIBUTIONS<br />
J.M.P. and K.H. conceived the study, interpreted results and wrote the manuscript;<br />
J.M.P. performed most of the experiments with help from W.K.; S.L. and T.E.<br />
performed and interpreted in vitro differentiation assays; M.E.F and A.M.<br />
performed and analyzed HELP methylation experiments; K.Y.T. and A.J.W. isolated<br />
SMPs and derived most SMP-iPSCs; T.S. and S.N. performed expression arrays;<br />
and S.E., E.A. and M.S. provided essential study material. All authors gave critical<br />
input to the manuscript draft.<br />
COMPETING FINANCIAL INTERESTS<br />
The authors declare competing financial interests: details accompany the full-text<br />
HTML version of the paper at http://www.nature.com/naturebiotechnology/.<br />
Published online at http://www.nature.com/naturebiotechnology/.<br />
Reprints and permissions information is available online at http://npg.nature.com/<br />
reprintsandpermissions/.<br />
Note added in proof: We thank George Daley for sharing unpublished results,<br />
which show similar differences in DNA methylation patterns and differentiation<br />
propensity of iPSCs derived from distinctive cell types. Of note, this report 43 also<br />
suggests that somatic cell nuclear transfer more faithfully reprograms cells to a<br />
pluripotent state than transcription factor overexpression.<br />
1. Aoi, T. et al. Generation of pluripotent stem cells from adult mouse liver and stomach<br />
cells. Science 321, 699–702 (2008).<br />
2. Eminli, S. et al. Differentiation stage determines potential of hematopoietic cells for<br />
reprogramming into induced pluripotent stem cells. Nat. Genet. 41, 968–976 (2009).<br />
3. Eminli, S., Utikal, J., Arnold, K., Jaenisch, R. & Hochedlinger, K. Reprogramming of<br />
neural progenitor cells into induced pluripotent stem cells in the absence of exogenous<br />
Sox2 expression. Stem Cells 26, 2467–2474 (2008).<br />
4. Hanna, J. et al. Direct reprogramming of terminally differentiated mature B lymphocytes<br />
to pluripotency. Cell 133, 250–264 (2008).<br />
5. Lowry, W.E. et al. Generation of human induced pluripotent stem cells from dermal<br />
fibroblasts. Proc. Natl. Acad. Sci. USA 105, 2883–2888 (2008).<br />
6. Park, I.H. et al. Reprogramming of human somatic cells to pluripotency with defined<br />
factors. <strong>Nature</strong> 451, 141–146 (2008).<br />
7. Stadtfeld, M., Brennand, K. & Hochedlinger, K. Reprogramming of pancreatic beta<br />
cells into induced pluripotent stem cells. Curr. Biol. 18, 890–894 (2008).<br />
8. Takahashi, K. et al. Induction of pluripotent stem cells from adult human fibroblasts<br />
by defined factors. Cell 131, 861–872 (2007).<br />
9. Takahashi, K. & Yamanaka, S. Induction of pluripotent stem cells from mouse embryonic<br />
and adult fibroblast cultures by defined factors. Cell 126, 663–676 (2006).<br />
10. Yu, J. et al. Induced pluripotent stem cell lines derived from human somatic cells.<br />
Science 318, 1917–1920 (2007).<br />
11. Loh, Y.H. et al. Generation of induced pluripotent stem cells from human blood. Blood<br />
113, 5476–5479 (2009).<br />
12. Aasen, T. et al. Efficient and rapid generation of induced pluripotent stem cells from<br />
human keratinocytes. Nat. Biotechnol. 26, 1276–1284 (2008).<br />
13. Maherali, N. et al. A high-efficiency system for the generation and study of human<br />
induced pluripotent stem cells. Cell Stem Cell 3, 340–345 (2008).<br />
14. Utikal, J., Maherali, N., Kulalert, W. & Hochedlinger, K. Sox2 is dispensable for the<br />
reprogramming of melanocytes and melanoma cells into induced pluripotent stem<br />
cells. J. Cell Sci. 122, 3502–3510 (2009).<br />
15. Kim, J.B. et al. Pluripotent stem cells induced from adult neural stem cells by reprogramming<br />
with two factors. <strong>Nature</strong> 454, 646–650 (2008).<br />
16. Shi, Y. et al. A combined chemical and genetic approach for the generation of induced<br />
pluripotent stem cells. Cell Stem Cell 2, 525–528 (2008).<br />
17. Silva, J. et al. Promotion of reprogramming to ground state pluripotency by signal<br />
inhibition. PLoS Biol. 6, e253 (2008).<br />
18. Miura, K. et al. Variation in the safety of induced pluripotent stem cell lines. Nat.<br />
Biotechnol. 27, 743–745 (2009).<br />
19. Ghosh, Z. et al. Persistent donor cell gene expression among human induced pluripotent<br />
stem cells contributes to differences with human embryonic stem cells. PLoS<br />
One 5, e8975 (2010).<br />
20. Soldner, F. et al. Parkinson’s disease patient-derived induced pluripotent stem cells<br />
free of viral reprogramming factors. Cell 136, 964–977 (2009).<br />
21. Okita, K., Ichisaka, T. & Yamanaka, S. Generation of germline-competent induced<br />
pluripotent stem cells. <strong>Nature</strong> 448, 313–317 (2007).<br />
22. Stadtfeld, M. et al. Aberrant silencing of imprinted genes on chromosome 12qF1 in<br />
mouse induced pluripotent stem cells. <strong>Nature</strong> 465, 175–181 (2010).<br />
23. Dimos, J.T. et al. Induced pluripotent stem cells generated from patients with ALS<br />
can be differentiated into motor neurons. Science 321, 1218–1221 (2008).<br />
24. Ebert, A.D. et al. Induced pluripotent stem cells from a spinal muscular atrophy<br />
patient. <strong>Nature</strong> 457, 277–280 (2009).<br />
25. Park, I.H. et al. Disease-specific induced pluripotent stem cells. Cell 134, 877–886<br />
(2008).<br />
26. Saha, K. & Jaenisch, R. Technical challenges in using human induced pluripotent<br />
stem cells to model disease. Cell Stem Cell 5, 584–595 (2009).<br />
27. Lee, G. et al. Modelling pathogenesis and treatment of familial dysautonomia using<br />
patient-specific iPSCs. <strong>Nature</strong> 461, 402–406 (2009).<br />
28. Wernig, M. et al. A drug-inducible transgenic system for direct reprogramming of<br />
multiple somatic cell types. Nat. Biotechnol. 26, 916–924 (2008).<br />
29. Stadtfeld, M., Maherali, N., Breault, D.T. & Hochedlinger, K. Defining molecular<br />
cornerstones during fibroblast to iPS cell reprogramming in mouse. Cell Stem Cell 2,<br />
230–240 (2008).<br />
30. Cerletti, M. et al. Highly efficient, functional engraftment of skeletal muscle stem<br />
cells in dystrophic muscles. Cell 134, 37–47 (2008).<br />
31. Stadtfeld, M., Maherali, N., Borkent, M. & Hochedlinger, K. A reprogrammable mouse<br />
strain from gene-targeted embryonic stem cells. Nat. Methods 7, 53–55 (2010).<br />
32. Chin, M.H. et al. Induced pluripotent stem cells and embryonic stem cells are distinguished<br />
by gene expression signatures. Cell Stem Cell 5, 111–123 (2009).<br />
33. Bernstein, B.E. et al. A bivalent chromatin structure marks key developmental genes<br />
in embryonic stem cells. Cell 125, 315–326 (2006).<br />
34. Mikkelsen, T.S. et al. Dissecting direct reprogramming through integrative genomic<br />
analysis. <strong>Nature</strong> 454, 49–55 (2008).<br />
35. Sridharan, R. et al. Role of the murine reprogramming factors in the induction of<br />
pluripotency. Cell 136, 364–377 (2009).<br />
36. Marion, R.M. et al. Telomeres acquire embryonic stem cell characteristics in induced<br />
pluripotent stem cells. Cell Stem Cell 4, 141–154 (2009).<br />
37. Boiani, M., Eckardt, S., Scholer, H.R. & McLaughlin, K.J. Oct4 distribution and<br />
level in mouse clones: consequences for pluripotency. Genes Dev. 16, 1209–1219<br />
(2002).<br />
38. Bortvin, A. et al. Incomplete reactivation of Oct4-related genes in mouse embryos<br />
cloned from somatic nuclei. Development 130, 1673–1680 (2003).<br />
39. Ng, R.K. & Gurdon, J.B. Epigenetic memory of active gene transcription is inherited<br />
through somatic cell nuclear transfer. Proc. Natl. Acad. Sci. USA 102, 1957–1962<br />
(2005).<br />
40. Ng, R.K. & Gurdon, J.B. Epigenetic memory of an active gene state depends on histone<br />
H3.3 incorporation into chromatin in the absence of transcription. Nat. Cell Biol. 10,<br />
102–109 (2008).<br />
41. Feng, Q. et al. Hemangioblastic derivatives from human induced pluripotent stem<br />
cells exhibit limited expansion and early senescence. Stem Cells 28, 704–712<br />
(2010).<br />
42. Hu, B.Y. et al. Neural differentiation of human induced pluripotent stem cells follows<br />
developmental principles but with variable potency. Proc. Natl. Acad. Sci. USA 107,<br />
4335–4340 (2010).<br />
43. Kim, K. et al. Epigenetic memory in induced pluripotent stem cells. <strong>Nature</strong><br />
doi:10.1038/nature09342 (19 July 2010).<br />
nature biotechnology volume 28 number 8 august 2010 855
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
ONLINE METHODS<br />
Generation of iPSC lines. iPSC lines were generated as described previously<br />
2 . Briefly, iPSC-derived somatic cells were isolated from chimeras<br />
by fluorescence-activated cell sorting (FACS), plated on feeders in the<br />
presence of cytokines in ESC culture conditions. Resultant iPSC colonies<br />
were picked and expanded in the absence of doxycycline and used for<br />
subsequent analyses.<br />
SMP isolation. Myofiber-associated cells were prepared from intact<br />
limb muscles (extensor digitorum longus, gastrocnemius, quadriceps,<br />
soleus, traverus abdominis and triceps brachii) as described<br />
previously 44,45 . Briefly, intact mouse limb muscles were digested<br />
with collagenase II to dissociate individual myofibers. These were<br />
triturated and digested with collagenase II and dispase to release<br />
myofiber-associated cells. The myofiber-associated cells were next<br />
unfractionated by FACS, using the following marker profiles for each<br />
population: (i) SMPs: CD45 − Sca-1 − Mac-1 − CXCR4 + β1-integrin + ; (ii)<br />
Myoblast-containing population: CD45 − Sca-1 − Mac-1 − CXCR4 − ; (iii)<br />
Sca1 + mesenchymal cells: D45 − Sca-1 + Mac-1 − . After the initial sort, cells<br />
were resorted by FACS using the same gating profile to increase the<br />
purity of the obtained population 46 .<br />
Blastocyst injections. For blastocyst injections, female BDF1 mice were<br />
superovulated by intraperitoneal injection of PMS and hCG and mated<br />
to BDF1 stud males. Zygotes were isolated from females with a vaginal<br />
plug 24 h after hCG injection. Zygotes for 2n injections were cultured<br />
for 3 d in vitro in KSOM media, blastocysts were identified, injected with<br />
ESCs or iPSCs and transferred into pseudopregnant recipient females.<br />
Teratoma formation. iPSCs were harvested by trypsinization, preplated<br />
onto untreated culture plates to remove feeders as well as differentiating<br />
cells and injected into flanks of nonobese diabetic/severe combined<br />
immunodeficient NOD/SCID mice, using ~5 million cells per injection.<br />
The mice were euthanized 3–5 weeks after injection, teratomas dissected<br />
out and processed for histological analysis.<br />
Cellular growth assays. To measure the clonal growth potential of iPSCs,<br />
SSEA1-positive cells from the different iPSC lines were sorted into<br />
96-well plates by FACS (BD). After 7 d, the presence of iPSC colonies<br />
was scored based on morphology. To establish growth rates, the different<br />
bulk iPSCs lines or derivative subclones were plated in six gelatinized<br />
wells of a 12-well plates and each day the number of cells was counted<br />
in duplicate using a Countess cell counter (Invitrogen). For colorimetric<br />
measurement of growth, iPSCs lines were subcloned into 96-well plates<br />
and after 7 d, the cells were exposed to XTT (TOX-2) (Sigma) reagent<br />
overnight and the absorbance at 450 nm measured with a multiwell plate<br />
reader (Molecular Devices).<br />
Cell culture. ESCs and iPSCs were cultured in ESC medium (DMEM<br />
with 15% FBS, l-glutamin, penicillin-streptomycin, nonessential amino<br />
acids, β-mercaptoethanol and 1,000 U/ml leukemia inhibitor factor) on<br />
irradiated feeder cells. TTF cultures were established by trypsin digestion<br />
of tail-tip biopsies taken from newborn (3–8 d of age) chimeric mice<br />
produced by blastocyst injection of iPSCs.<br />
RNA isolation. ESCs and iPSCs grown on 35-mm dishes were harvested<br />
when they reached about 50% confluency and preplated on<br />
nongelatinized T25 flasks for 45 min to remove feeder cells. Cells<br />
were spun down and the pellet used for isolation of total RNA using<br />
the miRNeasy Mini Kit (Qiagen) without DNase digestion. RNA was<br />
eluted from the columns using 50 ml RNAse-free water or TE buffer,<br />
pH7.5 (10 mM Tris-HCl and 0.1 mM EDTA) and quantified using a<br />
Nanodrop (Nanodrop Technologies).<br />
Quantitative PCR. cDNA was produced with the First Strand cDNA<br />
Synthesis Kit (Roche) using 1 mg of total RNA input. Real-time quantitative<br />
PCR reactions were set up in triplicate using 5 ml of cDNA<br />
(1:100 dilution) with the Brilliant II SYBR Green QPCR Master Mix<br />
(Stratagene) and run on a Mx3000P QPCR System (Stratagene). Primer<br />
sequences are listed in Supplementary Table 4.<br />
mRNA profiling. Total RNA samples (RIN (RNA integrity number) > 9)<br />
were subjected to transcriptomal analyses using Affymetrix HTMG- 430A<br />
mRNA expression microarray as previously described.<br />
Statistical analyses. Hierarchical clustering was performed using the<br />
GeneSifter software (Geospiza). Correlation distance and subsequent clustering<br />
were done using Ward’s method. The differentially expressed genes<br />
(twofold) were calculated using a t-test (P = 0.05) with Benjamini and<br />
Hochberg correction. Principal component analysis was performed using<br />
the GeneSifter software. Gene ontology analysis was performed using the<br />
DAVID software 47 , with the classification stringency set to ‘high’.<br />
Embryoid body formation. Before plating embryoid bodies, the iPSCs<br />
were depleted of mouse embryonic fibroblasts by splitting the cells 1:3<br />
onto gelatin-coated plates on each day, for 2 consecutive days. On the 3rd<br />
day (designated day 0), iPSCs were trypsinized and plated at a density of<br />
5,000 cells/ml in Isocove’s Modified Dulbecco’s Medium (IMDM) with<br />
15% FCS (Atlanta Biologicals), 10% protein-free hybridoma medium<br />
(PFHM-II; Gibco), 2 mM l-glutamine (Gibco), 200 µg/ml transferrin<br />
(Roche), 0.5 mM ascorbic acid (Sigma) and 4.5 × 10–4 M monothioglycerol<br />
(MTG; Sigma). Differentiation was carried out in 60-mm ethylene<br />
oxide–treated Petri grade dishes (Parter Medical). The embryoid bodies<br />
were left to differentiate until day 6, when the cells were harvested to<br />
assay for hematopoietic colonies.<br />
Hematopoietic colony formation assays. Day 6 embryoid bodies were<br />
collected by gravity, dissociated with trypsin and then passed several<br />
times through a 20 gauge needle to ensure dissociation. For the growth<br />
of hematopoietic progenitors, the cells were then seeded at a density<br />
of 100,000 cells/ml in IMDM containing 1% methylcellulose (Fluka<br />
Biochemika), 15% plasma-derived serum (PDS; Animal Technologies),<br />
5% PFHM-II and specific cytokines as follows: primitive erythrocytes<br />
(erythropoietin (EPO, 2 U/ml)); macrophages (IL-3 (10ng/ml), M-CSF<br />
(5 ng/ml)); megakaryocytes (IL-3 (10 ng/ml), IL-11 (5 ng/ml), thrombopoietin<br />
(TPO, 5 ng/ml)); mixed colonies (SCF (5ng/ml), IL-3 (10 ng/<br />
ml), G-CSF (30 ng/ml), GM-CSF (10 ng/ml), IL-11 (5 ng/ml), IL-6 (5 ng/<br />
ml), TPO (5 ng/ml), and M-CSF (5 ng/ml)). All cytokines were purchased<br />
from R&D Systems. Primitive erythroctye colonies (eryPs) were counted<br />
on day 10 (4 d after embryoid body harvest). Macrophage colonies were<br />
counted on day 13 (7 d after embryoid body harvest). Mixed colonies<br />
were counted on day 14 (8 d after embryoid body harvest) and consist of<br />
a layer of macrophages, a layer of granulocytes, and a central core of red<br />
erythroid cells. Statistical analysis was performed using the Krward software.<br />
P values were calculated using the nonparametric Wilkinson test.<br />
HELP DNA methylation analysis. High molecular weight DNA<br />
was isolated from iPSCs using the PureGene kit from Qiagen and<br />
the HELP (HpaII tiny fragment enrichment by ligation-mediated<br />
PCR) assay was carried out as previously described 1,2 . Briefly, 1 µg<br />
of genomic DNA was digested overnight with either HpaII or MspI<br />
(New England Biolabs). On the following day, the reactions were<br />
nature biotechnology <br />
doi:10.1038/nbt.1667
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
extracted once with phenol-chloroform and resuspended in 11 µl of<br />
10 mM Tris-HCl pH 8.0 and the digested DNA was used to set up an<br />
overnight ligation of the JHpaII adaptor using T4 DNA ligase. The<br />
adaptor-ligated DNA was used to carry out the PCR amplification<br />
of the HpaII- and MspI-digested DNA as previously described 48 .<br />
All samples for microarray hybridization were processed at the<br />
Roche-NimbleGen Service Laboratory. Samples were labeled using<br />
Cy-labeled random primers (9 mers) and then hybridized onto a<br />
mouse custom-designed oligonucleotide array (50-mers) covering<br />
25,720 HpaII amplifiable fragments (HAF) (>50,000 CpGs), annotated<br />
to 15,465 unique gene symbols (Roche NimbleGen, Design<br />
name: 2006-10-26_MM5_HELP_Promoter Design ID = 4803).<br />
HpaII-amplifiable fragments are defined as genomic sequences contained<br />
between two flanking HpaII sites found within 200–2,000 bp<br />
from each other and is represented on the array by 15 individual<br />
probes, randomly distributed across the microarray slide. HAF were<br />
first realigned to the MM9 July 2007 build of the mouse genome and<br />
then annotated to the nearest transcription start site (TSS), allowing<br />
for a maximum distance of 5 kb from the TSS. Scanning was<br />
performed using a GenePix 4000B scanner (Axon Instruments) as<br />
previously described 49 . Quality control and data analysis of HELP<br />
microarrays was performed as described 50 .<br />
Signal intensities at each HpaII-amplifiable fragment were calculated<br />
as a robust (25% trimmed) mean of their component probe-level signal<br />
intensities. Any fragments found within the level of background MspI<br />
signal intensity, measured as 2.5 mean-absolute-differences (MAD) above<br />
the median of random probe signals, were categorized as ‘failed’. These<br />
failed loci therefore represent the population of fragments that did not<br />
amplify by PCR, whatever the biological (e.g., genomic deletions and<br />
other sequence errors) or experimental cause. On the other hand, ‘methylated’<br />
loci were so designated when the level of HpaII signal intensity<br />
was similarly indistinguishable from background. PCR-amplifying fragments<br />
(those not flagged as either methylated or failed) were normalized<br />
using an intra-array quantile approach wherein HpaII/MspI ratios are<br />
aligned across density-dependent sliding windows of fragment size–sorted<br />
data. DNA methylation was therefore measured as the log 2 (HpaII/MspI)<br />
ratio, where HpaII reflects the hypomethylated fraction of the genome<br />
and MspI represents the whole genome reference. Analysis of normalized<br />
data revealed the presence of a bimodal distribution. For each sample,<br />
a cutoff was selected at the point that more clearly separated these two<br />
populations and the data were centered around this point. Each fragment<br />
was then categorized as either methylated, if the centered log HpaII/MspI<br />
ratio < 0, or hypomethylated if on the other hand the log ratio > 0.<br />
HELP data analysis. Statistical analysis was performed using R 2.9 and<br />
BioConductor 51 . Unsupervised hierarchical clustering of HELP data was<br />
performed using the subset of probe sets (n = 3745) with s.d. > 1 across<br />
all cases. We used 1– Pearson correlation distance, followed by a Lingoes<br />
transformation of the distance matrix to a Euclidean one and subsequent<br />
clustering using Ward’s method. Correspondence analysis was performed<br />
using the BioConductor package MADE4. The top 100 genes whose<br />
methylation status varied the most across the different groups were identified<br />
as those with the greatest s.d. across all samples.<br />
Quantitative DNA methylation analysis by MassARRAY EpiTyping.<br />
Validation of HELP findings was performed by matrix-assisted laser<br />
desorption ionization/time-of-flight (MALDI-TOF) mass spectrometry<br />
using EpiTyper by MassARRAY (Sequenom) on bisulfite-converted<br />
DNA following manufacturer’s instructions 52 but using the Fast Start<br />
High Fidelity Taq polymerase from Roche for the PCR amplification<br />
of the bisulfite-converted DNA. MassArray primers were designed to<br />
cover the promoter regions of the indicated genes. (Primer sequences<br />
available as Supplementary Table 5).<br />
Chromatin immunoprecipitation (ChIP). Cells were fixed in 1%<br />
formaldehyde for 10 min, quenched with glycine and washed three<br />
times with PBS. Cells were then resuspended in lysis buffer and<br />
sonicated 10 × 30 s in a Bioruptor (Diagenode) to shear the chromatin<br />
to an average length of 600 bp. Supernatants were precleared<br />
using protein-A agarose beads (Roche) and 10% input was collected.<br />
Immunoprecipitations were performed using polyclonal antibodies<br />
to H3K4trimethylated, H3K27trimethylated, H3 pan-acetylation and<br />
normal rabbit serum (Upstate). DNA-protein complexes were pulled<br />
down using protein-A agarose beads and washed. DNA was recovered<br />
by overnight incubation at 65 °C to reverse cross-links and purified<br />
using QIAquick PCR purification columns (Qiagen). Enrichment of<br />
the modified histones in different genes was detected by quantitative<br />
real-time PCR using the primers in the Supplementary Table 4.<br />
44. Conboy, I.M., Conboy, M.J., Smythe, G.M. & Rando, T.A. Notch-mediated restoration<br />
of regenerative potential to aged muscle. Science 302, 1575–1577 (2003).<br />
45. sherwood, R.I. et al. Isolation of adult mouse myogenic progenitors: functional heterogeneity<br />
of cells within and engrafting skeletal muscle. Cell 119, 543–554 (2004).<br />
46. cheshier, S.H., Morrison, S.J., Liao, X. & Weissman, I.L. In vivo proliferation and cell<br />
cycle kinetics of long-term self-renewing hematopoietic stem cells. Proc. Natl. Acad.<br />
Sci. USA 96, 3120–3125 (1999).<br />
47. Huang, D.W. et al. Systematic and integrative analysis of large gene lists using DAVID<br />
Bioinformatics Resources. Nat. Protoc. 4, 44–57 (2009).<br />
48. Figueroa, M.E., Melnick, A. & Greally, J.M. Genome-wide determination of DNA methylation<br />
by Hpa II tiny fragment enrichment by ligation-mediated PCR (HELP) for the<br />
study of acute leukemias. Methods Mol. Biol. 538, 395–407 (2009).<br />
49. selzer, R.R. et al. Analysis of chromosome breakpoints in neuroblastoma at sub-kilobase<br />
resolution using fine-tiling oligonucleotide array CGH. Genes Chromosom. Cancer<br />
44, 305–319 (2005).<br />
50. Thompson, R.F. et al. An analytical pipeline for genomic representations used for<br />
cytosine methylation studies. Bioinformatics 24, 1161–1167 (2008).<br />
51. Culhane, A.C., Thioulouse, J., Perriere, G. & Higgins, D.G. MADE4: an R package<br />
for multivariate analysis of gene expression data. Bioinformatics 21, 2789–2790<br />
(2005).<br />
52. Ehrich, M. et al. Quantitative high-throughput analysis of DNA methylation patterns<br />
by base-specific cleavage and mass spectrometry. Proc. Natl. Acad. Sci. USA 102,<br />
15785–15790 (2005).<br />
doi:10.1038/nbt.1667<br />
nature biotechnology
A rt i c l e s<br />
Rapid profiling of a microbial genome using mixtures<br />
of barcoded oligonucleotides<br />
Joseph R Warner 1 , Philippa J Reeder 1 , Anis Karimpour-Fard 2 , Lauren B A Woodruff 1 & Ryan T Gill 1<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
A fundamental goal in biotechnology and biology is the development of approaches to better understand the genetic basis of<br />
traits. Here we report a versatile method, trackable multiplex recombineering (TRMR), whereby thousands of specific genetic<br />
modifications are created and evaluated simultaneously. To demonstrate TRMR, in a single day we modified the expression of<br />
>95% of the genes in Escherichia coli by inserting synthetic DNA cassettes and molecular barcodes upstream of each gene.<br />
Barcode sequences and microarrays were then used to quantify population dynamics. Within a week we mapped thousands of<br />
genes that affect E. coli growth in various media (rich, minimal and cellulosic hydrolysate) and in the presence of several growth<br />
inhibitors (b-glucoside, d-fucose, valine and methylglyoxal). This approach can be applied to a broad range of traits to identify<br />
targets for future genome-engineering endeavors.<br />
Microbial genomes hold the potential for tremendous combinatorial<br />
diversity, comprising a sequence space of 4 4,600,000 . Researchers’ ability<br />
to search this diversity for genetic features that affect pertinent traits<br />
remains limited by the number of individuals that can be tested, which<br />
is a small fraction of all possibilities. Thus, there is a demand for strategies<br />
for first defining relevant genetic variation and then thoroughly<br />
searching that space. This issue has been studied in great depth at the<br />
level of individual genes 1,2 , where high-throughput protein engineering<br />
methods are available for introducing specific mutations and then<br />
mapping the effects of such mutations onto protein activity. Advances<br />
in genomics 3 , and more recently multiplex DNA synthesis 4–8 and<br />
homologous recombination (or recombineering) 9–11 , now enable the<br />
extension of such a strategy to the genome scale.<br />
Advances in genomics have resulted in several methods for highly<br />
parallel mapping of genes to traits, such as profiling of gene-knockout<br />
and plasmid-based libraries 12–20 . In some instances, microarray<br />
technology has been used to enable parallel tracking of genetically<br />
distinct individuals throughout growth in selective environments.<br />
One such tool, molecular barcoding 12,17 , involves the replacement<br />
of every gene in Saccharomyces cerevisiae with a specific DNA<br />
sequence that could be tracked via microarray. Although these tools<br />
are a powerful way to profile the effect of mutation, the difficulty<br />
of specifically creating new mutations limits these studies to one of<br />
two types of mutations that have previously been introduced (insertions<br />
or increases in copy number). These limitations have challenged<br />
efforts to apply these methods for dissecting phenotypes and reengineering<br />
phenotypes that rely upon the coordinated action of multiple<br />
genes and mutations.<br />
Research over the past decade has resulted in recombination-based<br />
methods (recombineering) that make it easier to specifically modify<br />
the E. coli genome using synthetic DNA (synDNA) 9–11,21–23 . Recently,<br />
a recombineering-based method, called MAGE, was reported 24 ,<br />
whereby the expression levels of 24 genes were optimized in parallel<br />
to improve lycopene production more than all previously reported<br />
efforts, in considerably less time. This demonstration was enabled by<br />
a priori knowledge of what genes to modify, which is not known in<br />
many genome-engineering efforts, such as engineering growth and<br />
tolerance. Here we describe TRMR, a complementary method for<br />
simultaneously mapping genetic modifications that affect a trait of<br />
interest. The method combines parallel DNA synthesis, recombineering<br />
and molecular barcode technology to enable rapid modification of<br />
all E. coli genes (Fig. 1 and Supplementary Fig. 1). We demonstrate<br />
this general approach through the construction of two comprehensive<br />
E. coli genomic libraries comprising 8,000 distinct mutations and<br />
gene-trait mapping of these cells in seven environments.<br />
Results<br />
Synthetic DNA cassettes for promoter replacement<br />
We designed a comprehensive library of synDNA cassettes that<br />
have predictable effects when inserted into the genome of E. coli.<br />
Although various genetic features could have been incorporated into<br />
the cassettes (such as point mutations or sequences affecting mRNA<br />
stability, translational efficiency and other processes), we chose to<br />
demonstrate TRMR using functional modifications that either generally<br />
increase the expression of a target gene, called ‘up’, or generally<br />
decrease the gene’s expression, called ‘down’. The up cassette contains<br />
a strong and repressible P LtetO-1 promoter 25 and ribosome binding<br />
site (RBS) 26 sequences, which in general will increase downstream<br />
gene transcription and translation (Fig. 2). The down cassette was<br />
designed to replace the native RBS with an inert sequence that will<br />
generally cause a decrease in translation initiation. Both cassette<br />
designs include a blasticidin-S resistance gene 27 , allowing for selection<br />
of recombinant alleles. Molecular barcodes 12 (also called ‘tags’)<br />
were incorporated to track the presence of each synDNA oligo and to<br />
1 Department of Chemical and Biological Engineering, University of Colorado, Boulder, Colorado, USA. 2 School of Medicine, University of Colorado at Health Science<br />
Center, Denver, Colorado, USA. Correspondence should be addressed to R.T.G. (rtg@colorado.edu).<br />
Received 4 February; accepted 8 June; published online 18 July 2010; doi:10.1038/nbt.1653<br />
856 VOLUME 28 NUMBER 8 AUGUST 2010 nature biotechnology
A rt i c l e s<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
track each allele (engineered cell) within the<br />
mixed population on a barcode microarray 28<br />
(Supplementary Notes).<br />
Because the length of the synDNA cassettes<br />
used here is beyond the current<br />
capabilities of commercially available oligo<br />
library synthesis, we developed a strategy<br />
for multiplex cassette construction that<br />
involves the ligation of sequences shared by<br />
all cassettes to a mixture of shorter oligos<br />
specific to each targeted gene. Construction<br />
of this library was complicated by the fact<br />
that each synDNA cassette must contain<br />
unique sequences in the flanking positions<br />
that are homologous to the chromosome<br />
where the cassette is to be inserted. This is<br />
traditionally accomplished by using PCR to<br />
amplify a DNA cassette with primers that<br />
contain the flanking homology regions 21,29 .<br />
Using such a method to construct thousands<br />
of alleles is resource- and timeintensive<br />
3,11,19 , thus limiting the number<br />
and type of allelic libraries that can be investigated.<br />
(iii) Multiplex recombineering<br />
(ii) Multiplex<br />
synthesis<br />
(i) Design<br />
Targeting Tracking<br />
Bacterial cells<br />
wild-type genome<br />
To address these issues, we developed a procedure to generate thousands<br />
of synDNAs containing multiple desirable sequence features<br />
(such as homology regions and expression modulators) that can be<br />
carried out in a complex mixture. Briefly, ‘targeting oligos’ were first<br />
synthesized on a microarray. Then, we ligated these to the cassette<br />
that modifies gene function, amplified the resulting product with<br />
rolling-circle amplification and then cleaved the long amplified DNA<br />
molecule into the synDNAs (Fig. 2a–c).<br />
Targeting oligos were designed for every protein-coding gene in the<br />
E. coli MG1655 genome (Supplementary Table 1 and Supplementary<br />
Notes). In all, 8,154 targeting oligos were designed to create two possible<br />
expression alleles for 4,077 genes. Targeting regions were chosen<br />
such that DNA cassettes would insert upstream of genes, replace the<br />
translation start codon and account for gene overlap. Once designed,<br />
the set of targeting oligos, each 189 nucleotides long, was purchased<br />
through limited access at a cost of roughly $1 per unique oligo<br />
(Oligonucleotide Library Synthesis, Agilent).<br />
To test cassette design and construction and to optimize the procedure<br />
for allele production, we attempted promoter replacement<br />
for the lacZ and galK genes. After optimizing design, we were able<br />
to efficiently generate these alleles using the procedures outlined in<br />
Figure 2. Alleles were isolated as colonies and all showed the expected<br />
change in regulation and expression of the lacZ gene (Fig. 2d,e) or<br />
the galK gene. Furthermore, in PCR confirmations and sequencing,<br />
30 of 30 alleles tested showed the correct site of insertion. By counting<br />
colonies we estimated that we were able to routinely generate at<br />
least 75 alleles per microliter of cells transformed and determined<br />
that yields increased linearly with transformation volumes from<br />
40 ml up to 400 ml tested. With increases in scale, it is conceivable that<br />
one could generate 10 5 –10 7 alleles in a single day, enough to profile<br />
several modifications of every E. coli gene.<br />
Efficient construction of genome-scale allele libraries<br />
Using a library of 8,154 targeting oligos, we attempted to construct<br />
4,077 up synDNA oligos and 4,077 down synDNA oligos<br />
in separate pools. Both oligo pools were constructed in 1 week<br />
and resulted in enough material for several rounds of multiplex<br />
recombineering. The synDNA oligos were then used in a day of<br />
Mixture of ≈ 8,000<br />
unique oligomers<br />
Functional<br />
Targeting<br />
(iv) Enrichment of improved cells<br />
Engineered<br />
genomes<br />
Frequency of designed<br />
mutation (F x = C x /C tot )<br />
geneC<br />
geneD<br />
geneE<br />
geneF<br />
microarray<br />
(v) Multiplex identification<br />
Frequency of designed<br />
mutation (F x = C x /C tot )<br />
geneC<br />
geneD<br />
geneE<br />
geneF<br />
microarray<br />
Improved<br />
genomes<br />
(vi) Genome mapping<br />
Fitness conferred by<br />
mutation (W′ = F x x,f /F x,i )<br />
Genome<br />
plot<br />
Figure 1 TRMR method. (i) Design DNA cassettes encoding the suite of mutations of interest.<br />
(ii) Synthesize those cassettes, along with associated molecular barcodes, in a single pool.<br />
(iii) Introduce cassettes into recombination-proficient E. coli 46 and produce thousands of variants,<br />
each with a distinct region of the chromosome that is engineered. (iv) Perform selections or screens<br />
on the mixture of variants to enrich for those possessing a desired trait. (v) Quantify changes in<br />
allele frequency using molecular barcode technology 47 . (vi) Use these frequency measurements<br />
to map specific genetic changes onto the trait of interest. C x , concentration of allele x; C tot ,<br />
total concentration; F x,f and F x,i , final and initial allele frequencies (see equations in Results).<br />
recombineering experiments, separately generating thousands of<br />
up and down recombinant colonies. Colonies were scraped from<br />
plates and frozen in aliquots for subsequent experiments.<br />
To confirm that desired mutant alleles were generated, we PCR<br />
amplified and sequenced barcode tags from 390 colonies. Sequencing<br />
of the cassette and neighboring chromosome DNA indicated that in<br />
34 of 34 distinct alleles, the cassettes had inserted into the correct<br />
location of the genome. Sequencing also provided an estimate of the<br />
number of alleles containing an error in DNA sequence. Outside<br />
of the barcode sequences, DNA errors were observed in only three<br />
of 34 alleles, two of which had errors in regions of the cassette that<br />
should not affect allele identification or function. The barcode tag<br />
sequences provide an estimate of DNA errors present in the initial<br />
oligo libraries because barcodes are not subject to the experimental<br />
bias (bias includes selection for correct sequences during PCR<br />
amplification and during homologous recombination) that would<br />
filter out incorrect sequences. High fidelity of the molecular barcode<br />
sequences is also required to accurately detect the presence of each<br />
allele in cell mixtures. Only 5% of the 390 sequenced tags showed<br />
an error, usually substitution or loss of a single nucleotide. The<br />
high percentage of correct alleles observed here is a first indication<br />
that complex oligonucleotide mixtures may be used to engineer and<br />
identify thousands of distinct genomic loci with high fidelity.<br />
To assess our ability to make complete and uniform libraries in<br />
multiplex, we used Affymetrix Geneflex TAG4 arrays 28 to measure<br />
the concentration of each barcode tag in the synDNA mixture (before<br />
recombineering) and in genomic DNA from cell mixtures (after<br />
recombineering). We observed microarray signals from hybridization<br />
of each of the 8,154 library tags, ten positive-control tags that<br />
we spiked into the samples to calculate tag concentrations (see<br />
Supplementary Fig. 2), and 1,642 negative-control tags used to provide<br />
a measure of background hybridization and noise. The barcode<br />
signals from the synDNA mixtures indicated that 8,016 of the oligos<br />
were present (detected above background). Therefore, we successfully<br />
generated nearly complete (98%) up and down oligo libraries.<br />
Microarray analysis of the cell mixtures indicated successful generation<br />
of at least 7,829 unique alleles (96% of designed alleles; Fig. 3a<br />
and Supplementary Table 2). We found that the concentration of<br />
each unique allele depended on the concentration of synDNA used<br />
nature biotechnology VOLUME 28 NUMBER 8 AUGUST 2010 857
A rt i c l e s<br />
a<br />
Target<br />
oligos<br />
Two mixtures of<br />
target oligos<br />
geneX up<br />
geneY up<br />
geneZ up<br />
Shared DNA<br />
geneX up<br />
Shared DNA<br />
Two mixtures of 4,077<br />
synDNA oligos<br />
X up X<br />
Y up Y<br />
Z up Z<br />
geneX down<br />
i geneY down<br />
ii iii iv<br />
geneZ down<br />
X down X<br />
Y down Y<br />
Z down Z<br />
b<br />
Target oligo (189 nucleotides)<br />
P1 H2 x Cut site H1 x P3 Tag x P2<br />
synDNA oligo (~ 800 base pairs)<br />
H1 P3 Tag P2 antibiotic R x<br />
x<br />
Up/Down H2 x<br />
c<br />
E. coli cell<br />
Chromosome<br />
geneX<br />
up<br />
up<br />
up<br />
geneY<br />
Recombineering enzymes<br />
4,077 genes<br />
targeted<br />
simultaneously<br />
geneZ<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
d<br />
Up<br />
Down<br />
(P LtetO-1 & RBS)<br />
(no RBS)<br />
Up allele (761 bp insertion)<br />
Chromosome Tag blasticidin R lacZ+ P LtetO-1 RBS<br />
Down allele (703 bp insertion)<br />
Chromosome Tag blasticidin R<br />
lacZ– No RBS<br />
Chromosome<br />
Chromosome<br />
Chromosome<br />
lacZ<br />
lacZ<br />
up<br />
geneX<br />
geneX<br />
geneX<br />
e<br />
up<br />
geneY<br />
lacZ up<br />
geneY<br />
geneY<br />
Glucose + X-gal<br />
Wild type<br />
geneZ<br />
up<br />
geneZ<br />
geneZ<br />
lacZ down<br />
IPTG + X-gal<br />
Figure 2 Multiplex strategy to rapidly generate cell mixtures with defined genetic modifications. (a) Construction of synDNA library. (i) ‘Target’ oligos<br />
that contain chromosome homology and barcodes are synthesized on a chip, cleaved from the chip, amplified by two rounds of PCR and modified<br />
with (ligation) sequences by uracil excision 48 . (ii) This pool of target oligos is ligated with oligos containing a selectable marker and promoter and<br />
RBS variants (Shared DNA), resulting in a pool of DNA circles. (iii) DNA circles are copied into a pool of linear concatemers by rolling-circle<br />
amplification 49 . (iv) Concatemers are cleaved at a repeating site linking the homology regions to provide a pool of synDNA ready for multiplex<br />
recombineering. (b) Schematic of target oligos and synDNA oligos for gene x. Red, unique regions; black, shared regions; P, PCR priming site;<br />
H, chromosome targeting region; Tag, barcode tag sequence; Up/Down, functional region. Sequence is shown for amplifying barcode tags and for<br />
functional regions (promoter sequence in italic, RBS in bold, start codon underlined). (c) Pool of synDNA oligos is inserted into electrocompetent E. coli<br />
cells. Recombineering enzymes catalyze the insertion of the synDNA oligos at thousands of unique loci in the genome. (d) Schematic of lacZ alleles<br />
used to test the method. Up allele is designed to increase gene transcription and translation. Down allele is designed to decrease translation. (e) LacZ<br />
up and down alleles yield the intended phenotypes. Up mutation of the lacZ gene causes cells to turn blue on the surface of agar containing glucose and<br />
X-gal. Down mutation of the lacZ gene causes cells to remain colorless on the surface of agar containing IPTG and X-gal.<br />
Wild type<br />
to construct that allele (Supplementary Fig. 3). After normalization<br />
of the concentration of each allele for differences in synDNA concentrations<br />
used in recombineering, the s.d. for generating each mutant<br />
was ± 65% of the average, distributed uniformly around the genome<br />
(Fig. 3b). We also observed a modest dependence of recombineering<br />
frequency on the hybridization free energy 30 of the homology regions<br />
(Supplementary Fig. 4).<br />
A small percentage of the alleles were not detected (4%), and in all<br />
these cases the preceding synDNA was either absent or found in low<br />
concentrations. In subsequent attempts to create allele libraries, most<br />
of these missing alleles were detected, suggesting that the alleles were<br />
initially not detected because of low concentrations of the synDNA<br />
oligos. These results indicate that the uniformity of cell mixtures in<br />
future multiplex recombineering experiments may easily be improved<br />
by supplementation with synDNA oligos that are initially present in<br />
low concentrations. Improvements in the uniformity of the initial<br />
mixture should enable the more efficient identification of cells with<br />
improved traits.<br />
Notably, a single researcher was able to create these two genomescale<br />
up and down allele libraries in a single day, demonstrating that<br />
multiplex recombineering is a rapid strategy for reprogramming<br />
thousands of genes.<br />
Genome-scale mapping of alleles to selectable traits<br />
To illustrate the potential of TRMR to rapidly generate and identify<br />
cells with new traits, we plated the cell mixtures on agar medium<br />
supplemented to create four different conditions (salicin, d-fucose,<br />
methylglyoxal and valine) in which wild-type E. coli typically do not<br />
grow. Colonies representing resistant mutants arose from our allele<br />
mixtures at frequencies >100-fold greater than from unmodified control<br />
cells that relied on spontaneous mutation to generate resistance<br />
(Supplementary Table 3). We characterized individual colonies (83<br />
total) by sequencing the barcode tags. Additionally, we used TAG4<br />
microarrays to characterize the populations obtained by scraping all<br />
colonies off of the surfaces of selection plates. Using microarray data,<br />
we ranked each allele in each condition according to fitness (fitness of<br />
858 VOLUME 28 NUMBER 8 AUGUST 2010 nature biotechnology
A rt i c l e s<br />
a<br />
Number of probes<br />
2,000<br />
1,500<br />
1,000<br />
500<br />
Number of probes<br />
1,200<br />
900<br />
600<br />
300<br />
0<br />
20 80 140 200 260<br />
Unassigned tag signals<br />
0<br />
0 1,200 2,400 3,600 4,800 6,000 7,200 8,400<br />
Allele tag signals<br />
Threshold<br />
Figure 3 Analysis of synDNA and cell library. (a) Histogram showing<br />
the distribution of barcode signals of the up and down allele libraries<br />
detected by the TAG4 microarray. The unassigned tag signals (shown<br />
in gray) provide a measure of the background signal for each probe on<br />
the microarray. Probes that are assigned to unique alleles are shown in<br />
green. The unassigned tag signals have a low signal distribution (inset),<br />
and the threshold is shown for signals that are significantly above the<br />
background signal. The threshold for detection was such that the rate<br />
of false positives would be less than 2.2%. (b) TAG4 microarray results<br />
showing the distribution of synDNA oligos and alleles plotted by genomic<br />
location on the circular E. coli genome. Blue, up library; red, down<br />
library; inner circles, the concentration of each unique synDNA oligo<br />
before recombineering; outer circles, efficiency of generating each allele,<br />
calculated by dividing allele concentration by synDNA concentration.<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
b<br />
Up library<br />
pop. 3,869<br />
allele x = W ′ x = F x,f / F x,i , which is the ratio of the final allele frequency<br />
(F x = concentration of x/total concentration) after growth to the initial<br />
allele frequency). The allele fitness determined by microarray agreed<br />
well with the results from picking and sequencing individual colonies<br />
(Fig. 4 and Supplementary Table 4).<br />
Constructing mutants with beneficial traits and identifying the<br />
genetic cause has traditionally been a slow and laborious process.<br />
Using TRMR, we were able to rapidly identify traits present in our<br />
cell mixtures that are consistent with previous<br />
studies and identify unexpected genetic<br />
modifications that could be used in future<br />
metabolic engineering. The allele(s) that<br />
conferred the highest frequency or fitness<br />
from these selections were reconstructed<br />
separately to confirm that improved growth is<br />
due to the insertion of the identified cassette.<br />
These alleles are summarized in Figure 4 and<br />
described in detail below.<br />
Salicin is a carbon source that E. coli normally<br />
cannot metabolize owing to repression<br />
of the enzymes BglF and BglB. We identified<br />
the hns down mutation, using both array<br />
Down library<br />
pop. 3,960<br />
a<br />
b<br />
Frozen cell<br />
mixture<br />
Salicin<br />
hns<br />
results and sequencing, as having the greatest effect on fitness in<br />
medium supplemented with salicin. Mutations in the hns (histonelike<br />
nucleoid structuring protein) regulator 31 are known to confer<br />
improved growth on salicin. Its identification here confirms that the<br />
TRMR method can effectively uncover gene-trait relationships.<br />
d-fucose is a nonmetabolizable analog of arabinose that inhibits the<br />
ability of E. coli to use arabinose as a carbon source by inhibiting induction<br />
of the l-arabinose operon. We identified the xylA up allele, which<br />
causes overexpression of xylA and xylB, as conferring the ability to grow<br />
in the presence of d-fucose. Notably, these results suggest that E. coli<br />
xylose isomerase (XylA) may have in vivo l-arabinose isomerase activity.<br />
This discovery is corroborated by the observation that overexpression<br />
of E. coli xylAB in Pseudomonas putida confers the ability to metabolize<br />
both xylose and l-arabinose 32 . Such a trait is of potential value for the<br />
efficient use of cellulosic biomass as a renewable feedstock.<br />
Methylglyoxal is an important intracellular metabolite because it<br />
can be used as an intermediate for production of commodity chemicals<br />
and because, when metabolism is disrupted, it can accumulate,<br />
Recovered<br />
cells<br />
Growth on selective agars<br />
Microarray analysis, allele sequencing & reconstruction, phenotype validation<br />
xyIA<br />
D-fucose<br />
Methylglyoxal<br />
sodC<br />
ilvN<br />
Valine<br />
Figure 4 Trait-conferring genotypes identified<br />
in four selective environments. (a) Up and down<br />
alleles were recovered from frozen cultures and<br />
spread on agar medium in conditions where<br />
wild-type cells would not grow (indicated as<br />
column headings). (b) Fitness (W′) calculated<br />
by microarray detection of barcode tags was<br />
plotted for each allele by genomic location.<br />
Blue, up allele; red, down allele. (c) Known<br />
or hypothesized mechanisms whereby the<br />
identified genomic modifications confer<br />
the ability to grow. High-fitness alleles were<br />
detected on microarrays, except for leuL down,<br />
which was identified by sequencing of barcode<br />
tags within colonies.<br />
c<br />
hns down<br />
Salicin<br />
BgIF<br />
H-NS<br />
BgIB<br />
H-NS<br />
Glycolysis<br />
D-xylose<br />
XylA<br />
XyIB<br />
xyIA up sodC down IeuL down ilvN down<br />
D-fucose<br />
L-arabinose<br />
D-fucose<br />
Pentose phosphate<br />
pathway<br />
Methylglyoxal<br />
Toxic<br />
oxygen radicals<br />
SodC<br />
Overexpression<br />
LeuABCD<br />
Valine<br />
2-KIV<br />
LeuABCD<br />
Leucine<br />
&<br />
isoleucine<br />
Pyruvate<br />
+<br />
2-ketobutyrate<br />
IIvB<br />
ilvN<br />
Valine<br />
IIvB<br />
Acetohydroxybutyrate<br />
Isoleucine<br />
nature biotechnology VOLUME 28 NUMBER 8 AUGUST 2010 859
A rt i c l e s<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
Figure 5 Alleles identified during pooled growth in<br />
media and cellulosic hydrolysate. (a) TRMR alleles<br />
were recovered from frozen cultures and allowed<br />
to grow in a rich medium, minimal medium or<br />
cellulosic hydrolysate. (b) Allele frequencies after<br />
growth in media plotted by genomic location. Inner<br />
circle, rich; outer circle, minimal; blue, up allele;<br />
red, down allele; black, control allele frequency × 10.<br />
(c) Allele fitness in minimal medium plotted<br />
against fitness of the same allele in rich medium.<br />
Shapes describe the affected gene function as<br />
determined by clusters of orthologous groups:<br />
◊, information storage and processing; , cellular<br />
processes; , metabolism; ×, poorly characterized;<br />
blue, up allele; red, down allele; black, control<br />
allele. Fitness trend was fit to a line shown in<br />
black (R 2 = 0.748). (d) The fitness of down<br />
alleles compared with the corresponding up<br />
alleles. Brown , rich medium; green , minimal<br />
medium. For alleles that cluster toward either<br />
the x or the y axis, the up allele and the down<br />
allele report opposite effects. Inset shows fitness<br />
benefits (W′ > 1) of top 40 alleles for growth in<br />
minimal medium, and the fitness effects (usually<br />
detrimental, W′ < 1) of the orthogonal alleles.<br />
(e) Fitness (lnW′) plotted by genomic location of<br />
alleles isolated after growth in hydrolysate. Inner<br />
circle, 15–17% hydrolysate; outer circle, 18–20%<br />
hydrolysate; blue, up allele; red, down allele.<br />
Some alleles conferring high fitness are labeled.<br />
(f) Growth curves of isolated variants in cellulosic<br />
hydrolysate. Each growth curve is the average of<br />
three replicates. Curves are fit with a Gompertz<br />
function 50 (black). Alleles are denoted with roman<br />
numerals, as follows: (i) puuE down (pale blue),<br />
(ii) yciV down (purple), (iii) ygaZ up (green), (iv) lpp<br />
down (pink), (v) ugpE down (blue), (vi) ptsI down<br />
(pale green), (vii) wild-type MG1655 (red),<br />
(viii) ahpC up (blue). Error bars are minimal and are<br />
not shown for clarity. A 600 , absorbance at 600 nm.<br />
(g) Percent change in biomass productivity and<br />
maximum growth rate for isolated variants grown<br />
in hydrolysate relative to E. coli MG1655 grown<br />
in hydrolysate. Biomass productivity (gray bars) is<br />
the area under each growth curve. Growth rate (red<br />
bars) is the maximum growth rate as calculated<br />
from the Gompertz function. Values are the average<br />
of three replicates; error bars denote s.d.<br />
a<br />
b<br />
c<br />
Minimal medium allele fitness (W′)<br />
d<br />
Down allele fitness (W′)<br />
3.0<br />
2.5<br />
2.0<br />
1.5<br />
1.0<br />
0.5<br />
3.0<br />
2.5<br />
2.0<br />
1.5<br />
1.0<br />
0.5<br />
Growth in<br />
minimal nutrients<br />
Growth in<br />
rich nutrients<br />
0<br />
0 0.5 1.0 1.5 2.0<br />
Rich medium allele fitness (W′)<br />
Fitness<br />
Gain<br />
Loss<br />
0<br />
0 0.5 1.0 1.5 2.0 2.5 3.0<br />
Up allele fitness (W′)<br />
3<br />
2<br />
1<br />
0<br />
Freezer<br />
stock<br />
e<br />
f<br />
Number of cells (A 600 )<br />
g<br />
% change in biomass productivity<br />
and growth rate relative to wild-type<br />
1.5<br />
1.0<br />
0.5<br />
cyaA<br />
ygjQ<br />
ahpC<br />
up<br />
cyaA<br />
ilvM<br />
Growth in 15–17%<br />
cellulosic hydrolysate<br />
Growth in 18–20%<br />
cellulosic hydrolysate<br />
eutL<br />
ptsI<br />
eutL<br />
moeA<br />
ybaB<br />
ydjG<br />
0<br />
0 2 4 6 8 10<br />
3.5<br />
3.0<br />
2.5<br />
2.0<br />
1.5<br />
1.0<br />
0.5<br />
0<br />
0 2 4 6 8 10<br />
Time (h)<br />
248 ± 18%<br />
233 ± 22%<br />
80<br />
60<br />
40<br />
20<br />
0<br />
–20<br />
puuE ptsI lpp yciV<br />
down down down down<br />
ygaZ<br />
up<br />
ahpC<br />
IsrA<br />
yciV<br />
vi<br />
12 14<br />
12 14<br />
ugpE<br />
down<br />
i<br />
ii<br />
iii<br />
iv & v<br />
vii<br />
viii<br />
vii<br />
resulting in oxidative damage and eventual cell death 33 . We used<br />
TRMR to discover a previously unknown phenotype: decreased<br />
expression of sodC, which produces a superoxide-mediating enzyme 34 ,<br />
confers resistance to exogenous methylglyoxal, possibly by affecting<br />
superoxide concentrations in the periplasm.<br />
Excess valine causes feedback inhibition of leucine and isoleucine<br />
biosynthesis, leading to inhibition of cell growth as these amino<br />
acids become scarce. Microarray results identified ilvN down as the<br />
allele conferring the best growth, and this genomic region has been<br />
indicated in several previous studies 35,36 . Unexpectedly, sequencing<br />
showed that the leuL down allele also could grow well on valine plates.<br />
The leuL down mutation would cause increased expression of the<br />
leucine biosynthesis operon leuABCD by circumventing the alleged<br />
transcription attenuation caused by leuL 37 . Mutations of this operon<br />
have not previously been associated with valine resistance. However, a<br />
recent attempt to increase production of noncanonical amino acids in<br />
engineered E. coli cells demonstrated that overexpression of leuABCD<br />
shifts metabolite pools from valine toward isoleucine and leucine 38 .<br />
Genome-scale quantitative growth phenotypes<br />
To further demonstrate that TRMR performs well at the genome scale,<br />
we combined the up and down allele libraries and measured fitness in<br />
liquid cultures that contained rich or minimal nutrients (Fig. 5a). The<br />
liquid cultures were allowed to grow for an average of eight generations,<br />
before and after which aliquots of cells were plated for analysis of<br />
individuals or frozen for microarray analysis. Additionally, an aliquot<br />
of control cells (barcoded and kanamycin resistant; Supplementary<br />
Notes) was spiked into the culture at the start of selections. A known<br />
concentration of these control cells was used to assess the ability of<br />
barcode technology to measure allele concentrations during pooled<br />
growth. The control cells also serve as a wild-type standard with which<br />
the fitness of alleles can be compared.<br />
Using barcode microarrays, we simultaneously tracked all of<br />
the alleles, which were reduced to approximately 2,500 alleles after<br />
growth selections (Fig. 5b). The numbers of control cells in the<br />
populations determined by microarray was not substantially different<br />
from estimates of control-cell numbers obtained from counting<br />
860 VOLUME 28 NUMBER 8 AUGUST 2010 nature biotechnology
A rt i c l e s<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
kanamycin-resistant colonies. Microarrays revealed that the majority<br />
of alleles had similar growth phenotypes in both rich and minimal<br />
media (Fig. 5c, x-y diagonal). Noteworthy alleles that do not fit this<br />
trend are those that allow growth in the rich medium but are no<br />
longer observed in the minimal medium (Fig. 5c, alleles along x axis).<br />
Consistent with previous observations 19 , many of these alleles consist<br />
of changes in the expression of genes involved in metabolism. Also<br />
of interest are those alleles that confer faster growth than that of the<br />
control cells in the minimal medium (a list of fitness values can be<br />
found in Supplementary Table 5).<br />
These experiments also offer the first genome-wide glimpse of<br />
generally orthogonal expression alleles grown competitively in the<br />
same culture. We anticipated that if a particular up allele shows a<br />
fitness benefit, then the down allele is likely to show a negative effect<br />
on fitness, possibly being lost from the culture, and vice versa. This<br />
is often the case (see Fig. 5d, allele clustering toward the axes), providing<br />
further evidence that our synthetic cassettes are generally<br />
causing the intended effects at genome-wide loci. Exceptions such as<br />
improved growth resulting from both up and down expression alleles<br />
in the same environment may be due to secondary effects (such as<br />
increased transcription of multiple downstream genes) and require<br />
further investigation.<br />
Mapping tolerance to lignocellulosic hydrolysate<br />
We next applied TRMR to identify genes that improve tolerance to<br />
lignocellulosic hydrolysate derived from corn stover (provided by the<br />
US National Renewable Energy Laboratory). This class of feedstocks<br />
contains a variable array of growth inhibitors (known inhibitors<br />
include organic acids, aldehydes and phenolic-based compounds) 39,40 .<br />
To take hydrolysate variability into account, we measured growth of<br />
variants bearing our alleles in several mixtures of hydrolysate and<br />
minimal medium.<br />
Microarray analysis of the alleles indicated that only a small<br />
subset of the population remained after each selection (Fig. 5e;<br />
see Supplementary Table 6 for fitness values and gene ontology<br />
analysis). Many of the modifications that improved growth in lower<br />
concentrations of corn stover hydrolysate affected genes known to be<br />
involved in primary metabolism (pgi up, eno up and tdcG up), RNA<br />
metabolism (rlmG down, rimM up, rsmE down and rrmA down) and<br />
transport of sugars (ptsI down, ptsI up and directly downstream crr).<br />
Growth in higher concentrations of hydrolysate selected alleles related<br />
to secondary metabolism (ispF up and dxs up), vitamin metabolic<br />
processes (nadD up, menD up, apbE up, pabC up, dxs up and ribB<br />
up) and antioxidant activity (ahpC up, tpx up and bcp up). The down<br />
mutation of the adenylate cyclase gene (cyaA) conferred a growth<br />
advantage in every selection.<br />
To confirm that the mutations conferred fitness advantages, we<br />
isolated seven alleles after the selections and characterized growth<br />
in hydrolysate relative to unmutated E. coli. (Fig. 5f,g). All seven<br />
alleles (ahpC up, ugpE down, puuE down, ptsI down, ygaZ up, yciV<br />
down and lpp down) yielded improvement in either growth rate or<br />
biomass productivity relative to the wild-type strain. Notably, the<br />
up allele of ahpC resulted in a large improvement. The ahpC gene<br />
and its downstream counterpart ahpF have not previously been<br />
identified as important for growth in hydrolysate. However, they<br />
have been implicated in resistance to organic solvents 41 and various<br />
oxidants 42,43 , possibly indicating that during growth in cellulosic<br />
hydrolysate, reactive oxygen species in the form of peroxides and<br />
other oxidants are present or forming as a result of imbalances in<br />
metabolism 44 . In addition to identifying several important targets for<br />
future genome-engineering endeavors, many of which would have<br />
been difficult to predict a priori, these profiling studies shed light on<br />
general mechanisms of hydrolysate toxicity (such as the presence of<br />
oxidants) and growth advantage in hydrolysate (such as metabolism<br />
of preferred carbon sources).<br />
Discussion<br />
We have described a new method for the genome-scale mapping<br />
of genes to traits and have shown that this method can increase<br />
the throughput of genetic studies by several orders of magnitude.<br />
Although some of the trait-conferring modifications we identified<br />
correspond to previously identified genomic regions, the majority<br />
would have been difficult to predict. Such unanticipated outcomes<br />
provide insight into many uncharacterized genes and, in some cases,<br />
into known genes with uncharacterized functions. We have already<br />
begun applying this method toward understanding a range of traits of<br />
importance in biotechnology, including improved growth in industrially<br />
relevant conditions and enhanced product formation.<br />
We have designed TRMR to be easy to use and versatile. The<br />
molecular cloning procedures were accomplished within a week by<br />
a single researcher, with two additional days providing enough cells<br />
for 60 genome-wide selection and screening studies. Notably, data<br />
acquisition and analysis from TRMR is similar to genomics methods<br />
currently used by the yeast community and is amenable to a range<br />
of freely and commercially available software packages. The primary<br />
challenge to the broad dissemination of this method is the acquisition<br />
of oligonucleotide libraries, which will be overcome as DNA synthesis<br />
technologies continue to improve.<br />
We envision that a broad range of additional studies could be performed<br />
using the basic TRMR platform described here by changing<br />
the targeting, functional or tracking design. For example, although the<br />
functional regions we used were promoters and translation sites, one<br />
might conceivably use sites associated with additional functions such<br />
as switches, oscillators or sensors 45 . Moreover, the TRMR approach<br />
is not limited to engineering or examining the E. coli genome. The<br />
design could be adapted for rapidly engineering yeast and a range<br />
of Gram-negative bacteria 23 , provided the host has sufficient transformation<br />
and recombination capabilities. Additionally, TRMR may<br />
be carried out recursively, allowing for the accumulation of multiple<br />
beneficial mutations within a genome. Researchers could produce<br />
second- and third-generation recombinant cells by removing the antibiotic<br />
cassette between rounds of recombineering to allow isolation<br />
of cells containing an additional mutation, by using different antibiotic<br />
cassettes in the modular construction of the synDNA oligos so<br />
that different antibiotics could be used to isolate recombinants after<br />
each round of TRMR, or by eliminating altogether the need to isolate<br />
recombinants by relying on the increased efficiency of recombineering<br />
strategies such as those used in MAGE 24 . Integration of TRMR<br />
into directed-evolution programs would provide genome-scale construction<br />
and tracking of combinations of mutations, which would<br />
improve both the understanding and engineering of complex traits.<br />
Methods<br />
Methods and any associated references are available in the online<br />
version of the paper at http://www.nature.com/naturebiotechnology/.<br />
Note: Supplementary information is available on the <strong>Nature</strong> Biotechnology website.<br />
Acknowledgments<br />
We thank D. Court (Center for Cancer Research, National Cancer Institute at<br />
Frederick, Maryland) for sharing plasmid pSIM5, C. Nislow and G. Giaever<br />
(University of Toronto, Ontario) for help with microarray analysis, A. Mohagheghi<br />
and M. Zhang (US National Renewable Energy Laboratories) for hydrolysate<br />
samples, M. O’Donnell for help in preparation of selective agar plates, Agilent for<br />
nature biotechnology VOLUME 28 NUMBER 8 AUGUST 2010 861
A rt i c l e s<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
access to the Oligonucleotide Library Synthesis product, and H. Marshall and the<br />
University of Colorado Microarray Facility for molecular barcode genotyping.<br />
The authors appreciate financial support provided by Shell, the Colorado Center<br />
for Biorefining and Biofuels (http://www.C2B2web.org) and the Colorado Energy<br />
Initiative (http://rasei.colorado.edu).<br />
Author Contributions<br />
J.R.W. and R.T.G. conceived the study; J.R.W. designed and performed all<br />
experiments except for growth selections and allele confirmations in hydrolysate,<br />
which were conducted by P.J.R.; A.K.-F. aided J.R.W. in selection of targeting<br />
sequences and selection of barcode tags; A.K.-F. and P.J.R. assigned gene ontology<br />
terms; L.B.A.W. aided J.R.W. in selection design and microarray analysis; L.B.A.W.<br />
constructed circle plots; P.J.R., A.K.-F. and L.B.A.W. helped in manuscript<br />
preparation; J.R.W. and R.T.G. wrote the manuscript; R.T.G. supervised all aspects<br />
of the study.<br />
COMPETING FINANCIAL INTERESTS<br />
The authors declare no competing financial interests.<br />
Published online at http://www.nature.com/naturebiotechnology/.<br />
Reprints and permissions information is available online at http://npg.nature.com/<br />
reprintsandpermissions/.<br />
1. Fox, R.J. et al. Improving catalytic function by ProSAR-driven enzyme evolution.<br />
Nat. Biotechnol. 25, 338–344 (2007).<br />
2. Turner, N.J. Directed evolution drives the next generation of biocatalysts. Nat. Chem.<br />
Biol. 5, 567–573 (2009).<br />
3. Winzeler, E.A. et al. Functional characterization of the S. cervisiase genome by<br />
gene geletion and parallel analysis. Science 285, 901–906 (1999).<br />
4. Fodor, S. et al. Light-directed, spatially addressable parallel chemical synthesis.<br />
Science 251, 767–773 (1991).<br />
5. Blanchard, A.P., Kaiser, R.J. & Hood, L.E. High-density oligonucleotide arrays.<br />
Biosens. Bioelectron. 11, 687–690 (1996).<br />
6. Singh-Gasson, S. et al. Maskless fabrication of light-directed oligonucleotide microarrays<br />
using a digital micromirror array. Nat. Biotechnol. 17, 974–978 (1999).<br />
7. Cleary, M.A. et al. Production of complex nucleic acid libraries using highly parallel<br />
in situ oligonucleotide synthesis. Nat. Methods 1, 241–248 (2004).<br />
8. Ghindilis, A. et al. CombiMatrix oligonucleotide arrays: genotyping and gene<br />
expression assays employing electrochemical detection. Biosens. Bioelectron. 22,<br />
1853–1860 (2007).<br />
9. Yu, D. et al. An efficient recombination system for chromosome engineering in<br />
Escherichia coli. Proc. Natl. Acad. Sci. USA 97, 5978–5983 (2000).<br />
10. Murphy, K. Use of bacteriophage lambda recombination functions to promote gene<br />
replacement in Escherichia coli. J. Bacteriol. 180, 2063–2071 (1998).<br />
11. Zhang, Y., Buchholz, F., Muyrers, J. & Stewart, A.F. A new logic for DNA engineering<br />
using recombination in Escherichia coli. Nat. Genet. 20, 123–128 (1998).<br />
12. Shoemaker, D.D., Lashkari, D.A., Morris, D., Mittmann, M. & Davis, R.W. Quantitative<br />
phenotypic analysis of yeast deletion mutants using a highly parallel molecular<br />
bar-coding strategy. Nat. Genet. 14, 450–456 (1996).<br />
13. Cho, R.J. et al. Parallel analysis of genetic selections using whole genome<br />
oligonucleotide arrays. Proc. Natl. Acad. Sci. USA 95, 3752–3757 (1998).<br />
14. Gill, R.T. et al. Genome wide screening for trait conferring genes using DNA microarrays.<br />
Proc. Natl. Acad. Sci. USA 99, 7033–7038 (2002).<br />
15. Lynch, M.D., Warnecke, T. & Gill, R.T. SCALEs: multiscale analysis of library<br />
enrichment. Nat. Methods 4, 87–93 (2007).<br />
16. Badarinarayana, V. et al. Selection analyses of insertional mutants using subgenic<br />
resolution arrays. Nat. Biotechnol. 19, 1060–1065 (2001).<br />
17. Giaever, G. et al. Functional profiling of the Saccharomyces cerevisiae genome.<br />
<strong>Nature</strong> 418, 387–391 (2002).<br />
18. Ho, C.H. et al. A molecular barcoded yeast ORF library enables mode-of-action<br />
analysis of bioactive compounds. Nat. Biotechnol. 27, 369–377 (2009).<br />
19. Baba, T. et al. Construction of Escherichia coli K-12 in-frame, single-gene knockout<br />
mutants: the Keio collection. Mol. Syst. Biol. 2, 1–11 (2006).<br />
20. Kitagawa, M. et al. Complete set of ORF clones of Escherichia coli ASKA library<br />
(a complete set of E. coli K-12 ORF archive): unique resources for biological<br />
research. DNA Res. 12, 291–299 (2006).<br />
21. Datsenko, K. & Wanner, B. One-step inactivation of chromosomal genes in<br />
E. coli K12 using PCR products. Proc. Natl. Acad. Sci. USA 97, 6640–6645<br />
(2000).<br />
22. Ellis, H.M., Yu, D., DiTizio, T. & Court, D.L. High efficiency mutagenesis, repair,<br />
and engineering of chromosomal DNA using single-stranded oligonucleotides.<br />
Proc. Natl. Acad. Sci. USA 98, 6742–6746 (2001).<br />
23. Datta, S., Costantino, N., Zhou, X. & Court, D.L. Identification and analysis of<br />
recombineering functions from Gram-negative and Gram-positive bacteria and their<br />
phages. Proc. Natl. Acad. Sci. USA 105, 1626–1631 (2008).<br />
24. Wang, H. et al. Programming cells by multiplex genome engineering and accelerate<br />
evolution. <strong>Nature</strong> 460, 894–898 (2009).<br />
25. Lutz, R. & Bujard, H. Independent and tight regulation of transcriptional units in<br />
Escherichis coli via the LacR/O, the TetR/O and AraC/I1–I2 regulatory elements.<br />
Nucleic Acids Res. 25, 1203–1210 (1997).<br />
26. Shine, J. & Dalgarno, L. The 3′-terminal sequence of Escherichia coli 16S ribosomal<br />
RNA: complementarity to nonsense triplets and ribosome binding sites. Proc. Natl.<br />
Acad. Sci. USA 71, 1342–1346 (1974).<br />
27. Kimura, M., Takatsuki, A., Yamaguchi, I. & Blasticidin, S. Deaminase gene from<br />
Aspergillus terreus (BSD): a new drug resistance gene for transfection of mammalian<br />
cells. Biochim. Biophys. Acta 1219, 653–659 (1994).<br />
28. Pierce, S.E. et al. A unique and universal molecular barcode array. Nat. Methods<br />
3, 601–603 (2006).<br />
29. Baudin, A., Ozier-Kalogeropoulos, O., Denouel, A., Lacroute, F. & Culin, C. A simple<br />
and efficient method for direct gene deletion in Saccharomyces cerevisiae.<br />
Nucleic Acids Res. 21, 3329–3330 (1993).<br />
30. Markham, N.R. & Zuker, M. DINAMelt web server for nucleic acid melting prediction.<br />
Nucleic Acids Res. 33, W577–W581 (2005).<br />
31. Defez, R. & de Felice, M. Cryptic operon for beta-glucoside metabolism in<br />
Escherichia coli K12: genetic evidence for a regulatory protein. Genetics 97, 11–25<br />
(1981).<br />
32. Meijnen, J.P., de Winde, J.H. & Ruijssenaars, H.J. Engineering Pseudomonas putida<br />
S12 for efficient utilization of d-xylose and l-arabinose. Appl. Environ. Microbiol.<br />
74, 5031–5037 (2008).<br />
33. Zhu, M.M., Skraly, F.A. & Cameron, D.C. Accumulation of methylglyoxal in<br />
anaerobically grown Escherichia coli and its detoxification by expression of the<br />
Pseudomonas putida glyoxalase i gene. Metab. Eng. 3, 218–225 (2001).<br />
34. Gort, A.S., Ferber, D.M. & Imlay, J.A. The regulation and role of the periplasmic copper,<br />
zinc superoxide dismutase of Escherichia coli. Mol. Microbiol. 32, 179–191 (1999).<br />
35. Sutton, A., Newman, T., Francis, M. & Freundlich, M. Valine-resistant Escherichia<br />
coli K-12 strains with mutations in the ilvB operon. J. Bacteriol. 148, 998–1001<br />
(1981).<br />
36. Weinstock, O., Sella, C., Chipman, D.M. & Barak, Z. Properties of subcloned<br />
subunits of bacterial acetohydroxy acid synthases. J. Bacteriol. 174, 5560–5566<br />
(1992).<br />
37. Wessler, S.R. & Calvo, J.M. Control of leu operon expression in Escherichia coli by<br />
a transcription attenuation mechanism. J. Mol. Biol. 149, 579–597 (1981).<br />
38. Sycheva, E.V. et al. Overproduction of noncanonical amino acids by Escherichia<br />
coli cells. Microbiology 76, 712–718 (2007).<br />
39. Chen, S.F., Mowery, R.A., Castleberry, V.A., van Walsum, G.P. & Chambliss, C.K.<br />
High-performance liquid chromatography method for simultaneous determination<br />
of aliphatic acid, aromatic acid and neutral degradation products in biomass<br />
pretreatment hydrolysates. J. Chromatogr. A 1104, 54–61 (2006).<br />
40. Mohagheghi, A. & Schell, D.J. Impact of recycling stillage on conversion of dilute sulfuric<br />
acid pretreated corn stover to ethanol. Biotechnol. Bioeng. 105, 992–996 (2010).<br />
41. Ferrante, A.A., Augliera, J., Lewis, K. & Klibanov, A.M. Cloning of an organic<br />
solvent-resistance gene in Escherichia coli: the unexpected role of alkylhydroperoxide<br />
reductase. Proc. Natl. Acad. Sci. USA 92, 7617–7621 (1995).<br />
42. Poole, L.B. Bacterial defenses against oxidants: mechanistic features of cysteinebased<br />
peroxidases and their flavoprotein reductases. Arch. Biochem. Biophys. 433,<br />
240–254 (2005).<br />
43. Seaver, L.C. & Imlay, J.A. Alkyl hydroperoxide reductase is the primary scavenger<br />
of endogenous hydrogen peroxide in Escherichia coli. J. Bacteriol. 183, 7173–7181<br />
(2001).<br />
44. Kohanski, M.A., Dwyer, D.J., Hayete, B., Lawrence, C.A. & Collins, J.J. A common<br />
mechanism of cellular death induced by bactericidal antibiotics. Cell 130, 797–810<br />
(2007).<br />
45. Lu, T.K., Khalil, A.S. & Collins, J.J. Next-generation synthetic gene networks.<br />
Nat. Biotechnol. 27, 1139–1150 (2009).<br />
46. Datta, S., Constantino, N. & Court, D.L. A set of recombineering plasmids for<br />
gram-negative bacteria. Gene 379, 109–115 (2006).<br />
47. Pierce, S.E., Davis, R.W., Nislow, C. & Giaever, G. Genome-wide analysis of barcoded<br />
Saccharomyces cerevisiae gene-deletion mutants in pooled cultures. Nat. Protoc.<br />
2, 2958–2974 (2007).<br />
48. Nour-Eldin, H.H., Hansen, B.G., Norholm, M.H.H., Jensen, J.K. & Halkier, B.A.<br />
Advancing uracil-excision based cloning towards an ideal technique for cloning PCR<br />
fragments. Nucleic Acids Res. 34, e122 (2006).<br />
49. Dean, F.B., Nelson, J.R., Giesler, T.L. & Lasken, R.S. Rapid amplification of plasmid<br />
and phage DNA using Phi29 DNA polymerase and multiply-primed rolling circle<br />
amplification. Genome Res. 11, 1095–1099 (2001).<br />
50. Perni, S., Andrew, P.W. & Shama, G. Estimating the maximum growth rate from microbial<br />
growth curves: definition is everything. Food Microbiol. 22, 491–495 (2005).<br />
862 VOLUME 28 NUMBER 8 AUGUST 2010 nature biotechnology
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
ONLINE METHODS<br />
Strains, DNA and reagents. Escherichia coli MG1655 (wild type) was obtained<br />
from ATCC 700926. Genomic sequences were obtained from GenBank<br />
U00096.2, and gene annotation was from the Ecogene database version 2.20<br />
(http://www.ecogene.org/). Pseudogenes and insertion elements were excluded<br />
from the protein-coding genes that were targeted. The kanamycin-resistant<br />
control strain (also called JWKAN) was constructed from E. coli ATCC 700926,<br />
with nucleotide 3,909,796 replaced with a barcoded kanamycin cassette 21<br />
(Supplementary Notes). Up and down DNA cassettes were constructed using<br />
PCR and cloned into the pEM7/BSD plasmid (Invitrogen, Supplementary<br />
Notes). Oligonucleotide libraries were purchased from Agilent; all other oligonucleotides<br />
were purchased from Integrated DNA Technologies with standard<br />
desalting except where noted. The pSIM5 plasmid 46 was a gift from D. Court.<br />
All reagents were obtained from common commercial sources. All enzymes<br />
were from New England Biolabs except where noted. All sequencing was performed<br />
by Macrogen USA or Eurofins MGW Operon. Recipes and additional<br />
information can be found in Supplementary Notes.<br />
Preparation of synthetic DNA and recombineering. A portion of the oligonucleotide<br />
library provided by Agilent (8,154 unique 189-mers) was amplified<br />
by two rounds of PCR. Products were treated with the USER enzymes (New<br />
England Biolabs), purified and ligated to the up cassette. Rolling-circle amplification,<br />
nuclease treatment and purification resulted in 8–10 μg synDNA. This<br />
procedure was also carried out in parallel to separately generate TRMR down<br />
synDNA. More details are available in Supplementary Notes.<br />
E. coli cells containing the recombineering plasmid pSIM5 were grown<br />
in 800 ml SOB cultures at 30 °C and made recombineering proficient with<br />
minor modifications to reported methods 46 . Briefly, when cells reached an<br />
optical density at 600 nm of 0.7, flasks were transferred to water baths at 42 °C<br />
to induce the λRed enzymes for 15 min. Flasks were then transferred to an<br />
ice-water bath and cells were kept close to 4 °C for the remaining steps. Cells<br />
were collected by centrifugation and suspended with cold deionized water. Cell<br />
collection and washing was repeated once more, then cells were suspended to a<br />
final volume of 6.4 ml in water. Aliquots of cells (400 μl) were transformed in<br />
a 0.2-cm electrocuvette with approximately 1 μg of up or down synDNA and a<br />
pulse of 12.5 kV cm −1 . Transformation was carried out eight times to generate<br />
the up allele library and eight times to generate the down allele library. The<br />
cells from each transformation were recovered in 12 ml SOC medium for 1 h<br />
at 37 °C. Cells were collected by centrifugation and resuspended in 30 ml MA<br />
salts (Supplementary Notes). Centrifugation and resuspension was repeated<br />
twice more, with the final resuspension to a volume of 2 ml in MA salts. The<br />
up and down allele libraries were separately spread onto a total of 40 low-salt<br />
LB agar plates containing blasticidin-S (90 μg ml −1 ) and allowed to grow at<br />
37 °C for 22 h. Colonies were scraped from the agar plates and up and down<br />
allele libraries were each suspended in a total of 35 ml LB. Cells were collected<br />
by centrifugation and suspended to 3 × 10 9 cells per milliliter in LB medium<br />
containing 16% (vol/vol) glycerol and blasticidin-S (90 μg ml −1 ). Aliquots of<br />
the up or down cell mixtures were stored at −80 °C.<br />
Screens and selections. Freezer stocks were used to inoculate 50 ml low-salt<br />
LB medium containing 80 μg ml −1 blasticidin-S with 5 × 10 8 TRMR up cells<br />
and 5 × 10 8 TRMR down cells. This culture was allowed to grow with shaking<br />
at 37 °C to an optical density at 600 nm of 0.8. The cells were centrifuged<br />
at 4,500g for 6 min, decanted and suspended in 30 ml of MA salts. The cells<br />
were collected once more by centrifugation and suspended in MA salts to a<br />
concentration of 5 × 10 8 cells per milliliter. The JWKAN cells were added to a<br />
final concentration of 7.7 × 10 4 cells per milliliter. A 1.7 ml aliquot of the cell<br />
library (called the recovery culture) was frozen for microarray analysis, and<br />
the remainder was used for various growth selections.<br />
Liquid selections were carried out with shaking at 37 °C in 600 ml of MOPS<br />
minimal medium containing 2 mM phosphate and 4% (wt/vol) glucose or in<br />
600 ml LB medium. Each medium was inoculated with 2.4 × 10 8 cells from a<br />
recovery culture and allowed to grow to an optical density at 600 nm of 1.0–1.2.<br />
Cells were collected from each culture by centrifugation of 10-ml aliquots<br />
at 4,500g for 6 min, decanted and stored at −80 °C for microarray analysis.<br />
Growth results are the average of three array hybridizations.<br />
Hydrolysate growth selections were carried out in various dilutions of<br />
hydrolysate in minimal media (15%, 16%, 17%, 18%, 19% and 20%). During<br />
selections, cell samples were taken for microarray analysis of populations, and<br />
cells were plated to isolate and identify individual alleles growing as colonies.<br />
Unique alleles from selections were identified and confirmed by PCR and<br />
studied for growth characteristics in hydrolysate. All growth curves were done<br />
in complete triplicate. More details are available in Supplementary Notes.<br />
Growth on various selective agars was carried out by spreading a total of 0.7 ×<br />
10 8 cells of the allele mixtures recovered from freezer aliquots on five plates<br />
for each selective condition (salicin, d-fucose plus l-arabinose, valine, and<br />
methylglyoxal; plate recipes in Supplementary Notes). Plates were incubated<br />
at 37 °C until colonies were visible (1–3 d). Selection for galK down alleles was<br />
carried out on plates containing 2-deoxygalactose 9 , and screens were carried<br />
out on MacConkey agar containing 1% (wt/vol) d-galactose. Screens of lacZ<br />
up alleles were carried out on LB agar plates containing 0.2% (wt/vol) glucose<br />
and 40 μg ml −1 X-gal. Screens of lacZ down alleles were carried out on LB<br />
agar plates containing 0.05% (wt/vol) IPTG and 40 μg ml −1 X-gal. Selections<br />
for control cells were carried out on LB agar plates containing kanamycin<br />
(30 μg ml −1 ).<br />
Microarray tracking. Genomic DNA was extracted from ~10 9 E. coli cells<br />
using Purelink Genomic Mini kit (Invitrogen). Barcode tags are amplified in<br />
300 μl PCR reactions (final concentrations: 1× PCR buffer, 2.5 mM MgCl 2 ,<br />
0.2 mM each dNTP, 1 μM each primer 5′-GTAGCACACGAGGTCTCT-3′ and<br />
Biotin-5′-TACGACTCACTATAGGGAGA-3′, 0.6 U μl −1 Taq polymerase and<br />
0.5 μg genomic DNA or 30 pg synDNA). Reactions were cycled 25 times with<br />
an annealing temperature of 55 °C. Barcode tags were purified by agarose gel<br />
electrophoresis and extraction using the QIAquick gel extraction protocols<br />
(Qiagen, substitute buffer QX1 for QG). Tag purification was shown to reduce<br />
background hybridization. Microarray hybridizations to the Geneflex Tag4<br />
16K V2 array (Affymetrix) were carried out according to published procedures<br />
47 with the following modifications: 600 ng of purified tags (combined<br />
up tags and down tags) were hybridized along with ten tags (amplified and<br />
purified as above) included at known concentrations (0.5 pM to 10 nM).<br />
Intensity values are calculated for each tag after removal of replicate outliers<br />
and averaging of unmasked replicates using software (raw_file_maker.pl) that<br />
can be downloaded from http://chemogenomics.stanford.edu/supplements/<br />
04tag/download.html. Background hybridization was calculated from the<br />
average intensity of 1,642 unused tag probes; threshold intensity was set to<br />
background hybridization plus 2 s.d. The intensities of the ten spiked tags<br />
were used to calculate allele concentrations from array signals and correct for<br />
array saturation (Supplementary Fig. 2). Barcode frequencies were calculated<br />
by dividing barcode concentrations by the total concentration of all barcodes<br />
detected on the array.<br />
doi:10.1038/nbt.1653<br />
nature biotechnology
l e t t e r s<br />
Implications of the presence of N-glycolylneuraminic<br />
acid in recombinant therapeutic glycoproteins<br />
Darius Ghaderi 1,2 , Rachel E Taylor 1 , Vered Padler-Karavani 1 , Sandra Diaz 1 & Ajit Varki 1<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
Recombinant glycoprotein therapeutics produced in nonhuman<br />
mammalian cell lines and/or with animal serum are often<br />
modified with the nonhuman sialic acid N-glycolylneuraminic<br />
acid (Neu5Gc; refs. 1,2). This documented contamination<br />
has generally been ignored in drug development because<br />
healthy individuals were not thought to react to Neu5Gc<br />
(ref. 2). However, recent findings indicate that all humans<br />
have Neu5Gc-specific antibodies, sometimes at high levels 3,4 .<br />
Working with two monoclonal antibodies in clinical use,<br />
we demonstrate the presence of covalently bound Neu5Gc<br />
in cetuximab (Erbitux) but not panitumumab (Vectibix).<br />
Anti-Neu5Gc antibodies from healthy humans interact with<br />
cetuximab in a Neu5Gc-specific manner and generate immune<br />
complexes in vitro. Mice with a human-like defect in Neu5Gc<br />
synthesis generate antibodies to Neu5Gc after injection with<br />
cetuximab, and circulating anti-Neu5Gc antibodies can<br />
promote drug clearance. Finally, we show that the Neu5Gc<br />
content of cultured human and nonhuman cell lines and their<br />
secreted glycoproteins can be reduced by adding a human<br />
sialic acid to the culture medium. Our findings may be relevant<br />
to improving the half-life, efficacy and immunogenicity of<br />
glycoprotein therapeutics.<br />
Therapeutic glycoproteins, including antibodies, growth factors,<br />
cytokines, hormones and clotting factors, generate sales with annual<br />
double-digit growth rates 5 . They must often be produced in mammalian<br />
expression systems because of the crucial influence of the location,<br />
number and structure of N-glycans on their yields, bioactivity, solubility,<br />
stability against proteolysis, immunogenicity and rate of clearance<br />
from the bloodstream 6–8 .<br />
Two differences between the protein glycosylation apparatus of<br />
humans and rodents account for major potential differences between<br />
the N-glycans on glycoproteins made in cultured human cells and<br />
those made using rodent cell lines. First, humans cannot synthesize a<br />
terminal Galα1-3Gal motif (known as alpha-Gal) on N-glycans. As a<br />
consequence, they express antibodies against this structure 9 . Second,<br />
unlike other mammals, humans cannot biosynthesize the sialic acid<br />
Neu5Gc because the human gene CMAH, encoding CMP-N-acetylneuraminic<br />
acid hydroxylase, the enzyme responsible for producing<br />
CMP-Neu5Gc from CMP-N-acetylneuraminic acid (CMP-Neu5Ac),<br />
is irreversibly mutated 10 . The use of cultured human cells to address<br />
this issue is not a solution, as Neu5Gc can be taken up from animal<br />
products present in the culture medium and then metabolically incorporated<br />
into secreted glycoproteins 11 .<br />
Owing largely to limitations of the assays originally used to detect<br />
anti-Neu5Gc antibodies, including the fact that only a small number<br />
of possible Neu5Gc-containing epitopes were tested, healthy humans<br />
were long believed to show no immune reaction to Neu5Gc (ref. 2).<br />
Subsequent reports that all humans possess anti-Neu5Gc antibodies 3 ,<br />
sometimes at high levels, approaching 0.1–0.2% of circulating IgG 3,4 ,<br />
have led to re-evaluation of the potential significance of Neu5Gc<br />
contamination 7,8 . Especially in light of trends toward administering<br />
increasingly higher amounts of certain biotherapeutics over longer<br />
periods of time, some biopharmaceutical companies are exploring<br />
steps to reduce levels of Neu5Gc in their products 12 .<br />
Given that they are produced using nonhuman cell lines, animal<br />
serum or serum-derived factors, or a combination of these, it is likely<br />
that most recombinant therapeutic glycoproteins carry some Neu5Gc.<br />
However, given the diversity of products and production protocols,<br />
it is difficult to make generalizations. Thus, we chose to compare<br />
two US Food and Drug Administration (FDA)-approved monoclonal<br />
antibodies with the same therapeutic target, the EGF receptor. The<br />
first, Erbitux (cetuximab, obtained from the University of California,<br />
San Diego Pharmacy), is a chimeric antibody produced in mouse<br />
myeloma cells 13,14 . The second, Vectibix (panitumumab, obtained<br />
from Amgen), is a fully human antibody produced in Chinese<br />
hamster ovary (CHO) cells 15 . The samples studied were preparations<br />
that would normally be administered to patients.<br />
We first performed enzyme-linked immunosorbent assays (ELISAs)<br />
using an affinity-purified polyclonal chicken Neu5Gc-specific antibody<br />
preparation that is highly monospecific for Neu5Gc (ref. 16,<br />
alongside a nonreactive control IgY). Bound Neu5Gc was easily<br />
detectable on cetuximab but not on panitumumab (Fig. 1a). Sialidase<br />
pretreatment abolished binding, confirming specificity. Western blot<br />
analysis also showed sialidase-sensitive anti-Neu5Gc IgY reactivity<br />
on the heavy chains of cetuximab but not those of panitumumab<br />
(Fig. 1b). The specificity of anti-Neu5Gc IgY binding was reaffirmed<br />
by pretreatment with mild sodium periodate under conditions that<br />
selectively cleave sialic acid side chains (Fig. 1c) and abolish reactivity<br />
of such antibodies 3,16 . Finally, we quantified sialic acids on the therapeutic<br />
antibodies, as described in Online Methods. Panitumumab<br />
carries 0.22 mol of sialic acids per mole of protein, with
l e t t e r s<br />
a b<br />
c<br />
d e<br />
A 495<br />
0.6<br />
0.4<br />
0.2<br />
0<br />
Anti-Neu5Gc<br />
IgY<br />
Cet<br />
Pan<br />
Control IgY<br />
***<br />
Anti-Neu5Gc<br />
IgY<br />
Control IgY<br />
Active Heat-inactivated<br />
sialidase sialidase<br />
Sialidase:<br />
Coomassie<br />
staining<br />
Anti-Neu5Gc<br />
IgY<br />
Control IgY<br />
Cet Pan<br />
– + – +<br />
A 495<br />
0.8<br />
0.4<br />
0<br />
Cet<br />
Pan<br />
Anti-Neu5Gc<br />
IgY<br />
Control IgY<br />
Periodate<br />
treatment<br />
***<br />
Anti-Neu5Gc<br />
IgY<br />
Control IgY<br />
Mock<br />
treatment<br />
A 495<br />
0.2<br />
0<br />
Human anti-<br />
Neu5Gc IgG<br />
Cet<br />
Pan<br />
Control<br />
human IgG<br />
Periodate<br />
treatment<br />
***<br />
Human anti-<br />
Neu5Gc IgG<br />
Control<br />
human IgG<br />
Mock<br />
treatment<br />
Cet Pan<br />
Sialidase: – + – +<br />
Anti-Neu5Gc<br />
Human IgG<br />
f<br />
Concentration (ng µl –1 )<br />
2<br />
1<br />
0<br />
**<br />
**<br />
Cet Pan No Ab<br />
S34<br />
S30<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
Figure 1 ELISA and western blot detection of Neu5Gc on biotherapeutic antibodies by Neu5Gc IgY antibodies from chickens or IgG antibodies from<br />
normal human serum. Cetuximab (Cet) and panitumumab (Pan) were treated with active sialidase to eliminate sialic acid epitopes or with heatinactivated<br />
sialidase as control. (a,b) Samples were used for ELISA (a) or western blot (b), in which Neu5Gc was detected using an affinity-purified<br />
chicken anti-Neu5Gc IgY or control IgY. A 495 , absorbance at 495 nm. ***P < 0.001, paired two-tailed t-test. (c) In an additional ELISA, Cet and Pan<br />
were used for coating, then blocked, and sialic acid epitopes were modified chemically using mild sodium metaperiodate pretreatment. The reaction<br />
was stopped using sodium borohydride. As a control, periodate and borohydride were mixed and then added to the wells (the borohydride inactivates the<br />
periodate). ELISA samples were studied at least in triplicate and data shown are means ± s.d. ***P < 0.001, paired two-tailed t-test. (d) Cet and Pan<br />
were pretreated with mild periodate as in c and used to coat ELISA wells before blocking and incubation with human anti-Neu5Gc IgG that had been<br />
purified from the serum of healthy humans and biotinylated as previously described 4 . Samples were studied in triplicate and data shown are means ±<br />
s.d. ***P < 0.001, paired two-tailed t-test. (e) Cet and Pan (1 μg each) were treated with sialidase or heat-inactivated sialidase as in a separated by<br />
SDS-PAGE, Coomassie-stained, blotted (see b), and Neu5Gc detected using biotinylated human anti-Neu5Gc IgG. (f) Immune complex formation with<br />
Cet or Pan in whole human serum was detected using the CIC (C1Q) ELISA Kit (Buehlmann) as described in the manufacturer’s guidelines. Absorbance<br />
was measured at 405 nm. Samples were studied in triplicate and data shown are means ± s.d. **P < 0.01, paired two-tailed t-test. Gels in b and e were<br />
cropped for clarity of presentation. Full-length blots and gels are presented in Supplementary Figures 1–4.<br />
In contrast, cetuximab carries 1.84 mol of sialic acids per mole of<br />
protein, mostly as Neu5Gc (see Supplementary Table 1). The differences<br />
probably reflect different cell-expression systems. For example,<br />
in contrast to CHO cells, murine myeloma cell lines express a greater<br />
proportion of sialic acids as Neu5Gc (ref. 17; see Supplementary<br />
Tables 2 and 3 for a listing of other potential examples). Pull-down<br />
assays of cetuximab with SNA-agarose (modified with the lectin<br />
Sambucus nigra agglutinin, which recognizes α2-6-linked sialic acids),<br />
followed by ELISAs of unbound proteins, showed that only about half<br />
of cetuximab molecules actually carry bound sialic acids and Neu5Gc<br />
(data not shown). Such heterogeneity is typical for glycoproteins.<br />
To address the potential significance of the high levels of anti-<br />
Neu5Gc antibodies found in certain humans, we affinity purified anti-<br />
Neu5Gc antibodies from normal human sera and biotinylated them<br />
exactly as previously described 4 , before their analysis using ELISA<br />
and western blotting assays (Fig. 1d,e). As with Neu5Gc-specific<br />
chicken IgY, these affinity-purified human Neu5Gc-specific antibodies<br />
reacted with cetuximab but not with panitumumab. Again,<br />
reactivity was abrogated by pretreatment with mild sodium periodate<br />
(Fig. 1d) or sialidase (Fig. 1e).<br />
To further address potential clinical relevance, we studied whether<br />
addition of cetuximab to normal human sera is capable of promoting<br />
the formation of immune complexes (Fig. 1f). Cetuximab formed<br />
immune complexes in a human serum with high levels of anti-Neu5Gc<br />
antibodies (serum S34; ref. 4) but not in a low-titer serum (serum S30;<br />
ref. 4). In contrast, we detected no formation of immune complexes<br />
with either serum in the presence of panitumumab. Assuming that<br />
similar interactions occur between cetuximab and circulating anti-<br />
Neu5Gc antibodies in humans, these complexes could potentially fix<br />
complement and cause untoward reactions in some patients and/or<br />
affect half-life, possibly explaining some reported clinical differences<br />
between cetuximab and panitumumab 13,15 ,<br />
We next evaluated whether Neu5Gc affects clearance rate when circulating<br />
anti-Neu5Gc antibodies are present. To mimic the situation<br />
in humans, we used mice with a human-like defect in the Cmah<br />
gene, which encodes the enzyme that generates activated Neu5Gc<br />
(CMP-Neu5Gc) 18 . Such mice can make anti-Neu5Gc antibodies upon<br />
immunization with glycosidically bound, but not free, Neu5Gc 19–21 .<br />
However, the previous studies reporting these mouse anti-Neu5Gc<br />
antibodies used whole rodent or chimpanzee cells for immunization<br />
19,20 , an artificial approach. In contrast, feeding of Neu5Gc (which<br />
is present in mouse chow) does not induce a human-like immune<br />
response in the mutant mice 21 . We could not immunize the mice with<br />
cetuximab itself, as other antibodies directed against the partly human<br />
IgG protein backbone would confound any results. To most closely<br />
mimic the situation in humans, we therefore immunized with Neu5Gcloaded<br />
Haemophilus influenzae (see Online Methods and ref. 21;<br />
this is very similar to the mechanism by which human Neu5Gc-specific<br />
antibodies appear to be generated naturally 21 ). Given the great variability<br />
in isotypes and affinities of the naturally occurring human<br />
anti-Neu5Gc antibodies, as well as their different relative reactivities<br />
against various Neu5Gc-containing antigens 4 , it is impractical to<br />
model all possible human conditions. We therefore chose to mimic<br />
a situation in a human with relatively high levels of the IgG antibodies<br />
against the kind of Neu5Gc epitope (Neu5Gcα2-6Galβ1-4Glc-)<br />
found in cetuximab 22 . It also happens that this epitope is commonly<br />
recognized by human anti-Neu5Gc antibodies 4 .<br />
Each of the therapeutic antibodies, cetuximab and panitumumab,<br />
was injected intravenously at levels estimated to ensure a concentration<br />
of 1 μg ml −1 in the extracellular fluid volume according to mouse<br />
body weight 23 . Next, sera pooled from naïve, control-immunized or<br />
Neu5Gc-immunized syngeneic mice were passively transferred via<br />
intraperitoneal injection, ensuring equal starting concentrations of<br />
circulating Neu5Gc-specific antibodies. Anti-Neu5Gc IgG levels in<br />
the pooled sera from Neu5Gc-immunized mice were quantified using<br />
ELISA with a Neu5Gcα2-6Galβ1-4Glc-conjugate as a target, as previously<br />
described 4 (97.5 μg ml −1 , data not shown). The amount of<br />
pooled antibody injected was then calculated to achieve an approximate<br />
864 VOLUME 28 NUMBER 8 AUGUST 2010 nature biotechnology
l e t t e r s<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
Figure 2 Effects of Neu5Gc-specific antibodies on the kinetics of<br />
therapeutic antibodies in mice with a human-like Neu5Gc deficiency,<br />
levels of anti-Neu5Gc IgG in mice after injections of the therapeutic<br />
antibodies, and binding of IgG Neu5Gc-specific antibodies from whole<br />
human serum to Neu5Gc on the Fab fragment of cetuximab. (a) Cmahnull<br />
mice were first injected intravenously with either of the therapeutic<br />
antibodies, cetuximab (Cet) or panitumumab (Pan). Serum from Cmah-null<br />
mice containing anti-Neu5Gc antibodies (or serum from naïve mice or<br />
control-immunized mice) was then passively transferred by intraperitoneal<br />
injection. Mice were bled periodically after the passive transfer of serum.<br />
Concentrations of Cet or Pan in the isolated sera were determined by<br />
sandwich ELISA. Absorbance was measured at 495 nm. The y axis starts<br />
at 60% to better display the difference in kinetics. Error bars, s.d.;<br />
***P < 0.001, unpaired two-tailed t-test. (b) Cmah-null mice were injected<br />
intravenously with Cet, Pan or mouse IgG weekly and were bled initially<br />
and after the third intravenous injection. To detect Neu5Gc-specific<br />
antibodies by ELISA, we coated wells with human (Neu5Gc-deficient)<br />
or chimpanzee (Neu5Gc-positive) serum glycoproteins (upper chart), or<br />
alternatively with human or bovine fibrinogen (lower chart). Data were<br />
obtained in triplicate. (c) Fab fragments of Cet and Pan were isolated using<br />
the Pierce Fab Preparation Kit according to the manufacturer’s manual for use as target molecules in ELISA (1 μg per well). Sialic acid–specific binding<br />
was determined using mild sodium metaperiodate pretreatment. Wells were then blocked and incubated with human sera (S30 and S34, with low and<br />
high anti-Neu5Gc IgG titers, respectively 4 ). Binding of human IgG was detected using anti-human IgG-Fc. Absorbance was measured at 490 nm and<br />
ELISA samples were studied in triplicate. Error bars, s.d.; *P < 0.05, paired two-tailed t-test.<br />
starting concentration of 4 μg ml −1 IgG in the extracellular fluid<br />
volumes of the mice, which is about a four fold excess of anti-Neu5Gc<br />
antibodies compared to the injected drug in the mice, and similar to<br />
levels found in some humans 4 .<br />
Clearance was monitored by a sandwich ELISA specific for human<br />
IgG-Fc. Although both drugs had a similar clearance rate in mice<br />
pre-injected with serum from naïve or control-immunized mice,<br />
circulating levels of cetuximab decreased significantly (P < 0.001)<br />
when Neu5Gc-specific antibodies were pre-injected (Fig. 2a).<br />
Assuming that a similar interaction between cetuximab and<br />
circulating anti-Neu5Gc antibodies occurs in patients, there could<br />
be relevant effects on clearance rate and efficacy. This might help to<br />
explain the wide range of half-life values reported for such antibodies<br />
in clinical studies 14,15 .<br />
To further simulate the clinical situation, we injected equal<br />
amounts of cetuximab or panitumumab intravenously into Neu5Gcdeficient<br />
Cmah −/− mice in typical human dosages (4 μg per gram of<br />
body weight) at weekly intervals. To exclude any effect of the human<br />
portion of the protein (cetuximab) or of the fully human protein<br />
(panitumumab) in mice, we also injected murine IgG as a positive<br />
control, as it happens to carry Neu5Gc as the predominant sialic<br />
acid (Supplementary Table 1). Notably, cetuximab and murine IgG<br />
(but never panitumumab) induced a Neu5Gc-specific IgG immune<br />
response (Fig. 2b). As with humans, responses of individual mice<br />
varied greatly, and more positive signals were obtained with the<br />
Neu5Gc epitope mixture found in chimp serum than that in bovine<br />
fibrinogen. Thus, even patients without pre-existing high levels of<br />
anti-Neu5Gc antibodies may be at risk of developing them after injection<br />
of Neu5Gc-carrying agents, potentially affecting the outcome<br />
of subsequent injections. Moreover, repeated injections of Neu5Gccarrying<br />
agents could result in the accumulation of this nonhuman<br />
sugar in human tissues. Together with Neu5Gc-specific antibodies,<br />
accumulation of Neu5Gc in tissues can mediate chronic inflammation<br />
and potentially facilitate progression of diseases such as cancer 19 and<br />
atherosclerosis 24 . Thus, chronic use of Neu5Gc-bearing therapeutics<br />
might increase future risk of such diseases.<br />
Finally, we studied direct binding of anti-Neu5Gc antibodies from<br />
whole human sera to both cetuximab and panitumumab. To avoid<br />
a<br />
Circulating Cet or Pan<br />
(normalized)<br />
100<br />
Cet<br />
Pan<br />
80<br />
60<br />
0<br />
NaÏve<br />
mouse<br />
serum<br />
*** ***<br />
50<br />
Hours<br />
Serum of<br />
controlimmunized<br />
mice<br />
100<br />
Serum of<br />
Neu5Gcimmunized<br />
mice<br />
b<br />
Change in A 495<br />
A 495<br />
0.8<br />
mlgG Pan Cet<br />
0.6<br />
0.4<br />
0.2<br />
0<br />
–0.2<br />
0.3<br />
0.2<br />
0.1<br />
0<br />
–0.1<br />
S34<br />
S30<br />
0<br />
Periodate Mock Periodate Mock<br />
Fab Cet Fab Pan<br />
excessive cross-reactivity involving the secondary reagent, we prepared<br />
Fab fragments of both of the agents, used them to coat ELISA<br />
plate wells, exposed them to human sera and then detected serum<br />
antibody binding with a human IgG-Fc–specific secondary antibody<br />
(note that cetuximab is known to have an additional glycosylation<br />
site in the V-region 21 ). We detected mild periodate–sensitive binding<br />
of serum IgG from a high–anti-Neu5Gc titer serum (S34, ref. 4),<br />
which had >15 μg ml −1 IgG antibodies against Neu5Gcα2-6Galβ1-<br />
4Glc-) to the Fab fragments of cetuximab and not to those of panitumumab<br />
(Fig. 2c). In contrast, incubation with another human serum<br />
containing very low Neu5Gc-antibodies (serum S30, ref. 4, which had<br />
l e t t e r s<br />
a b c d e<br />
Neu5Gc<br />
(% of total sialic acids)<br />
80<br />
60<br />
40<br />
20<br />
0<br />
Day 0<br />
Ethanol-soluble fraction<br />
Ethanol-precipitable fraction<br />
Secreted protein<br />
Membrane fraction<br />
Control 5 mM Neu5Ac Control 5 mM Neu5Ac<br />
Control 5 mM Neu5Ac Control 5 mM Neu5Ac<br />
80<br />
10<br />
60<br />
15<br />
40<br />
10<br />
5<br />
20<br />
5<br />
0<br />
0<br />
0<br />
Day 1<br />
Day 2<br />
Day 3<br />
Day 4<br />
Day 5<br />
Neu5Gc<br />
(% of total sialic acids)<br />
Day 0<br />
Day 1<br />
Day 2<br />
Day 3<br />
Day 4<br />
Day 5<br />
Neu5Gc (% of total sialic acids)<br />
Day 3<br />
Day 5<br />
Day 7<br />
Neu5Gc (% of total sialic acids)<br />
Day 3<br />
Day 5<br />
Day 7<br />
5 mM Neu5Ac<br />
– + Size<br />
197 (kDa)<br />
125<br />
83<br />
Figure 3 An approach to reducing Neu5Gc contamination in biotherapeutic products. (a,b) Human 293T cells were grown in the presence of 5 mM Neu5Gc<br />
for 3 d. The cells were then washed with PBS and split into two identical cultures, and 5 mM Neu5Ac was added to one of the cultures. Cells were harvested<br />
as described in Online Methods, and the Neu5Gc and Neu5Ac content of both the ethanol-soluble (a) and ethanol-precipitable proteins (b) was analyzed<br />
by HPLC. (c–e) Feeding of CHO cells with free Neu5Ac reduced Neu5Gc in the whole-cell membranes and in secreted glycoproteins. Stably transfected<br />
CHO-KI cells expressing a recombinant soluble IgG-Fc fusion protein were grown in the absence or presence of 5 mM Neu5Ac. The individually collected<br />
medium was centrifuged to remove cell debris and adjusted to 5 mM Tris-HCl pH 8. The fusion protein was purified using protein A–Sepharose, and sialic<br />
acid content was determined by DMB-HPLC analysis as described in Online Methods (c). Total cell membranes from the same CHO cells were prepared and<br />
used for DMB-HPLC analysis (d). CHO membrane proteins from d were separated by SDS-PAGE and transferred onto nitrocellulose membranes. Expression<br />
of Neu5Gc (e) was detected by incubating with polyclonal affinity-purified chicken anti-Neu5Gc IgY, as described in Online Methods.<br />
37<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
small amounts of Neu5Gc in recombinant glycoproteins produced in<br />
CHO cells 1,16 , we next asked whether feeding Neu5Ac could reduce total<br />
glycoprotein Neu5Gc levels in CHO cells. This was successful for all membrane<br />
glycoproteins and for a secreted recombinant protein (Fig. 3c–e).<br />
Similar feeding of murine myeloma cells with Neu5Ac did not substantially<br />
reduce the higher initial Neu5Gc content (~70–80% of sialic acids),<br />
most probably because of the higher baseline levels of Cmah in these cells.<br />
Regardless, given that the CHO cell expresses its own Cmah enzyme, these<br />
data suggest a novel mechanism, in addition to Neu5Ac competing out<br />
recycled Neu5Gc. Whatever that mechanism, reduction of Neu5Gc content<br />
of a recombinant glycoprotein can be achieved even in a nonhuman<br />
Cmah-positive cell line that starts with low levels of Neu5Gc.<br />
Despite their successful use for a variety of indications, infusionrelated<br />
reactions, immunogenicity and accelerated clearance remain<br />
important concerns for many therapeutic glycoproteins 7,25 . The incidence<br />
and severity of an immune reaction depends on the interplay<br />
of infused agents with the immune system and can vary greatly from<br />
patient to patient. Understanding the underlying nature of these<br />
events will help to identify patients at risk with the use of specific<br />
markers. Humanized and fully human antibodies have been developed<br />
to reduce immunogenicity due to peptide epitopes 5 . However, the<br />
potential immunogenicity of the glycans they carry has not been as<br />
well considered. It is known that immune reactions can be mediated<br />
by binding of pre-existing IgEs against the nonhuman alpha-Gal epitope<br />
carried by some agents, such as cetuximab 13 . However, in our studies<br />
alpha-Gal residues are not an issue, as Cmah-null mice already express<br />
this sequence and do not have antibodies against it.<br />
A further concern arises here because pre-existing antibodies<br />
against a glycan on a glycoprotein can secondarily enhance antibody<br />
reactivity against the underlying protein backbone 26 , perhaps because<br />
immune complexes are cleared efficiently by Fc receptors into dendritic<br />
cells and other antigen-presenting cells 27,28 . Such a mechanism<br />
might help explain why patients’ immunogenicity to some glycoprotein<br />
therapeutics sometimes increases over time 26,29,30 . If this were<br />
true, it would likely have a further impact in long-term replacement<br />
therapy with recombinant therapeutic glycoproteins.<br />
Our findings suggest that the potential significance of the presence<br />
of Neu5Gc on glycoprotein biotherapeutics should be revisited.<br />
Despite a natural tendency to downplay potential new problems<br />
involving currently useful drugs, it is worthwhile to consider lessons<br />
from other fields, where initial enthusiasm was not balanced by full<br />
appreciation of immunological implications 31 . With this in mind, we<br />
have also suggested that Neu5Gc contamination of stem cells and<br />
other cell types intended for human therapy could pose risks 32,33 . In<br />
addition, others have recently reported that Cmah-null mice can reject<br />
Neu5Gc-positive wild-type organ transplants via complement-fixing<br />
Neu5Gc-specific antibodies 20 .<br />
For new drugs, it may be possible to avoid Neu5Gc contamination<br />
from the outset by using Neu5Gc-deficient cells and media.<br />
Meanwhile, as an immediate practical solution, we have also demonstrated<br />
a nontoxic way to reduce the Neu5Gc content of some currently<br />
used expression systems and their secreted glycoproteins, by simply<br />
adding Neu5Ac to the culture media. This could bypass the need<br />
to establish new Neu5Gc-deficient cell lines for already approved<br />
drugs. The addition of Neu5Ac to the media could also potentially<br />
increase total sialylation of a glycoprotein biotherapeutic agent. But<br />
if anything, such an increase would only be beneficial—for example,<br />
leading to a longer half-life of the agent in vivo.<br />
Methods<br />
Methods and any associated references are available in the online<br />
version of the paper at http://www.nature.com/naturebiotechnology/.<br />
Note: Supplementary information is available on the <strong>Nature</strong> Biotechnology website.<br />
Acknowledgments<br />
This work was supported by US National Institutes of Health grants R01-GM32373<br />
and R01-CA38701 to A.V. and The International Sephardic Education Foundation<br />
for V.P.-K. Haemophilus influenzae strain 2019 was a generous gift from M. Apicella,<br />
Department of Microbiology, University of Iowa.<br />
AUTHOR CONTRIBUTIONS<br />
All authors helped design the studies; D.G. and S.D. performed the research; R.E.T.<br />
and V.P.-K. generated crucial reagents; D.G. and A.V. wrote the paper; and all<br />
authors read the paper.<br />
COMPETING FINANCIAL INTERESTS<br />
The authors declare competing financial interests: details accompany the full-text<br />
HTML version of the paper at http://www.nature.com/naturebiotechnology/.<br />
Published online at http://www.nature.com/naturebiotechnology/.<br />
Reprints and permissions information is available online at http://npg.nature.com/<br />
reprintsandpermissions/.<br />
1. Hokke, C.H. et al. Sialylated carbohydrate chains of recombinant human<br />
glycoproteins expressed in Chinese hamster ovary cells contain traces of<br />
N-glycolylneuraminic acid. FEBS Lett. 275, 9–14 (1990).<br />
866 VOLUME 28 NUMBER 8 AUGUST 2010 nature biotechnology
l e t t e r s<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
2. Noguchi, A., Mukuria, C.J., Suzuki, E. & Naiki, M. Failure of human immunoresponse<br />
to N-glycolylneuraminic acid epitope contained in recombinant human erythropoietin.<br />
Nephron 72, 599–603 (1996).<br />
3. Tangvoranuntakul, P. et al. Human uptake and incorporation of an immunogenic<br />
nonhuman dietary sialic acid. Proc. Natl. Acad. Sci. USA 100, 12045–12050<br />
(2003).<br />
4. Padler-Karavani, V. et al. Diversity in specificity, abundance, and composition of<br />
anti-Neu5Gc antibodies in normal humans: potential implications for disease.<br />
Glycobiology 18, 818–830 (2008).<br />
5. Aggarwal, S. What′s fueling the biotech engine—2007. Nat. Biotechnol. 26,<br />
1227–1233 (2008).<br />
6. Arnold, J.N., Wormald, M.R., Sim, R.B., Rudd, P.M. & Dwek, R.A. The impact of<br />
glycosylation on the biological function and structure of human immunoglobulins.<br />
Annu. Rev. Immunol. 25, 21–50 (2007).<br />
7. Durocher, Y. & Butler, M. Expression systems for therapeutic glycoprotein production.<br />
Curr. Opin. Biotechnol. 20, 700–707 (2009).<br />
8. Higgins, E. Carbohydrate analysis throughout the development of a protein<br />
therapeutic. Glycoconj. J. 27, 211–225 (2009).<br />
9. Galili, U. Immune response, accommodation, and tolerance to transplantation<br />
carbohydrate antigens. Transplantation 78, 1093–1098 (2004).<br />
10. Varki, A. Glycan-based interactions involving vertebrate sialic-acid-recognizing<br />
proteins. <strong>Nature</strong> 446, 1023–1029 (2007).<br />
11. Bardor, M., Nguyen, D.H., Diaz, S. & Varki, A. Mechanism of uptake and incorporation<br />
of the non-human sialic acid N-glycolylneuraminic acid into human cells. J. Biol.<br />
Chem. 280, 4228–4237 (2005).<br />
12. Borys, M.C. et al. Effects of culture conditions on N-glycolylneuraminic acid<br />
(Neu5Gc) content of a recombinant fusion protein produced in CHO cells. Biotechnol.<br />
Bioeng. 105, 1048–1057 (2009).<br />
13. Chung, C.H. et al. Cetuximab-induced anaphylaxis and IgE specific for galactosealpha-1,3-galactose.<br />
N. Engl. J. Med. 358, 1109–1117 (2008).<br />
14. Delbaldo, C. et al. Pharmacokinetic profile of cetuximab (Erbitux) alone and in<br />
combination with irinotecan in patients with advanced EGFR-positive adenocarcinoma.<br />
Eur. J. Cancer 41, 1739–1745 (2005).<br />
15. Saadeh, C.E. & Lee, H.S. Panitumumab: a fully human monoclonal antibody with<br />
activity in metastatic colorectal cancer. Ann. Pharmacother. 41, 606–613<br />
(2007).<br />
16. Diaz, S.L. et al. Sensitive and specific detection of the non-human sialic acid<br />
N-glycolylneuraminic acid in human tissues and biotherapeutic products. PLoS ONE<br />
4, e4241 (2009).<br />
17. Muchmore, E.A., Milewski, M., Varki, A. & Diaz, S. Biosynthesis of N-glycolyneuraminic<br />
acid. The primary site of hydroxylation of N-acetylneuraminic acid<br />
is the cytosolic sugar nucleotide pool. J. Biol. Chem. 264, 20216–20223<br />
(1989).<br />
18. Hedlund, M. et al. N-glycolylneuraminic acid deficiency in mice: implications for<br />
human biology and evolution. Mol. Cell. Biol. 27, 4340–4346 (2007).<br />
19. Hedlund, M., Padler-Karavani, V., Varki, N.M. & Varki, A. Evidence for a humanspecific<br />
mechanism for diet and antibody-mediated inflammation in carcinoma<br />
progression. Proc. Natl. Acad. Sci. USA 105, 18936–18941 (2008).<br />
20. Tahara, H. et al. Immunological property of antibodies against N-glycolylneuraminic<br />
acid epitopes in cytidine monophospho-n-acetylneuraminic acid hydroxylasedeficient<br />
mice. J. Immunol. 184, 3269–3275 (2010).<br />
21. Taylor, R.E. et al. Novel mechanism for the generation of human xeno-autoantibodies<br />
against the non-human sialic acid N-glycolylneuraminic acid. J. Exp.<br />
Med. published online, doi: 10.1084/jem.20100575 (12 July 2010).<br />
22. Qian, J. et al. Structural characterization of N-linked oligosaccharides on monoclonal<br />
antibody cetuximab by the combination of orthogonal matrix-assisted laser<br />
desorption/ionization hybrid quadrupole-quadrupole time-of-flight tandem mass<br />
spectrometry and sequential enzymatic digestion. Anal. Biochem. 364, 8–18<br />
(2007).<br />
23. Axworthy, D.B. et al. Cure of human carcinoma xenografts by a single dose of<br />
pretargeted yttrium-90 with negligible toxicity. Proc. Natl. Acad. Sci. USA 97,<br />
1802–1807 (2000).<br />
24. Pham, T. et al. Evidence for a novel human-specific xeno-auto-antibody response<br />
against vascular endothelium. Blood 114, 5225–5235 (2009).<br />
25. Jahn, E.M. & Schneider, C.K. How to systematically evaluate immunogenicity of<br />
therapeutic proteins—regulatory considerations. New Biotechnol. 25, 280–286<br />
(2009).<br />
26. Galili, U. et al. Enhancement of antigen presentation of influenza virus hemagglutinin<br />
by the natural human anti-Gal antibody. Vaccine 14, 321–328 (1996).<br />
27. Benatuil, L. et al. The influence of natural antibody specificity on antigen<br />
immunogenicity. Eur. J. Immunol. 35, 2638–2647 (2005).<br />
28. Abdel-Motal, U.M., Wigglesworth, K. & Galili, U. Mechanism for increased<br />
immunogenicity of vaccines that form in vivo immune complexes with the natural<br />
anti-Gal antibody. Vaccine 27, 3072–3082 (2009).<br />
29. Koren, E. et al. Recommendations on risk-based strategies for detection and<br />
characterization of antibodies against biotechnology products. J. Immunol. Methods<br />
333, 1–9 (2008).<br />
30. Shankar, G., Pendley, C. & Stein, K.E. A risk-based bioanalytical strategy for the<br />
assessment of antibody immune responses against biological drugs. Nat. Biotechnol.<br />
25, 555–561 (2007).<br />
31. Wilson, J.M. Medicine. A history lesson for stem cells. Science 324, 727–728<br />
(2009).<br />
32. Martin, M.J., Muotri, A., Gage, F. & Varki, A. Human embryonic stem cells express<br />
an immunogenic nonhuman sialic acid. Nat. Med. 11, 228–232 (2005).<br />
33. Martin, M.J., Muotri, A., Gage, F. & Varki, A. Response to Cerdan et al.: Complement<br />
targeting of nonhuman sialic acid does not mediate cell death of human embryonic<br />
stem cells. Nat. Med. 12, 1115 (2006).<br />
34. Van Hoeyveld, E. & Bossuyt, X. Evaluation of seven commercial ELISA kits compared<br />
with the C1q solid-phase binding RIA for detection of circulating immune complexes.<br />
Clin. Chem. 46, 283–285 (2000).<br />
35. Campagnari, A.A., Gupta, M.R., Dudas, K.C., Murphy, T.F. & Apicella, M.A. Antigenic<br />
diversity of lipooligosaccharides of nontypable Haemophilus influenzae. Infect.<br />
Immun. 55, 882–887 (1987).<br />
36. Greiner, L.L. et al. Nontypeable Haemophilus influenzae strain 2019 produces a<br />
biofilm containing N-acetylneuraminic acid that may mimic sialylated O-linked<br />
glycans. Infect. Immun. 72, 4249–4260 (2004).<br />
37. Gagneux, P. et al. Proteomic comparison of human and great ape blood plasma<br />
reveals conserved glycosylation and differences in thyroid hormone metabolism.<br />
Am. J. Phys. Anthropol. 115, 99–109 (2001).<br />
38. Debeire, P., Montreuil, J., Moczar, E., van Halbeek, H. & Vliegenthart, J.F.G. Primary<br />
structure of two major glycans of bovine fibrinogen. Eur. J. Biochem. 151, 607–611<br />
(1985).<br />
nature biotechnology VOLUME 28 NUMBER 8 AUGUST 2010 867
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
ONLINE METHODS<br />
Mice. The Cmah-null mice used for this study have been described previously 18<br />
and were backcrossed to C57Bl/6 mice for over ten generations. All experiments<br />
were approved by the University of California, San Diego Institutional<br />
Review Board committee responsible for approving animal experiments.<br />
Sialidase treatment of therapeutic antibodies. One milligram each of cetuximab<br />
or panitumumab (obtained from the University of California, San Diego<br />
pharmacy or the manufacturer) were treated with 50 mU of active or heatinactivated<br />
Arthrobacter ureafaciens sialidase (EY Laboratories) in 100 mM<br />
sodium acetate pH 5.5, at 37 °C for 24 h. Samples were used for ELISA or<br />
western blots.<br />
Periodate treatment of therapeutic antibodies on ELISA plate. Untreated<br />
cetuximab and panitumumab (1 μg per well) were used for coating, then<br />
blocked with PBST for 2 h and incubated with freshly made 2 mM sodium<br />
metaperiodate in PBS for 20 min at 4 °C in the dark. The reaction was stopped<br />
by addition of 200 mM sodium borohydride to a final concentration of 20 mM.<br />
As a control, periodate and borohydride were premixed and then added to the<br />
wells (the borohydride inactivates the periodate). To remove resulting borates,<br />
wells were then washed three times with 100 mM sodium acetate with 100 mM<br />
NaCl pH 5.5 before further analysis.<br />
ELISA detection of Neu5Gc on therapeutic antibodies. For the ELISA, wells<br />
were coated with 1 μg of cetuximab or panitumumab (either before sialidase<br />
treatment or after periodate treatment), blocked with TBST for 2 h and then<br />
incubated with affinity-purified chicken anti-Neu5Gc IgY or control IgY for 1 h<br />
(1:20,000 in TBST). Binding of IgY was detected using horseradish peroxidase<br />
(HRP)-conjugated donkey anti-chicken IgY (1:50,000 in TBST) and developed<br />
with O-phenylenediamine in citrate-phosphate buffer, pH 5.5, with absorbance<br />
measured at 495 nm. ELISA samples were studied at least in triplicate. Similar<br />
to the ELISA with the anti-Neu5Gc chicken IgY, human anti-Neu5Gc IgG that<br />
had been purified from the serum of healthy humans and biotinylated (exactly<br />
as described in ref. 4) was also used as the primary antibody (1:100 in TBST).<br />
Binding of the human antibodies to the therapeutic antibodies was detected<br />
using HRP-conjugated streptavidin (1:10,000) followed by development as<br />
described above. Samples were studied in triplicate.<br />
Western blot detection of Neu5Gc on therapeutic antibodies. For western<br />
blot detection, cetuximab or panitumumab (1 μg per lane) was separated<br />
by 12.5% SDS-PAGE and Coomassie-stained or blotted on nitrocellulose<br />
membranes. Blotted membranes were blocked with TBST containing 0.5%<br />
cold-water fish-skin gelatin overnight at 4 °C and subsequently incubated<br />
with affinity-purified chicken anti-Neu5Gc IgY for 4 h at room temperature<br />
(1:100,000 in TBST). Binding of the chicken anti-Neu5Gc IgY was detected<br />
using HRP-conjugated donkey anti-chicken IgY for 1 h (1:50,000 in TBST),<br />
followed by incubation with SuperSignal West Pico Substrate (Pierce) as per<br />
the manufacturer’s recommendation, exposure to X-ray film and development<br />
of the film. Similar to the western blot with the chicken anti-Neu5Gc IgY,<br />
purified biotinylated human anti-Neu5Gc IgG was also used as the primary<br />
antibody (1:100 in TBST). Binding of the human antibodies to the therapeutic<br />
antibodies was detected using HRP-conjugated streptavidin (1:10,000 in<br />
TBST) followed by development as described above.<br />
CIC-C1q binding assay. Immune complex formation was detected using<br />
the CIC (C1Q) ELISA Kit (Buehlmann) as described in the manufacturer’s<br />
guidelines 34 . Briefly, 100 μl of human serum with low or high anti-Neu5Gc<br />
(S30 and S34, respectively 4 ) was incubated with 40 μg of cetuximab or panitumumab<br />
for 14 h at 4 °C. We applied 1:50 dilutions of the mix to human<br />
C1q–coated ELISA wells and incubated for 1 h at 25 °C. Binding was detected<br />
using alkaline phosphatase–conjugated protein A. After another washing step,<br />
the enzyme substrate (para-nitrophenylphosphate) was added, followed by a<br />
stopping step. The absorbance was measured at 405 nm. Samples were studied<br />
in triplicate.<br />
Generation of murine Neu5Gc-specific antibodies. Haemophilus influenzae<br />
strain 2019 (ref. 35) was a generous gift from M. Apicella, Department of<br />
Microbiology, University of Iowa. Bacteria were grown to mid log phase in<br />
sialic acid–free media 36 with or without addition of 1 mM Neu5Gc 21 , heatkilled<br />
and injected intraperitoneally (200 μl of culture at an absorbance of<br />
600 nm of 0.4) into Cmah-null mice.<br />
Effects of anti-Neu5Gc antibodies on in vivo kinetics of therapeutic antibodies.<br />
Cetuximab or panitumumab in PBS (0.24 μg per gram mouse body<br />
weight) were injected intravenously, and 14 h later, mouse serum pooled from<br />
syngeneic Cmah-null mice containing anti-Neu5Gc antibodies (or pooled<br />
serum from syngeneic naïve or control-immunized mice) was passively transferred<br />
via intraperitoneal injection into syngeneic Cmah-null mice that were<br />
prescreened for the absence of pre-existing antibodies against human IgG.<br />
Mice were bled 0, 2, 8, 32, 56 and 80 h after the passive transfer of mouse<br />
serum. For quantification of therapeutic antibody concentrations in the sera,<br />
wells of ELISA plates were coated with 1 μg of anti-human IgG (Biorad), then<br />
blocked with TBST for 2 h and incubated with 1:500 dilutions of the sera in<br />
each well. Captured therapeutic antibodies were detected by HRP-conjugated<br />
anti-human Fc (Jackson; 1:10,000), with development by O-phenylenediamine<br />
in citrate-phosphate buffer, pH 5.5, and absorbance measured at 495 nm<br />
(n = 5 for injections of both control sera groups; n = 10 for injections of anti-<br />
Neu5Gc serum groups).<br />
Quantification of Neu5Gc-specific IgG antibodies in Neu5Gc-immunized<br />
mice. A Neu5Gcα2-6Galβ1-4Glc-conjugate 4 (1 μg per well) and serial dilutions<br />
of mouse IgG as standards (0.625–20 ng per well) were used for coating<br />
overnight, then blocked with PBST for 2 h and incubated with pooled serum<br />
from Neu5Gc-immunized mice (1:250 dilution) for 2 h at 25 °C. Binding<br />
of mouse IgG was detected using HRP-conjugated goat anti-mouse IgG-Fc<br />
(Jackson; 1:10,000 in PBST) and developed with O-phenylenediamine in<br />
citrate-phosphate buffer, pH 5.5, with absorbance measured at 490 nm. ELISA<br />
samples were studied in triplicate.<br />
Levels of anti-Neu5Gc IgG after injections of the antibodies. Cmah-null mice<br />
were injected intravenously with 4 μg antibody per gram of mouse body weight<br />
in PBS weekly for 3 weeks. Mice were bled initially, and again 1 week after the<br />
third intravenous injection. Wells of ELISA plates were coated with 1:1,000<br />
dilutions of human (Neu5Gc-deficient) or chimpanzee (Neu5Gc-positive)<br />
serum glycoproteins (note that the only major difference between human<br />
and chimp serum glycosylation is the absence or presence of Neu5Gc;<br />
ref. 37). Alternatively, wells were coated with human or bovine fibrinogen,<br />
which carry Neu5Ac or Neu5Gc on otherwise identical N-glycans 38 . Wells<br />
were then blocked with TBST for 2 h followed by incubation with 1:100<br />
dilutions of the mouse sera. Binding of the mouse antibodies was detected<br />
using HRP-conjugated goat anti-mouse IgG Fc fragment (1:10,000 in TBST).<br />
Neu5Gc-specific binding (change in absorbance at 495 nm) was determined<br />
by subtracting the background signal of the wells coated with human serum or<br />
human fibrinogen (no Neu5Gc) from the signal of chimpanzee serum–coated<br />
or bovine fibrinogen–coated wells (containing Neu5Gc). Data were obtained<br />
in triplicate (n = 5 for injection of mouse IgG; n = 4 for injection of panitumumab;<br />
n = 6 for injection of cetuximab ).<br />
An approach to reduce Neu5Gc contamination in biotherapeutic products.<br />
Human 293T kidney cells were grown in DME supplemented with 10% (vol/vol)<br />
fetal calf serum. Cells were lifted from the culture plate using 20 mM EDTA in<br />
PBS and allowed to grow to 50% confluence. At this point, buffered 100 mM<br />
Neu5Gc was added to the culture in duplicate for a final 5 mM concentration,<br />
and the cells were grown in this supplemented media for 3 d. At the end of<br />
this Neu5Gc pulse, the cells were once again lifted using 20 mM EDTA in<br />
PBS, pelleted, washed once with PBS to remove any excess Neu5Gc and then<br />
suspended in 30 ml of growth medium. We added 5 ml of this cell suspension<br />
to each of five P-100 dishes. We immediately harvested the last aliquot of cell<br />
suspension, at time 0, by pelleting the cells, washing once with PBS, suspending<br />
them in 1 ml of PBS and transferring them to a 1.5-ml microcentrifuge<br />
tube. The cells were repelleted and frozen until all time points were collected.<br />
Buffered 100 mM Neu5Ac was added to each of the other five plates for the<br />
‘Neu5Ac chase’ and an equivalent amount of media added to the ‘minus chase’<br />
samples. We harvested cells at days 1, 2, 3, 4 and 5 by scraping them into the<br />
nature biotechnology<br />
doi:10.1038/nbt.1651
culture media, collecting by pelleting, washing once with PBS, transferring<br />
them to a 1.5-ml microcentrifuge tube, pelleting and freezing the cell pellet.<br />
At the end of the 5 d of chase, all collected cell pellets were homogenized in<br />
300 μl of ice-cold 20 mM potassium phosphate pH 7 using a 3- to 20-s burst<br />
with a Fisher Sonicator. We precipitated glycoconjugate-bound sialic acids by<br />
adding 700 μl of 100% ice-cold ethanol (final 70% (vol/vol) correct ethanol) and<br />
incubating at −20 °C overnight. The samples were spun at 20,000g for 15 min<br />
and the supernatants transferred to clean tubes and dried on a speed vac.<br />
The precipitated glycoconjugates and dried ethanol supernatants were each<br />
suspended in 100 μl of 20 mM potassium phosphate pH 7 by sonication. Sialic<br />
acids were released from both fractions by acid hydrolysis with 2 M acetic<br />
acid (final) and incubation at 80 °C for 3 h. Samples were passed through<br />
a Microcon-10 filter and the filtrate derivatized with DMB (1,2-diamino-4,<br />
5-methylenedioxybenzene) reagent for analysis of sialic acids by HPLC.<br />
A similar approach was taken with CHO cells stably expressing a Siglec-Fc<br />
protein in the medium, except that the Neu5Gc pulse was omitted and the<br />
secreted glycoproteins were captured on protein A–Sepharose beads. The<br />
cells were also processed similarly, except that total cell membranes were<br />
pelleted by centrifugation. The sialic acid content of the secreted proteins and<br />
cell membranes was determined by acid hydrolysis, DMB derivatization and<br />
HPLC. The cell membranes were also studied by western blotting with the<br />
chicken anti-Neu5Gc IgY, as described above.<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
doi:10.1038/nbt.1651<br />
nature biotechnology
l e t t e r s<br />
Global analysis of lysine ubiquitination by ubiquitin<br />
remnant immunoaffinity profiling<br />
Guoqiang Xu, Jeremy S Paige & Samie R Jaffrey<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
Protein ubiquitination is a post-translational modification<br />
(PTM) that regulates various aspects of protein function by<br />
different mechanisms. Characterization of ubiquitination has<br />
lagged behind that of smaller PTMs, such as phosphorylation,<br />
largely because of the difficulty of isolating and identifying<br />
peptides derived from the ubiquitinated portion of proteins.<br />
To address this issue, we generated a monoclonal antibody<br />
that enriches for peptides containing lysine residues modified<br />
by diglycine, an adduct left at sites of ubiquitination after<br />
trypsin digestion. We use mass spectrometry to identify 374<br />
diglycine-modified lysines on 236 ubiquitinated proteins from<br />
HEK293 cells, including 80 proteins containing multiple<br />
sites of ubiquitination. Seventy-two percent of these proteins<br />
and 92% of the ubiquitination sites do not appear to have<br />
been reported previously. Ubiquitin remnant profiling of the<br />
multi-ubiquitinated proteins proliferating cell nuclear antigen<br />
(PCNA) and tubulin -1A reveals differential regulation of<br />
ubiquitination at specific sites by microtubule inhibitors,<br />
demonstrating the effectiveness of our method to characterize<br />
the dynamics of lysine ubiquitination.<br />
Protein ubiquitination occurs on a wide variety of eukaryotic proteins<br />
and affects processes ranging from protein degradation and subcellular<br />
localization to gene expression and DNA repair 1 . Ubiquitination<br />
involves the transfer of ubiquitin to a target protein using E1 ubiquitin–<br />
activating enzymes, E2 ubiquitin–conjugating enzymes and E3 ubiquitin<br />
ligases 1 . This process typically leads to the formation of an amide<br />
linkage comprising the ε-amine of lysine of the target protein and the<br />
C terminus of ubiquitin, and can involve ubiquitination at distinct sites<br />
within the same protein, although the roles of ubiquitination at distinct<br />
sites are incompletely understood. The human genome is predicted to<br />
encode 16 E1, 53 E2 and 527 E3 proteins 2 , which underscores the likely<br />
importance of ubiquitination in molecular signaling.<br />
In most cases, proteins suspected to be ubiquitinated have been<br />
identified based on their susceptibility to proteasome-mediated degradation,<br />
as evidenced by their increased levels following application<br />
of proteasome inhibitors. These proteins are immunopurified and<br />
ubiquitin adducts are confirmed by anti-ubiquitin immunoblotting 3 .<br />
Mutagenesis experiments can identify ubiquitination sites 4 . Global<br />
identification of ubiquitinated proteins has been performed by<br />
purifying ubiquitinated proteins, using ubiquitin-binding proteins<br />
such as anti-ubiquitin antibodies 5 , or by purifying hexahistidine<br />
(His 6 )-tagged ubiquitin-protein conjugates 6 . The enriched set of proteins<br />
are then proteolyzed and subjected to tandem mass spectrometry<br />
(MS/MS). However, as only one or a few lysines are typically modified<br />
in any ubiquitinated protein, most peptides do not exhibit any<br />
ubiquitin-derived modifications 7 . This introduces uncertainty<br />
whether they are derived from the nonubiquitinated portion of a<br />
protein or from coprecipitated proteins.<br />
Alternatively, proteolytic digests can be screened for peptides that<br />
contain remnants of ubiquitin modification. Digestion of ubiquitinconjugated<br />
proteins results in peptides that contain a ubiquitin remnant<br />
derived from the ubiquitin C terminus. The three C-terminal<br />
residues of ubiquitin are Arg-Gly-Gly, with the C-terminal glycine<br />
conjugated to a lysine residue in the target. After digestion with<br />
trypsin, ubiquitin is cleaved after arginine, leaving a Gly-Gly dipeptide<br />
remnant on the conjugated lysine. Therefore, tryptic digests<br />
will include peptides that contain a diglycine-modified lysine,<br />
indicating the prior conjugation of ubiquitin to that region of the<br />
target protein. The diglycine-modified lysine serves as a signature<br />
of ubiquitination and also identifies the specific site of modification.<br />
Sequencing of ubiquitin remnant–containing peptides in tryptic<br />
digests has been used to identify 110 ubiquitination sites from<br />
yeast expressing His 6 -ubiquitin 7 . Despite the availability of these<br />
approaches for several years, analysis of the Swiss-Prot database<br />
indicates that only 255 mammalian proteins have been reported to<br />
be ubiquitinated based on experimental evidence. In most cases, the<br />
ubiquitination sites have not been identified. Direct enrichment of<br />
ubiquitin remnant–containing peptides would facilitate the highthroughput<br />
identification of ubiquitination sites.<br />
To identify ubiquitinated proteins and simultaneously report their<br />
sites of ubiquitination, we generated an antibody that recognizes<br />
peptides containing the ubiquitin remnant left after trypsin digestion<br />
of ubiquitinated proteins. To prepare a protein antigen containing<br />
diglycine-modified lysines, we first reacted purified lysine-rich histone<br />
III-S protein with t-butyloxycarbonyl-Gly-Gly-N-hydroxysuccinimide<br />
(Boc-Gly-Gly-NHS) to introduce amide-linked Boc-Gly-Gly adducts<br />
on all amines (Fig. 1a). Nearly complete modification of the amines was<br />
confirmed by the reduction in labeling of the Boc-Gly-Gly–modified<br />
protein by the lysine-modifying reagent biotin-NHS, as assessed by<br />
anti-biotin immunoblotting (Fig. 1b). The modified protein was treated<br />
with trifluoroacetic acid (TFA) to remove the Boc moiety. Quantitative<br />
Department of Pharmacology, Weill Medical College, Cornell University, New York, New York, USA. Correspondence should be addressed to S.R.J.<br />
(srj2003@med.cornell.edu).<br />
Received 25 March; accepted 11 June; published online 18 July 2010; doi:10.1038/nbt.1654<br />
868 VOLUME 28 NUMBER 8 AUGUST 2010 nature biotechnology
l e t t e r s<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
Figure 1 Generation of monoclonal antibodies that selectively recognize<br />
diglycine-modified lysines. (a) The antigen used to raise antibodies was<br />
synthesized by modifying the ε-amines of all lysines in a histone with<br />
t-butyloxycarbonyl-Gly-Gly-N-hydroxysuccinimide (Boc-Gly-Gly-NHS)<br />
and then removing the Boc group by treatment with TFA. The lysines<br />
in the final protein contain Gly-Gly adducts on the ε-amine of all lysine<br />
residues. (b) To validate the synthesis of Gly-Gly–modified histone,<br />
we monitored the reaction of the histone with Boc-Gly-Gly-NHS by<br />
detecting amines, such as those in unmodified lysine, through the<br />
reaction of the protein with the amine-modifying agent biotin-NHS,<br />
and subsequent western blot analysis with an anti-biotin antibody.<br />
Amines in the histone were completely lost after treatment with<br />
Boc-Gly-Gly-NHS, indicating complete modification of all the<br />
lysines in the histone. Removal of the Boc protecting group with<br />
TFA resulted in the formation of an amine at the N terminus of the<br />
Gly-Gly adduct. This step was essentially complete, as the TFA-treated<br />
protein exhibited nearly complete recovery of amine reactivity.<br />
The position of the bands in the different samples is slightly<br />
shifted due to the different molecular weights and number of<br />
positive charges in the modified and unmodified samples. The<br />
bands above 50 kDa represent impurities in the histone sample.<br />
(c) We evaluated the specificity of the GX41 monoclonal antibody by western blot analysis of β-lactoglobulin, lysozyme or rat brain lysate, in which<br />
the lysines were either unmodified (A), or modified with Boc-Gly-Gly (B) or Gly-Gly- (C) adducts, respectively.<br />
conversion of the Boc-Gly-Gly adduct, which does not contain an<br />
amine, to Gly-Gly, which contains an amine, was confirmed by the<br />
reactivity of the TFA-treated protein with biotin-NHS (Fig. 1b).<br />
We injected the diglycine-modified histone into mice, and screened<br />
hybridoma lines for antibodies that specifically recognize proteins<br />
containing diglycine-modified lysines. Hybridoma line GX41 generated<br />
monoclonal antibodies that exhibited pronounced specificity for<br />
proteins containing the diglycine-modified lysines. The antibodies<br />
failed to interact with unmodified lysozyme or lactoglobulin (Fig. 1c),<br />
or either of these proteins after they have been modified with Boc-Gly-<br />
Gly. However, the antibody recognized Gly-Gly–modified lysozyme<br />
and lactoglobulin obtained after removal of the Boc group.<br />
These results indicate that the antibody recognizes Gly-Gly–modified<br />
lysines, and suggest that the antibody only recognizes Gly-Gly adducts<br />
that contain an unmodified primary amine. Similarly, the antibody<br />
exhibits negligible reactivity with rat brain lysate (Fig. 1c), or brain<br />
lysate modified with Boc-Gly-Gly, but exhibits substantial reactivity<br />
with Gly-Gly–modified proteins from brain lysate. Notably, the brain<br />
lysate includes highly abundant proteins containing internal Gly-<br />
Gly peptide sequences, such as β-actin, glyceraldehyde-3-phosphate<br />
dehydrogenase and α-tubulin, as well as histone H2A, which contains<br />
an internal Gly-Gly-Lys sequence. This indicates that internal Gly-Gly<br />
sequences are not recognized by the antibody. Additionally, peptides<br />
that contain Gly-Gly as the first two amino acids are not recognized<br />
(Supplementary Fig. 1). Together, these data indicate that the antibody<br />
recognizes Gly-Gly sequences that are present as an adduct on<br />
the ε-amine of lysine.<br />
We next investigated whether the anti–diglycyl-lysine antibody was<br />
able to immunoprecipitate peptides containing Gly-Gly–modified lysine.<br />
A flow chart for sample preparation, immunoprecipitation, and MS/MS<br />
analysis is shown in Figure 2a. We prepared a peptide containing an<br />
N-terminal Gly-Gly sequence (GGDRVYIHPFHL), and a peptide containing<br />
a diglycyl adduct on lysine (Ac-SYSMEHFRWGK*PV-NH 2 ; K*<br />
and Ac represent Gly-Gly–modified lysine and an acetyl group, respectively).<br />
An equimolar mixture of the peptides was immunoprecipitated<br />
with the anti–diglycyl-lysine antibody, resulting in selective enrichment<br />
(≥50×) of the peptide containing the Gly-Gly–modified lysine (Fig. 2b).<br />
Additionally, this peptide was quantitatively immunoprecipitated<br />
with a nearly 100% yield (Supplementary Fig. 2). These experiments<br />
a<br />
NH 2<br />
NH 2<br />
NH<br />
NH<br />
NH<br />
NH 2<br />
NH<br />
NH 2<br />
Boc-Gly-Gly-NHS<br />
NH<br />
TFA<br />
NH<br />
NH<br />
NH<br />
Boc-Gly-Gly-<br />
Boc-Gly-Gly-<br />
Boc-Gly-Gly-<br />
Boc-Gly-Gly-<br />
Gly-Gly-<br />
Gly-Gly-<br />
Gly-Gly-<br />
Gly-Gly-<br />
Histone<br />
Boc-Gly-Gly-NHS<br />
TFA<br />
Biotin-NHS<br />
demonstrated that the GX41 antibody is capable of enriching peptides<br />
containing diglycine-modified lysines and does not immunoprecipitate<br />
peptides containing a Gly-Gly sequence at their N termini.<br />
We next sought to assess the diversity of lysine ubiquitination in<br />
cultured cells. To distinguish diglycine remnants derived from ubiquitin<br />
from those originating from less common ubiquitin-like proteins<br />
(such as ISG15 and NEDD8, which also leave a diglycine remnant<br />
on lysines after trypsinization 8 ), we used HEK293 cells expressing<br />
His 6 -tagged ubiquitin. Ubiquitinated proteins were purified<br />
by immobilized metal-affinity chromatography, before proteolysis<br />
and anti–diglycyl-lysine immunopurification. Ubiquitin remnant–<br />
containing peptides were subjected to liquid chromatography (LC)-<br />
MS/MS followed by database searching and spectral validation. To<br />
minimize alterations in ubiquitination levels after cell lysis, 5 mM<br />
chloroacetamide was included in lysis buffer to inhibit deubiquitinase<br />
and ubiquitin ligase activity 9 . To measure post-lysis ubiquitination,<br />
we spiked a lysate with excess glutathione S-transferase. This protein<br />
showed no detectable level of ubiquitination (Supplementary Fig. 3),<br />
suggesting that negligible ubiquitination occurred after cell lysis.<br />
MS/MS spectra of ubiquitin remnant–containing peptides exhibited<br />
normal y- and b-ion series, typically with a pair of ions separated by<br />
a mass of 242.14 Da, consistent with the masses of a lysine residue<br />
(128.09) and a Gly-Gly adduct (114.04 Da) on the ε-amine of lysine.<br />
Whereas most peptides contained a single diglycine-modified lysine<br />
(Fig. 2c), 17 peptides contained two diglycine-modified lysines. The<br />
majority (>92%) of ubiquitin remnant–containing peptides have a +3 or<br />
+4 charge (Supplementary Fig. 4), which reflects the additional charge<br />
from the N-terminal amine on the Gly-Gly adduct. Gly-Gly–modified<br />
lysines as the C-terminal residue of peptides were also detected (~2% of<br />
total) (Supplementary Fig. 5), and reflect use of the Gly-Gly–modified<br />
lysine as a substrate for trypsin, as described previously 10 .<br />
In total, we identified 374 diglycine-modified lysines on 236 ubiquitinated<br />
mammalian proteins. Analysis of the Swiss-Prot database<br />
suggests that 72% of these proteins were not previously known to<br />
be ubiquitinated. Similarly, 92% of the ubiquitination sites that we<br />
identified were not previously known. Among the identified proteins,<br />
156 proteins have one ubiquitination site and 80 have two or more<br />
ubiquitination sites (Supplementary Table 1 and Supplementary<br />
Fig. 6). To validate the ubiquitination detected using the ubiquitin<br />
b<br />
WB: anti-biotin<br />
c<br />
180<br />
115<br />
82<br />
64<br />
49<br />
37<br />
26<br />
15<br />
6<br />
250<br />
150<br />
100<br />
75<br />
50<br />
37<br />
25<br />
15<br />
A B C<br />
– + +<br />
– –<br />
+ +<br />
+<br />
β-Lact Lysozyme<br />
A B C A B C Brain lysate<br />
A B C<br />
185<br />
98<br />
52<br />
31<br />
19<br />
17<br />
14<br />
WB: anti–diglycyl-lysine<br />
nature biotechnology VOLUME 28 NUMBER 8 AUGUST 2010 869
l e t t e r s<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
Figure 2 Profiling immunopurified ubiquitin<br />
remnant–containing peptides to identify<br />
ubiquitinated proteins. (a) Strategy to identify<br />
ubiquitinated proteins by immunoprecipitation<br />
of peptides containing diglycyl-lysine,<br />
followed by MS analysis. (b) Confirmation<br />
of antibody specificity using two peptides,<br />
GGDRVYIHPFHL and Ac-SYSMEHFRWGK*PV-<br />
NH 2 . Equimolar amounts (0.3 nmol) of the two<br />
peptides were mixed and immunoprecipitated<br />
with immobilized anti–diglycyl-lysine<br />
monoclonal antibody. Matrix-assisted laser<br />
desorption ionization/time-of-flight (MALDI-<br />
TOF)-MS analysis for the starting material<br />
and the antibody-purified material suggests<br />
an enrichment factor of at least 50, based on<br />
the comparison of the MS signals of the two<br />
peptides before and after immunoprecipitation.<br />
(c) Representative annotated MS/MS spectra<br />
of two ubiquitin remnant–containing peptides<br />
obtained by immunoprecipitation from a<br />
HEK293 cell lysate. The sequence of<br />
the ubiquitinated peptide, including the<br />
diglycine-modified lysine (K * ), is indicated<br />
and the fragment ions are labeled. The<br />
symbols \, / and | represent b-ions, y-ions,<br />
and both b-ions and y-ions, respectively.<br />
(d) Biochemical verification of the<br />
ubiquitination of six proteins. Proteins were<br />
immunoprecipitated using target-specific<br />
antibodies and the immunoprecipitate<br />
was detected by western blotting using an<br />
anti-ubiquitin antibody. IgG was used as a<br />
control for nonspecific immunoprecipitation.<br />
The proteasome inhibitor N-acetyl-Leu-<br />
Leu-norleucinal (LLnL) was added to allow<br />
accumulation of the ubiquitinated protein.<br />
remnant–profiling approach, we selected a subset of six proteins identified<br />
by MS and assessed whether they were ubiquitinated in cells.<br />
Lysates from HEK293 cells were immunoprecipitated with antibodies<br />
specific for the protein under investigation and immunoblotted<br />
using an anti-ubiquitin antibody (Fig. 2d). In these experiments, the<br />
HEK293 cells were not transfected with plasmids expressing His 6 -<br />
tagged ubiquitin. In each case, the immunopurified protein exhibits<br />
anti-ubiquitin immunoreactivity consistent with the endogenous<br />
ubiquitination of these proteins.<br />
The ubiquitination targets include disease-related proteins, such<br />
as 14-3-3ε, ataxin, β-catenin, BRCA1-associated protein and TTRAP<br />
(TRAF and TNF receptor-associated protein). The proteins identified<br />
by ubiquitin remnant profiling have roles in numerous biological<br />
processes, of which the largest number involve metabolism, cell cycle/<br />
apoptosis and signal transduction (Fig. 3a). Additionally, we identified<br />
proteins that influence the trafficking, localization and structure of<br />
proteins, as well as regulate the immune system, consistent with previously<br />
reported roles for ubiquitination 11–14 . Ubiquitination of many<br />
ubiquitin-conjugating enzymes, ubiquitin ligases and 26S proteasome<br />
regulatory subunits also supports previous studies that reported the<br />
prevalence of ubiquitination of proteins involved in proteasome<br />
degradation pathways 15,16 . Some of the proteins found to be ubiquitinated<br />
extend earlier findings regarding the role of ubiquitination in<br />
certain cellular processes. For example, although histone H2 ubiquitination<br />
has been described 3 , we found that histone H1, H3 and H4 isoforms<br />
are also ubiquitinated, as are subunits of histone acetyltransferases and<br />
histone deacetylases. These findings support the idea that ubiquitin<br />
a<br />
b<br />
Relative intensity<br />
Relative intensity<br />
Trypsin<br />
digestion<br />
Immunopurification<br />
Ub(RGG-)HN<br />
GG<br />
K<br />
GG<br />
K<br />
nLC-MS/MS analysis<br />
Before purification<br />
100<br />
GGDRVYIHPFHL<br />
Ac-SYSMEHFRWGK*PV-NH 2<br />
80<br />
60<br />
Met-Ox<br />
40<br />
20<br />
+Na<br />
0<br />
1,000 1,200 1,400 1,600 1,800 2,000<br />
m/z<br />
After purification<br />
100<br />
Ac-SYSMEHFRWGK*PV-NH 2<br />
80<br />
60<br />
40<br />
Met-Ox<br />
20<br />
0<br />
1,000 1,200 1,400 1,600 1,800 2,000<br />
m/z<br />
c<br />
Relative intensity<br />
Relative intensity<br />
4<br />
b<br />
60S ribosomal protein L7a<br />
GA|L|A| 217 K*|L|V|E/A/I/R<br />
100 R I A E V L<br />
K*<br />
80<br />
y<br />
60<br />
a 2 y 1<br />
y y<br />
40<br />
3<br />
5 y<br />
y 6<br />
2 -NH 3<br />
y<br />
y 5<br />
2+ 4<br />
y 1<br />
y 2 y y<br />
20<br />
3 6<br />
2+<br />
b 2+<br />
2 b y y 6 y 7 7 4 y y8<br />
11<br />
20<br />
b b 3 y 2<br />
2+<br />
2+ y<br />
y 8<br />
7<br />
2+<br />
y 9 2 2+<br />
a 7 b 4<br />
b 5 6<br />
b y 7<br />
0<br />
7<br />
100 200 300 400 500 600 700 800 900 1,000<br />
m/z<br />
Splicing factor, arginine/serine-rich 1<br />
a<br />
100<br />
2 DI|ED\V/F/Y/ 38 K*/Y/G/A/I/R<br />
80<br />
R I A G Y K*<br />
Y F<br />
60<br />
40<br />
0<br />
200 400 600 800 1,000 1,200<br />
m/z<br />
d<br />
Western blotting: ubiquitin<br />
IP: β-14-3-3 IgG IP: Vimentin IgG<br />
LLnL – + – + LLnL – + – +<br />
250<br />
250<br />
150<br />
150<br />
100<br />
75<br />
100<br />
75<br />
50<br />
IP: NAP1L1 IgG IP: PARP1 IgG<br />
LLnL<br />
250<br />
– + – + LLnL – + – +<br />
150<br />
250<br />
100<br />
150<br />
75<br />
100<br />
50<br />
IP: HSP70 IgG IP: β-Catenin IgG<br />
LLnL – + – + LLnL – + – +<br />
250<br />
150<br />
100<br />
75<br />
250<br />
150<br />
100<br />
contributes to epigenetic gene regulation through multiple pathways.<br />
Many heat shock proteins, such as HSP70, HSP105, and<br />
HSC71, are ubiquitinated, linking ubiquitination to stress responses.<br />
Ubiquitination of several heterogeneous nuclear ribonucleoproteins<br />
reveals a role for ubiquitination in mRNA processing, metabolism,<br />
transport and splicing. Our studies also identify numerous transcription<br />
factors, splicing factors, DNA repair proteins and kinases. This<br />
supports the well-characterized role for ubiquitination in regulating<br />
cellular signal transduction.<br />
The subcellular distribution of the detected proteins is likely to<br />
reflect, in part, the subcellular fractions that were used for MS/MS<br />
analysis. Subcellular localization analysis of the identified proteins<br />
indicates that essentially all the ubiquitinated proteins are cytosolic<br />
(Fig. 3a, right panel), which is consistent with the general observation<br />
that ubiquitination occurs primary in the cytosolic compartment of the<br />
cell 12 . Many of the identified proteins are localized to the nucleus, and<br />
several proteins are localized to the mitochondria, suggesting a role for<br />
ubiquitination in regulating aspects of mitochondrial function.<br />
We next wanted to gain insight into how lysine ubiquitination<br />
might be regulated at the level of primary and secondary structure.<br />
Interestingly, ubiquitin remnant–modified lysines have a slight<br />
tendency to be localized in regions enriched in small hydrophobic<br />
residues, such as alanine, leucine, isoleucine, glycine, proline and<br />
valine (Supplementary Fig. 7a). Examination of a six-amino-acid<br />
window adjacent to ubiquitinated lysines in the human proteome<br />
revealed that cysteine, histidine and lysine are found at a ~40%<br />
lower frequency than when they are adjacent to lysines in general<br />
75<br />
870 VOLUME 28 NUMBER 8 AUGUST 2010 nature biotechnology
l e t t e r s<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
Figure 3 Bioinformatic analysis of ubiquitinated<br />
proteins and ubiquitin-modified lysines. (a) Pie<br />
charts of biological processes and subcellular<br />
localization of ubiquitinated proteins analyzed<br />
using the PANTHER and PENCE Proteome<br />
Analyst databases, respectively. Proteins were<br />
designated ‘other’ if their localizations or<br />
functions were not annotated in the database.<br />
(b) Backbone amino acid sequence analysis<br />
of ubiquitinated peptides. A density map of<br />
the ratios of the frequencies of each of the<br />
20 amino acids adjacent to the ubiquitinated<br />
lysines and adjacent to lysines in general was<br />
plotted using MATLAB. Several amino acids<br />
are slightly enriched at certain positions,<br />
such as leucine at +2, valine at −2, alanine<br />
at −5, glycine at +6, and tyrosine at −1 and<br />
+1, determined by Rosner’s test with a 95%<br />
confidence. (c) Ubiquitinated lysines (Ub<br />
Lys) possess an increased solvent accessible<br />
area (SAA) relative to lysines in general. The<br />
distribution of SAA of both populations of lysines indicates an increase in SAA among ubiquitinated lysines. The two distributions are significantly<br />
different (Student’s t test, P < 0.001). The results were obtained from an analysis of 89 PDB structures (140 ubiquitinated lysines, 3,970 total lysines).<br />
(d) Distribution of secondary structures of all lysines and ubiquitinated lysines obtained from an analysis of 89 PDB structures. The disordered region<br />
was predicted by DisEMBL for all ubiquitinated proteins identified by our MS experiments. χ 2 test: P < 0.001.<br />
(Supplementary Fig. 7a). Analysis involving Motif-x 17 identified<br />
K*XL as a potential consensus ubiquitination site. This motif appears<br />
to be ~1.8 times more common among ubiquitinated lysines than<br />
lysines in general (Supplementary Fig. 7b). To compare all 20 amino<br />
acids for their propensity to be found at specific residues adjacent to<br />
ubiquitinated lysines, we prepared a density map that indicates the<br />
frequency of each amino acid at any of the ten proximal positions on<br />
either side of the ubiquitinated lysines, compared to the frequency of<br />
that amino acid next to lysines in general, as assessed by surveying<br />
the human proteome (Fig. 3b). This analysis shows that there is only<br />
a subtle enrichment for specific residues at some positions, such as<br />
leucine at the +2 position, valine at the −2 position, alanine at the −5<br />
position, glycine at the +6 position, and tyrosine at the −1 and +1<br />
positions. In contrast, an analysis of ubiquitinated proteins in yeast 7<br />
indicates an significant enrichment of aspartic acid, glutamic acid,<br />
histidine and proline at some positions (Supplementary Fig. 7c).<br />
To determine whether the sequence of the immunogen affected the<br />
specificity of the immunoprecipitated peptides, we generated a similar<br />
density map to present the frequency of each amino acid adjacent to<br />
the Gly-Gly–modified lysines in the immunogen. Although there are<br />
a<br />
Metabolism<br />
(49.6%)<br />
Cell cycle/apoptosis (13.0%)<br />
Structure (4.4%)<br />
Small-molecule transport (2.3%)<br />
Immunity/defence (4.2%)<br />
Protein trafficking/localization (8.3%)<br />
Other/unclassified (9.1%)<br />
Signal transduction (9.1%)<br />
Mitochondria<br />
(3.6%)<br />
Endoplamic<br />
reticulum (2.4%)<br />
b c d<br />
A<br />
CDE<br />
F<br />
G<br />
H<br />
K L<br />
M<br />
NP<br />
Q<br />
RSTV<br />
W<br />
YK<br />
I<br />
–10 –5 0 5 10<br />
2.5<br />
2.0<br />
1.5<br />
1.0<br />
0.5<br />
0<br />
Percentage<br />
40<br />
30<br />
20<br />
10<br />
0<br />
All Lys<br />
Ub Lys<br />
0–20<br />
20–40<br />
40–60<br />
60–80<br />
80–100<br />
Relative solvent accessible area (%)<br />
Cytoplasm (48.2%)<br />
Nucleus (28.7%)<br />
Plasma<br />
membrane (1.2%)<br />
Other (15.8%)<br />
Golgi (2.4%)<br />
All Lys<br />
0.6<br />
Ub Lys<br />
0.5<br />
0.4<br />
0.3<br />
0.2<br />
0.1<br />
0.0<br />
Helix Strand Coil Disordered<br />
marked amino acid preferences adjacent to lysine in the immunogen<br />
(Supplementary Fig. 7d), these preferences are not seen in peptides<br />
pulled down by the anti–diglycyl-lysine antibody (Supplementary<br />
Fig. 7d). This suggests that the sequence of the immunogen used to<br />
generate our immunoaffinity reagent does not substantially bias the<br />
sequences of the immunoprecipitated peptides the antibody recovers.<br />
We found that ubiquitinated lysines have a slight tendency to appear<br />
on protein surfaces in preferred structural contexts. Structural information<br />
is available in Protein Data Bank (PDB) for 89 of the proteins<br />
identified in this study. Measurements of the solvent-accessible area<br />
of lysines in these proteins indicate that ubiquitinated lysines tend<br />
to be exposed slightly more to solvent than other lysines (Fig. 3c,<br />
Student’s t test, P < 0.001). If lysines with >50% surface exposure are<br />
considered solvent exposed 18 , 60% of the ubiquitinated lysines are<br />
exposed, which is more than for lysines in general (45%). Overall,<br />
ubiquitinated lysines are ~6.5% more exposed than all the lysines.<br />
This is in agreement with a ubiquitination site survey for yeast 19 .<br />
Interestingly, in some cases, the ubiquitinated lysine is fully buried<br />
(e.g., Supplementary Fig. 8). In these proteins, ubiquitination may<br />
be regulated by stimuli that induce the exposure of the lysine to the<br />
Fraction<br />
Relative intensity<br />
Relative intensity<br />
1.0<br />
0.8<br />
0.6<br />
0.4<br />
0.2<br />
0<br />
1.0<br />
0.8<br />
0.6<br />
0.4<br />
0.2<br />
0<br />
Lys0 Lys8 DLSHIGDAVVISCA 164 K*DGVK<br />
Lys0:Lys8 = 1.08 ± 0.05<br />
524 526 528 530<br />
m/z<br />
Lys0<br />
532 534<br />
YYLAP 254 K*IEDEEGS<br />
Lys0:Lys8 = 1.47 ± 0.09<br />
Lys8<br />
814 816 818 820 822<br />
m/z<br />
Figure 4 Colchicine differentially regulates the ubiquitination of two<br />
lysines in PCNA. HEK293 cells were grown in SILAC medium containing<br />
either light (Lys0) or heavy (Lys8) lysine, and transfected with a plasmid<br />
expressing His 6 -ubiquitin. Whereas Lys0-labeled cells were treated with<br />
10 μM colchicine, Lys8-labeled cells were treated with vehicle for 16 h.<br />
Identical amounts of cells from each treatment were mixed and processed<br />
for MS analysis of ubiquitin remnant–containing peptides. The relative<br />
ratio of MS signals between Lys0- and Lys8-labeled peptides was used<br />
for relative quantification of the change in ubiquitination at K164 and<br />
K254. The observed ratio was normalized to the change in PCNA protein<br />
abundance in the two samples by measuring two unmodified PCNA<br />
peptides in the initial mixed cell lysate (Supplementary Fig. 11). The<br />
observation that the ion intensity of the novel ubiquitination site (K254)<br />
is about 20% of that of K164 suggests that its ubiquitination may be less<br />
common or more transient than K164. This may explain why it was not<br />
detected previously in mutagenesis studies 33 . All data are the averages<br />
of experiments repeated three times. Note that the peptide ubiquitinated<br />
at K254 is the C-terminal tryptic peptide of the protein so that the last<br />
amino acid is neither K nor R, and the charge state of this peptide is +2.<br />
nature biotechnology VOLUME 28 NUMBER 8 AUGUST 2010 871
l e t t e r s<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
surface. Analysis of the local secondary structure surrounding all<br />
lysines and ubiquitinated lysines indicates that ubiquitinated lysines<br />
prefer helical structures compared to all lysines, although ubiquitination<br />
sites can also be found in other structural contexts (Fig. 3d).<br />
Additional crystal structures of proteins that are susceptible to ubiquitination<br />
are needed to fully assess the solvent exposure and structural<br />
contexts of ubiquitinated lysines.<br />
Recently, a large number of lysine acetylation sites have been discovered<br />
by proteomic approaches 20–23 . Although only 0.6% of lysines<br />
are predicted to be acetylated based on yeast studies 24 , >20% of the<br />
lysines that we found to be ubiquitinated are also sites of acetylation.<br />
For example, all the ubiquitinated lysines in H2B, H3.1 and H4 were<br />
reported to be acetylated. In the case of tubulin α-1A, four of the six<br />
ubiquitinated lysines were reported to be acetylated. The surprisingly<br />
high degree of concordance of lysine ubiquitination and acetylation<br />
sites suggests that acetylation of a specific lysine residue could serve<br />
as a means to prevent lysine ubiquitination 25 , or vice versa. A BLAST<br />
analysis of ubiquitination sites in human proteins against mouse, rat<br />
and yeast revealed that modified lysines are statistically more conserved<br />
between these species than lysines in general (Supplementary<br />
Fig. 9). This suggests that the pathways leading to the ubiquitination<br />
of these sites may be evolutionarily conserved.<br />
In cases where a protein is ubiquitinated at more than one site, it is<br />
particularly challenging to monitor how the ubiquitination at the individual<br />
sites is independently regulated. We therefore examined two<br />
proteins exhibiting multi-ubiquitination: tubulin α-1A and PCNA,<br />
a protein that regulates cell cycle progression 26 and has been linked<br />
to tumorigenesis 27 . We labeled His 6 -ubiquitin-expressing HEK293T<br />
cells with either light (Lys0) or heavy (Lys8) lysine to quantify ubiquitination<br />
using the SILAC (stable isotope labeling by amino acids in<br />
cell culture) approach 28 (Supplementary Fig. 10). We treated cells for<br />
16 h with either vehicle (Lys8) or 10 μM colchicine (Lys0), an inhibitor<br />
of microtubule polymerization that affects progression through the<br />
cell cycle 29 , before mixing, lysing and processing cells as described in<br />
the Online Methods. We then analyzed the samples by nanoLC-MS<br />
to quantify ubiquitination at the PCNA ubiquitination sites that we<br />
had previously identified using MS/MS based on their retention time,<br />
mass-to-charge ratio (m/z) and charge states. We quantified relative<br />
ubiquitination at each modification site by normalization using protein<br />
abundance, as measured by the averaged light-to-heavy ratio of<br />
unmodified peptides detected from initial mixed cell lysate before any<br />
affinity purification 30 (Supplementary Fig. 11). Interestingly, whereas<br />
the ubiquitination of K164 was unaffected by colchicine treatment,<br />
the ubiquitination of K254 was increased by 47% (Fig. 4).<br />
We also examined the multi-ubiquitination of tubulin α-1A.<br />
Treatment with colchicine resulted in a similar ~80% decrease in<br />
the ubiquitination of K326, K336 and K370. Surprisingly, treatment<br />
with vinblastine, which also disrupts microtubules, albeit through a<br />
distinct mechanism 31,32 , resulted in an opposite effect on ubiquitination,<br />
with a ~40% increase in ubiquitination at each of these sites<br />
(Supplementary Figs. 12 and 13). These results highlight how some<br />
ubiquitination sites may be ubiquitinated in a dynamic manner, for<br />
example, in response to specific signals, whereas other ubiquitination<br />
sites may be ‘constitutive’. In the case of both PCNA and tubulin α-1A,<br />
ubiquitin remnant profiling provided insights into how distinct<br />
ubiquitination sites respond to different experimental treatments in<br />
a manner not readily available using currently available approaches.<br />
The ubiquitin remnant–profiling approach described here provides<br />
a simple and robust strategy to identify and quantify sites of ubiquitination<br />
in cells. It could be used to identify ubiquitination patterns<br />
in cells and tissues with altered expression of ubiquitin ligases,<br />
deubiquitinating enzymes, as well as to profile changes in ubiquitination<br />
elicited by various signaling molecules, drugs and disease states.<br />
Although the present data used cells expressing His 6 -tagged ubiquitin<br />
to reduce the likelihood of obtaining diglycine-modified peptides from<br />
ISG15- and NEDD8-modified proteins, ubiquitin-modified proteins<br />
could readily be enriched using immobilized ubiquitin-binding<br />
proteins, such as S5a, or ubiquitin antibodies 5 in cells and tissues not<br />
amenable to transfection.<br />
Methods<br />
Methods and any associated references are available in the online<br />
version of the paper at http://www.nature.com/naturebiotechnology/.<br />
Accession code. MS/MS data and the identifications are deposited<br />
in the open access public repository PRIDE (http://www.ebi.ac.uk/<br />
pride/) with the accession code of 12018.<br />
Note: Supplementary information is available on the <strong>Nature</strong> Biotechnology website.<br />
Acknowledgments<br />
We thank T. Neubert and G. Zhang (New York University) for useful suggestions,<br />
P. Zhou (Weill Cornell Medical College, WCMC) for the His 6 -ubiquitin plasmid,<br />
U. Hengst, A. Deglincerti, R. Almeida and B. Derakhshan for the assistance<br />
during initial cell culturing, S. Gross and Y. Ma (WCMC Mass Spectrometry Core<br />
Facility) for helpful discussion in MS/MS analysis, F. Campagne, L. Skrabanek,<br />
J. Sun (WCMC Institute for Computational Biomedicine) for instructions and<br />
assistance in bioinformatic analysis. The mass spectrometry work was performed<br />
at the WCMC Mass Spectrometry Core Facility using instrumentation supported<br />
by US National Institutes of Health (NIH) RR19355 and RR22615. This work<br />
was supported by grants from Weill Cornell, NIH (MH086128) (S.R.J.), and<br />
a pharmacology cancer training grant from the National Cancer Institute<br />
(T32CA062948) (G.X. and J.S.P.).<br />
AUTHOR CONTRIBUTIONS<br />
S.R.J. and G.X. conceived and designed the study. G.X. and J.S.P. conducted<br />
the experiments, and G.X. and S.R.J. analyzed the data. S.R.J. and G.X. wrote<br />
the manuscript.<br />
COMPETING FINANCIAL INTERESTS<br />
The authors declare no competing financial interests.<br />
Published online at http://www.nature.com/naturebiotechnology/.<br />
Reprints and permissions information is available online at http://npg.nature.com/<br />
reprintsandpermissions/.<br />
1. Hershko, A. & Ciechanover, A. The ubiquitin system. Annu. Rev. Biochem. 67,<br />
425–479 (1998).<br />
2. Xu, P. & Peng, J. Dissecting the ubiquitin pathway by mass spectrometry. Biochim.<br />
Biophys. Acta 1764, 1940–1947 (2006).<br />
3. Ericsson, C., Goldknopf, I.L. & Daneholt, B. Inhibition of transcription does not<br />
affect the total amount of ubiquitinated histone 2A in chromatin. Exp. Cell Res.<br />
167, 127–134 (1986).<br />
4. Galluzzi, L., Paiardini, M., Lecomte, M.C. & Magnani, M. Identification of the main<br />
ubiquitination site in human erythroid alpha-spectrin. FEBS Lett. 489, 254–258<br />
(2001).<br />
5. Tomlinson, E., Palaniyappan, N., Tooth, D. & Layfield, R. Methods for the purification<br />
of ubiquitinated proteins. Proteomics 7, 1016–1022 (2007).<br />
6. Beers, E.P. & Callis, J. Utility of polyhistidine-tagged ubiquitin in the purification<br />
of ubiquitin-protein conjugates and as an affinity ligand for the purification of<br />
ubiquitin-specific hydrolases. J. Biol. Chem. 268, 21645–21649 (1993).<br />
7. Peng, J. et al. A proteomics approach to understanding protein ubiquitination.<br />
Nat. Biotechnol. 21, 921–926 (2003).<br />
8. Srikumar, T., Jeram, S.M., Lam, H. & Raught, B. A ubiquitin and ubiquitin-like<br />
protein spectral library. Proteomics 10, 337–342 (2010).<br />
9. Hershko, A., Heller, H., Elias, S. & Ciechanover, A. Components of ubiquitin-protein<br />
ligase system. Resolution, affinity purification, and role in protein breakdown.<br />
J. Biol. Chem. 258, 8206–8214 (1983).<br />
10. Denis, N.J., Vasilescu, J., Lambert, J.P., Smith, J.C. & Figeys, D. Tryptic digestion<br />
of ubiquitin standards reveals an improved strategy for identifying ubiquitinated<br />
proteins by mass spectrometry. Proteomics 7, 868–874 (2007).<br />
11. Rechsteiner, M. Ubiquitin-mediated pathways for intracellular proteolysis. Annu.<br />
Rev. Cell Biol. 3, 1–30 (1987).<br />
12. Bonifacino, J.S. & Weissman, A.M. Ubiquitin and the control of protein fate in the<br />
secretory and endocytic pathways. Annu. Rev. Cell Dev. Biol. 14, 19–57 (1998).<br />
872 VOLUME 28 NUMBER 8 AUGUST 2010 nature biotechnology
l e t t e r s<br />
13. Kirkpatrick, D.S., Denison, C. & Gygi, S.P. Weighing in on ubiquitin: the expanding<br />
role of mass-spectrometry-based proteomics. Nat. Cell Biol. 7, 750–757 (2005).<br />
14. Sun, L. & Chen, Z.J. The novel functions of ubiquitination in signaling. Curr. Opin.<br />
Cell Biol. 16, 119–126 (2004).<br />
15. Etlinger, J.D., Li, S.X., Guo, G.G. & Li, N. Phosphorylation and ubiquitination of<br />
the 26S proteasome complex. Enzyme Protein 47, 325–329 (1993).<br />
16. Peters, J.M. Subunits and substrates of the anaphase-promoting complex. Exp. Cell<br />
Res. 248, 339–349 (1999).<br />
17. Schwartz, D. & Gygi, S.P. An iterative statistical approach to the identification of<br />
protein phosphorylation motifs from large-scale data sets. Nat. Biotechnol. 23,<br />
1391–1398 (2005).<br />
18. Ahmad, S. & Gromiha, M.M. NETASA: neural network based prediction of solvent<br />
accessibility. Bioinformatics 18, 819–824 (2002).<br />
19. Catic, A., Collins, C., Church, G.M. & Ploegh, H.L. Preferred in vivo ubiquitination<br />
sites. Bioinformatics 20, 3302–3307 (2004).<br />
20. Choudhary, C. et al. Lysine acetylation targets protein complexes and co-regulates<br />
major cellular functions. Science 325, 834–840 (2009).<br />
21. Gnad, F. et al. PHOSIDA (phosphorylation site database): management, structural<br />
and evolutionary investigation, and prediction of phosphosites. Genome Biol. 8,<br />
R250 (2007).<br />
22. Kim, S.C. et al. Substrate and functional diversity of lysine acetylation revealed by<br />
a proteomics survey. Mol. Cell 23, 607–618 (2006).<br />
23. Zhao, S. et al. Regulation of cellular metabolism by protein lysine acetylation.<br />
Science 327, 1000–1004 (2010).<br />
24. Basu, A. et al. Proteome-wide prediction of acetylation substrates. Proc. Natl. Acad.<br />
Sci. USA 106, 13785–13790 (2009).<br />
25. Yang, X.J. & Seto, E. Lysine acetylation: codified crosstalk with other posttranslational<br />
modifications. Mol. Cell 31, 449–461 (2008).<br />
26. Prosperi, E. Multiple roles of the proliferating cell nuclear antigen: DNA replication,<br />
repair and cell cycle control. Prog. Cell Cycle Res. 3, 193–210 (1997).<br />
27. Mayer, A. et al. The prognostic significance of proliferating cell nuclear antigen,<br />
epidermal growth factor receptor, and mdr gene expression in colorectal cancer.<br />
Cancer 71, 2454–2460 (1993).<br />
28. Ong, S.E. et al. Stable isotope labeling by amino acids in cell culture, SILAC, as<br />
a simple and accurate approach to expression proteomics. Mol. Cell. Proteomics<br />
1, 376–386 (2002).<br />
29. Jordan, M.A. Mechanism of action of antitumor drugs that interact with microtubules<br />
and tubulin. Curr. Med. Chem. Anticancer Agents 2, 1–17 (2002).<br />
30. Wisniewski, J.R. et al. Constitutive and dynamic phosphorylation and acetylation<br />
sites on NUCKS, a hypermodified nuclear protein, studied by quantitative<br />
proteomics. Proteins 73, 710–718 (2008).<br />
31. Gigant, B. et al. Structural basis for the regulation of tubulin by vinblastine. <strong>Nature</strong><br />
435, 519–522 (2005).<br />
32. Ravelli, R.B. et al. Insight into tubulin regulation from a complex with colchicine<br />
and a stathmin-like domain. <strong>Nature</strong> 428, 198–202 (2004).<br />
33. Unk, I. et al. Human SHPRH is a ubiquitin ligase for Mms2-Ubc13-dependent<br />
polyubiquitylation of proliferating cell nuclear antigen. Proc. Natl. Acad. Sci. USA<br />
103, 18107–18112 (2006).<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
nature biotechnology VOLUME 28 NUMBER 8 AUGUST 2010 873
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
ONLINE METHODS<br />
Antigen synthesis and antibody production. Lysine-rich histone from<br />
calf thymus (type III-S, Sigma) was dissolved in 100 mM NaHCO 3 buffer<br />
(10 ml) at pH 10. 500 μl t-butyloxycarbonyl-Gly-Gly-N-hydroxysuccinimide<br />
(50 mM, Boc-Gly-Gly-NHS, ref. 34) in DMSO was added to histone solution<br />
and the reaction was carried out at 25° C for 1 h by constant shaking on a<br />
plate rotator. This step was repeated three additional times and sample B<br />
was obtained. For deprotection of the Boc group, neat trifluoroacetic acid<br />
(6 ml, TFA, Sigma) was added and the solution was shaken for 2 h at 25° C.<br />
The reaction was stopped by neutralizing with 10 M NaOH dropwise on<br />
ice (sample C). Sample A, B and C were dialyzed four times against 20 mM<br />
acetic acid followed by lyophilization. The degree of the reaction was assessed<br />
by anti-biotin (Sigma) western blot analysis after samples A, B and C were<br />
reacted with 5 mM biotin-NHS (Sigma) for 10 min. The same protocol<br />
was used to prepare Boc-Gly-Gly– and Gly-Gly–modified β-lactoglobulin,<br />
hen egg white lysozyme, rat brain lysate and peptides (DRVYIHPFHL and<br />
Ac-SYSMEHFRWGKPV-NH 2 ) for antibody evaluation.<br />
The antigen was injected into mice for antibody production, and hybridoma<br />
clones were made by Promab. Cells of monoclonal clones were grown<br />
in MegaCell Dulbecco’s Modified Eagle’s Medium (MegaCell DMEM, pH 7.2,<br />
Sigma) supplemented with 10% FBS (FBS), 50 μg/ml of kanamycin, 1 mM<br />
glutamine, and cells were split and cell culture supernatant was collected<br />
every week.<br />
Hybridoma clone GX41 was obtained after screening a panel of hybridomas<br />
to assess their utility in detecting diglycine-modified lysines. Antibodies<br />
from each hybridoma clone were first evaluated by western blot analysis using<br />
Gly-Gly–modified β-lactoglobulin, lysozyme and rat brain lysate. Clones were<br />
selected based on the absence of reactivity with unmodified protein and lysates,<br />
absence of reactivity with proteins and lysate modified with Boc-Gly-Gly, and<br />
reactivity with Gly-Gly–modified proteins and lysate. The top five clones that<br />
were further characterized were based on their ability to recognize the largest<br />
number of bands in the Gly-Gly–modified rat brain lysate. Antibodies from<br />
these clones were purified and used for immunoprecipitation of ubiquitin<br />
remnant–containing peptides from His 6 -ubiquitin–expressing HEK 293 cells,<br />
and tandem MS identification of tryptic ubiquitinated peptides to assess the<br />
degeneracy of antibodies. Only clone GX41 pulled down peptides that contained<br />
each of the 20 amino acids N-terminal to the modified lysine and each<br />
of the 20 amino acids C-terminal to the modified lysine, suggesting that the<br />
antibody can bind peptides which contain the diglycyl-lysine in a wide range<br />
of sequence contexts, which was supported by subsequent characterization of<br />
the amino acid context of the diglycyl-lysine obtained from a larger data set of<br />
ubiquitin remnant peptides (Fig. 3b and Supplementary Fig. 7a). The GX41<br />
anti–diglycyl-lysine monoclonal antibody was found to be IgG1κ isotype. This<br />
antibody was used for all the experiments in this study.<br />
Antibody purification and coupling. Gly-Gly–modified β-lactoglobulin was<br />
coupled to Affi-Gel 10 resin (Bio-Rad) in a concentration of 5 mg protein/ml<br />
resin in a pH 8 HEPES buffer overnight in 4 °C. The resin was quenched by 1 M<br />
Tris-HCl (pH 8), washed with three volumes of 10 mM citric acid (pH 3)<br />
and PBS. Cell culture supernatant (50 ml) from monoclonal cell lines was<br />
loaded six times into an 8-cm column with 1 ml Affi-Gel 10 resin coupled with<br />
Gly-Gly–modified β-lactoglobulin in 4 °C using a peristaltic pump. The resin<br />
was washed three times with 6 ml of 2× PBS and three times with 6 ml of PBS.<br />
The antibody was eluted four times with 0.5 ml 10 mM citric acid (pH 3) and<br />
immediately neutralized by 50 μl of 1 M HEPES (pH 8). The pH was adjusted<br />
to 8.5 and the antibody was concentrated by a 15 ml filter device (30 kDa<br />
molecular weight cutoff, Millipore). The antibody concentration was measured<br />
by Bradford protein assay (Bio-Rad). Typically, 0.1~0.2 mg of antibody was<br />
coupled to 20 μl Affi-Gel 10 resin according to the method described above.<br />
The antibody resin was stored in PBS buffer with 0.1% sodium azide at 4 °C.<br />
Cell culture and sample preparation. Human embryo kidney (HEK) 293 cells<br />
were cultured in Dulbecco’s Modified Eagle’s Medium (DMEM, Invitrogen)<br />
supplemented with 4.5 g/l glucose, 10% FBS, 100 units/ml penicillin G and<br />
100 μg/ml streptomycin. When the confluence reached ~50%, cells were<br />
transfected with 10 μg of a His 6 -tagged ubiquitin plasmid per 10-cm Petri<br />
dish using the calcium phosphate transfection method. Cells were used 1 d<br />
after transfection and treated with vehicle or proteasome inhibitor 25 μM<br />
LLnL (Calbiochem) in DMSO and incubated for 16 h before harvest. The<br />
His 6 -ubiquitin is expressed at a fraction of the level of endogenous ubiquitin<br />
(Supplementary Fig. 14) suggesting that it is unlikely to perturb endogenous<br />
ubiquitin pathways. The expression of tagged ubiquitin has been widely used<br />
in proteomics studies of protein ubiquitination 7,35,36 .<br />
Twenty 10-cm Petri dishes were cultured and cells were washed twice with<br />
ice-cold PBS. The cells were detached, collected and centrifuged at 1,000g for<br />
5 min at 4 °C. To increase coverage of ubiquitinated proteins, crude lysates, as<br />
well as subcellular fractions, including nuclear, membrane and cytosolic fractions,<br />
were prepared for analysis. For the crude lysate, the cell pellet was lysed<br />
and His 6 -tagged proteins were purified by Ni-NTA resin (Qiagen) in native<br />
and denaturing conditions according to the manufacturer’s protocol. The lysis<br />
buffer contained 5 mM chloroacetamide to alkylate cysteines and to inhibit<br />
ubiquitin ligases and deubiquitinases 9 . The membrane fraction was obtained<br />
by centrifuging at 100,000g for 60 min after removing the nuclear pellet in the<br />
presence of 250 mM sucrose. The pellet from nuclear and membrane fraction<br />
was dissolved in 8 M urea with 1% triton X-100 and 0.1% SDS and the proteins<br />
were purified by Ni-NTA resin in the presence of 10 mM β-mercaptoethanol.<br />
After immobilized metal affinity purification, ubiquitinated proteins are significantly<br />
enriched (Supplementary Fig. 15).<br />
All the samples after Ni-NTA purification were concentrated on an Amicon<br />
YM10 filter device (Millipore) and separated by SDS-PAGE. Gel pieces were<br />
treated with 10 mM dithiothreitol at 50 °C for 30 min, followed by 55 mM<br />
chloroacetamide at 25 °C for 45 min, using methods described previously 37 ,<br />
except that chloroacetamide was used in place of iodoacetamide. In-gel digestion<br />
and peptide extraction were performed as described 37 .<br />
The lyophilized peptide mixture was dissolved in 300 μl of buffer containing<br />
150 mM NaCl, 50 mM Tris-HCl (pH7.4) and 2 mM EDTA. The sample<br />
was boiled in a water bath for 10 min to deactivate residual trypsin activity.<br />
The peptide mixture was incubated with 20 μl antibody resin for 4 h in 4 °C,<br />
loaded on a micro-spin column (Pierce) six times, washed three times with<br />
2× PBS and three times with PBS, and eluted six times with 20 μl 10 mM<br />
citric acid (pH 3). The eluted peptide mixture was concentrated to 20 μl for<br />
tandem MS analysis.<br />
For the MALDI-TOF-MS experiment, a sample containing ~0.3 nmol of<br />
each peptide, GGDRVYIHPFHL and Ac-SYSMEHFRWGK*PV-NH 2 , was prepared<br />
and subjected to immunoprecipitation using the agarose-immobilized<br />
antibody described above.<br />
For SILAC quantification, five 10-cm dishes of HEK293T cells were grown<br />
in the media containing either light lysine (Lys0: 12 C 6 14 N 2 -Lys) or heavy lysine<br />
(Lys8: 13 C 6 15 N 2 -Lys) (Cambridge Isotope Labs) using previously described<br />
procedures for SILAC experiments 38 . The cells were transfected with His 6 -<br />
ubiquitin plasmid as described above, and treated with vehicle or drugs<br />
(10 μM colchicine or 1 μM vinblastine, Sigma) in the presence of LLnL (PCNA:<br />
25 μM for 16 h; tubulin α-1A: 50 μM for 30 min). The cells were mixed and<br />
purified under denaturing condition as described above without fractionation.<br />
To normalize the ubiquitinated peptides by unmodified peptides in the<br />
cell lysate, a small amount of initial mixed cell lysate was digested by trypsin<br />
followed by tandem MS analysis 30 .<br />
Mass spectrometric analysis. For MALDI-TOF-MS, samples were desalted<br />
by Millipore C18 ZipTip according to manufacturer’s protocol and eluted<br />
in a 2 μl solvent with 50% acetonitrile and 0.1% TFA in the presence of<br />
10 mg/ml α-cyano-4-hydroxycinnamic acid (Sigma). The masses of the samples<br />
were analyzed in the reflector mode by Voyager-DE PRO MALDI-TOF-MS<br />
(Applied Biosystems).<br />
The samples purified from cell lysate were analyzed by nanoLC Q-TOF<br />
MS/MS (Agilent) to obtain peptide sequence information using settings as<br />
described previously 39 . Briefly, 8 μl of peptide mixtures were loaded onto<br />
an enrichment column with 97% solvent A and 3% solvent B with a flow rate<br />
of 3 μl/min. Solvent A consists of 0.1% formic acid (Fluka) and solvent B of<br />
90% acetonitrile (Fisher) and 0.1% formic acid. Peptides were eluted with a<br />
gradient from 3% to 40% solvent B in 20 min, followed by a steep gradient<br />
to 90% solvent B in 5 min at a flow rate of 0.3 μl/min. Mass spectra were<br />
acquired in the positive-ion mode with automated data-dependent MS/MS<br />
on the five most intense ions from precursor MS scans and every selected<br />
nature biotechnology<br />
doi:10.1038/nbt.1654
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
precursor peak was analyzed twice within 3 min. In some runs, a list of previous<br />
identified peptides was excluded for MS/MS fragmentation.<br />
Database search of MS/MS spectra for peptide and protein identification.<br />
Analysis of MS/MS spectra for peptide and protein identification was<br />
performed by protein database searching with Spectrum Mill software<br />
(Rev A.03.02, Agilent) against the Swiss-Prot database (v57.2, May 5, 2009)<br />
containing a concatenated reverse database with the same entries and the<br />
same length for each protein, as described 40 . The use of a decoy database<br />
to evaluate the false-positive rate for modified peptides may underestimate<br />
the false identifications as protein modifications can greatly expand the<br />
search space. Raw spectra were first extracted to MS/MS spectra that could<br />
be assigned to at least four y- or b-series ions. Scans with the same precursor<br />
within a mass window of ±0.4 m/z were merged within a time frame of ±15 s,<br />
charges up to a maximum of 7 were assigned to the precursor ion and the<br />
12 C peak was determined by the Data Extractor. Key search parameters were<br />
a minimum matched peak intensity of 50%, a precursor mass tolerance of<br />
±20 p.p.m., and a product mass tolerance of ±40 p.p.m. A fixed modification<br />
was carbamidomethylation (same modification as chloroacetamide) for<br />
cysteines and variable modifications were Gly-Gly modification for lysines<br />
and oxidation for methionines. It should be noted that there are potentially<br />
a large number of naturally occurring sequence variants in mammals, but<br />
very limited data in the databases on these sequences. These variants may<br />
be missed or misidentified if the sequence variation lies in the same peptide<br />
that contains the diglycine modified–lysine. The maximal number of<br />
diglycine modifications was set as two. Trypsin was selected as enzyme for<br />
sample digestion and four missed cleavages were allowed during the database<br />
search. The threshold used for peptide identification was a Spectrum Mill<br />
score of ≥ 9, an SPI% (the percentage of the scored peak intensity) of ≥ 50%<br />
and the difference between forward and reverse scores of ≥2. Under these<br />
criteria, the false-positive rate is 1, there is a commensurately<br />
higher likelihood for Pro at the −1 position to be adjacent to a ubiquitinated<br />
lysine. The highest relative ratio detected was 2.3 and the range of the color<br />
map was set from 0 to 2.5. The density map was prepared by MATLAB. The<br />
enriched amino acids were obtained by determining the outliers with a 95%<br />
confidence using the Rosner’s test 46 .<br />
To access the structural features of ubiquitinated lysine residues for human<br />
proteins, we searched crystal structures for all the ubiquitinated proteins in<br />
protein database bank (PDB). In total, 89 PDB structures (Supplementary<br />
Table 2) contained lysines that we found are susceptible to ubiquitination<br />
(140 modified lysines and 3970 total lysines). In cases when multiple PDB<br />
structures for a ubiquitinated protein were reported, the structure with best<br />
quality was used. The secondary structure types for lysines were determined<br />
using the program DSSP 47 . H and G were considered to be helix, E and B to<br />
be strand, S, T and others for coil. The fraction of each secondary structure<br />
type of modified lysines was compared to that of all the lysine residues in<br />
89 PDB structures. The disordered region was predicted by DisEMBL 48<br />
for all identified ubiquitinated proteins and the information for modified<br />
lysines and all lysines was extracted. The relative solvent-accessible area<br />
for the modified and all lysines in 89 crystal structures was calculated<br />
using NACCESS 49 with a probe of 1.4 Å, which corresponds to the size of a<br />
water molecule.<br />
doi:10.1038/nbt.1654<br />
nature biotechnology
34. Derrien, D. et al. Muramyl dipeptide bound to poly-l-lysine substituted with mannose<br />
and gluconoyl residues as macrophage activators. Glycoconj. J. 6, 241–255 (1989).<br />
35. Kirkpatrick, D.S., Weldon, S.F., Tsaprailis, G., Liebler, D.C. & Gandolfi, A.J.<br />
Proteomic identification of ubiquitinated proteins from human cells expressing<br />
His-tagged ubiquitin. Proteomics 5, 2104–2111 (2005).<br />
36. Xu, P. et al. Quantitative proteomics reveals the function of unconventional ubiquitin<br />
chains in proteasomal degradation. Cell 137, 133–145 (2009).<br />
37. Shevchenko, A., Wilm, M., Vorm, O. & Mann, M. Mass spectrometric sequencing<br />
of proteins silver-stained polyacrylamide gels. Anal. Chem. 68, 850–858 (1996).<br />
38. de Godoy, L.M. et al. Status of complete proteome analysis by mass spectrometry:<br />
SILAC labeled yeast as a model system. Genome Biol. 7, R50 (2006).<br />
39. Xu, G., Shin, S.B. & Jaffrey, S.R. Global profiling of protease cleavage sites by<br />
chemoselective labeling of protein N-termini. Proc. Natl. Acad. Sci. USA 106,<br />
19310–19315 (2009).<br />
40. Elias, J.E. & Gygi, S.P. Target-decoy search strategy for increased confidence in<br />
large-scale protein identifications by mass spectrometry. Nat. Methods 4, 207–214<br />
(2007).<br />
41. Silva, J.C. et al. Quantitative proteomic analysis by accurate mass retention time<br />
pairs. Anal. Chem. 77, 2187–2200 (2005).<br />
42. Mortensen, P. et al. MSQuant, an open source platform for mass spectrometry-based<br />
quantitative proteomics. J. Proteome Res. 9, 393–403 (2010).<br />
43. Thomas, P.D. et al. PANTHER: a library of protein families and subfamilies indexed<br />
by function. Genome Res. 13, 2129–2141 (2003).<br />
44. Dennis, G. Jr. et al. DAVID: database for annotation, visualization, and integrated<br />
discovery. Genome Biol. 4, 3 (2003).<br />
45. Lu, Z. et al. Predicting subcellular localization of proteins using machine-learned<br />
classifiers. Bioinformatics 20, 547–556 (2004).<br />
46. Rosner, J. Test of auditory analysis skills (TAAS) in helping children overcome<br />
learning difficulties: a step-by-step guide for parents and teachers (Academic<br />
Therapy, New York, 1979).<br />
47. Kabsch, W. & Sander, C. Dictionary of protein secondary structure: pattern<br />
recognition of hydrogen-bonded and geometrical features. Biopolymers 22,<br />
2577–2637 (1983).<br />
48. Linding, R. et al. Protein disorder prediction: implications for structural proteomics.<br />
Structure 11, 1453–1459 (2003).<br />
49. Hubbard, S.J., Campbell, S.F. & Thornton, J.M. Molecular recognition. Conformational<br />
analysis of limited proteolytic sites and serine proteinase protein inhibitors. J. Mol.<br />
Biol. 220, 507–530 (1991).<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
nature biotechnology<br />
doi:10.1038/nbt.1654
careers and recruitment<br />
Second quarter biotech job picture<br />
Michael Francisco<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
In the second quarter of 2010, biotech and pharmaceutical postings on<br />
the three representative job databases tracked by <strong>Nature</strong> Biotechnology<br />
(Tables 1 and 2) largely stayed the same from the previous quarter<br />
(Nat. Biotechnol. 28, 527, 2010). Noteworthy increases in job openings<br />
were seen from instrument systems and consumables manufacturers Life<br />
Table 1 Who’s hiring? Advertised openings at the 25 largest biotech<br />
companies<br />
Number of advertised openings b<br />
Company a<br />
Number of<br />
employees<br />
Monster Biospace <strong>Nature</strong>jobs<br />
Monsanto 21,700 0 0 31<br />
Amgen 16,800 29 29 1<br />
Genentech 11,186 6 26 100<br />
Genzyme 11,000 63 0 0<br />
Life Technologies 9,700 71 89 0<br />
PerkinElmer 7,900 52 0 0<br />
Bio-Rad Laboratories 6,600 12 17 0<br />
Biomerieux 6,140 9 0 0<br />
Millipore 5,900 15 11 0<br />
IDEXX Laboratories 4,700 14 0 0<br />
Biogen Idec 4,700 38 104 1<br />
Gilead Sciences 3,441 0 26 0<br />
WuXi PharmaTech 3,172 0 0 0<br />
Qiagen 3,041 0 0 1<br />
Cephalon 2,780 0 0 0<br />
Biocon 2,772 0 0 0<br />
Celgene 2,441 13 10 0<br />
Biotest 2,108 0 11 0<br />
Actelion 2,054 4 1 0<br />
Amylin Pharmaceuticals 1,800 9 3 0<br />
Elan 1,687 7 2 0<br />
Illumina 1,536 2 27 9<br />
Albany Molecular<br />
Research<br />
1,357 0 0 0<br />
Vertex Pharmaceuticals 1,322 40 58 2<br />
CK Life Sciences 1,315 0 0 0<br />
a As defined in <strong>Nature</strong> Biotechnology’s survey of public companies (27, 710–721, 2009). b As<br />
searched on Monster.com, Biospace.com and <strong>Nature</strong>jobs.com, 21 July 2010. Jobs may overlap.<br />
Michael Francisco is Senior Editor, <strong>Nature</strong> Biotechnology<br />
Technologies (Carlsbad, CA, USA), PerkinElmer (Waltham, MA, USA),<br />
Bio-Rad Laboratories (Hercules, CA, USA) and Illumina (San Diego).<br />
Table 3 shows selected downsizings within the life science industry.<br />
<strong>Nature</strong> Biotechnology will continue to follow hiring and firing trends<br />
throughout 2010.<br />
Table 2 Advertised job openings at the ten largest pharma companies<br />
Number of advertised openings b<br />
Company a<br />
Number of<br />
employees Monster Biospace <strong>Nature</strong>jobs<br />
Johnson & Johnson 119,200 522 8 19<br />
Bayer 106,200 78 27 3<br />
GlaxoSmithKline 103,483 5 1 2<br />
Sanofi-Aventis 99,495 12 1 4<br />
Novartis 98,200 144 94 20<br />
Pfizer 86,600 2 81 88<br />
Roche 78,604 35 31 20<br />
Abbott Laboratories 68,697 67 42 1<br />
AstraZeneca 67,400 71 7 4<br />
Merck & Co. 59,800 0 10 0<br />
a Data obtained from MedAdNews. b As searched on Monster.com, Biospace.com and <strong>Nature</strong>jobs.com,<br />
21 July 2010. Jobs may overlap.<br />
Table 3 Selected biotech and pharma downsizings<br />
Company<br />
Albany Molecular<br />
Research<br />
Number of<br />
employees<br />
cut Details<br />
80 Restructured its US operations, including reducing head<br />
count by about 10% and suspending operations at one of<br />
its research laboratories in Rensselaer, New York.<br />
Cell Therapeutics 36 Reduced head count by 29% to 88 to conserve cash,<br />
with the cuts coming mostly from sales and marketing.<br />
GTC<br />
Biotherapeutics<br />
Helicos<br />
BioSciences<br />
50 Will restructure and reduce head count by 46% to 59 to<br />
save cash.<br />
40 Reduced head count by 50% to 40 and plans to refocus<br />
its business on molecular diagnostics development.<br />
InterMune 60 Reduced head count by 40% to 85, with the cuts<br />
coming predominantly in the commercial and discovery<br />
research areas.<br />
Lonza Group 193 Reducing head count by 6% to 2,899 at its R&D and<br />
production site in Visp, Switzerland, to save cash.<br />
Myriad<br />
Pharmaceuticals<br />
21 Restructured and reduced head count by 13% to about<br />
140 to focus on its cancer pipeline.<br />
Novartis 383 Will restructure Novartis Pharmaceuticals and reduce head<br />
count at the US unit. Thirty-five percent of the cuts will<br />
be achieved by not filling vacant positions. The cuts will<br />
primarily come from “headquarter-based functions,” with<br />
minimal impact on the commercial sales organization.<br />
Pfizer 6,000 Announced plans to restructure its global manufacturing<br />
plant network and reduce manufacturing head count by<br />
18% to 27,000 over the next five years. Plans to close<br />
eight sites in Puerto Rico, Ireland and the US and reduce<br />
operations at another six sites.<br />
Sanofi-Aventis 400 Cuts will primarily come from US sales force, which<br />
previously had 5,700 employees.<br />
Takeda<br />
Pharmaceutical<br />
Source: BioCentury.<br />
~1,900 Will reduce head count by about 10% to reduce costs in<br />
its fiscal year 2010 ending March 31, 2011.<br />
nature biotechnology volume 28 number 8 august 2010 875
people<br />
© 2010 <strong>Nature</strong> America, Inc. All rights reserved.<br />
Biogen Idec (Cambridge, MA, USA) has named George Scangos<br />
(left) as its new CEO as well as a member of the board of<br />
directors, replacing the recently retired Jim Mullen. Scangos<br />
joins Biogen Idec from Exelixis, where he has served as president<br />
and CEO since 1996. Previously, he spent 10 years at Bayer,<br />
leaving as president of Bayer Biotechnology.<br />
“George’s appointment is the culmination of the board’s<br />
comprehensive selection process to identify the best leader to<br />
take Biogen Idec to the next level,” says chairman William D.<br />
Young. “Science is at the heart of our business, and George has<br />
an exceptional scientific background, as well as significant operational expertise and a<br />
strong leadership track record.”<br />
Nile Therapeutics (San Francisco) has appointed<br />
Richard B. Brewer as executive chairman.<br />
Brewer brings over 35 years of operational,<br />
financial and business development expertise<br />
to Nile. He currently serves as chairman of Arca<br />
Biopharma and was previously CEO and president<br />
of Scios, COO of Heartport and senior vice<br />
president of US marketing at Genentech.<br />
BioVex (Woburn, MA, USA) has appointed<br />
Kapil Dhingra to its board of directors.<br />
Dhingra spent nearly ten years at<br />
Hoffmann-La Roche, culminating in his<br />
appointment as vice president and head of<br />
oncology clinical development.<br />
Myriad Genetics (Salt Lake City, UT, USA) has<br />
announced the appointment of Gary A. King<br />
to the newly created position of executive vice<br />
president of international operations. King has<br />
over 25 years of life sciences experience, most<br />
recently as CEO of AverDx. Prior to AverDx,<br />
he was vice president, international operations<br />
at Biosite.<br />
Dean Mitchell (left)<br />
has been named president<br />
and CEO of Lux<br />
Biosciences (Jersey<br />
City, NJ, USA).<br />
Mitchell was formerly<br />
president and<br />
CEO of Alpharma<br />
and Guilford<br />
Pharmaceuticals.<br />
He is also a nonexecutive board member of<br />
ISTA Pharmaceutics, Intrexon and Talecris<br />
Biotherapeutics.<br />
Diagnostic kit developer Ingen Biosciences<br />
(Chilly-Mazarin, France) has appointed<br />
Karine Mignon-Godefroy as director of<br />
research and development. She joins Ingen<br />
from the blood virus division of Bio-Rad,<br />
where she was director of international projects.<br />
Before Bio-Rad, she held the post of R&D<br />
manager at BMD.<br />
Frank Morich, CEO of NOXXON Pharma<br />
(Berlin) has announced his intention to leave<br />
the company effective August 15 to take up<br />
the position of executive vice president, international<br />
operations of Takeda Pharmaceutical<br />
Company. Iain Buchanan, a director of<br />
NOXXON, will assume the role of interim<br />
CEO and will support the board during its<br />
search for a permanent replacement. Buchanan<br />
has over 30 years of experience in the pharma<br />
and biotech industry, most recently as CEO<br />
of Novexel.<br />
Marine biotechnology company Aquapharm<br />
Biodiscovery (Oban, UK) has named Tim<br />
Morley as CSO. Morley has over 20 years experience<br />
in the pharmaceutical industry, including<br />
previous positions as research and strategic<br />
project director at Quotient Biodiagnostics,<br />
vice president preclinical sciences at Ardana<br />
Bioscience and senior director molecular and<br />
cellular pharmacology at Vernalis.<br />
Exelixis (S. San Francisco, CA, USA) has<br />
announced the appointment of Michael<br />
Morrissey as president and CEO, succeeding<br />
George Scangos. Morrissey will also become<br />
a member of the board of directors. He joined<br />
Exelixis in 2000 and served as executive vice<br />
president, discovery before his appointment<br />
as president of research and development in<br />
January 2007.<br />
Illumina (San Diego) has announced the<br />
appointment of Nicholas J. Naclerio to the position<br />
of senior vice president, corporate development.<br />
Naclerio formerly served as cofounder<br />
and executive chairman of Quanterix, raising<br />
$15 million in venture financing to launch the<br />
company. In addition, Illumina has named<br />
to its board of directors Gerald Möller, who<br />
currently serves as an advisor at HBM Bio<br />
Ventures, a Swiss investment firm. Previously,<br />
Möller spent 23 years at Boehringer Mannheim<br />
and Roche, where he held a number of leadership<br />
positions including CEO of the worldwide<br />
Boehringer Mannheim Group and head<br />
of global development and strategic marketing,<br />
pharmaceuticals for Roche.<br />
BrainStorm Cell Therapeutics (New York and<br />
Petach Tikva, Israel) has named Liat Sossover<br />
as CFO. Sossover has served in senior financial<br />
positions at a number of publicly traded and<br />
private companies, most recently as vice president,<br />
finance at ForeScout Technologies.<br />
James F. Young has been appointed to the board<br />
of directors of 3-V Biosciences (Menlo Park,<br />
CA, USA). He currently serves on the board of<br />
directors of Novavax. Previously, he served as<br />
head of MedImmune’s R&D organization and<br />
was directly involved in the development of<br />
approximately 20 clinical programs.<br />
Patrick J. Zenner has been elected to the board<br />
of directors of Par Pharmaceutical (Woodcliff<br />
Lake, NJ, USA). Zenner retired in January 2001<br />
from Hoffmann-La Roche, where he served as<br />
president and CEO since 1993. He currently<br />
serves as chairman of the board of ArQule<br />
and Exact Sciences and as a director of West<br />
Pharmaceutical Services.<br />
876 volume 28 number 8 august 2010 nature biotechnology